Hello,
This is a new attempt to implement cumulative hist period report.
This work begins from Arun's SORT_INCLUSIVE patch [1] but I completely
rewrote it from scratch.
In this version, I separated out the --percentage patchset not to have
dependency on it since it requires further works.
Please see the patch 01/21. I refactored functions that add hist
entries with struct hist_entry_iter. While I converted all functions
carefully, it'd be better anyone can test and confirm that I didn't
mess up something - especially for branch stack and mem stuff.
This patchset basically adds period in a sample to every node in the
callchain. A hist_entry now has an additional fields to keep the
cumulative period if --children option is given on perf report.
I changed the option as a separate --children and added a new
"Children" column (and renamed the default "Overhead" column into
"Self"). The output will be sorted by children (cumulative) overhead
for now. The reason I changed to the --children is that I still think
it's much different from other --call-graph options. The --call-graph
option will take care of it even with --children option.
I know that the UI should be changed also to be more flexible as Ingo
requested, but I'd like to do this first and then move to work on the
next. I also added a new config option to enable it by default.
* changes in v8:
- not depends on --percentage patchkit
- fix callchain resolving bug (Jiri)
- convert to sample__resolve_{mem,bstack}
- eliminate 'event' field from hist_entry_iter
* changes in v7:
- add Tested-by tags from Arun
- rebase onto current acme/perf/core
* changes in v6:
- separate struct hist_iter_ops (Jiri)
- check iter->he before calling ->add_entry_cb (Jiri)
- fix locking issue on perf top (Jiri)
* changes in v5:
- support both of --children and --call-graph (Arun)
- refactor hist_entry_iter to share with perf top (Jiri)
- various cleanups and fixes (Jiri)
- add ack's from Jiri
* changes in v4:
- change to --children option (Ingo)
- rebased on new annotation change (Arnaldo)
- support perf top also
- enable --children option by default (Ingo)
* changes in v3:
- change to --cumulate option
- fix a couple of bugs (Jiri, Rodrigo)
- rename some help functions (Arnaldo)
- cache previous hist entries rathen than just symbol and dso
- add some preparatory cleanups
- add report.cumulate config option
Let me show you an example:
$ cat abc.c
#define barrier() asm volatile("" ::: "memory")
void a(void)
{
int i;
for (i = 0; i < 1000000; i++)
barrier();
}
void b(void)
{
a();
}
void c(void)
{
b();
}
int main(void)
{
c();
return 0;
}
With this simple program I ran perf record and report:
$ perf record -g -e cycles:u ./abc
Case 1.
$ perf report --stdio --no-call-graph --no-children
# Overhead Command Shared Object Symbol
# ........ ....... ................. ..............
#
91.50% abc abc [.] a
8.18% abc ld-2.17.so [.] strlen
0.31% abc [kernel.kallsyms] [k] page_fault
0.01% abc ld-2.17.so [.] _start
Case 2. (current default behavior)
$ perf report --stdio --call-graph --no-children
# Overhead Command Shared Object Symbol
# ........ ....... ................. ..............
#
91.50% abc abc [.] a
|
--- a
b
c
main
__libc_start_main
8.18% abc ld-2.17.so [.] strlen
|
--- strlen
_dl_sysdep_start
0.31% abc [kernel.kallsyms] [k] page_fault
|
--- page_fault
_start
0.01% abc ld-2.17.so [.] _start
|
--- _start
Case 3.
$ perf report --no-call-graph --children --stdio
# Self Children Command Shared Object Symbol
# ........ ........ ....... ................. .....................
#
0.00% 91.50% abc libc-2.17.so [.] __libc_start_main
0.00% 91.50% abc abc [.] main
0.00% 91.50% abc abc [.] c
0.00% 91.50% abc abc [.] b
91.50% 91.50% abc abc [.] a
0.00% 8.18% abc ld-2.17.so [.] _dl_sysdep_start
8.18% 8.18% abc ld-2.17.so [.] strlen
0.01% 0.33% abc ld-2.17.so [.] _start
0.31% 0.31% abc [kernel.kallsyms] [k] page_fault
As you can see __libc_start_main -> main -> c -> b -> a callchain show
up in the output.
Finally, it looks like below with both option enabled:
Case 4. (default behavior?)
$ perf report --call-graph --children --stdio
# Self Children Command Shared Object Symbol
# ........ ........ ....... ................. .....................
#
0.00% 91.50% abc libc-2.17.so [.] __libc_start_main
|
--- __libc_start_main
0.00% 91.50% abc abc [.] main
|
--- main
__libc_start_main
0.00% 91.50% abc abc [.] c
|
--- c
main
__libc_start_main
0.00% 91.50% abc abc [.] b
|
--- b
c
main
__libc_start_main
91.50% 91.50% abc abc [.] a
|
--- a
b
c
main
__libc_start_main
...
Currently the perf enables both of --call-graph and --children when it
finds callchains in the samples. While this is useful for TUI or GTK,
I'm not sure for stdio as it'd consume so much lines.
It does not handle all kind of cases like event groups and annotations
yet, but I really want to release it and get reviews.
You can also get this series on 'perf/cumulate-v8' branch in my tree at:
git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
Any comments are welcome, thanks.
Namhyung
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
[1] https://lkml.org/lkml/2012/3/31/6
Namhyung Kim (21):
perf tools: Introduce struct hist_entry_iter
perf hists: Add support for accumulated stat of hist entry
perf hists: Check if accumulated when adding a hist entry
perf hists: Accumulate hist entry stat based on the callchain
perf tools: Update cpumode for each cumulative entry
perf report: Cache cumulative callchains
perf callchain: Add callchain_cursor_snapshot()
perf tools: Save callchain info for each cumulative entry
perf hists: Sort hist entries by accumulated period
perf ui/hist: Add support to accumulated hist stat
perf ui/browser: Add support to accumulated hist stat
perf ui/gtk: Add support to accumulated hist stat
perf tools: Apply percent-limit to cumulative percentage
perf tools: Add more hpp helper functions
perf report: Add --children option
perf report: Add report.children config option
perf tools: Add callback function to hist_entry_iter
perf top: Convert to hist_entry_iter
perf top: Add --children option
perf top: Add top.children config option
perf tools: Enable --children option by default
tools/perf/Documentation/perf-report.txt | 5 +
tools/perf/Documentation/perf-top.txt | 6 +
tools/perf/builtin-annotate.c | 3 +-
tools/perf/builtin-diff.c | 2 +-
tools/perf/builtin-report.c | 178 +++--------
tools/perf/builtin-top.c | 89 +++---
tools/perf/tests/hists_link.c | 4 +-
tools/perf/ui/browsers/hists.c | 50 ++--
tools/perf/ui/gtk/hists.c | 20 +-
tools/perf/ui/hist.c | 62 ++++
tools/perf/ui/stdio/hist.c | 4 +-
tools/perf/util/callchain.c | 45 ++-
tools/perf/util/callchain.h | 11 +
tools/perf/util/hist.c | 499 ++++++++++++++++++++++++++++++-
tools/perf/util/hist.h | 45 ++-
tools/perf/util/sort.h | 18 +-
tools/perf/util/symbol.c | 11 +-
tools/perf/util/symbol.h | 1 +
18 files changed, 836 insertions(+), 217 deletions(-)
--
1.7.11.7
Add top.children config option for setting default value of
callchain accumulation. It affects the output only if one of
-g or --call-graph option is given as well.
A user can write .perfconfig file like below to enable accumulation
by default:
$ cat ~/.perfconfig
[top]
children = true
And it can be disabled through command line:
$ perf top --no-children
Tested-by: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-top.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 2a0bc3a8e634..259c34e9914f 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1005,6 +1005,10 @@ static int perf_top_config(const char *var, const char *value, void *cb)
if (!strcmp(var, "top.call-graph"))
return record_parse_callchain(value, &top->record_opts);
+ if (!strcmp(var, "top.children")) {
+ symbol_conf.cumulate_callchain = perf_config_bool(var, value);
+ return 0;
+ }
return perf_default_config(var, value, cb);
}
--
1.7.11.7
The --children option is for showing accumulated overhead (period)
value as well as self overhead. It should be used with one of -g or
--call-graph option.
Tested-by: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/Documentation/perf-top.txt | 6 ++++++
tools/perf/builtin-top.c | 7 +++++++
2 files changed, 13 insertions(+)
diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
index cdd8d4946dba..01b6fd1a4428 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -149,6 +149,12 @@ Default is to monitor all CPUS.
Setup and enable call-graph (stack chain/backtrace) recording,
implies -g.
+--children::
+ Accumulate callchain of children to parent entry so that then can
+ show up in the output. The output will have a new "Children" column
+ and will be sorted on the data. It requires -g/--call-graph option
+ enabled.
+
--max-stack::
Set the stack depth limit when parsing the callchain, anything
beyond the specified depth will be ignored. This is a trade-off
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index e7d67421eb0f..2a0bc3a8e634 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1097,6 +1097,8 @@ int cmd_top(int argc, const char **argv, const char *prefix __maybe_unused)
OPT_CALLBACK(0, "call-graph", &top.record_opts,
"mode[,dump_size]", record_callchain_help,
&parse_callchain_opt),
+ OPT_BOOLEAN(0, "children", &symbol_conf.cumulate_callchain,
+ "Accumulate callchains of children and show total overhead as well"),
OPT_INTEGER(0, "max-stack", &top.max_stack,
"Set the maximum stack depth when parsing the callchain. "
"Default: " __stringify(PERF_MAX_STACK_DEPTH)),
@@ -1198,6 +1200,11 @@ int cmd_top(int argc, const char **argv, const char *prefix __maybe_unused)
top.sym_evsel = perf_evlist__first(top.evlist);
+ if (!symbol_conf.use_callchain) {
+ symbol_conf.cumulate_callchain = false;
+ perf_hpp__cancel_cumulate();
+ }
+
symbol_conf.priv_size = sizeof(struct annotation);
symbol_conf.try_vmlinux_path = (symbol_conf.vmlinux_name == NULL);
--
1.7.11.7
The cpumode and level in struct addr_localtion was set for a sample
and but updated as cumulative callchains were added. This led to have
non-matching symbol and cpumode in the output.
Update it accordingly based on the fact whether the map is a part of
the kernel or not. This is a reverse of what thread__find_addr_map()
does.
Tested-by: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/callchain.c | 42 ++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/callchain.h | 2 ++
tools/perf/util/hist.c | 13 ++-----------
3 files changed, 46 insertions(+), 11 deletions(-)
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 35b12a06bc39..c1d4ccd755ed 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -552,3 +552,45 @@ int hist_entry__append_callchain(struct hist_entry *he, struct perf_sample *samp
return 0;
return callchain_append(he->callchain, &callchain_cursor, sample->period);
}
+
+int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node *node,
+ bool hide_unresolved)
+{
+ al->map = node->map;
+ al->sym = node->sym;
+ if (node->map)
+ al->addr = node->map->map_ip(node->map, node->ip);
+ else
+ al->addr = node->ip;
+
+ if (al->sym == NULL) {
+ if (hide_unresolved)
+ return 0;
+ if (al->map == NULL)
+ goto out;
+ }
+
+ if (al->map->groups == &al->machine->kmaps) {
+ if (machine__is_host(al->machine)) {
+ al->cpumode = PERF_RECORD_MISC_KERNEL;
+ al->level = 'k';
+ } else {
+ al->cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
+ al->level = 'g';
+ }
+ } else {
+ if (machine__is_host(al->machine)) {
+ al->cpumode = PERF_RECORD_MISC_USER;
+ al->level = '.';
+ } else if (perf_guest) {
+ al->cpumode = PERF_RECORD_MISC_GUEST_USER;
+ al->level = 'u';
+ } else {
+ al->cpumode = PERF_RECORD_MISC_HYPERVISOR;
+ al->level = 'H';
+ }
+ }
+
+out:
+ return 1;
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 8ad97e9b119f..66faae21370d 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -155,6 +155,8 @@ int sample__resolve_callchain(struct perf_sample *sample, struct symbol **parent
struct perf_evsel *evsel, struct addr_location *al,
int max_stack);
int hist_entry__append_callchain(struct hist_entry *he, struct perf_sample *sample);
+int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node *node,
+ bool hide_unresolved);
extern const char record_callchain_help[];
#endif /* __PERF_CALLCHAIN_H */
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 4691313f8748..15e619d6654f 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -740,18 +740,9 @@ iter_next_cumulative_entry(struct hist_entry_iter *iter,
if (node == NULL)
return 0;
- al->map = node->map;
- al->sym = node->sym;
- if (node->map)
- al->addr = node->map->map_ip(node->map, node->ip);
- else
- al->addr = node->ip;
-
- if (iter->hide_unresolved && al->sym == NULL)
- return 0;
-
callchain_cursor_advance(&callchain_cursor);
- return 1;
+
+ return fill_callchain_info(al, node, iter->hide_unresolved);
}
static int
--
1.7.11.7
It is possble that a callchain has cycles or recursive calls. In that
case it'll end up having entries more than 100% overhead in the
output. In order to prevent such entries, cache each callchain node
and skip if same entry already cumulated.
Tested-by: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/hist.c | 43 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 43 insertions(+)
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 15e619d6654f..7529d216986a 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -709,7 +709,22 @@ static int
iter_prepare_cumulative_entry(struct hist_entry_iter *iter __maybe_unused,
struct addr_location *al __maybe_unused)
{
+ struct hist_entry **he_cache;
+
callchain_cursor_commit(&callchain_cursor);
+
+ /*
+ * This is for detecting cycles or recursions so that they're
+ * cumulated only one time to prevent entries more than 100%
+ * overhead.
+ */
+ he_cache = malloc(sizeof(*he_cache) * (PERF_MAX_STACK_DEPTH + 1));
+ if (he_cache == NULL)
+ return -ENOMEM;
+
+ iter->priv = he_cache;
+ iter->curr = 0;
+
return 0;
}
@@ -719,6 +734,7 @@ iter_add_single_cumulative_entry(struct hist_entry_iter *iter,
{
struct perf_evsel *evsel = iter->evsel;
struct perf_sample *sample = iter->sample;
+ struct hist_entry **he_cache = iter->priv;
struct hist_entry *he;
he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
@@ -727,6 +743,8 @@ iter_add_single_cumulative_entry(struct hist_entry_iter *iter,
if (he == NULL)
return -ENOMEM;
+ he_cache[iter->curr++] = he;
+
return hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
}
@@ -751,7 +769,29 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
{
struct perf_evsel *evsel = iter->evsel;
struct perf_sample *sample = iter->sample;
+ struct hist_entry **he_cache = iter->priv;
struct hist_entry *he;
+ struct hist_entry he_tmp = {
+ .cpu = al->cpu,
+ .thread = al->thread,
+ .comm = thread__comm(al->thread),
+ .ip = al->addr,
+ .ms = {
+ .map = al->map,
+ .sym = al->sym,
+ },
+ .parent = iter->parent,
+ };
+ int i;
+
+ /*
+ * Check if there's duplicate entries in the callchain.
+ * It's possible that it has cycles or recursive calls.
+ */
+ for (i = 0; i < iter->curr; i++) {
+ if (hist_entry__cmp(he_cache[i], &he_tmp) == 0)
+ return 0;
+ }
he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
sample->period, sample->weight,
@@ -759,6 +799,8 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
if (he == NULL)
return -ENOMEM;
+ he_cache[iter->curr++] = he;
+
return hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
}
@@ -772,6 +814,7 @@ iter_finish_cumulative_entry(struct hist_entry_iter *iter,
evsel->hists.stats.total_period += sample->period;
hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+ zfree(&iter->priv);
return 0;
}
--
1.7.11.7
The callchain_cursor_snapshot() is for saving current status of the
callchain. It'll be used to accumulate callchain information for each node.
Tested-by: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/callchain.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 66faae21370d..bbd63dfbe112 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -159,4 +159,13 @@ int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node *
bool hide_unresolved);
extern const char record_callchain_help[];
+
+static inline void callchain_cursor_snapshot(struct callchain_cursor *dest,
+ struct callchain_cursor *src)
+{
+ *dest = *src;
+
+ dest->first = src->curr;
+ dest->nr -= src->pos;
+}
#endif /* __PERF_CALLCHAIN_H */
--
1.7.11.7
Print accumulated stat of a hist entry if requested.
Tested-by: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/ui/hist.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/hist.h | 1 +
2 files changed, 46 insertions(+)
diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index 78f4c92e9b73..eb9c07bcdb01 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -129,6 +129,28 @@ static int hpp__entry_##_type(struct perf_hpp_fmt *_fmt __maybe_unused, \
scnprintf, true); \
}
+#define __HPP_COLOR_ACC_PERCENT_FN(_type, _field) \
+static u64 he_get_acc_##_field(struct hist_entry *he) \
+{ \
+ return he->stat_acc->_field; \
+} \
+ \
+static int hpp__color_acc_##_type(struct perf_hpp_fmt *fmt __maybe_unused, \
+ struct perf_hpp *hpp, struct hist_entry *he) \
+{ \
+ return __hpp__fmt(hpp, he, he_get_acc_##_field, " %6.2f%%", \
+ (hpp_snprint_fn)percent_color_snprintf, true); \
+}
+
+#define __HPP_ENTRY_ACC_PERCENT_FN(_type, _field) \
+static int hpp__entry_acc_##_type(struct perf_hpp_fmt *_fmt __maybe_unused, \
+ struct perf_hpp *hpp, struct hist_entry *he) \
+{ \
+ const char *fmt = symbol_conf.field_sep ? " %.2f" : " %6.2f%%"; \
+ return __hpp__fmt(hpp, he, he_get_acc_##_field, fmt, \
+ scnprintf, true); \
+}
+
#define __HPP_ENTRY_RAW_FN(_type, _field) \
static u64 he_get_raw_##_field(struct hist_entry *he) \
{ \
@@ -148,17 +170,25 @@ __HPP_WIDTH_FN(_type, _min_width, _unit_width) \
__HPP_COLOR_PERCENT_FN(_type, _field) \
__HPP_ENTRY_PERCENT_FN(_type, _field)
+#define HPP_PERCENT_ACC_FNS(_type, _str, _field, _min_width, _unit_width)\
+__HPP_HEADER_FN(_type, _str, _min_width, _unit_width) \
+__HPP_WIDTH_FN(_type, _min_width, _unit_width) \
+__HPP_COLOR_ACC_PERCENT_FN(_type, _field) \
+__HPP_ENTRY_ACC_PERCENT_FN(_type, _field)
+
#define HPP_RAW_FNS(_type, _str, _field, _min_width, _unit_width) \
__HPP_HEADER_FN(_type, _str, _min_width, _unit_width) \
__HPP_WIDTH_FN(_type, _min_width, _unit_width) \
__HPP_ENTRY_RAW_FN(_type, _field)
+__HPP_HEADER_FN(overhead_self, "Self", 8, 8)
HPP_PERCENT_FNS(overhead, "Overhead", period, 8, 8)
HPP_PERCENT_FNS(overhead_sys, "sys", period_sys, 8, 8)
HPP_PERCENT_FNS(overhead_us, "usr", period_us, 8, 8)
HPP_PERCENT_FNS(overhead_guest_sys, "guest sys", period_guest_sys, 9, 8)
HPP_PERCENT_FNS(overhead_guest_us, "guest usr", period_guest_us, 9, 8)
+HPP_PERCENT_ACC_FNS(overhead_acc, "Children", period, 8, 8)
HPP_RAW_FNS(samples, "Samples", nr_events, 12, 12)
HPP_RAW_FNS(period, "Period", period, 12, 12)
@@ -171,6 +201,14 @@ HPP_RAW_FNS(period, "Period", period, 12, 12)
.entry = hpp__entry_ ## _name \
}
+#define HPP__COLOR_ACC_PRINT_FNS(_name) \
+ { \
+ .header = hpp__header_ ## _name, \
+ .width = hpp__width_ ## _name, \
+ .color = hpp__color_acc_ ## _name, \
+ .entry = hpp__entry_acc_ ## _name \
+ }
+
#define HPP__PRINT_FNS(_name) \
{ \
.header = hpp__header_ ## _name, \
@@ -184,6 +222,7 @@ struct perf_hpp_fmt perf_hpp__format[] = {
HPP__COLOR_PRINT_FNS(overhead_us),
HPP__COLOR_PRINT_FNS(overhead_guest_sys),
HPP__COLOR_PRINT_FNS(overhead_guest_us),
+ HPP__COLOR_ACC_PRINT_FNS(overhead_acc),
HPP__PRINT_FNS(samples),
HPP__PRINT_FNS(period)
};
@@ -208,6 +247,12 @@ void perf_hpp__init(void)
{
perf_hpp__column_enable(PERF_HPP__OVERHEAD);
+ if (symbol_conf.cumulate_callchain) {
+ perf_hpp__format[PERF_HPP__OVERHEAD].header =
+ hpp__header_overhead_self;
+ perf_hpp__column_enable(PERF_HPP__OVERHEAD_ACC);
+ }
+
if (symbol_conf.show_cpu_utilization) {
perf_hpp__column_enable(PERF_HPP__OVERHEAD_SYS);
perf_hpp__column_enable(PERF_HPP__OVERHEAD_US);
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 6da98cb2ee7e..1b4fb71a0bf8 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -191,6 +191,7 @@ enum {
PERF_HPP__OVERHEAD_US,
PERF_HPP__OVERHEAD_GUEST_SYS,
PERF_HPP__OVERHEAD_GUEST_US,
+ PERF_HPP__OVERHEAD_ACC,
PERF_HPP__SAMPLES,
PERF_HPP__PERIOD,
--
1.7.11.7
Print accumulated stat of a hist entry if requested.
Tested-by: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/ui/browsers/hists.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index b720b92eba6e..c8e3a2e7b5c0 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -693,11 +693,26 @@ hist_browser__hpp_color_##_type(struct perf_hpp_fmt *fmt __maybe_unused,\
return __hpp__color_fmt(hpp, he, __hpp_get_##_field, _cb); \
}
+#define __HPP_COLOR_ACC_PERCENT_FN(_type, _field, _cb) \
+static u64 __hpp_get_acc_##_field(struct hist_entry *he) \
+{ \
+ return he->stat_acc->_field; \
+} \
+ \
+static int \
+hist_browser__hpp_color_##_type(struct perf_hpp_fmt *fmt __maybe_unused,\
+ struct perf_hpp *hpp, \
+ struct hist_entry *he) \
+{ \
+ return __hpp__color_fmt(hpp, he, __hpp_get_acc_##_field, _cb); \
+}
+
__HPP_COLOR_PERCENT_FN(overhead, period, __hpp__color_callchain)
__HPP_COLOR_PERCENT_FN(overhead_sys, period_sys, NULL)
__HPP_COLOR_PERCENT_FN(overhead_us, period_us, NULL)
__HPP_COLOR_PERCENT_FN(overhead_guest_sys, period_guest_sys, NULL)
__HPP_COLOR_PERCENT_FN(overhead_guest_us, period_guest_us, NULL)
+__HPP_COLOR_ACC_PERCENT_FN(overhead_acc, period, NULL)
#undef __HPP_COLOR_PERCENT_FN
@@ -715,6 +730,8 @@ void hist_browser__init_hpp(void)
hist_browser__hpp_color_overhead_guest_sys;
perf_hpp__format[PERF_HPP__OVERHEAD_GUEST_US].color =
hist_browser__hpp_color_overhead_guest_us;
+ perf_hpp__format[PERF_HPP__OVERHEAD_ACC].color =
+ hist_browser__hpp_color_overhead_acc;
}
static int hist_browser__show_entry(struct hist_browser *browser,
--
1.7.11.7
Print accumulated stat of a hist entry if requested.
Tested-by: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/ui/gtk/hists.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index 5b95c44f3435..49531c1c7121 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -98,11 +98,25 @@ static int perf_gtk__hpp_color_##_type(struct perf_hpp_fmt *fmt __maybe_unused,
return __hpp__color_fmt(hpp, he, he_get_##_field); \
}
+#define __HPP_COLOR_ACC_PERCENT_FN(_type, _field) \
+static u64 he_get_acc_##_field(struct hist_entry *he) \
+{ \
+ return he->stat_acc->_field; \
+} \
+ \
+static int perf_gtk__hpp_color_##_type(struct perf_hpp_fmt *fmt __maybe_unused, \
+ struct perf_hpp *hpp, \
+ struct hist_entry *he) \
+{ \
+ return __hpp__color_fmt(hpp, he, he_get_acc_##_field); \
+}
+
__HPP_COLOR_PERCENT_FN(overhead, period)
__HPP_COLOR_PERCENT_FN(overhead_sys, period_sys)
__HPP_COLOR_PERCENT_FN(overhead_us, period_us)
__HPP_COLOR_PERCENT_FN(overhead_guest_sys, period_guest_sys)
__HPP_COLOR_PERCENT_FN(overhead_guest_us, period_guest_us)
+__HPP_COLOR_ACC_PERCENT_FN(overhead_acc, period)
#undef __HPP_COLOR_PERCENT_FN
@@ -121,6 +135,8 @@ void perf_gtk__init_hpp(void)
perf_gtk__hpp_color_overhead_guest_sys;
perf_hpp__format[PERF_HPP__OVERHEAD_GUEST_US].color =
perf_gtk__hpp_color_overhead_guest_us;
+ perf_hpp__format[PERF_HPP__OVERHEAD_ACC].color =
+ perf_gtk__hpp_color_overhead_acc;
}
static void callchain_list__sym_name(struct callchain_list *cl,
--
1.7.11.7
Reuse hist_entry_iter__add() function to share the similar code with
perf report. Note that it needs to be called with hists.lock so tweak
some internal functions not to deadlock or hold the lock too long.
Tested-by: Arun Sharma <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-top.c | 78 ++++++++++++++++++++++++++----------------------
1 file changed, 43 insertions(+), 35 deletions(-)
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 40e430530168..e7d67421eb0f 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -194,6 +194,12 @@ static void perf_top__record_precise_ip(struct perf_top *top,
pthread_mutex_unlock(¬es->lock);
+ /*
+ * This function is now called with he->hists->lock held.
+ * Release it before going to sleep.
+ */
+ pthread_mutex_unlock(&he->hists->lock);
+
if (err == -ERANGE && !he->ms.map->erange_warned)
ui__warn_map_erange(he->ms.map, sym, ip);
else if (err == -ENOMEM) {
@@ -201,6 +207,8 @@ static void perf_top__record_precise_ip(struct perf_top *top,
sym->name);
sleep(1);
}
+
+ pthread_mutex_lock(&he->hists->lock);
}
static void perf_top__show_details(struct perf_top *top)
@@ -236,24 +244,6 @@ out_unlock:
pthread_mutex_unlock(¬es->lock);
}
-static struct hist_entry *perf_evsel__add_hist_entry(struct perf_evsel *evsel,
- struct addr_location *al,
- struct perf_sample *sample)
-{
- struct hist_entry *he;
-
- pthread_mutex_lock(&evsel->hists.lock);
- he = __hists__add_entry(&evsel->hists, al, NULL, NULL, NULL,
- sample->period, sample->weight,
- sample->transaction, true);
- pthread_mutex_unlock(&evsel->hists.lock);
- if (he == NULL)
- return NULL;
-
- hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
- return he;
-}
-
static void perf_top__print_sym_table(struct perf_top *top)
{
char bf[160];
@@ -657,6 +647,28 @@ static int symbol_filter(struct map *map __maybe_unused, struct symbol *sym)
return 0;
}
+static int hist_iter_cb(struct hist_entry_iter *iter, struct addr_location *al,
+ bool single, void *arg)
+{
+ struct perf_top *top = arg;
+ struct hist_entry *he = iter->he;
+ struct perf_evsel *evsel = iter->evsel;
+
+ if (single)
+ hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+
+ if (sort__has_sym) {
+ u64 ip = al->addr;
+
+ if (al->map)
+ ip = al->map->unmap_ip(al->map, ip);
+
+ perf_top__record_precise_ip(top, he, evsel->idx, ip);
+ }
+
+ return 0;
+}
+
static void perf_event__process_sample(struct perf_tool *tool,
const union perf_event *event,
struct perf_evsel *evsel,
@@ -664,8 +676,6 @@ static void perf_event__process_sample(struct perf_tool *tool,
struct machine *machine)
{
struct perf_top *top = container_of(tool, struct perf_top, tool);
- struct symbol *parent = NULL;
- u64 ip = sample->ip;
struct addr_location al;
int err;
@@ -741,25 +751,23 @@ static void perf_event__process_sample(struct perf_tool *tool,
}
if (al.sym == NULL || !al.sym->ignore) {
- struct hist_entry *he;
+ struct hist_entry_iter iter = {
+ .add_entry_cb = hist_iter_cb,
+ };
- err = sample__resolve_callchain(sample, &parent, evsel, &al,
- top->max_stack);
- if (err)
- return;
+ if (symbol_conf.cumulate_callchain)
+ iter.ops = &hist_iter_cumulative;
+ else
+ iter.ops = &hist_iter_normal;
- he = perf_evsel__add_hist_entry(evsel, &al, sample);
- if (he == NULL) {
- pr_err("Problem incrementing symbol period, skipping event\n");
- return;
- }
+ pthread_mutex_lock(&evsel->hists.lock);
- err = hist_entry__append_callchain(he, sample);
- if (err)
- return;
+ err = hist_entry_iter__add(&iter, &al, evsel, sample,
+ top->max_stack, top);
+ if (err < 0)
+ pr_err("Problem incrementing symbol period, skipping event\n");
- if (sort__has_sym)
- perf_top__record_precise_ip(top, he, evsel->idx, ip);
+ pthread_mutex_unlock(&evsel->hists.lock);
}
return;
--
1.7.11.7
Now perf top and perf report will show children column by default if
it has callchain information.
Requested-by: Ingo Molnar <[email protected]>
Tested-by: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/symbol.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index a9d758a3b371..abebc2591960 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -29,11 +29,12 @@ int vmlinux_path__nr_entries;
char **vmlinux_path;
struct symbol_conf symbol_conf = {
- .use_modules = true,
- .try_vmlinux_path = true,
- .annotate_src = true,
- .demangle = true,
- .symfs = "",
+ .use_modules = true,
+ .try_vmlinux_path = true,
+ .annotate_src = true,
+ .demangle = true,
+ .cumulate_callchain = true,
+ .symfs = "",
};
static enum dso_binary_type binary_type_symtab[] = {
--
1.7.11.7
When accumulating callchain entry, also save current snapshot of the
chain so that it can show the rest of the chain.
Tested-by: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/hist.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 7529d216986a..af99ab33bcd4 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -745,6 +745,14 @@ iter_add_single_cumulative_entry(struct hist_entry_iter *iter,
he_cache[iter->curr++] = he;
+ callchain_append(he->callchain, &callchain_cursor, sample->period);
+
+ /*
+ * We need to re-initialize the cursor since callchain_append()
+ * advanced the cursor to the end.
+ */
+ callchain_cursor_commit(&callchain_cursor);
+
return hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
}
@@ -758,8 +766,6 @@ iter_next_cumulative_entry(struct hist_entry_iter *iter,
if (node == NULL)
return 0;
- callchain_cursor_advance(&callchain_cursor);
-
return fill_callchain_info(al, node, iter->hide_unresolved);
}
@@ -783,6 +789,11 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
.parent = iter->parent,
};
int i;
+ struct callchain_cursor cursor;
+
+ callchain_cursor_snapshot(&cursor, &callchain_cursor);
+
+ callchain_cursor_advance(&callchain_cursor);
/*
* Check if there's duplicate entries in the callchain.
@@ -801,6 +812,8 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
he_cache[iter->curr++] = he;
+ callchain_append(he->callchain, &cursor, sample->period);
+
return hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
}
--
1.7.11.7
The new ->add_entry_cb() will be called after an entry was added to
the histogram. It's used for code sharing between perf report and
perf top. Note that ops->add_*_entry() should set iter->he properly
in order to call the ->add_entry_cb.
Also pass @arg to the callback function. It'll be used by perf top
later.
Tested-by: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-report.c | 31 ++++++++++++++++++++++++++++---
tools/perf/util/hist.c | 44 ++++++++++++++++++++++++--------------------
tools/perf/util/hist.h | 5 ++++-
3 files changed, 56 insertions(+), 24 deletions(-)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 48d2981877b1..066354fc00de 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -79,6 +79,27 @@ static int report__config(const char *var, const char *value, void *cb)
return perf_default_config(var, value, cb);
}
+static int hist_iter_cb(struct hist_entry_iter *iter,
+ struct addr_location *al, bool single,
+ void *arg __maybe_unused)
+{
+ int err;
+ struct hist_entry *he = iter->he;
+ struct perf_evsel *evsel = iter->evsel;
+ struct perf_sample *sample = iter->sample;
+
+ err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+
+ if (!single)
+ goto out;
+
+ evsel->hists.stats.total_period += sample->period;
+ hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+
+out:
+ return err;
+}
+
static int process_sample_event(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
@@ -108,15 +129,19 @@ static int process_sample_event(struct perf_tool *tool,
iter.ops = &hist_iter_branch;
else if (rep->mem_mode)
iter.ops = &hist_iter_mem;
- else if (symbol_conf.cumulate_callchain)
+ else if (symbol_conf.cumulate_callchain) {
iter.ops = &hist_iter_cumulative;
- else
+ iter.add_entry_cb = hist_iter_cb;
+ } else {
iter.ops = &hist_iter_normal;
+ iter.add_entry_cb = hist_iter_cb;
+ }
if (al.map != NULL)
al.map->dso->hit = 1;
- ret = hist_entry_iter__add(&iter, &al, evsel, sample, rep->max_stack);
+ ret = hist_entry_iter__add(&iter, &al, evsel, sample, rep->max_stack,
+ NULL);
if (ret < 0)
pr_debug("problem adding hist entry, skipping event\n");
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index d7912a5c14dc..688700bab4c4 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -684,11 +684,10 @@ iter_add_single_normal_entry(struct hist_entry_iter *iter, struct addr_location
}
static int
-iter_finish_normal_entry(struct hist_entry_iter *iter, struct addr_location *al)
+iter_finish_normal_entry(struct hist_entry_iter *iter,
+ struct addr_location *al __maybe_unused)
{
- int err;
struct hist_entry *he = iter->he;
- struct perf_evsel *evsel = iter->evsel;
struct perf_sample *sample = iter->sample;
if (he == NULL)
@@ -696,13 +695,6 @@ iter_finish_normal_entry(struct hist_entry_iter *iter, struct addr_location *al)
iter->he = NULL;
- err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
- if (err)
- return err;
-
- evsel->hists.stats.total_period += sample->period;
- hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
-
return hist_entry__append_callchain(he, sample);
}
static int
@@ -743,6 +735,7 @@ iter_add_single_cumulative_entry(struct hist_entry_iter *iter,
if (he == NULL)
return -ENOMEM;
+ iter->he = he;
he_cache[iter->curr++] = he;
callchain_append(he->callchain, &callchain_cursor, sample->period);
@@ -753,7 +746,7 @@ iter_add_single_cumulative_entry(struct hist_entry_iter *iter,
*/
callchain_cursor_commit(&callchain_cursor);
- return hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+ return 0;
}
static int
@@ -800,8 +793,11 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
* It's possible that it has cycles or recursive calls.
*/
for (i = 0; i < iter->curr; i++) {
- if (hist_entry__cmp(he_cache[i], &he_tmp) == 0)
+ if (hist_entry__cmp(he_cache[i], &he_tmp) == 0) {
+ /* to avoid calling callback function */
+ iter->he = NULL;
return 0;
+ }
}
he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
@@ -810,24 +806,20 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
if (he == NULL)
return -ENOMEM;
+ iter->he = he;
he_cache[iter->curr++] = he;
callchain_append(he->callchain, &cursor, sample->period);
- return hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+ return 0;
}
static int
iter_finish_cumulative_entry(struct hist_entry_iter *iter,
struct addr_location *al __maybe_unused)
{
- struct perf_evsel *evsel = iter->evsel;
- struct perf_sample *sample = iter->sample;
-
- evsel->hists.stats.total_period += sample->period;
- hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
-
zfree(&iter->priv);
+ iter->he = NULL;
return 0;
}
@@ -865,7 +857,7 @@ const struct hist_iter_ops hist_iter_cumulative = {
int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
struct perf_evsel *evsel, struct perf_sample *sample,
- int max_stack_depth)
+ int max_stack_depth, void *arg)
{
int err, err2;
@@ -885,10 +877,22 @@ int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
if (err)
goto out;
+ if (iter->he && iter->add_entry_cb) {
+ err = iter->add_entry_cb(iter, al, true, arg);
+ if (err)
+ goto out;
+ }
+
while (iter->ops->next_entry(iter, al)) {
err = iter->ops->add_next_entry(iter, al);
if (err)
break;
+
+ if (iter->he && iter->add_entry_cb) {
+ err = iter->add_entry_cb(iter, al, false, arg);
+ if (err)
+ goto out;
+ }
}
out:
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index a02342141db9..28a88ea6e5ce 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -107,6 +107,9 @@ struct hist_entry_iter {
void *priv;
const struct hist_iter_ops *ops;
+ /* user-defined callback function (optional) */
+ int (*add_entry_cb)(struct hist_entry_iter *iter,
+ struct addr_location *al, bool single, void *arg);
};
extern const struct hist_iter_ops hist_iter_normal;
@@ -123,7 +126,7 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
bool sample_self);
int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
struct perf_evsel *evsel, struct perf_sample *sample,
- int max_stack_depth);
+ int max_stack_depth, void *arg);
int64_t hist_entry__cmp(struct hist_entry *left, struct hist_entry *right);
int64_t hist_entry__collapse(struct hist_entry *left, struct hist_entry *right);
--
1.7.11.7
When callchain accumulation is requested, we need to sort the entries
by accumulated period value. When accumulated periods of two entries
are same (i.e. single path callchain) put the caller above since
accumulation tends to put callers on higher position for obvious
reason.
Tested-by: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/hist.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index af99ab33bcd4..d7912a5c14dc 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -1063,6 +1063,18 @@ static int hist_entry__sort_on_period(struct hist_entry *a,
struct hist_entry *pair;
u64 *periods_a, *periods_b;
+ if (symbol_conf.cumulate_callchain) {
+ /*
+ * Put caller above callee when they have equal period.
+ */
+ if (a->stat_acc->period != b->stat_acc->period)
+ return a->stat_acc->period > b->stat_acc->period ? 1 : -1;
+
+ if (a->callchain->max_depth != b->callchain->max_depth)
+ return a->callchain->max_depth < b->callchain->max_depth ?
+ 1 : -1;
+ }
+
ret = period_cmp(a->stat.period, b->stat.period);
if (ret || !symbol_conf.event_group)
return ret;
--
1.7.11.7
The --children option is for showing accumulated overhead (period)
value as well as self overhead.
Tested-by: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/Documentation/perf-report.txt | 5 +++++
tools/perf/builtin-report.c | 10 ++++++++++
2 files changed, 15 insertions(+)
diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 8eab8a4bdeb8..710fc8b18f6b 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -141,6 +141,11 @@ OPTIONS
Default: fractal,0.5,callee,function.
+--children::
+ Accumulate callchain of children to parent entry so that then can
+ show up in the output. The output will have a new "Children" column
+ and will be sorted on the data. It requires callchains are recorded.
+
--max-stack::
Set the stack depth limit when parsing the callchain, anything
beyond the specified depth will be ignored. This is a trade-off
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 967ba75277fd..fba27e2e1491 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -172,6 +172,14 @@ static int report__setup_sample_type(struct report *rep)
}
}
+ if (symbol_conf.cumulate_callchain) {
+ /* Silently ignore if callchain is missing */
+ if (!(sample_type & PERF_SAMPLE_CALLCHAIN)) {
+ symbol_conf.cumulate_callchain = false;
+ perf_hpp__cancel_cumulate();
+ }
+ }
+
if (sort__mode == SORT_MODE__BRANCH) {
if (!is_pipe &&
!(sample_type & PERF_SAMPLE_BRANCH_STACK)) {
@@ -636,6 +644,8 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
OPT_CALLBACK_DEFAULT('g', "call-graph", &report, "output_type,min_percent[,print_limit],call_order",
"Display callchains using output_type (graph, flat, fractal, or none) , min percent threshold, optional print limit, callchain order, key (function or address). "
"Default: fractal,0.5,callee,function", &parse_callchain_opt, callchain_default_opt),
+ OPT_BOOLEAN(0, "children", &symbol_conf.cumulate_callchain,
+ "Accumulate callchains of children and show total overhead as well"),
OPT_INTEGER(0, "max-stack", &report.max_stack,
"Set the maximum stack depth when parsing the callchain, "
"anything beyond the specified depth will be ignored. "
--
1.7.11.7
Add report.children config option for setting default value of
callchain accumulation. It affects the report output only if
perf.data contains callchain info.
A user can write .perfconfig file like below to enable accumulation
by default:
$ cat ~/.perfconfig
[report]
children = true
And it can be disabled through command line:
$ perf report --no-children
Tested-by: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-report.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index fba27e2e1491..48d2981877b1 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -71,6 +71,10 @@ static int report__config(const char *var, const char *value, void *cb)
rep->min_percent = strtof(value, NULL);
return 0;
}
+ if (!strcmp(var, "report.children")) {
+ symbol_conf.cumulate_callchain = perf_config_bool(var, value);
+ return 0;
+ }
return perf_default_config(var, value, cb);
}
--
1.7.11.7
There're some duplicate code when adding hist entries. They are
different in that some have branch info or mem info but generally do
same thing. So introduce new struct hist_entry_iter and add callbacks
to customize each case in general way.
The new perf_evsel__add_entry() function will look like:
iter->prepare_entry();
iter->add_single_entry();
while (iter->next_entry())
iter->add_next_entry();
iter->finish_entry();
This will help further work like the cumulative callchain patchset.
Tested-by: Arun Sharma <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-report.c | 165 +++---------------------
tools/perf/util/hist.c | 298 ++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/hist.h | 33 +++++
3 files changed, 348 insertions(+), 148 deletions(-)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index d882b6f96411..53b150fb9036 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -75,138 +75,6 @@ static int report__config(const char *var, const char *value, void *cb)
return perf_default_config(var, value, cb);
}
-static int report__add_mem_hist_entry(struct report *rep, struct addr_location *al,
- struct perf_sample *sample, struct perf_evsel *evsel)
-{
- struct symbol *parent = NULL;
- struct hist_entry *he;
- struct mem_info *mi, *mx;
- uint64_t cost;
- int err = sample__resolve_callchain(sample, &parent, evsel, al, rep->max_stack);
-
- if (err)
- return err;
-
- mi = sample__resolve_mem(sample, al);
- if (!mi)
- return -ENOMEM;
-
- if (rep->hide_unresolved && !al->sym)
- return 0;
-
- cost = sample->weight;
- if (!cost)
- cost = 1;
-
- /*
- * must pass period=weight in order to get the correct
- * sorting from hists__collapse_resort() which is solely
- * based on periods. We want sorting be done on nr_events * weight
- * and this is indirectly achieved by passing period=weight here
- * and the he_stat__add_period() function.
- */
- he = __hists__add_entry(&evsel->hists, al, parent, NULL, mi,
- cost, cost, 0);
- if (!he)
- return -ENOMEM;
-
- err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
- if (err)
- goto out;
-
- mx = he->mem_info;
- err = addr_map_symbol__inc_samples(&mx->daddr, evsel->idx);
- if (err)
- goto out;
-
- evsel->hists.stats.total_period += cost;
- hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
- err = hist_entry__append_callchain(he, sample);
-out:
- return err;
-}
-
-static int report__add_branch_hist_entry(struct report *rep, struct addr_location *al,
- struct perf_sample *sample, struct perf_evsel *evsel)
-{
- struct symbol *parent = NULL;
- unsigned i;
- struct hist_entry *he;
- struct branch_info *bi, *bx;
- int err = sample__resolve_callchain(sample, &parent, evsel, al, rep->max_stack);
-
- if (err)
- return err;
-
- bi = sample__resolve_bstack(sample, al);
- if (!bi)
- return -ENOMEM;
-
- for (i = 0; i < sample->branch_stack->nr; i++) {
- if (rep->hide_unresolved && !(bi[i].from.sym && bi[i].to.sym))
- continue;
-
- err = -ENOMEM;
-
- /* overwrite the 'al' to branch-to info */
- al->map = bi[i].to.map;
- al->sym = bi[i].to.sym;
- al->addr = bi[i].to.addr;
- /*
- * The report shows the percentage of total branches captured
- * and not events sampled. Thus we use a pseudo period of 1.
- */
- he = __hists__add_entry(&evsel->hists, al, parent, &bi[i], NULL,
- 1, 1, 0);
- if (he) {
- bx = he->branch_info;
- err = addr_map_symbol__inc_samples(&bx->from, evsel->idx);
- if (err)
- goto out;
-
- err = addr_map_symbol__inc_samples(&bx->to, evsel->idx);
- if (err)
- goto out;
-
- evsel->hists.stats.total_period += 1;
- hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
- } else
- goto out;
- }
- err = 0;
-out:
- free(bi);
- return err;
-}
-
-static int report__add_hist_entry(struct report *rep, struct perf_evsel *evsel,
- struct addr_location *al, struct perf_sample *sample)
-{
- struct symbol *parent = NULL;
- struct hist_entry *he;
- int err = sample__resolve_callchain(sample, &parent, evsel, al, rep->max_stack);
-
- if (err)
- return err;
-
- he = __hists__add_entry(&evsel->hists, al, parent, NULL, NULL,
- sample->period, sample->weight,
- sample->transaction);
- if (he == NULL)
- return -ENOMEM;
-
- err = hist_entry__append_callchain(he, sample);
- if (err)
- goto out;
-
- err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
- evsel->hists.stats.total_period += sample->period;
- hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
-out:
- return err;
-}
-
-
static int process_sample_event(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
@@ -215,6 +83,9 @@ static int process_sample_event(struct perf_tool *tool,
{
struct report *rep = container_of(tool, struct report, tool);
struct addr_location al;
+ struct hist_entry_iter iter = {
+ .hide_unresolved = rep->hide_unresolved,
+ };
int ret;
if (perf_event__preprocess_sample(event, machine, &al, sample) < 0) {
@@ -229,22 +100,20 @@ static int process_sample_event(struct perf_tool *tool,
if (rep->cpu_list && !test_bit(sample->cpu, rep->cpu_bitmap))
return 0;
- if (sort__mode == SORT_MODE__BRANCH) {
- ret = report__add_branch_hist_entry(rep, &al, sample, evsel);
- if (ret < 0)
- pr_debug("problem adding lbr entry, skipping event\n");
- } else if (rep->mem_mode == 1) {
- ret = report__add_mem_hist_entry(rep, &al, sample, evsel);
- if (ret < 0)
- pr_debug("problem adding mem entry, skipping event\n");
- } else {
- if (al.map != NULL)
- al.map->dso->hit = 1;
-
- ret = report__add_hist_entry(rep, evsel, &al, sample);
- if (ret < 0)
- pr_debug("problem incrementing symbol period, skipping event\n");
- }
+ if (sort__mode == SORT_MODE__BRANCH)
+ iter.ops = &hist_iter_branch;
+ else if (rep->mem_mode)
+ iter.ops = &hist_iter_mem;
+ else
+ iter.ops = &hist_iter_normal;
+
+ if (al.map != NULL)
+ al.map->dso->hit = 1;
+
+ ret = hist_entry_iter__add(&iter, &al, evsel, sample, rep->max_stack);
+ if (ret < 0)
+ pr_debug("problem adding hist entry, skipping event\n");
+
return ret;
}
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 0466efa71140..e7a6d9091455 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -4,6 +4,7 @@
#include "session.h"
#include "sort.h"
#include "evsel.h"
+#include "annotate.h"
#include <math.h>
static bool hists__filter_entry_by_dso(struct hists *hists,
@@ -439,6 +440,303 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
return add_hist_entry(hists, &entry, al);
}
+static int
+iter_next_nop_entry(struct hist_entry_iter *iter __maybe_unused,
+ struct addr_location *al __maybe_unused)
+{
+ return 0;
+}
+
+static int
+iter_add_next_nop_entry(struct hist_entry_iter *iter __maybe_unused,
+ struct addr_location *al __maybe_unused)
+{
+ return 0;
+}
+
+static int
+iter_prepare_mem_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ struct perf_sample *sample = iter->sample;
+ struct mem_info *mi;
+
+ mi = sample__resolve_mem(sample, al);
+ if (mi == NULL)
+ return -ENOMEM;
+
+ iter->priv = mi;
+ return 0;
+}
+
+static int
+iter_add_single_mem_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ u64 cost;
+ struct mem_info *mi = iter->priv;
+ struct hist_entry *he;
+
+ if (mi == NULL)
+ return -EINVAL;
+
+ cost = iter->sample->weight;
+ if (!cost)
+ cost = 1;
+
+ /*
+ * must pass period=weight in order to get the correct
+ * sorting from hists__collapse_resort() which is solely
+ * based on periods. We want sorting be done on nr_events * weight
+ * and this is indirectly achieved by passing period=weight here
+ * and the he_stat__add_period() function.
+ */
+ he = __hists__add_entry(&iter->evsel->hists, al, iter->parent, NULL, mi,
+ cost, cost, 0);
+ if (!he)
+ return -ENOMEM;
+
+ iter->he = he;
+ return 0;
+}
+
+static int
+iter_finish_mem_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ struct perf_evsel *evsel = iter->evsel;
+ struct hist_entry *he = iter->he;
+ struct mem_info *mx;
+ int err = -EINVAL;
+ u64 cost;
+
+ if (he == NULL)
+ goto out;
+
+ err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+ if (err)
+ goto out;
+
+ mx = he->mem_info;
+ err = addr_map_symbol__inc_samples(&mx->daddr, evsel->idx);
+ if (err)
+ goto out;
+
+ cost = iter->sample->weight;
+ if (!cost)
+ cost = 1;
+
+ evsel->hists.stats.total_period += cost;
+ hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+
+ err = hist_entry__append_callchain(he, iter->sample);
+
+out:
+ /*
+ * We don't need to free iter->priv (mem_info) here since
+ * the mem info was either already freed in add_hist_entry() or
+ * passed to a new hist entry by hist_entry__new().
+ */
+ iter->priv = NULL;
+
+ iter->he = NULL;
+ return err;
+}
+
+static int
+iter_prepare_branch_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ struct branch_info *bi;
+ struct perf_sample *sample = iter->sample;
+
+ bi = sample__resolve_bstack(sample, al);
+ if (!bi)
+ return -ENOMEM;
+
+ iter->curr = 0;
+ iter->total = sample->branch_stack->nr;
+
+ iter->priv = bi;
+ return 0;
+}
+
+static int
+iter_add_single_branch_entry(struct hist_entry_iter *iter __maybe_unused,
+ struct addr_location *al __maybe_unused)
+{
+ return 0;
+}
+
+static int
+iter_next_branch_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ struct branch_info *bi = iter->priv;
+ int i = iter->curr;
+
+ if (bi == NULL)
+ return 0;
+
+ if (iter->curr >= iter->total)
+ return 0;
+
+ al->map = bi[i].to.map;
+ al->sym = bi[i].to.sym;
+ al->addr = bi[i].to.addr;
+ return 1;
+}
+
+static int
+iter_add_next_branch_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ struct branch_info *bi, *bx;
+ struct perf_evsel *evsel = iter->evsel;
+ struct hist_entry *he;
+ int i = iter->curr;
+ int err = 0;
+
+ bi = iter->priv;
+
+ if (iter->hide_unresolved && !(bi[i].from.sym && bi[i].to.sym))
+ goto out;
+
+ /*
+ * The report shows the percentage of total branches captured
+ * and not events sampled. Thus we use a pseudo period of 1.
+ */
+ he = __hists__add_entry(&evsel->hists, al, iter->parent, &bi[i], NULL,
+ 1, 1, 0);
+ if (he == NULL)
+ return -ENOMEM;
+
+ bx = he->branch_info;
+ err = addr_map_symbol__inc_samples(&bx->from, evsel->idx);
+ if (err)
+ goto out;
+
+ err = addr_map_symbol__inc_samples(&bx->to, evsel->idx);
+ if (err)
+ goto out;
+
+ evsel->hists.stats.total_period += 1;
+ hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+
+out:
+ iter->curr++;
+ return err;
+}
+
+static int
+iter_finish_branch_entry(struct hist_entry_iter *iter,
+ struct addr_location *al __maybe_unused)
+{
+ zfree(&iter->priv);
+
+ return iter->curr >= iter->total ? 0 : -1;
+}
+
+static int
+iter_prepare_normal_entry(struct hist_entry_iter *iter __maybe_unused,
+ struct addr_location *al __maybe_unused)
+{
+ return 0;
+}
+
+static int
+iter_add_single_normal_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ struct perf_evsel *evsel = iter->evsel;
+ struct perf_sample *sample = iter->sample;
+ struct hist_entry *he;
+
+ he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
+ sample->period, sample->weight,
+ sample->transaction);
+ if (he == NULL)
+ return -ENOMEM;
+
+ iter->he = he;
+ return 0;
+}
+
+static int
+iter_finish_normal_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ int err;
+ struct hist_entry *he = iter->he;
+ struct perf_evsel *evsel = iter->evsel;
+ struct perf_sample *sample = iter->sample;
+
+ if (he == NULL)
+ return 0;
+
+ iter->he = NULL;
+
+ err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+ if (err)
+ return err;
+
+ evsel->hists.stats.total_period += sample->period;
+ hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+
+ return hist_entry__append_callchain(he, sample);
+}
+const struct hist_iter_ops hist_iter_mem = {
+ .prepare_entry = iter_prepare_mem_entry,
+ .add_single_entry = iter_add_single_mem_entry,
+ .next_entry = iter_next_nop_entry,
+ .add_next_entry = iter_add_next_nop_entry,
+ .finish_entry = iter_finish_mem_entry,
+};
+
+const struct hist_iter_ops hist_iter_branch = {
+ .prepare_entry = iter_prepare_branch_entry,
+ .add_single_entry = iter_add_single_branch_entry,
+ .next_entry = iter_next_branch_entry,
+ .add_next_entry = iter_add_next_branch_entry,
+ .finish_entry = iter_finish_branch_entry,
+};
+
+const struct hist_iter_ops hist_iter_normal = {
+ .prepare_entry = iter_prepare_normal_entry,
+ .add_single_entry = iter_add_single_normal_entry,
+ .next_entry = iter_next_nop_entry,
+ .add_next_entry = iter_add_next_nop_entry,
+ .finish_entry = iter_finish_normal_entry,
+};
+
+int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
+ struct perf_evsel *evsel, struct perf_sample *sample,
+ int max_stack_depth)
+{
+ int err, err2;
+
+ err = sample__resolve_callchain(sample, &iter->parent, evsel, al,
+ max_stack_depth);
+ if (err)
+ return err;
+
+ iter->evsel = evsel;
+ iter->sample = sample;
+
+ err = iter->ops->prepare_entry(iter, al);
+ if (err)
+ goto out;
+
+ err = iter->ops->add_single_entry(iter, al);
+ if (err)
+ goto out;
+
+ while (iter->ops->next_entry(iter, al)) {
+ err = iter->ops->add_next_entry(iter, al);
+ if (err)
+ break;
+ }
+
+out:
+ err2 = iter->ops->finish_entry(iter, al);
+ if (!err)
+ err = err2;
+
+ return err;
+}
+
int64_t
hist_entry__cmp(struct hist_entry *left, struct hist_entry *right)
{
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index a59743fa3ef7..49fe04bf10ee 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -84,12 +84,45 @@ struct hists {
u16 col_len[HISTC_NR_COLS];
};
+struct hist_entry_iter;
+
+struct hist_iter_ops {
+ int (*prepare_entry)(struct hist_entry_iter *, struct addr_location *);
+ int (*add_single_entry)(struct hist_entry_iter *, struct addr_location *);
+ int (*next_entry)(struct hist_entry_iter *, struct addr_location *);
+ int (*add_next_entry)(struct hist_entry_iter *, struct addr_location *);
+ int (*finish_entry)(struct hist_entry_iter *, struct addr_location *);
+};
+
+struct hist_entry_iter {
+ int total;
+ int curr;
+
+ bool hide_unresolved;
+
+ struct perf_evsel *evsel;
+ struct perf_sample *sample;
+ struct hist_entry *he;
+ struct symbol *parent;
+ void *priv;
+
+ const struct hist_iter_ops *ops;
+};
+
+extern const struct hist_iter_ops hist_iter_normal;
+extern const struct hist_iter_ops hist_iter_branch;
+extern const struct hist_iter_ops hist_iter_mem;
+
struct hist_entry *__hists__add_entry(struct hists *hists,
struct addr_location *al,
struct symbol *parent,
struct branch_info *bi,
struct mem_info *mi, u64 period,
u64 weight, u64 transaction);
+int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
+ struct perf_evsel *evsel, struct perf_sample *sample,
+ int max_stack_depth);
+
int64_t hist_entry__cmp(struct hist_entry *left, struct hist_entry *right);
int64_t hist_entry__collapse(struct hist_entry *left, struct hist_entry *right);
int hist_entry__transaction_len(void);
--
1.7.11.7
Maintain accumulated stat information in hist_entry->stat_acc if
symbol_conf.cumulate_callchain is set. Fields in ->stat_acc have same
vaules initially, and will be updated as callchain is processed later.
Tested-by: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/hist.c | 28 ++++++++++++++++++++++++++--
tools/perf/util/sort.h | 1 +
tools/perf/util/symbol.h | 1 +
3 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index e7a6d9091455..4119a9dd665f 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -238,6 +238,8 @@ static bool hists__decay_entry(struct hists *hists, struct hist_entry *he)
return true;
he_stat__decay(&he->stat);
+ if (symbol_conf.cumulate_callchain)
+ he_stat__decay(he->stat_acc);
if (!he->filtered)
hists->stats.total_period -= prev_period - he->stat.period;
@@ -279,12 +281,26 @@ void hists__decay_entries(struct hists *hists, bool zap_user, bool zap_kernel)
static struct hist_entry *hist_entry__new(struct hist_entry *template)
{
- size_t callchain_size = symbol_conf.use_callchain ? sizeof(struct callchain_root) : 0;
- struct hist_entry *he = zalloc(sizeof(*he) + callchain_size);
+ size_t callchain_size = 0;
+ struct hist_entry *he;
+
+ if (symbol_conf.use_callchain || symbol_conf.cumulate_callchain)
+ callchain_size = sizeof(struct callchain_root);
+
+ he = zalloc(sizeof(*he) + callchain_size);
if (he != NULL) {
*he = *template;
+ if (symbol_conf.cumulate_callchain) {
+ he->stat_acc = malloc(sizeof(he->stat));
+ if (he->stat_acc == NULL) {
+ free(he);
+ return NULL;
+ }
+ memcpy(he->stat_acc, &he->stat, sizeof(he->stat));
+ }
+
if (he->ms.map)
he->ms.map->referenced = true;
@@ -296,6 +312,7 @@ static struct hist_entry *hist_entry__new(struct hist_entry *template)
*/
he->branch_info = malloc(sizeof(*he->branch_info));
if (he->branch_info == NULL) {
+ free(he->stat_acc);
free(he);
return NULL;
}
@@ -368,6 +385,8 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
if (!cmp) {
he_stat__add_period(&he->stat, period, weight);
+ if (symbol_conf.cumulate_callchain)
+ he_stat__add_period(he->stat_acc, period, weight);
/*
* This mem info was allocated from sample__resolve_mem
@@ -404,6 +423,8 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
rb_insert_color(&he->rb_node_in, hists->entries_in);
out:
he_stat__add_cpumode_period(&he->stat, al->cpumode, period);
+ if (symbol_conf.cumulate_callchain)
+ he_stat__add_cpumode_period(he->stat_acc, al->cpumode, period);
return he;
}
@@ -775,6 +796,7 @@ void hist_entry__free(struct hist_entry *he)
{
zfree(&he->branch_info);
zfree(&he->mem_info);
+ zfree(&he->stat_acc);
free_srcline(he->srcline);
free(he);
}
@@ -800,6 +822,8 @@ static bool hists__collapse_insert_entry(struct hists *hists __maybe_unused,
if (!cmp) {
he_stat__add_stat(&iter->stat, &he->stat);
+ if (symbol_conf.cumulate_callchain)
+ he_stat__add_stat(iter->stat_acc, he->stat_acc);
if (symbol_conf.use_callchain) {
callchain_cursor_reset(&callchain_cursor);
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 43e5ff42a609..309f2838a1b4 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -81,6 +81,7 @@ struct hist_entry {
struct list_head head;
} pairs;
struct he_stat stat;
+ struct he_stat *stat_acc;
struct map_symbol ms;
struct thread *thread;
struct comm *comm;
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index fffe2888a1c7..410d01da1546 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -97,6 +97,7 @@ struct symbol_conf {
show_nr_samples,
show_total_period,
use_callchain,
+ cumulate_callchain,
exclude_other,
show_cpu_utilization,
initialized,
--
1.7.11.7
To support callchain accumulation, @entry should be recognized if it's
accumulated or not when add_hist_entry() called. The period of an
accumulated entry should be added to ->stat_acc but not ->stat. Add
@sample_self arg for that.
Tested-by: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-annotate.c | 3 ++-
tools/perf/builtin-diff.c | 2 +-
tools/perf/builtin-top.c | 2 +-
tools/perf/tests/hists_link.c | 4 ++--
tools/perf/util/hist.c | 29 ++++++++++++++++++-----------
tools/perf/util/hist.h | 3 ++-
6 files changed, 26 insertions(+), 17 deletions(-)
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 0da603b79b61..70b2d52c3b2e 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -65,7 +65,8 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
return 0;
}
- he = __hists__add_entry(&evsel->hists, al, NULL, NULL, NULL, 1, 1, 0);
+ he = __hists__add_entry(&evsel->hists, al, NULL, NULL, NULL, 1, 1, 0,
+ true);
if (he == NULL)
return -ENOMEM;
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index a77e31246c00..93912add75b5 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -308,7 +308,7 @@ static int hists__add_entry(struct hists *hists,
u64 weight, u64 transaction)
{
if (__hists__add_entry(hists, al, NULL, NULL, NULL, period, weight,
- transaction) != NULL)
+ transaction, true) != NULL)
return 0;
return -ENOMEM;
}
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index ed99ec4a309f..40e430530168 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -245,7 +245,7 @@ static struct hist_entry *perf_evsel__add_hist_entry(struct perf_evsel *evsel,
pthread_mutex_lock(&evsel->hists.lock);
he = __hists__add_entry(&evsel->hists, al, NULL, NULL, NULL,
sample->period, sample->weight,
- sample->transaction);
+ sample->transaction, true);
pthread_mutex_unlock(&evsel->hists.lock);
if (he == NULL)
return NULL;
diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c
index 2b6519e0e36f..e4e931ec1dbb 100644
--- a/tools/perf/tests/hists_link.c
+++ b/tools/perf/tests/hists_link.c
@@ -223,7 +223,7 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine)
goto out;
he = __hists__add_entry(&evsel->hists, &al, NULL,
- NULL, NULL, 1, 1, 0);
+ NULL, NULL, 1, 1, 0, true);
if (he == NULL)
goto out;
@@ -246,7 +246,7 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine)
goto out;
he = __hists__add_entry(&evsel->hists, &al, NULL,
- NULL, NULL, 1, 1, 0);
+ NULL, NULL, 1, 1, 0, true);
if (he == NULL)
goto out;
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 4119a9dd665f..db474526ba66 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -279,7 +279,8 @@ void hists__decay_entries(struct hists *hists, bool zap_user, bool zap_kernel)
* histogram, sorted on item, collects periods
*/
-static struct hist_entry *hist_entry__new(struct hist_entry *template)
+static struct hist_entry *hist_entry__new(struct hist_entry *template,
+ bool sample_self)
{
size_t callchain_size = 0;
struct hist_entry *he;
@@ -299,6 +300,8 @@ static struct hist_entry *hist_entry__new(struct hist_entry *template)
return NULL;
}
memcpy(he->stat_acc, &he->stat, sizeof(he->stat));
+ if (!sample_self)
+ memset(&he->stat, 0, sizeof(he->stat));
}
if (he->ms.map)
@@ -360,7 +363,8 @@ static u8 symbol__parent_filter(const struct symbol *parent)
static struct hist_entry *add_hist_entry(struct hists *hists,
struct hist_entry *entry,
- struct addr_location *al)
+ struct addr_location *al,
+ bool sample_self)
{
struct rb_node **p;
struct rb_node *parent = NULL;
@@ -384,7 +388,8 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
cmp = hist_entry__cmp(he, entry);
if (!cmp) {
- he_stat__add_period(&he->stat, period, weight);
+ if (sample_self)
+ he_stat__add_period(&he->stat, period, weight);
if (symbol_conf.cumulate_callchain)
he_stat__add_period(he->stat_acc, period, weight);
@@ -414,7 +419,7 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
p = &(*p)->rb_right;
}
- he = hist_entry__new(entry);
+ he = hist_entry__new(entry, sample_self);
if (!he)
return NULL;
@@ -422,7 +427,8 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
rb_link_node(&he->rb_node_in, parent, p);
rb_insert_color(&he->rb_node_in, hists->entries_in);
out:
- he_stat__add_cpumode_period(&he->stat, al->cpumode, period);
+ if (sample_self)
+ he_stat__add_cpumode_period(&he->stat, al->cpumode, period);
if (symbol_conf.cumulate_callchain)
he_stat__add_cpumode_period(he->stat_acc, al->cpumode, period);
return he;
@@ -433,7 +439,8 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
struct symbol *sym_parent,
struct branch_info *bi,
struct mem_info *mi,
- u64 period, u64 weight, u64 transaction)
+ u64 period, u64 weight, u64 transaction,
+ bool sample_self)
{
struct hist_entry entry = {
.thread = al->thread,
@@ -458,7 +465,7 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
.transaction = transaction,
};
- return add_hist_entry(hists, &entry, al);
+ return add_hist_entry(hists, &entry, al, sample_self);
}
static int
@@ -511,7 +518,7 @@ iter_add_single_mem_entry(struct hist_entry_iter *iter, struct addr_location *al
* and the he_stat__add_period() function.
*/
he = __hists__add_entry(&iter->evsel->hists, al, iter->parent, NULL, mi,
- cost, cost, 0);
+ cost, cost, 0, true);
if (!he)
return -ENOMEM;
@@ -622,7 +629,7 @@ iter_add_next_branch_entry(struct hist_entry_iter *iter, struct addr_location *a
* and not events sampled. Thus we use a pseudo period of 1.
*/
he = __hists__add_entry(&evsel->hists, al, iter->parent, &bi[i], NULL,
- 1, 1, 0);
+ 1, 1, 0, true);
if (he == NULL)
return -ENOMEM;
@@ -668,7 +675,7 @@ iter_add_single_normal_entry(struct hist_entry_iter *iter, struct addr_location
he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
sample->period, sample->weight,
- sample->transaction);
+ sample->transaction, true);
if (he == NULL)
return -ENOMEM;
@@ -1170,7 +1177,7 @@ static struct hist_entry *hists__add_dummy_entry(struct hists *hists,
p = &(*p)->rb_right;
}
- he = hist_entry__new(pair);
+ he = hist_entry__new(pair, true);
if (he) {
memset(&he->stat, 0, sizeof(he->stat));
he->hists = hists;
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 49fe04bf10ee..348448d983ec 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -118,7 +118,8 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
struct symbol *parent,
struct branch_info *bi,
struct mem_info *mi, u64 period,
- u64 weight, u64 transaction);
+ u64 weight, u64 transaction,
+ bool sample_self);
int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
struct perf_evsel *evsel, struct perf_sample *sample,
int max_stack_depth);
--
1.7.11.7
Call __hists__add_entry() for each callchain node to get an
accumulated stat for an entry. Introduce new cumulative_iter ops to
process them properly.
Tested-by: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-report.c | 2 ++
tools/perf/util/callchain.c | 3 +-
tools/perf/util/hist.c | 87 +++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/hist.h | 1 +
4 files changed, 92 insertions(+), 1 deletion(-)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 53b150fb9036..967ba75277fd 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -104,6 +104,8 @@ static int process_sample_event(struct perf_tool *tool,
iter.ops = &hist_iter_branch;
else if (rep->mem_mode)
iter.ops = &hist_iter_mem;
+ else if (symbol_conf.cumulate_callchain)
+ iter.ops = &hist_iter_cumulative;
else
iter.ops = &hist_iter_normal;
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 8d9db454f1a9..35b12a06bc39 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -538,7 +538,8 @@ int sample__resolve_callchain(struct perf_sample *sample, struct symbol **parent
if (sample->callchain == NULL)
return 0;
- if (symbol_conf.use_callchain || sort__has_parent) {
+ if (symbol_conf.use_callchain || symbol_conf.cumulate_callchain ||
+ sort__has_parent) {
return machine__resolve_callchain(al->machine, evsel, al->thread,
sample, parent, al, max_stack);
}
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index db474526ba66..4691313f8748 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -705,6 +705,85 @@ iter_finish_normal_entry(struct hist_entry_iter *iter, struct addr_location *al)
return hist_entry__append_callchain(he, sample);
}
+static int
+iter_prepare_cumulative_entry(struct hist_entry_iter *iter __maybe_unused,
+ struct addr_location *al __maybe_unused)
+{
+ callchain_cursor_commit(&callchain_cursor);
+ return 0;
+}
+
+static int
+iter_add_single_cumulative_entry(struct hist_entry_iter *iter,
+ struct addr_location *al)
+{
+ struct perf_evsel *evsel = iter->evsel;
+ struct perf_sample *sample = iter->sample;
+ struct hist_entry *he;
+
+ he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
+ sample->period, sample->weight,
+ sample->transaction, true);
+ if (he == NULL)
+ return -ENOMEM;
+
+ return hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+}
+
+static int
+iter_next_cumulative_entry(struct hist_entry_iter *iter,
+ struct addr_location *al)
+{
+ struct callchain_cursor_node *node;
+
+ node = callchain_cursor_current(&callchain_cursor);
+ if (node == NULL)
+ return 0;
+
+ al->map = node->map;
+ al->sym = node->sym;
+ if (node->map)
+ al->addr = node->map->map_ip(node->map, node->ip);
+ else
+ al->addr = node->ip;
+
+ if (iter->hide_unresolved && al->sym == NULL)
+ return 0;
+
+ callchain_cursor_advance(&callchain_cursor);
+ return 1;
+}
+
+static int
+iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
+ struct addr_location *al)
+{
+ struct perf_evsel *evsel = iter->evsel;
+ struct perf_sample *sample = iter->sample;
+ struct hist_entry *he;
+
+ he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
+ sample->period, sample->weight,
+ sample->transaction, false);
+ if (he == NULL)
+ return -ENOMEM;
+
+ return hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+}
+
+static int
+iter_finish_cumulative_entry(struct hist_entry_iter *iter,
+ struct addr_location *al __maybe_unused)
+{
+ struct perf_evsel *evsel = iter->evsel;
+ struct perf_sample *sample = iter->sample;
+
+ evsel->hists.stats.total_period += sample->period;
+ hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+
+ return 0;
+}
+
const struct hist_iter_ops hist_iter_mem = {
.prepare_entry = iter_prepare_mem_entry,
.add_single_entry = iter_add_single_mem_entry,
@@ -729,6 +808,14 @@ const struct hist_iter_ops hist_iter_normal = {
.finish_entry = iter_finish_normal_entry,
};
+const struct hist_iter_ops hist_iter_cumulative = {
+ .prepare_entry = iter_prepare_cumulative_entry,
+ .add_single_entry = iter_add_single_cumulative_entry,
+ .next_entry = iter_next_cumulative_entry,
+ .add_next_entry = iter_add_next_cumulative_entry,
+ .finish_entry = iter_finish_cumulative_entry,
+};
+
int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
struct perf_evsel *evsel, struct perf_sample *sample,
int max_stack_depth)
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 348448d983ec..6da98cb2ee7e 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -112,6 +112,7 @@ struct hist_entry_iter {
extern const struct hist_iter_ops hist_iter_normal;
extern const struct hist_iter_ops hist_iter_branch;
extern const struct hist_iter_ops hist_iter_mem;
+extern const struct hist_iter_ops hist_iter_cumulative;
struct hist_entry *__hists__add_entry(struct hists *hists,
struct addr_location *al,
--
1.7.11.7
If -g cumulative option is given, it needs to show entries which don't
have self overhead. So apply percent-limit to accumulated overhead
percentage in this case.
Tested-by: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/ui/browsers/hists.c | 33 +++++++++++----------------------
tools/perf/ui/gtk/hists.c | 4 ++--
tools/perf/ui/stdio/hist.c | 4 ++--
tools/perf/util/hist.h | 1 +
tools/perf/util/sort.h | 17 ++++++++++++++++-
5 files changed, 32 insertions(+), 27 deletions(-)
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index c8e3a2e7b5c0..0a088535ebac 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -828,12 +828,12 @@ static unsigned int hist_browser__refresh(struct ui_browser *browser)
for (nd = browser->top; nd; nd = rb_next(nd)) {
struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
- float percent = h->stat.period * 100.0 /
- hb->hists->stats.total_period;
+ float percent;
if (h->filtered)
continue;
+ percent = hist_entry__get_percent_limit(h);
if (percent < hb->min_pcnt)
continue;
@@ -846,18 +846,13 @@ static unsigned int hist_browser__refresh(struct ui_browser *browser)
}
static struct rb_node *hists__filter_entries(struct rb_node *nd,
- struct hists *hists,
float min_pcnt)
{
while (nd != NULL) {
struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
- float percent = h->stat.period * 100.0 /
- hists->stats.total_period;
+ float percent = hist_entry__get_percent_limit(h);
- if (percent < min_pcnt)
- return NULL;
-
- if (!h->filtered)
+ if (!h->filtered && percent >= min_pcnt)
return nd;
nd = rb_next(nd);
@@ -867,13 +862,11 @@ static struct rb_node *hists__filter_entries(struct rb_node *nd,
}
static struct rb_node *hists__filter_prev_entries(struct rb_node *nd,
- struct hists *hists,
float min_pcnt)
{
while (nd != NULL) {
struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
- float percent = h->stat.period * 100.0 /
- hists->stats.total_period;
+ float percent = hist_entry__get_percent_limit(h);
if (!h->filtered && percent >= min_pcnt)
return nd;
@@ -902,14 +895,14 @@ static void ui_browser__hists_seek(struct ui_browser *browser,
switch (whence) {
case SEEK_SET:
nd = hists__filter_entries(rb_first(browser->entries),
- hb->hists, hb->min_pcnt);
+ hb->min_pcnt);
break;
case SEEK_CUR:
nd = browser->top;
goto do_offset;
case SEEK_END:
nd = hists__filter_prev_entries(rb_last(browser->entries),
- hb->hists, hb->min_pcnt);
+ hb->min_pcnt);
first = false;
break;
default:
@@ -952,8 +945,7 @@ do_offset:
break;
}
}
- nd = hists__filter_entries(rb_next(nd), hb->hists,
- hb->min_pcnt);
+ nd = hists__filter_entries(rb_next(nd), hb->min_pcnt);
if (nd == NULL)
break;
--offset;
@@ -986,7 +978,7 @@ do_offset:
}
}
- nd = hists__filter_prev_entries(rb_prev(nd), hb->hists,
+ nd = hists__filter_prev_entries(rb_prev(nd),
hb->min_pcnt);
if (nd == NULL)
break;
@@ -1157,7 +1149,6 @@ static int hist_browser__fprintf_entry(struct hist_browser *browser,
static int hist_browser__fprintf(struct hist_browser *browser, FILE *fp)
{
struct rb_node *nd = hists__filter_entries(rb_first(browser->b.entries),
- browser->hists,
browser->min_pcnt);
int printed = 0;
@@ -1165,8 +1156,7 @@ static int hist_browser__fprintf(struct hist_browser *browser, FILE *fp)
struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
printed += hist_browser__fprintf_entry(browser, h, fp);
- nd = hists__filter_entries(rb_next(nd), browser->hists,
- browser->min_pcnt);
+ nd = hists__filter_entries(rb_next(nd), browser->min_pcnt);
}
return printed;
@@ -1390,8 +1380,7 @@ static void hist_browser__update_pcnt_entries(struct hist_browser *hb)
while (nd) {
nr_entries++;
- nd = hists__filter_entries(rb_next(nd), hb->hists,
- hb->min_pcnt);
+ nd = hists__filter_entries(rb_next(nd), hb->min_pcnt);
}
hb->nr_pcnt_entries = nr_entries;
diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index 49531c1c7121..ded22b4da677 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -296,12 +296,12 @@ static void perf_gtk__show_hists(GtkWidget *window, struct hists *hists,
for (nd = rb_first(&hists->entries); nd; nd = rb_next(nd)) {
struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
GtkTreeIter iter;
- float percent = h->stat.period * 100.0 /
- hists->stats.total_period;
+ float percent;
if (h->filtered)
continue;
+ percent = hist_entry__get_percent_limit(h);
if (percent < min_pcnt)
continue;
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 831fbb77d1ff..b48c7ba49a1c 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -487,12 +487,12 @@ print_entries:
for (nd = rb_first(&hists->entries); nd; nd = rb_next(nd)) {
struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
- float percent = h->stat.period * 100.0 /
- hists->stats.total_period;
+ float percent;
if (h->filtered)
continue;
+ percent = hist_entry__get_percent_limit(h);
if (percent < min_pcnt)
continue;
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 1b4fb71a0bf8..60e2ef2a4c6e 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -132,6 +132,7 @@ int hist_entry__sort_snprintf(struct hist_entry *he, char *bf, size_t size,
struct hists *hists);
void hist_entry__free(struct hist_entry *);
+
void hists__output_resort(struct hists *hists);
void hists__collapse_resort(struct hists *hists, struct ui_progress *prog);
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 309f2838a1b4..103748ba25f2 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -20,7 +20,7 @@
#include "parse-options.h"
#include "parse-events.h"
-
+#include "hist.h"
#include "thread.h"
extern regex_t parent_regex;
@@ -130,6 +130,21 @@ static inline void hist_entry__add_pair(struct hist_entry *pair,
list_add_tail(&pair->pairs.node, &he->pairs.head);
}
+static inline float hist_entry__get_percent_limit(struct hist_entry *he)
+{
+ u64 period = he->stat.period;
+ u64 total_period = he->hists->stats.total_period;
+
+ if (unlikely(total_period == 0))
+ return 0;
+
+ if (symbol_conf.cumulate_callchain)
+ period = he->stat_acc->period;
+
+ return period * 100.0 / total_period;
+}
+
+
enum sort_mode {
SORT_MODE__NORMAL,
SORT_MODE__BRANCH,
--
1.7.11.7
Sometimes it needs to disable some columns at runtime. Add help
functions to support that.
Cc: Jiri Olsa <[email protected]>
Tested-by: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/ui/hist.c | 17 +++++++++++++++++
tools/perf/util/hist.h | 3 +++
2 files changed, 20 insertions(+)
diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index eb9c07bcdb01..1f9e25211da2 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -275,12 +275,29 @@ void perf_hpp__column_register(struct perf_hpp_fmt *format)
list_add_tail(&format->list, &perf_hpp__list);
}
+void perf_hpp__column_unregister(struct perf_hpp_fmt *format)
+{
+ list_del(&format->list);
+}
+
void perf_hpp__column_enable(unsigned col)
{
BUG_ON(col >= PERF_HPP__MAX_INDEX);
perf_hpp__column_register(&perf_hpp__format[col]);
}
+void perf_hpp__column_disable(unsigned col)
+{
+ BUG_ON(col >= PERF_HPP__MAX_INDEX);
+ perf_hpp__column_unregister(&perf_hpp__format[col]);
+}
+
+void perf_hpp__cancel_cumulate(void)
+{
+ perf_hpp__column_disable(PERF_HPP__OVERHEAD_ACC);
+ perf_hpp__format[PERF_HPP__OVERHEAD].header = hpp__header_overhead;
+}
+
int hist_entry__sort_snprintf(struct hist_entry *he, char *s, size_t size,
struct hists *hists)
{
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 60e2ef2a4c6e..a02342141db9 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -201,7 +201,10 @@ enum {
void perf_hpp__init(void);
void perf_hpp__column_register(struct perf_hpp_fmt *format);
+void perf_hpp__column_unregister(struct perf_hpp_fmt *format);
void perf_hpp__column_enable(unsigned col);
+void perf_hpp__column_disable(unsigned col);
+void perf_hpp__cancel_cumulate(void);
static inline size_t perf_hpp__use_color(void)
{
--
1.7.11.7
On Fri, Feb 07, 2014 at 10:35:02AM +0900, Namhyung Kim wrote:
>
> Currently the perf enables both of --call-graph and --children when it
> finds callchains in the samples. While this is useful for TUI or GTK,
> I'm not sure for stdio as it'd consume so much lines.
>
> It does not handle all kind of cases like event groups and annotations
> yet, but I really want to release it and get reviews.
>
> You can also get this series on 'perf/cumulate-v8' branch in my tree at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
I've modified a little your example to link against two libraries dynamically
and see different cases of how is self vs children attributed when running with
"--sort dso" and it seems great.
I've tried simple examples just as: calling a lib1 function that calls lib2
function which uses lot of CPU, calling lib1 function that uses lot of CPU and
also calls lib2 func that uses lot of CPU. And things like that. Probably using
one lib was enough (I first played with only one) as it's a different symbol
from the main, but just in case some weird bug (was really easy) I tested with
two :-)
I might be able to test with more real world scenarios on Saturday, although I'm
sick now and I'm not sure how I will feel :S
Thanks a lot!
Rodrigo
On Fri, Feb 07, 2014 at 10:35:20AM +0900, Namhyung Kim wrote:
> Reuse hist_entry_iter__add() function to share the similar code with
> perf report. Note that it needs to be called with hists.lock so tweak
> some internal functions not to deadlock or hold the lock too long.
>
> Tested-by: Arun Sharma <[email protected]>
> Signed-off-by: Namhyung Kim <[email protected]>
> ---
> tools/perf/builtin-top.c | 78 ++++++++++++++++++++++++++----------------------
> 1 file changed, 43 insertions(+), 35 deletions(-)
>
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index 40e430530168..e7d67421eb0f 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -194,6 +194,12 @@ static void perf_top__record_precise_ip(struct perf_top *top,
>
> pthread_mutex_unlock(¬es->lock);
>
> + /*
> + * This function is now called with he->hists->lock held.
> + * Release it before going to sleep.
> + */
> + pthread_mutex_unlock(&he->hists->lock);
> +
> if (err == -ERANGE && !he->ms.map->erange_warned)
> ui__warn_map_erange(he->ms.map, sym, ip);
> else if (err == -ENOMEM) {
> @@ -201,6 +207,8 @@ static void perf_top__record_precise_ip(struct perf_top *top,
> sym->name);
> sleep(1);
> }
> +
> + pthread_mutex_lock(&he->hists->lock);
> }
I've seen Jiri's comment on this on a previous version of the patch, and the
comment now really helps. Thanks :)
But it still feels weird for me unlocking and then re-aquiring the lock at the
end. Maybe is error prone too, as it's easy to add a "return" in between where
we don't have yet aquired the lock and everything breaks, for example. I would
document that this function must always return with that lock acquired. But
maybe this is too personal :)
And just curious, isn't it possible that the caller releases the lock before
calling the function and aquires it after ? This function could then aquire +
release the lock and be done with it. Or do you need that atomicity and know
that nobody else have taken the lock between the caller took it and this
function was called ? Although I'm not sure if this is better... Maybe is the
same at the end, as the caller should (maybe) always unlock before calling.
Also, I realize this is shared code so it might not make any sense or even
complicate things a lot more in some other places where this ends-up being
called. Sorry in advance if this is the case, I really don't know about perf :S
Thanks a lot, and sorry again,
Rodrigo
Hi Rodrigo,
On Fri, 7 Feb 2014 03:31:10 +0000, Rodrigo Campos wrote:
> On Fri, Feb 07, 2014 at 10:35:02AM +0900, Namhyung Kim wrote:
>>
>> Currently the perf enables both of --call-graph and --children when it
>> finds callchains in the samples. While this is useful for TUI or GTK,
>> I'm not sure for stdio as it'd consume so much lines.
>>
>> It does not handle all kind of cases like event groups and annotations
>> yet, but I really want to release it and get reviews.
>>
>> You can also get this series on 'perf/cumulate-v8' branch in my tree at:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
>
>
> I've modified a little your example to link against two libraries dynamically
> and see different cases of how is self vs children attributed when running with
> "--sort dso" and it seems great.
Thanks for testing!
>
> I've tried simple examples just as: calling a lib1 function that calls lib2
> function which uses lot of CPU, calling lib1 function that uses lot of CPU and
> also calls lib2 func that uses lot of CPU. And things like that. Probably using
> one lib was enough (I first played with only one) as it's a different symbol
> from the main, but just in case some weird bug (was really easy) I tested with
> two :-)
Nice.
>
> I might be able to test with more real world scenarios on Saturday, although I'm
> sick now and I'm not sure how I will feel :S
Oh, sorry to hear that. Hope you get very well soon!
Thanks,
Namhyung
On Fri, 7 Feb 2014 13:26:01 +0000, Rodrigo Campos wrote:
> On Fri, Feb 07, 2014 at 10:35:20AM +0900, Namhyung Kim wrote:
>> Reuse hist_entry_iter__add() function to share the similar code with
>> perf report. Note that it needs to be called with hists.lock so tweak
>> some internal functions not to deadlock or hold the lock too long.
>>
>> Tested-by: Arun Sharma <[email protected]>
>> Signed-off-by: Namhyung Kim <[email protected]>
>> ---
>> tools/perf/builtin-top.c | 78 ++++++++++++++++++++++++++----------------------
>> 1 file changed, 43 insertions(+), 35 deletions(-)
>>
>> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
>> index 40e430530168..e7d67421eb0f 100644
>> --- a/tools/perf/builtin-top.c
>> +++ b/tools/perf/builtin-top.c
>> @@ -194,6 +194,12 @@ static void perf_top__record_precise_ip(struct perf_top *top,
>>
>> pthread_mutex_unlock(¬es->lock);
>>
>> + /*
>> + * This function is now called with he->hists->lock held.
>> + * Release it before going to sleep.
>> + */
>> + pthread_mutex_unlock(&he->hists->lock);
>> +
>> if (err == -ERANGE && !he->ms.map->erange_warned)
>> ui__warn_map_erange(he->ms.map, sym, ip);
>> else if (err == -ENOMEM) {
>> @@ -201,6 +207,8 @@ static void perf_top__record_precise_ip(struct perf_top *top,
>> sym->name);
>> sleep(1);
>> }
>> +
>> + pthread_mutex_lock(&he->hists->lock);
>> }
>
>
> I've seen Jiri's comment on this on a previous version of the patch, and the
> comment now really helps. Thanks :)
>
> But it still feels weird for me unlocking and then re-aquiring the lock at the
> end. Maybe is error prone too, as it's easy to add a "return" in between where
> we don't have yet aquired the lock and everything breaks, for example. I would
> document that this function must always return with that lock acquired. But
> maybe this is too personal :)
Yeah, it's a bit sad to require such weird locking behavior. And I
agree that adding more comments will help preventing those possible
mistakes.
>
> And just curious, isn't it possible that the caller releases the lock before
> calling the function and aquires it after ? This function could then aquire +
> release the lock and be done with it. Or do you need that atomicity and know
> that nobody else have taken the lock between the caller took it and this
> function was called ? Although I'm not sure if this is better... Maybe is the
> same at the end, as the caller should (maybe) always unlock before calling.
>
> Also, I realize this is shared code so it might not make any sense or even
> complicate things a lot more in some other places where this ends-up being
> called. Sorry in advance if this is the case, I really don't know about perf :S
Don't say sorry. :)
The perf_top__record_precise_ip() is called in hist_iter_cb() callback
function so that we cannot control how it'll be called since it's now in
the generic code. So it acquires the lock before calling the generic
code and then release-and-reacquire in the callback function.
And the hist lock is needed during inserting an entry since display
thread can switch to another tree anytime.
Thanks,
Namhyung
On Fri, Feb 07, 2014 at 10:35:02AM +0900, Namhyung Kim wrote:
SNIP
>
>
> Namhyung Kim (21):
> perf tools: Introduce struct hist_entry_iter
> perf hists: Add support for accumulated stat of hist entry
> perf hists: Check if accumulated when adding a hist entry
> perf hists: Accumulate hist entry stat based on the callchain
> perf tools: Update cpumode for each cumulative entry
> perf report: Cache cumulative callchains
> perf callchain: Add callchain_cursor_snapshot()
> perf tools: Save callchain info for each cumulative entry
> perf hists: Sort hist entries by accumulated period
> perf ui/hist: Add support to accumulated hist stat
> perf ui/browser: Add support to accumulated hist stat
> perf ui/gtk: Add support to accumulated hist stat
> perf tools: Apply percent-limit to cumulative percentage
> perf tools: Add more hpp helper functions
> perf report: Add --children option
> perf report: Add report.children config option
> perf tools: Add callback function to hist_entry_iter
> perf top: Convert to hist_entry_iter
> perf top: Add --children option
> perf top: Add top.children config option
> perf tools: Enable --children option by default
heya, I wasn't CC-ed ;-)
Acked-by: Jiri Olsa <[email protected]>
for the patchset
thanks,
jirka
2014-02-10 (월), 09:07 +0100, Jiri Olsa:
> On Fri, Feb 07, 2014 at 10:35:02AM +0900, Namhyung Kim wrote:
> heya, I wasn't CC-ed ;-)
Oops, my bad, sorry about that.
>
> Acked-by: Jiri Olsa <[email protected]>
>
> for the patchset
Thank you so much!
Namhyung