Hello,
This is my third attempt to implement cumulative hist period report.
This work begins from Arun's SORT_INCLUSIVE patch [1] but I completely
rewrote it from scratch.
The patch 01 to 03 are independent cleanups and can be applied separately.
Please see the patch 04/28. I refactored functions that add hist
entries with struct add_entry_iter. While I converted all functions
carefully, it'd be better anyone can test and confirm that I didn't
mess up something - especially for branch stack and mem stuff.
This patchset basically adds period in a sample to every node in the
callchain. A hist_entry now has an additional fields to keep the
cumulative period if --children option is given on perf report.
I changed the option as a separate --children and added a new
"Children" column (and renamed the default "Overhead" column into
"Self"). The output will be sorted by children (cumulative) overhead
for now. The reason I changed to the --children is that I still think
it's much different from other --call-graph options. The --call-graph
option will take care of it even with --children option.
I know that the UI should be changed also to be more flexible as Ingo
requested, but I'd like to do this first and then move to work on the
next. I also added a new config option to enable it by default.
* changes in v5:
- support both of --children and --call-graph (Arun)
- refactor hist_entry_iter to share with perf top (Jiri)
- various cleanups and fixes (Jiri)
- add ack's from Jiri
* changes in v4:
- change to --children option (Ingo)
- rebased on new annotation change (Arnaldo)
- support perf top also
- enable --children option by default (Ingo)
* changes in v3:
- change to --cumulate option
- fix a couple of bugs (Jiri, Rodrigo)
- rename some help functions (Arnaldo)
- cache previous hist entries rathen than just symbol and dso
- add some preparatory cleanups
- add report.cumulate config option
Let me show you an example:
$ cat abc.c
#define barrier() asm volatile("" ::: "memory")
void a(void)
{
int i;
for (i = 0; i < 1000000; i++)
barrier();
}
void b(void)
{
a();
}
void c(void)
{
b();
}
int main(void)
{
c();
return 0;
}
With this simple program I ran perf record and report:
$ perf record -g -e cycles:u ./abc
Case 1.
$ perf report --stdio --no-call-graph --no-children
# Overhead Command Shared Object Symbol
# ........ ....... ................. ..............
#
91.50% abc abc [.] a
8.18% abc ld-2.17.so [.] strlen
0.31% abc [kernel.kallsyms] [k] page_fault
0.01% abc ld-2.17.so [.] _start
Case 2. (current default behavior)
$ perf report --stdio --call-graph --no-children
# Overhead Command Shared Object Symbol
# ........ ....... ................. ..............
#
91.50% abc abc [.] a
|
--- a
b
c
main
__libc_start_main
8.18% abc ld-2.17.so [.] strlen
|
--- strlen
_dl_sysdep_start
0.31% abc [kernel.kallsyms] [k] page_fault
|
--- page_fault
_start
0.01% abc ld-2.17.so [.] _start
|
--- _start
Case 3.
$ perf report --no-call-graph --children --stdio
# Self Children Command Shared Object Symbol
# ........ ........ ....... ................. .....................
#
0.00% 91.50% abc libc-2.17.so [.] __libc_start_main
0.00% 91.50% abc abc [.] main
0.00% 91.50% abc abc [.] c
0.00% 91.50% abc abc [.] b
91.50% 91.50% abc abc [.] a
0.00% 8.18% abc ld-2.17.so [.] _dl_sysdep_start
8.18% 8.18% abc ld-2.17.so [.] strlen
0.01% 0.33% abc ld-2.17.so [.] _start
0.31% 0.31% abc [kernel.kallsyms] [k] page_fault
As you can see __libc_start_main -> main -> c -> b -> a callchain show
up in the output.
Finally, it looks like below with both option enabled:
Case 4. (default behavior?)
$ perf report --call-graph --children --stdio
# Self Children Command Shared Object Symbol
# ........ ........ ....... ................. .....................
#
0.00% 91.50% abc libc-2.17.so [.] __libc_start_main
|
--- __libc_start_main
0.00% 91.50% abc abc [.] main
|
--- main
__libc_start_main
0.00% 91.50% abc abc [.] c
|
--- c
main
__libc_start_main
0.00% 91.50% abc abc [.] b
|
--- b
c
main
__libc_start_main
91.50% 91.50% abc abc [.] a
|
--- a
b
c
main
__libc_start_main
...
Currently the perf enables both of --call-graph and --children when it
finds callchains in the samples. While this is useful for TUI or GTK,
I'm not sure for stdio as it'd consume so much lines.
I know it have some rough edges or even bugs, but I really want to
release it and get reviews. It does not handle event groups and
annotations yet.
You can also get this series on 'perf/cumulate-v5' branch in my tree at:
git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
Any comments are welcome, thanks.
Namhyung
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
[1] https://lkml.org/lkml/2012/3/31/6
Namhyung Kim (28):
perf tools: Insert filtered entries to hists also
perf tools: Do not update total period of a hists when filtering
perf tools: Remove symbol_conf.use_callchain check
perf tools: Introduce struct hist_entry_iter
perf hists: Convert hist entry functions to use struct he_stat
perf hists: Add support for accumulated stat of hist entry
perf hists: Check if accumulated when adding a hist entry
perf hists: Accumulate hist entry stat based on the callchain
perf tools: Update cpumode for each cumulative entry
perf report: Cache cumulative callchains
perf callchain: Add callchain_cursor_snapshot()
perf tools: Save callchain info for each cumulative entry
perf hists: Sort hist entries by accumulated period
perf ui/hist: Add support to accumulated hist stat
perf ui/browser: Add support to accumulated hist stat
perf ui/gtk: Add support to accumulated hist stat
perf tools: Apply percent-limit to cumulative percentage
perf tools: Add more hpp helper functions
perf report: Add --children option
perf report: Add report.children config option
perf tools: Factor out sample__resolve_callchain()
perf tools: Factor out fill_callchain_info()
perf tools: Factor out hist_entry_iter code
perf tools: Add callback function to hist_entry_iter
perf top: Convert to hist_entry_iter
perf top: Add --children option
perf top: Add top.children config option
perf tools: Enable --children option by default
tools/perf/Documentation/perf-report.txt | 5 +
tools/perf/Documentation/perf-top.txt | 6 +
tools/perf/builtin-annotate.c | 3 +-
tools/perf/builtin-diff.c | 2 +-
tools/perf/builtin-report.c | 202 +++---------
tools/perf/builtin-top.c | 104 +++---
tools/perf/tests/hists_link.c | 4 +-
tools/perf/ui/browsers/hists.c | 50 +--
tools/perf/ui/gtk/hists.c | 20 +-
tools/perf/ui/hist.c | 62 ++++
tools/perf/ui/stdio/hist.c | 4 +-
tools/perf/util/callchain.c | 66 ++++
tools/perf/util/callchain.h | 17 +
tools/perf/util/event.c | 18 +-
tools/perf/util/hist.c | 542 +++++++++++++++++++++++++++++--
tools/perf/util/hist.h | 49 ++-
tools/perf/util/machine.c | 2 -
tools/perf/util/sort.h | 18 +-
tools/perf/util/symbol.c | 11 +-
tools/perf/util/symbol.h | 3 +-
20 files changed, 899 insertions(+), 289 deletions(-)
--
1.7.11.7
Currently if a sample was filtered by command line option, it just
dropped. But this affects final output in that the percentage can be
different since the filtered entries were not included to the total.
For example, if an original output looked like below:
$ perf report --stdio -s comm
# Overhead Command
# ........ .......
#
74.19% cc1
7.61% gcc
6.11% as
4.35% sh
4.14% make
1.13% fixdep
If we gave a filter, the output changed like below:
$ perf report --stdio -s comm -c gcc,as
# Overhead Command
# ........ .......
#
55.49% gcc
44.51% as
But user might want to see this:
$ perf report --stdio -s comm -c gcc,as
# Overhead Command
# ........ .......
#
7.61% gcc
6.11% as
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-report.c | 2 +-
tools/perf/util/event.c | 18 ++++++++----------
tools/perf/util/hist.c | 11 ++---------
tools/perf/util/hist.h | 7 +++++++
tools/perf/util/symbol.h | 2 +-
5 files changed, 19 insertions(+), 21 deletions(-)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index bf8dd2e893e4..6bfcb83db8ff 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -247,7 +247,7 @@ static int process_sample_event(struct perf_tool *tool,
return -1;
}
- if (al.filtered || (rep->hide_unresolved && al.sym == NULL))
+ if (rep->hide_unresolved && al.sym == NULL)
return 0;
if (rep->cpu_list && !test_bit(sample->cpu, rep->cpu_bitmap))
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 45a76c69a9ed..2c6a583bc2e9 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -3,6 +3,7 @@
#include "debug.h"
#include "machine.h"
#include "sort.h"
+#include "hist.h"
#include "string.h"
#include "strlist.h"
#include "thread.h"
@@ -663,7 +664,7 @@ void thread__find_addr_map(struct thread *thread,
al->thread = thread;
al->addr = addr;
al->cpumode = cpumode;
- al->filtered = false;
+ al->filtered = 0;
if (machine == NULL) {
al->map = NULL;
@@ -750,9 +751,6 @@ int perf_event__preprocess_sample(const union perf_event *event,
if (thread == NULL)
return -1;
- if (thread__is_filtered(thread))
- goto out_filtered;
-
dump_printf(" ... thread: %s:%d\n", thread__comm_str(thread), thread->tid);
/*
* Have we already created the kernel maps for this machine?
@@ -767,6 +765,10 @@ int perf_event__preprocess_sample(const union perf_event *event,
thread__find_addr_map(thread, machine, cpumode, MAP__FUNCTION,
sample->ip, al);
+
+ if (thread__is_filtered(thread))
+ al->filtered |= (1 << HIST_FILTER__THREAD);
+
dump_printf(" ...... dso: %s\n",
al->map ? al->map->dso->long_name :
al->level == 'H' ? "[hypervisor]" : "<not found>");
@@ -782,7 +784,7 @@ int perf_event__preprocess_sample(const union perf_event *event,
(dso->short_name != dso->long_name &&
strlist__has_entry(symbol_conf.dso_list,
dso->long_name)))))
- goto out_filtered;
+ al->filtered |= (1 << HIST_FILTER__DSO);
al->sym = map__find_symbol(al->map, al->addr,
machine->symbol_filter);
@@ -791,11 +793,7 @@ int perf_event__preprocess_sample(const union perf_event *event,
if (symbol_conf.sym_list &&
(!al->sym || !strlist__has_entry(symbol_conf.sym_list,
al->sym->name)))
- goto out_filtered;
-
- return 0;
+ al->filtered |= (1 << HIST_FILTER__SYMBOL);
-out_filtered:
- al->filtered = true;
return 0;
}
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 4ed3e883240d..64df6b96e7ea 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -13,13 +13,6 @@ static bool hists__filter_entry_by_thread(struct hists *hists,
static bool hists__filter_entry_by_symbol(struct hists *hists,
struct hist_entry *he);
-enum hist_filter {
- HIST_FILTER__DSO,
- HIST_FILTER__THREAD,
- HIST_FILTER__PARENT,
- HIST_FILTER__SYMBOL,
-};
-
struct callchain_param callchain_param = {
.mode = CHAIN_GRAPH_REL,
.min_percent = 0.5,
@@ -329,8 +322,8 @@ void hists__inc_nr_entries(struct hists *hists, struct hist_entry *h)
if (!h->filtered) {
hists__calc_col_len(hists, h);
++hists->nr_entries;
- hists->stats.total_period += h->stat.period;
}
+ hists->stats.total_period += h->stat.period;
}
static u8 symbol__parent_filter(const struct symbol *parent)
@@ -429,7 +422,7 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
.weight = weight,
},
.parent = sym_parent,
- .filtered = symbol__parent_filter(sym_parent),
+ .filtered = symbol__parent_filter(sym_parent) | al->filtered,
.hists = hists,
.branch_info = bi,
.mem_info = mi,
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index a59743fa3ef7..7d1d973d9a39 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -14,6 +14,13 @@ struct hist_entry;
struct addr_location;
struct symbol;
+enum hist_filter {
+ HIST_FILTER__DSO,
+ HIST_FILTER__THREAD,
+ HIST_FILTER__PARENT,
+ HIST_FILTER__SYMBOL,
+};
+
/*
* The kernel collects the number of events it couldn't send in a stretch and
* when possible sends this number in a PERF_RECORD_LOST event. The number of
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index cbd680361806..e42b410c3f11 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -170,7 +170,7 @@ struct addr_location {
struct symbol *sym;
u64 addr;
char level;
- bool filtered;
+ u8 filtered;
u8 cpumode;
s32 cpu;
};
--
1.7.11.7
The machine__resolve_callchain() is called only if symbol_conf.
use_callchain is set so no need to check it again.
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/machine.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index a98538dc465a..4dcc89490858 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1313,8 +1313,6 @@ static int machine__resolve_callchain_sample(struct machine *machine,
*root_al = al;
callchain_cursor_reset(&callchain_cursor);
}
- if (!symbol_conf.use_callchain)
- break;
}
err = callchain_cursor_append(&callchain_cursor,
--
1.7.11.7
When filtering by thread, dso or symbol on TUI it also update total
period so that the output shows different result than no filter - the
percentage changed to relative to filtered entries only. Sometimes
(always?) this is not desired since users might expect same results
with filter.
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/hist.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 64df6b96e7ea..266d9b238856 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -698,7 +698,6 @@ static void hists__remove_entry_filter(struct hists *hists, struct hist_entry *h
if (h->ms.unfolded)
hists->nr_entries += h->nr_rows;
h->row_offset = 0;
- hists->stats.total_period += h->stat.period;
hists->stats.nr_events[PERF_RECORD_SAMPLE] += h->stat.nr_events;
hists__calc_col_len(hists, h);
@@ -721,7 +720,7 @@ void hists__filter_by_dso(struct hists *hists)
{
struct rb_node *nd;
- hists->nr_entries = hists->stats.total_period = 0;
+ hists->nr_entries = 0;
hists->stats.nr_events[PERF_RECORD_SAMPLE] = 0;
hists__reset_col_len(hists);
@@ -754,7 +753,7 @@ void hists__filter_by_thread(struct hists *hists)
{
struct rb_node *nd;
- hists->nr_entries = hists->stats.total_period = 0;
+ hists->nr_entries = 0;
hists->stats.nr_events[PERF_RECORD_SAMPLE] = 0;
hists__reset_col_len(hists);
@@ -785,7 +784,7 @@ void hists__filter_by_symbol(struct hists *hists)
{
struct rb_node *nd;
- hists->nr_entries = hists->stats.total_period = 0;
+ hists->nr_entries = 0;
hists->stats.nr_events[PERF_RECORD_SAMPLE] = 0;
hists__reset_col_len(hists);
--
1.7.11.7
When accumulating callchain entry, also save current snapshot of the
chain so that it can show the rest of the chain.
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-report.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 087d01cac1aa..5f07da10f83c 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -363,7 +363,6 @@ static int
iter_prepare_cumulative_entry(struct hist_entry_iter *iter __maybe_unused,
struct addr_location *al __maybe_unused)
{
- struct callchain_cursor_node *node;
struct hist_entry **he_cache;
callchain_cursor_commit(&callchain_cursor);
@@ -380,10 +379,6 @@ iter_prepare_cumulative_entry(struct hist_entry_iter *iter __maybe_unused,
iter->priv = he_cache;
iter->curr = 0;
- node = callchain_cursor_current(&callchain_cursor);
- if (node == NULL)
- return 0;
-
return 0;
}
@@ -404,6 +399,14 @@ iter_add_single_cumulative_entry(struct hist_entry_iter *iter,
he_cache[iter->curr++] = he;
+ callchain_append(he->callchain, &callchain_cursor, sample->period);
+
+ /*
+ * We need to re-initialize the cursor since callchain_append()
+ * advanced the cursor to the end.
+ */
+ callchain_cursor_commit(&callchain_cursor);
+
return hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
}
@@ -453,7 +456,6 @@ iter_next_cumulative_entry(struct hist_entry_iter *iter,
}
out:
- callchain_cursor_advance(&callchain_cursor);
return 1;
}
@@ -477,6 +479,11 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
.parent = iter->parent,
};
int i;
+ struct callchain_cursor cursor;
+
+ callchain_cursor_snapshot(&cursor, &callchain_cursor);
+
+ callchain_cursor_advance(&callchain_cursor);
/*
* Check if there's duplicate entries in the callchain.
@@ -495,6 +502,8 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
he_cache[iter->curr++] = he;
+ callchain_append(he->callchain, &cursor, sample->period);
+
return hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
}
--
1.7.11.7
The callchain_cursor_snapshot() is for saving current status of the
callchain. It'll be used to accumulate callchain information for each node.
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/callchain.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 08b25af9eea1..bad694287cb8 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -151,4 +151,13 @@ int record_parse_callchain_opt(const struct option *opt, const char *arg, int un
int record_callchain_opt(const struct option *opt, const char *arg, int unset);
extern const char record_callchain_help[];
+
+static inline void callchain_cursor_snapshot(struct callchain_cursor *dest,
+ struct callchain_cursor *src)
+{
+ *dest = *src;
+
+ dest->first = src->curr;
+ dest->nr -= src->pos;
+}
#endif /* __PERF_CALLCHAIN_H */
--
1.7.11.7
It is possble that a callchain has cycles or recursive calls. In that
case it'll end up having entries more than 100% overhead in the
output. In order to prevent such entries, cache each callchain node
and skip if same entry already cumulated.
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-report.c | 54 ++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 51 insertions(+), 3 deletions(-)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 06330736485b..087d01cac1aa 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -363,7 +363,27 @@ static int
iter_prepare_cumulative_entry(struct hist_entry_iter *iter __maybe_unused,
struct addr_location *al __maybe_unused)
{
+ struct callchain_cursor_node *node;
+ struct hist_entry **he_cache;
+
callchain_cursor_commit(&callchain_cursor);
+
+ /*
+ * This is for detecting cycles or recursions so that they're
+ * cumulated only one time to prevent entries more than 100%
+ * overhead.
+ */
+ he_cache = malloc(sizeof(*he_cache) * (PERF_MAX_STACK_DEPTH + 1));
+ if (he_cache == NULL)
+ return -ENOMEM;
+
+ iter->priv = he_cache;
+ iter->curr = 0;
+
+ node = callchain_cursor_current(&callchain_cursor);
+ if (node == NULL)
+ return 0;
+
return 0;
}
@@ -373,6 +393,7 @@ iter_add_single_cumulative_entry(struct hist_entry_iter *iter,
{
struct perf_evsel *evsel = iter->evsel;
struct perf_sample *sample = iter->sample;
+ struct hist_entry **he_cache = iter->priv;
struct hist_entry *he;
he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
@@ -381,6 +402,8 @@ iter_add_single_cumulative_entry(struct hist_entry_iter *iter,
if (he == NULL)
return -ENOMEM;
+ he_cache[iter->curr++] = he;
+
return hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
}
@@ -408,8 +431,8 @@ iter_next_cumulative_entry(struct hist_entry_iter *iter,
goto out;
}
- if (al->map->groups == &iter->machine->kmaps) {
- if (machine__is_host(iter->machine)) {
+ if (al->map->groups == &al->machine->kmaps) {
+ if (machine__is_host(al->machine)) {
al->cpumode = PERF_RECORD_MISC_KERNEL;
al->level = 'k';
} else {
@@ -417,7 +440,7 @@ iter_next_cumulative_entry(struct hist_entry_iter *iter,
al->level = 'g';
}
} else {
- if (machine__is_host(iter->machine)) {
+ if (machine__is_host(al->machine)) {
al->cpumode = PERF_RECORD_MISC_USER;
al->level = '.';
} else if (perf_guest) {
@@ -440,7 +463,29 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
{
struct perf_evsel *evsel = iter->evsel;
struct perf_sample *sample = iter->sample;
+ struct hist_entry **he_cache = iter->priv;
struct hist_entry *he;
+ struct hist_entry he_tmp = {
+ .cpu = al->cpu,
+ .thread = al->thread,
+ .comm = thread__comm(al->thread),
+ .ip = al->addr,
+ .ms = {
+ .map = al->map,
+ .sym = al->sym,
+ },
+ .parent = iter->parent,
+ };
+ int i;
+
+ /*
+ * Check if there's duplicate entries in the callchain.
+ * It's possible that it has cycles or recursive calls.
+ */
+ for (i = 0; i < iter->curr; i++) {
+ if (hist_entry__cmp(he_cache[i], &he_tmp) == 0)
+ return 0;
+ }
he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
sample->period, sample->weight,
@@ -448,6 +493,8 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
if (he == NULL)
return -ENOMEM;
+ he_cache[iter->curr++] = he;
+
return hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
}
@@ -461,6 +508,7 @@ iter_finish_cumulative_entry(struct hist_entry_iter *iter,
evsel->hists.stats.total_period += sample->period;
hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+ zfree(&iter->priv);
return 0;
}
--
1.7.11.7
The cpumode and level in struct addr_localtion was set for a sample
and but updated as cumulative callchains were added. This led to have
non-matching symbol and cpumode in the output.
Update it accordingly based on the fact whether the map is a part of
the kernel or not. This is a reverse of what thread__find_addr_map()
does.
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-report.c | 30 ++++++++++++++++++++++++++++--
1 file changed, 28 insertions(+), 2 deletions(-)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 27e477df0730..06330736485b 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -401,9 +401,35 @@ iter_next_cumulative_entry(struct hist_entry_iter *iter,
else
al->addr = node->ip;
- if (iter->rep->hide_unresolved && al->sym == NULL)
- return 0;
+ if (al->sym == NULL) {
+ if (iter->rep->hide_unresolved)
+ return 0;
+ if (al->map == NULL)
+ goto out;
+ }
+
+ if (al->map->groups == &iter->machine->kmaps) {
+ if (machine__is_host(iter->machine)) {
+ al->cpumode = PERF_RECORD_MISC_KERNEL;
+ al->level = 'k';
+ } else {
+ al->cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
+ al->level = 'g';
+ }
+ } else {
+ if (machine__is_host(iter->machine)) {
+ al->cpumode = PERF_RECORD_MISC_USER;
+ al->level = '.';
+ } else if (perf_guest) {
+ al->cpumode = PERF_RECORD_MISC_GUEST_USER;
+ al->level = 'u';
+ } else {
+ al->cpumode = PERF_RECORD_MISC_HYPERVISOR;
+ al->level = 'H';
+ }
+ }
+out:
callchain_cursor_advance(&callchain_cursor);
return 1;
}
--
1.7.11.7
When callchain accumulation is requested, we need to sort the entries
by accumulated period value. When accumulated periods of two entries
are same (i.e. single path callchain) put the caller above since
accumulation tends to put callers on higher position for obvious
reason.
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/hist.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index a6bb3e329857..daf39fc8fe33 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -624,6 +624,18 @@ static int hist_entry__sort_on_period(struct hist_entry *a,
struct hist_entry *pair;
u64 *periods_a, *periods_b;
+ if (symbol_conf.cumulate_callchain) {
+ /*
+ * Put caller above callee when they have equal period.
+ */
+ if (a->stat_acc->period != b->stat_acc->period)
+ return a->stat_acc->period > b->stat_acc->period ? 1 : -1;
+
+ if (a->callchain->max_depth != b->callchain->max_depth)
+ return a->callchain->max_depth < b->callchain->max_depth ?
+ 1 : -1;
+ }
+
ret = period_cmp(a->stat.period, b->stat.period);
if (ret || !symbol_conf.event_group)
return ret;
--
1.7.11.7
Call __hists__add_entry() for each callchain node to get an
accumulated stat for an entry. Introduce new cumulative_iter ops to
process them properly.
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-report.c | 95 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 94 insertions(+), 1 deletion(-)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 80c50f243f34..27e477df0730 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -79,7 +79,11 @@ static int report__resolve_callchain(struct report *rep, struct symbol **parent,
struct perf_evsel *evsel, struct addr_location *al,
struct perf_sample *sample)
{
- if ((sort__has_parent || symbol_conf.use_callchain) && sample->callchain) {
+ if (sample->callchain == NULL)
+ return 0;
+
+ if (symbol_conf.use_callchain || symbol_conf.cumulate_callchain ||
+ sort__has_parent) {
return machine__resolve_callchain(al->machine, evsel, al->thread, sample,
parent, al, rep->max_stack);
}
@@ -355,6 +359,85 @@ iter_finish_normal_entry(struct hist_entry_iter *iter, struct addr_location *al)
return hist_entry__append_callchain(he, sample);
}
+static int
+iter_prepare_cumulative_entry(struct hist_entry_iter *iter __maybe_unused,
+ struct addr_location *al __maybe_unused)
+{
+ callchain_cursor_commit(&callchain_cursor);
+ return 0;
+}
+
+static int
+iter_add_single_cumulative_entry(struct hist_entry_iter *iter,
+ struct addr_location *al)
+{
+ struct perf_evsel *evsel = iter->evsel;
+ struct perf_sample *sample = iter->sample;
+ struct hist_entry *he;
+
+ he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
+ sample->period, sample->weight,
+ sample->transaction, true);
+ if (he == NULL)
+ return -ENOMEM;
+
+ return hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+}
+
+static int
+iter_next_cumulative_entry(struct hist_entry_iter *iter,
+ struct addr_location *al)
+{
+ struct callchain_cursor_node *node;
+
+ node = callchain_cursor_current(&callchain_cursor);
+ if (node == NULL)
+ return 0;
+
+ al->map = node->map;
+ al->sym = node->sym;
+ if (node->map)
+ al->addr = node->map->map_ip(node->map, node->ip);
+ else
+ al->addr = node->ip;
+
+ if (iter->rep->hide_unresolved && al->sym == NULL)
+ return 0;
+
+ callchain_cursor_advance(&callchain_cursor);
+ return 1;
+}
+
+static int
+iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
+ struct addr_location *al)
+{
+ struct perf_evsel *evsel = iter->evsel;
+ struct perf_sample *sample = iter->sample;
+ struct hist_entry *he;
+
+ he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
+ sample->period, sample->weight,
+ sample->transaction, false);
+ if (he == NULL)
+ return -ENOMEM;
+
+ return hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+}
+
+static int
+iter_finish_cumulative_entry(struct hist_entry_iter *iter,
+ struct addr_location *al __maybe_unused)
+{
+ struct perf_evsel *evsel = iter->evsel;
+ struct perf_sample *sample = iter->sample;
+
+ evsel->hists.stats.total_period += sample->period;
+ hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+
+ return 0;
+}
+
static struct hist_entry_iter mem_iter = {
.prepare_entry = iter_prepare_mem_entry,
.add_single_entry = iter_add_single_mem_entry,
@@ -379,6 +462,14 @@ static struct hist_entry_iter normal_iter = {
.finish_entry = iter_finish_normal_entry,
};
+static struct hist_entry_iter cumulative_iter = {
+ .prepare_entry = iter_prepare_cumulative_entry,
+ .add_single_entry = iter_add_single_cumulative_entry,
+ .next_entry = iter_next_cumulative_entry,
+ .add_next_entry = iter_add_next_cumulative_entry,
+ .finish_entry = iter_finish_cumulative_entry,
+};
+
static int
iter_add_entry(struct hist_entry_iter *iter, struct addr_location *al)
{
@@ -438,6 +529,8 @@ static int process_sample_event(struct perf_tool *tool,
iter = &branch_iter;
else if (rep->mem_mode == 1)
iter = &mem_iter;
+ else if (symbol_conf.cumulate_callchain)
+ iter = &cumulative_iter;
else
iter = &normal_iter;
--
1.7.11.7
The new ->add_entry_cb() will be called after an entry was added to
the histogram. It's used for code sharing between perf report and
perf top.
Also pass @arg to the callback function. It'll be used by perf top
later.
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-report.c | 30 +++++++++++++++++++++++++++---
tools/perf/util/hist.c | 38 +++++++++++++++++++-------------------
tools/perf/util/hist.h | 6 +++++-
3 files changed, 51 insertions(+), 23 deletions(-)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 2d603304e779..896d80c86347 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -79,6 +79,27 @@ static int report__config(const char *var, const char *value, void *cb)
return perf_default_config(var, value, cb);
}
+static int hist_iter_cb(struct hist_entry_iter *iter,
+ struct addr_location *al, bool single,
+ void *arg __maybe_unused)
+{
+ int err;
+ struct hist_entry *he = iter->he;
+ struct perf_evsel *evsel = iter->evsel;
+ struct perf_sample *sample = iter->sample;
+
+ err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+
+ if (!single)
+ goto out;
+
+ evsel->hists.stats.total_period += sample->period;
+ hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+
+out:
+ return err;
+}
+
static int process_sample_event(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
@@ -106,16 +127,19 @@ static int process_sample_event(struct perf_tool *tool,
iter = &hist_iter_branch;
else if (rep->mem_mode == 1)
iter = &hist_iter_mem;
- else if (symbol_conf.cumulate_callchain)
+ else if (symbol_conf.cumulate_callchain) {
iter = &hist_iter_cumulative;
- else
+ iter->add_entry_cb = hist_iter_cb;
+ } else {
iter = &hist_iter_normal;
+ iter->add_entry_cb = hist_iter_cb;
+ }
if (al.map != NULL)
al.map->dso->hit = 1;
ret = hist_entry_iter__add(iter, &al, evsel, event, sample,
- rep->hide_unresolved, rep->max_stack);
+ rep->hide_unresolved, rep->max_stack, NULL);
if (ret < 0)
pr_debug("problem adding hist entry, skipping event\n");
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index c1f5b664545a..3408b5095f0c 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -682,11 +682,10 @@ iter_add_single_normal_entry(struct hist_entry_iter *iter, struct addr_location
}
static int
-iter_finish_normal_entry(struct hist_entry_iter *iter, struct addr_location *al)
+iter_finish_normal_entry(struct hist_entry_iter *iter,
+ struct addr_location *al __maybe_unused)
{
- int err;
struct hist_entry *he = iter->he;
- struct perf_evsel *evsel = iter->evsel;
struct perf_sample *sample = iter->sample;
if (he == NULL)
@@ -694,13 +693,6 @@ iter_finish_normal_entry(struct hist_entry_iter *iter, struct addr_location *al)
iter->he = NULL;
- err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
- if (err)
- return err;
-
- evsel->hists.stats.total_period += sample->period;
- hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
-
return hist_entry__append_callchain(he, sample);
}
@@ -742,6 +734,7 @@ iter_add_single_cumulative_entry(struct hist_entry_iter *iter,
if (he == NULL)
return -ENOMEM;
+ iter->he = he;
he_cache[iter->curr++] = he;
callchain_append(he->callchain, &callchain_cursor, sample->period);
@@ -752,7 +745,7 @@ iter_add_single_cumulative_entry(struct hist_entry_iter *iter,
*/
callchain_cursor_commit(&callchain_cursor);
- return hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+ return 0;
}
static int
@@ -809,23 +802,18 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
if (he == NULL)
return -ENOMEM;
+ iter->he = he;
he_cache[iter->curr++] = he;
callchain_append(he->callchain, &cursor, sample->period);
- return hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+ return 0;
}
static int
iter_finish_cumulative_entry(struct hist_entry_iter *iter,
struct addr_location *al __maybe_unused)
{
- struct perf_evsel *evsel = iter->evsel;
- struct perf_sample *sample = iter->sample;
-
- evsel->hists.stats.total_period += sample->period;
- hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
-
zfree(&iter->priv);
return 0;
}
@@ -865,7 +853,7 @@ struct hist_entry_iter hist_iter_cumulative = {
int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
struct perf_evsel *evsel, const union perf_event *event,
struct perf_sample *sample, bool hide_unresolved,
- int max_stack_depth)
+ int max_stack_depth, void *arg)
{
int err, err2;
@@ -887,10 +875,22 @@ int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
if (err)
goto out;
+ if (iter->add_entry_cb) {
+ err = iter->add_entry_cb(iter, al, true, arg);
+ if (err)
+ goto out;
+ }
+
while (iter->next_entry(iter, al)) {
err = iter->add_next_entry(iter, al);
if (err)
break;
+
+ if (iter->add_entry_cb) {
+ err = iter->add_entry_cb(iter, al, false, arg);
+ if (err)
+ goto out;
+ }
}
out:
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 95fff1cd6727..90a94d6f0cae 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -109,6 +109,10 @@ struct hist_entry_iter {
int (*next_entry)(struct hist_entry_iter *, struct addr_location *);
int (*add_next_entry)(struct hist_entry_iter *, struct addr_location *);
int (*finish_entry)(struct hist_entry_iter *, struct addr_location *);
+
+ /* user-defined callback function (optional) */
+ int (*add_entry_cb)(struct hist_entry_iter *iter,
+ struct addr_location *al, bool single, void *arg);
};
extern struct hist_entry_iter hist_iter_normal;
@@ -126,7 +130,7 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
struct perf_evsel *evsel, const union perf_event *event,
struct perf_sample *sample, bool hide_unresolved,
- int max_stack_depth);
+ int max_stack_depth, void *arg);
int64_t hist_entry__cmp(struct hist_entry *left, struct hist_entry *right);
int64_t hist_entry__collapse(struct hist_entry *left, struct hist_entry *right);
--
1.7.11.7
The fill_callchain_info() will be shared with perf top --children.
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-report.c | 38 +-------------------------------------
tools/perf/util/callchain.c | 42 ++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/callchain.h | 2 ++
3 files changed, 45 insertions(+), 37 deletions(-)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 06eceab8d4af..ef9c67ff9e32 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -402,43 +402,7 @@ iter_next_cumulative_entry(struct hist_entry_iter *iter,
if (node == NULL)
return 0;
- al->map = node->map;
- al->sym = node->sym;
- if (node->map)
- al->addr = node->map->map_ip(node->map, node->ip);
- else
- al->addr = node->ip;
-
- if (al->sym == NULL) {
- if (iter->rep->hide_unresolved)
- return 0;
- if (al->map == NULL)
- goto out;
- }
-
- if (al->map->groups == &al->machine->kmaps) {
- if (machine__is_host(al->machine)) {
- al->cpumode = PERF_RECORD_MISC_KERNEL;
- al->level = 'k';
- } else {
- al->cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
- al->level = 'g';
- }
- } else {
- if (machine__is_host(al->machine)) {
- al->cpumode = PERF_RECORD_MISC_USER;
- al->level = '.';
- } else if (perf_guest) {
- al->cpumode = PERF_RECORD_MISC_GUEST_USER;
- al->level = 'u';
- } else {
- al->cpumode = PERF_RECORD_MISC_HYPERVISOR;
- al->level = 'H';
- }
- }
-
-out:
- return 1;
+ return fill_callchain_info(al, node, iter->rep->hide_unresolved);
}
static int
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 92b9010a651c..0799759a8e27 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -555,3 +555,45 @@ int hist_entry__append_callchain(struct hist_entry *he, struct perf_sample *samp
return 0;
return callchain_append(he->callchain, &callchain_cursor, sample->period);
}
+
+int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node *node,
+ bool hide_unresolved)
+{
+ al->map = node->map;
+ al->sym = node->sym;
+ if (node->map)
+ al->addr = node->map->map_ip(node->map, node->ip);
+ else
+ al->addr = node->ip;
+
+ if (al->sym == NULL) {
+ if (hide_unresolved)
+ return 0;
+ if (al->map == NULL)
+ goto out;
+ }
+
+ if (al->map->groups == &al->machine->kmaps) {
+ if (machine__is_host(al->machine)) {
+ al->cpumode = PERF_RECORD_MISC_KERNEL;
+ al->level = 'k';
+ } else {
+ al->cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
+ al->level = 'g';
+ }
+ } else {
+ if (machine__is_host(al->machine)) {
+ al->cpumode = PERF_RECORD_MISC_USER;
+ al->level = '.';
+ } else if (perf_guest) {
+ al->cpumode = PERF_RECORD_MISC_GUEST_USER;
+ al->level = 'u';
+ } else {
+ al->cpumode = PERF_RECORD_MISC_HYPERVISOR;
+ al->level = 'H';
+ }
+ }
+
+out:
+ return 1;
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 209538924cb6..bbd63dfbe112 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -155,6 +155,8 @@ int sample__resolve_callchain(struct perf_sample *sample, struct symbol **parent
struct perf_evsel *evsel, struct addr_location *al,
int max_stack);
int hist_entry__append_callchain(struct hist_entry *he, struct perf_sample *sample);
+int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node *node,
+ bool hide_unresolved);
extern const char record_callchain_help[];
--
1.7.11.7
The --children option is for showing accumulated overhead (period)
value as well as self overhead. It should be used with one of -g or
--call-graph option.
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/Documentation/perf-top.txt | 6 ++++++
tools/perf/builtin-top.c | 7 +++++++
2 files changed, 13 insertions(+)
diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
index cdd8d4946dba..01b6fd1a4428 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -149,6 +149,12 @@ Default is to monitor all CPUS.
Setup and enable call-graph (stack chain/backtrace) recording,
implies -g.
+--children::
+ Accumulate callchain of children to parent entry so that then can
+ show up in the output. The output will have a new "Children" column
+ and will be sorted on the data. It requires -g/--call-graph option
+ enabled.
+
--max-stack::
Set the stack depth limit when parsing the callchain, anything
beyond the specified depth will be ignored. This is a trade-off
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index cf330c66bed7..fa34bcbca587 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1078,6 +1078,8 @@ int cmd_top(int argc, const char **argv, const char *prefix __maybe_unused)
OPT_CALLBACK(0, "call-graph", &top.record_opts,
"mode[,dump_size]", record_callchain_help,
&parse_callchain_opt),
+ OPT_BOOLEAN(0, "children", &symbol_conf.cumulate_callchain,
+ "Accumulate callchains of children and show total overhead as well"),
OPT_INTEGER(0, "max-stack", &top.max_stack,
"Set the maximum stack depth when parsing the callchain. "
"Default: " __stringify(PERF_MAX_STACK_DEPTH)),
@@ -1177,6 +1179,11 @@ int cmd_top(int argc, const char **argv, const char *prefix __maybe_unused)
top.sym_evsel = perf_evlist__first(top.evlist);
+ if (!symbol_conf.use_callchain) {
+ symbol_conf.cumulate_callchain = false;
+ perf_hpp__cancel_cumulate();
+ }
+
symbol_conf.priv_size = sizeof(struct annotation);
symbol_conf.try_vmlinux_path = (symbol_conf.vmlinux_name == NULL);
--
1.7.11.7
Reuse hist_entry_iter__add() function to share the similar code with
perf report. Note that it needs to be called with hists.lock so tweak
some internal functions not to deadlock or hold the lock too long.
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-top.c | 75 ++++++++++++++++++++++++------------------------
1 file changed, 37 insertions(+), 38 deletions(-)
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index f0f55e6030cd..cf330c66bed7 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -186,9 +186,6 @@ static void perf_top__record_precise_ip(struct perf_top *top,
sym = he->ms.sym;
notes = symbol__annotation(sym);
- if (pthread_mutex_trylock(¬es->lock))
- return;
-
ip = he->ms.map->map_ip(he->ms.map, ip);
err = hist_entry__inc_addr_samples(he, counter, ip);
@@ -201,6 +198,8 @@ static void perf_top__record_precise_ip(struct perf_top *top,
sym->name);
sleep(1);
}
+
+ pthread_mutex_lock(¬es->lock);
}
static void perf_top__show_details(struct perf_top *top)
@@ -236,24 +235,6 @@ out_unlock:
pthread_mutex_unlock(¬es->lock);
}
-static struct hist_entry *perf_evsel__add_hist_entry(struct perf_evsel *evsel,
- struct addr_location *al,
- struct perf_sample *sample)
-{
- struct hist_entry *he;
-
- pthread_mutex_lock(&evsel->hists.lock);
- he = __hists__add_entry(&evsel->hists, al, NULL, NULL, NULL,
- sample->period, sample->weight,
- sample->transaction, true);
- pthread_mutex_unlock(&evsel->hists.lock);
- if (he == NULL)
- return NULL;
-
- hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
- return he;
-}
-
static void perf_top__print_sym_table(struct perf_top *top)
{
char bf[160];
@@ -657,6 +638,28 @@ static int symbol_filter(struct map *map __maybe_unused, struct symbol *sym)
return 0;
}
+static int hist_iter_cb(struct hist_entry_iter *iter, struct addr_location *al,
+ bool single, void *arg)
+{
+ struct perf_top *top = arg;
+ struct hist_entry *he = iter->he;
+ struct perf_evsel *evsel = iter->evsel;
+
+ if (single)
+ hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+
+ if (sort__has_sym) {
+ u64 ip = al->addr;
+
+ if (al->map)
+ ip = al->map->unmap_ip(al->map, ip);
+
+ perf_top__record_precise_ip(top, he, evsel->idx, ip);
+ }
+
+ return 0;
+}
+
static void perf_event__process_sample(struct perf_tool *tool,
const union perf_event *event,
struct perf_evsel *evsel,
@@ -664,8 +667,6 @@ static void perf_event__process_sample(struct perf_tool *tool,
struct machine *machine)
{
struct perf_top *top = container_of(tool, struct perf_top, tool);
- struct symbol *parent = NULL;
- u64 ip = sample->ip;
struct addr_location al;
int err;
@@ -741,25 +742,23 @@ static void perf_event__process_sample(struct perf_tool *tool,
}
if (al.sym == NULL || !al.sym->ignore) {
- struct hist_entry *he;
+ struct hist_entry_iter *iter;
- err = sample__resolve_callchain(sample, &parent, evsel, &al,
- top->max_stack);
- if (err)
- return;
+ if (symbol_conf.cumulate_callchain)
+ iter = &hist_iter_cumulative;
+ else
+ iter = &hist_iter_normal;
- he = perf_evsel__add_hist_entry(evsel, &al, sample);
- if (he == NULL) {
- pr_err("Problem incrementing symbol period, skipping event\n");
- return;
- }
+ iter->add_entry_cb = hist_iter_cb;
- err = hist_entry__append_callchain(he, sample);
- if (err)
- return;
+ pthread_mutex_lock(&evsel->hists.lock);
+
+ err = hist_entry_iter__add(iter, &al, evsel, event, sample,
+ false, top->max_stack, top);
+ if (err < 0)
+ pr_err("Problem incrementing symbol period, skipping event\n");
- if (sort__has_sym)
- perf_top__record_precise_ip(top, he, evsel->idx, ip);
+ pthread_mutex_unlock(&evsel->hists.lock);
}
return;
--
1.7.11.7
Now the hist_entry_iter code will be shared with perf top code base.
So move it to util/hist.c and do some necessary cleanups and renames.
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-report.c | 468 +-------------------------------------------
tools/perf/util/hist.c | 441 +++++++++++++++++++++++++++++++++++++++++
tools/perf/util/hist.h | 30 +++
3 files changed, 477 insertions(+), 462 deletions(-)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index ef9c67ff9e32..2d603304e779 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -79,458 +79,6 @@ static int report__config(const char *var, const char *value, void *cb)
return perf_default_config(var, value, cb);
}
-struct hist_entry_iter {
- int total;
- int curr;
-
- struct report *rep;
- union perf_event *event;
- struct perf_evsel *evsel;
- struct perf_sample *sample;
- struct hist_entry *he;
- struct symbol *parent;
- void *priv;
-
- int (*prepare_entry)(struct hist_entry_iter *, struct addr_location *);
- int (*add_single_entry)(struct hist_entry_iter *, struct addr_location *);
- int (*next_entry)(struct hist_entry_iter *, struct addr_location *);
- int (*add_next_entry)(struct hist_entry_iter *, struct addr_location *);
- int (*finish_entry)(struct hist_entry_iter *, struct addr_location *);
-};
-
-static int
-iter_next_nop_entry(struct hist_entry_iter *iter __maybe_unused,
- struct addr_location *al __maybe_unused)
-{
- return 0;
-}
-
-static int
-iter_add_next_nop_entry(struct hist_entry_iter *iter __maybe_unused,
- struct addr_location *al __maybe_unused)
-{
- return 0;
-}
-
-static int
-iter_prepare_mem_entry(struct hist_entry_iter *iter, struct addr_location *al)
-{
- union perf_event *event = iter->event;
- struct perf_sample *sample = iter->sample;
- struct mem_info *mi;
- u8 cpumode;
-
- cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
-
- mi = machine__resolve_mem(al->machine, al->thread, sample, cpumode);
- if (mi == NULL)
- return -ENOMEM;
-
- iter->priv = mi;
- return 0;
-}
-
-static int
-iter_add_single_mem_entry(struct hist_entry_iter *iter, struct addr_location *al)
-{
- u64 cost;
- struct mem_info *mi = iter->priv;
- struct hist_entry *he;
-
- if (mi == NULL)
- return -EINVAL;
-
- cost = iter->sample->weight;
- if (!cost)
- cost = 1;
-
- /*
- * must pass period=weight in order to get the correct
- * sorting from hists__collapse_resort() which is solely
- * based on periods. We want sorting be done on nr_events * weight
- * and this is indirectly achieved by passing period=weight here
- * and the he_stat__add_period() function.
- */
- he = __hists__add_entry(&iter->evsel->hists, al, iter->parent, NULL, mi,
- cost, cost, 0, true);
- if (!he)
- return -ENOMEM;
-
- iter->he = he;
- return 0;
-}
-
-static int
-iter_finish_mem_entry(struct hist_entry_iter *iter, struct addr_location *al)
-{
- struct perf_evsel *evsel = iter->evsel;
- struct hist_entry *he = iter->he;
- struct mem_info *mx;
- int err = -EINVAL;
- u64 cost;
-
- if (he == NULL)
- goto out;
-
- err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
- if (err)
- goto out;
-
- mx = he->mem_info;
- err = addr_map_symbol__inc_samples(&mx->daddr, evsel->idx);
- if (err)
- goto out;
-
- cost = iter->sample->weight;
- if (!cost)
- cost = 1;
-
- evsel->hists.stats.total_period += cost;
- hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
-
- err = hist_entry__append_callchain(he, iter->sample);
-
-out:
- /*
- * We don't need to free iter->priv (mem_info) here since
- * the mem info was either already freed in add_hist_entry() or
- * passed to a new hist entry by hist_entry__new().
- */
- iter->priv = NULL;
-
- iter->he = NULL;
- return err;
-}
-
-static int
-iter_prepare_branch_entry(struct hist_entry_iter *iter, struct addr_location *al)
-{
- struct branch_info *bi;
- struct perf_sample *sample = iter->sample;
-
- bi = machine__resolve_bstack(al->machine, al->thread,
- sample->branch_stack);
- if (!bi)
- return -ENOMEM;
-
- iter->curr = 0;
- iter->total = sample->branch_stack->nr;
-
- iter->priv = bi;
- return 0;
-}
-
-static int
-iter_add_single_branch_entry(struct hist_entry_iter *iter __maybe_unused,
- struct addr_location *al __maybe_unused)
-{
- return 0;
-}
-
-static int
-iter_next_branch_entry(struct hist_entry_iter *iter, struct addr_location *al)
-{
- struct branch_info *bi = iter->priv;
- int i = iter->curr;
-
- if (bi == NULL)
- return 0;
-
- if (iter->curr >= iter->total)
- return 0;
-
- al->map = bi[i].to.map;
- al->sym = bi[i].to.sym;
- al->addr = bi[i].to.addr;
- return 1;
-}
-
-static int
-iter_add_next_branch_entry(struct hist_entry_iter *iter, struct addr_location *al)
-{
- struct branch_info *bi, *bx;
- struct perf_evsel *evsel = iter->evsel;
- struct hist_entry *he;
- int i = iter->curr;
- int err = 0;
-
- bi = iter->priv;
-
- if (iter->rep->hide_unresolved && !(bi[i].from.sym && bi[i].to.sym))
- goto out;
-
- /*
- * The report shows the percentage of total branches captured
- * and not events sampled. Thus we use a pseudo period of 1.
- */
- he = __hists__add_entry(&evsel->hists, al, iter->parent, &bi[i], NULL,
- 1, 1, 0, true);
- if (he == NULL)
- return -ENOMEM;
-
- bx = he->branch_info;
- err = addr_map_symbol__inc_samples(&bx->from, evsel->idx);
- if (err)
- goto out;
-
- err = addr_map_symbol__inc_samples(&bx->to, evsel->idx);
- if (err)
- goto out;
-
- evsel->hists.stats.total_period += 1;
- hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
-
-out:
- iter->curr++;
- return err;
-}
-
-static int
-iter_finish_branch_entry(struct hist_entry_iter *iter,
- struct addr_location *al __maybe_unused)
-{
- zfree(&iter->priv);
-
- return iter->curr >= iter->total ? 0 : -1;
-}
-
-static int
-iter_prepare_normal_entry(struct hist_entry_iter *iter __maybe_unused,
- struct addr_location *al __maybe_unused)
-{
- return 0;
-}
-
-static int
-iter_add_single_normal_entry(struct hist_entry_iter *iter, struct addr_location *al)
-{
- struct perf_evsel *evsel = iter->evsel;
- struct perf_sample *sample = iter->sample;
- struct hist_entry *he;
-
- he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
- sample->period, sample->weight,
- sample->transaction, true);
- if (he == NULL)
- return -ENOMEM;
-
- iter->he = he;
- return 0;
-}
-
-static int
-iter_finish_normal_entry(struct hist_entry_iter *iter, struct addr_location *al)
-{
- int err;
- struct hist_entry *he = iter->he;
- struct perf_evsel *evsel = iter->evsel;
- struct perf_sample *sample = iter->sample;
-
- if (he == NULL)
- return 0;
-
- iter->he = NULL;
-
- err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
- if (err)
- return err;
-
- evsel->hists.stats.total_period += sample->period;
- hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
-
- return hist_entry__append_callchain(he, sample);
-}
-
-static int
-iter_prepare_cumulative_entry(struct hist_entry_iter *iter __maybe_unused,
- struct addr_location *al __maybe_unused)
-{
- struct hist_entry **he_cache;
-
- callchain_cursor_commit(&callchain_cursor);
-
- /*
- * This is for detecting cycles or recursions so that they're
- * cumulated only one time to prevent entries more than 100%
- * overhead.
- */
- he_cache = malloc(sizeof(*he_cache) * (PERF_MAX_STACK_DEPTH + 1));
- if (he_cache == NULL)
- return -ENOMEM;
-
- iter->priv = he_cache;
- iter->curr = 0;
-
- return 0;
-}
-
-static int
-iter_add_single_cumulative_entry(struct hist_entry_iter *iter,
- struct addr_location *al)
-{
- struct perf_evsel *evsel = iter->evsel;
- struct perf_sample *sample = iter->sample;
- struct hist_entry **he_cache = iter->priv;
- struct hist_entry *he;
-
- he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
- sample->period, sample->weight,
- sample->transaction, true);
- if (he == NULL)
- return -ENOMEM;
-
- he_cache[iter->curr++] = he;
-
- callchain_append(he->callchain, &callchain_cursor, sample->period);
-
- /*
- * We need to re-initialize the cursor since callchain_append()
- * advanced the cursor to the end.
- */
- callchain_cursor_commit(&callchain_cursor);
-
- return hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
-}
-
-static int
-iter_next_cumulative_entry(struct hist_entry_iter *iter,
- struct addr_location *al)
-{
- struct callchain_cursor_node *node;
-
- node = callchain_cursor_current(&callchain_cursor);
- if (node == NULL)
- return 0;
-
- return fill_callchain_info(al, node, iter->rep->hide_unresolved);
-}
-
-static int
-iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
- struct addr_location *al)
-{
- struct perf_evsel *evsel = iter->evsel;
- struct perf_sample *sample = iter->sample;
- struct hist_entry **he_cache = iter->priv;
- struct hist_entry *he;
- struct hist_entry he_tmp = {
- .cpu = al->cpu,
- .thread = al->thread,
- .comm = thread__comm(al->thread),
- .ip = al->addr,
- .ms = {
- .map = al->map,
- .sym = al->sym,
- },
- .parent = iter->parent,
- };
- int i;
- struct callchain_cursor cursor;
-
- callchain_cursor_snapshot(&cursor, &callchain_cursor);
-
- callchain_cursor_advance(&callchain_cursor);
-
- /*
- * Check if there's duplicate entries in the callchain.
- * It's possible that it has cycles or recursive calls.
- */
- for (i = 0; i < iter->curr; i++) {
- if (hist_entry__cmp(he_cache[i], &he_tmp) == 0)
- return 0;
- }
-
- he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
- sample->period, sample->weight,
- sample->transaction, false);
- if (he == NULL)
- return -ENOMEM;
-
- he_cache[iter->curr++] = he;
-
- callchain_append(he->callchain, &cursor, sample->period);
-
- return hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
-}
-
-static int
-iter_finish_cumulative_entry(struct hist_entry_iter *iter,
- struct addr_location *al __maybe_unused)
-{
- struct perf_evsel *evsel = iter->evsel;
- struct perf_sample *sample = iter->sample;
-
- evsel->hists.stats.total_period += sample->period;
- hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
-
- zfree(&iter->priv);
- return 0;
-}
-
-static struct hist_entry_iter mem_iter = {
- .prepare_entry = iter_prepare_mem_entry,
- .add_single_entry = iter_add_single_mem_entry,
- .next_entry = iter_next_nop_entry,
- .add_next_entry = iter_add_next_nop_entry,
- .finish_entry = iter_finish_mem_entry,
-};
-
-static struct hist_entry_iter branch_iter = {
- .prepare_entry = iter_prepare_branch_entry,
- .add_single_entry = iter_add_single_branch_entry,
- .next_entry = iter_next_branch_entry,
- .add_next_entry = iter_add_next_branch_entry,
- .finish_entry = iter_finish_branch_entry,
-};
-
-static struct hist_entry_iter normal_iter = {
- .prepare_entry = iter_prepare_normal_entry,
- .add_single_entry = iter_add_single_normal_entry,
- .next_entry = iter_next_nop_entry,
- .add_next_entry = iter_add_next_nop_entry,
- .finish_entry = iter_finish_normal_entry,
-};
-
-static struct hist_entry_iter cumulative_iter = {
- .prepare_entry = iter_prepare_cumulative_entry,
- .add_single_entry = iter_add_single_cumulative_entry,
- .next_entry = iter_next_cumulative_entry,
- .add_next_entry = iter_add_next_cumulative_entry,
- .finish_entry = iter_finish_cumulative_entry,
-};
-
-static int
-iter_add_entry(struct hist_entry_iter *iter, struct addr_location *al)
-{
- int err, err2;
-
- err = sample__resolve_callchain(iter->sample, &iter->parent,
- iter->evsel, al, iter->rep->max_stack);
- if (err)
- return err;
-
- err = iter->prepare_entry(iter, al);
- if (err)
- goto out;
-
- err = iter->add_single_entry(iter, al);
- if (err)
- goto out;
-
- while (iter->next_entry(iter, al)) {
- err = iter->add_next_entry(iter, al);
- if (err)
- break;
- }
-
-out:
- err2 = iter->finish_entry(iter, al);
- if (!err)
- err = err2;
-
- return err;
-}
-
static int process_sample_event(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
@@ -555,23 +103,19 @@ static int process_sample_event(struct perf_tool *tool,
return 0;
if (sort__mode == SORT_MODE__BRANCH)
- iter = &branch_iter;
+ iter = &hist_iter_branch;
else if (rep->mem_mode == 1)
- iter = &mem_iter;
+ iter = &hist_iter_mem;
else if (symbol_conf.cumulate_callchain)
- iter = &cumulative_iter;
+ iter = &hist_iter_cumulative;
else
- iter = &normal_iter;
+ iter = &hist_iter_normal;
if (al.map != NULL)
al.map->dso->hit = 1;
- iter->rep = rep;
- iter->evsel = evsel;
- iter->event = event;
- iter->sample = sample;
-
- ret = iter_add_entry(iter, &al);
+ ret = hist_entry_iter__add(iter, &al, evsel, event, sample,
+ rep->hide_unresolved, rep->max_stack);
if (ret < 0)
pr_debug("problem adding hist entry, skipping event\n");
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index daf39fc8fe33..c1f5b664545a 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -4,6 +4,7 @@
#include "session.h"
#include "sort.h"
#include "evsel.h"
+#include "annotate.h"
#include <math.h>
static bool hists__filter_entry_by_dso(struct hists *hists,
@@ -460,6 +461,446 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
return add_hist_entry(hists, &entry, al, sample_self);
}
+static int
+iter_next_nop_entry(struct hist_entry_iter *iter __maybe_unused,
+ struct addr_location *al __maybe_unused)
+{
+ return 0;
+}
+
+static int
+iter_add_next_nop_entry(struct hist_entry_iter *iter __maybe_unused,
+ struct addr_location *al __maybe_unused)
+{
+ return 0;
+}
+
+static int
+iter_prepare_mem_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ const union perf_event *event = iter->event;
+ struct perf_sample *sample = iter->sample;
+ struct mem_info *mi;
+ u8 cpumode;
+
+ cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+
+ mi = machine__resolve_mem(al->machine, al->thread, sample, cpumode);
+ if (mi == NULL)
+ return -ENOMEM;
+
+ iter->priv = mi;
+ return 0;
+}
+
+static int
+iter_add_single_mem_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ u64 cost;
+ struct mem_info *mi = iter->priv;
+ struct hist_entry *he;
+
+ if (mi == NULL)
+ return -EINVAL;
+
+ cost = iter->sample->weight;
+ if (!cost)
+ cost = 1;
+
+ /*
+ * must pass period=weight in order to get the correct
+ * sorting from hists__collapse_resort() which is solely
+ * based on periods. We want sorting be done on nr_events * weight
+ * and this is indirectly achieved by passing period=weight here
+ * and the he_stat__add_period() function.
+ */
+ he = __hists__add_entry(&iter->evsel->hists, al, iter->parent, NULL, mi,
+ cost, cost, 0, true);
+ if (!he)
+ return -ENOMEM;
+
+ iter->he = he;
+ return 0;
+}
+
+static int
+iter_finish_mem_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ struct perf_evsel *evsel = iter->evsel;
+ struct hist_entry *he = iter->he;
+ struct mem_info *mx;
+ int err = -EINVAL;
+ u64 cost;
+
+ if (he == NULL)
+ goto out;
+
+ err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+ if (err)
+ goto out;
+
+ mx = he->mem_info;
+ err = addr_map_symbol__inc_samples(&mx->daddr, evsel->idx);
+ if (err)
+ goto out;
+
+ cost = iter->sample->weight;
+ if (!cost)
+ cost = 1;
+
+ evsel->hists.stats.total_period += cost;
+ hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+
+ err = hist_entry__append_callchain(he, iter->sample);
+
+out:
+ /*
+ * We don't need to free iter->priv (mem_info) here since
+ * the mem info was either already freed in add_hist_entry() or
+ * passed to a new hist entry by hist_entry__new().
+ */
+ iter->priv = NULL;
+
+ iter->he = NULL;
+ return err;
+}
+
+static int
+iter_prepare_branch_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ struct branch_info *bi;
+ struct perf_sample *sample = iter->sample;
+
+ bi = machine__resolve_bstack(al->machine, al->thread,
+ sample->branch_stack);
+ if (!bi)
+ return -ENOMEM;
+
+ iter->curr = 0;
+ iter->total = sample->branch_stack->nr;
+
+ iter->priv = bi;
+ return 0;
+}
+
+static int
+iter_add_single_branch_entry(struct hist_entry_iter *iter __maybe_unused,
+ struct addr_location *al __maybe_unused)
+{
+ return 0;
+}
+
+static int
+iter_next_branch_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ struct branch_info *bi = iter->priv;
+ int i = iter->curr;
+
+ if (bi == NULL)
+ return 0;
+
+ if (iter->curr >= iter->total)
+ return 0;
+
+ al->map = bi[i].to.map;
+ al->sym = bi[i].to.sym;
+ al->addr = bi[i].to.addr;
+ return 1;
+}
+
+static int
+iter_add_next_branch_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ struct branch_info *bi, *bx;
+ struct perf_evsel *evsel = iter->evsel;
+ struct hist_entry *he;
+ int i = iter->curr;
+ int err = 0;
+
+ bi = iter->priv;
+
+ if (iter->hide_unresolved && !(bi[i].from.sym && bi[i].to.sym))
+ goto out;
+
+ /*
+ * The report shows the percentage of total branches captured
+ * and not events sampled. Thus we use a pseudo period of 1.
+ */
+ he = __hists__add_entry(&evsel->hists, al, iter->parent, &bi[i], NULL,
+ 1, 1, 0, true);
+ if (he == NULL)
+ return -ENOMEM;
+
+ bx = he->branch_info;
+ err = addr_map_symbol__inc_samples(&bx->from, evsel->idx);
+ if (err)
+ goto out;
+
+ err = addr_map_symbol__inc_samples(&bx->to, evsel->idx);
+ if (err)
+ goto out;
+
+ evsel->hists.stats.total_period += 1;
+ hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+
+out:
+ iter->curr++;
+ return err;
+}
+
+static int
+iter_finish_branch_entry(struct hist_entry_iter *iter,
+ struct addr_location *al __maybe_unused)
+{
+ zfree(&iter->priv);
+
+ return iter->curr >= iter->total ? 0 : -1;
+}
+
+static int
+iter_prepare_normal_entry(struct hist_entry_iter *iter __maybe_unused,
+ struct addr_location *al __maybe_unused)
+{
+ return 0;
+}
+
+static int
+iter_add_single_normal_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ struct perf_evsel *evsel = iter->evsel;
+ struct perf_sample *sample = iter->sample;
+ struct hist_entry *he;
+
+ he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
+ sample->period, sample->weight,
+ sample->transaction, true);
+ if (he == NULL)
+ return -ENOMEM;
+
+ iter->he = he;
+ return 0;
+}
+
+static int
+iter_finish_normal_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ int err;
+ struct hist_entry *he = iter->he;
+ struct perf_evsel *evsel = iter->evsel;
+ struct perf_sample *sample = iter->sample;
+
+ if (he == NULL)
+ return 0;
+
+ iter->he = NULL;
+
+ err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+ if (err)
+ return err;
+
+ evsel->hists.stats.total_period += sample->period;
+ hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+
+ return hist_entry__append_callchain(he, sample);
+}
+
+static int
+iter_prepare_cumulative_entry(struct hist_entry_iter *iter __maybe_unused,
+ struct addr_location *al __maybe_unused)
+{
+ struct hist_entry **he_cache;
+
+ callchain_cursor_commit(&callchain_cursor);
+
+ /*
+ * This is for detecting cycles or recursions so that they're
+ * cumulated only one time to prevent entries more than 100%
+ * overhead.
+ */
+ he_cache = malloc(sizeof(*he_cache) * (PERF_MAX_STACK_DEPTH + 1));
+ if (he_cache == NULL)
+ return -ENOMEM;
+
+ iter->priv = he_cache;
+ iter->curr = 0;
+
+ return 0;
+}
+
+static int
+iter_add_single_cumulative_entry(struct hist_entry_iter *iter,
+ struct addr_location *al)
+{
+ struct perf_evsel *evsel = iter->evsel;
+ struct perf_sample *sample = iter->sample;
+ struct hist_entry **he_cache = iter->priv;
+ struct hist_entry *he;
+
+ he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
+ sample->period, sample->weight,
+ sample->transaction, true);
+ if (he == NULL)
+ return -ENOMEM;
+
+ he_cache[iter->curr++] = he;
+
+ callchain_append(he->callchain, &callchain_cursor, sample->period);
+
+ /*
+ * We need to re-initialize the cursor since callchain_append()
+ * advanced the cursor to the end.
+ */
+ callchain_cursor_commit(&callchain_cursor);
+
+ return hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+}
+
+static int
+iter_next_cumulative_entry(struct hist_entry_iter *iter,
+ struct addr_location *al)
+{
+ struct callchain_cursor_node *node;
+
+ node = callchain_cursor_current(&callchain_cursor);
+ if (node == NULL)
+ return 0;
+
+ return fill_callchain_info(al, node, iter->hide_unresolved);
+}
+
+static int
+iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
+ struct addr_location *al)
+{
+ struct perf_evsel *evsel = iter->evsel;
+ struct perf_sample *sample = iter->sample;
+ struct hist_entry **he_cache = iter->priv;
+ struct hist_entry *he;
+ struct hist_entry he_tmp = {
+ .cpu = al->cpu,
+ .thread = al->thread,
+ .comm = thread__comm(al->thread),
+ .ip = al->addr,
+ .ms = {
+ .map = al->map,
+ .sym = al->sym,
+ },
+ .parent = iter->parent,
+ };
+ int i;
+ struct callchain_cursor cursor;
+
+ callchain_cursor_snapshot(&cursor, &callchain_cursor);
+
+ callchain_cursor_advance(&callchain_cursor);
+
+ /*
+ * Check if there's duplicate entries in the callchain.
+ * It's possible that it has cycles or recursive calls.
+ */
+ for (i = 0; i < iter->curr; i++) {
+ if (hist_entry__cmp(he_cache[i], &he_tmp) == 0)
+ return 0;
+ }
+
+ he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
+ sample->period, sample->weight,
+ sample->transaction, false);
+ if (he == NULL)
+ return -ENOMEM;
+
+ he_cache[iter->curr++] = he;
+
+ callchain_append(he->callchain, &cursor, sample->period);
+
+ return hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+}
+
+static int
+iter_finish_cumulative_entry(struct hist_entry_iter *iter,
+ struct addr_location *al __maybe_unused)
+{
+ struct perf_evsel *evsel = iter->evsel;
+ struct perf_sample *sample = iter->sample;
+
+ evsel->hists.stats.total_period += sample->period;
+ hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+
+ zfree(&iter->priv);
+ return 0;
+}
+
+struct hist_entry_iter hist_iter_mem = {
+ .prepare_entry = iter_prepare_mem_entry,
+ .add_single_entry = iter_add_single_mem_entry,
+ .next_entry = iter_next_nop_entry,
+ .add_next_entry = iter_add_next_nop_entry,
+ .finish_entry = iter_finish_mem_entry,
+};
+
+struct hist_entry_iter hist_iter_branch = {
+ .prepare_entry = iter_prepare_branch_entry,
+ .add_single_entry = iter_add_single_branch_entry,
+ .next_entry = iter_next_branch_entry,
+ .add_next_entry = iter_add_next_branch_entry,
+ .finish_entry = iter_finish_branch_entry,
+};
+
+struct hist_entry_iter hist_iter_normal = {
+ .prepare_entry = iter_prepare_normal_entry,
+ .add_single_entry = iter_add_single_normal_entry,
+ .next_entry = iter_next_nop_entry,
+ .add_next_entry = iter_add_next_nop_entry,
+ .finish_entry = iter_finish_normal_entry,
+};
+
+struct hist_entry_iter hist_iter_cumulative = {
+ .prepare_entry = iter_prepare_cumulative_entry,
+ .add_single_entry = iter_add_single_cumulative_entry,
+ .next_entry = iter_next_cumulative_entry,
+ .add_next_entry = iter_add_next_cumulative_entry,
+ .finish_entry = iter_finish_cumulative_entry,
+};
+
+int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
+ struct perf_evsel *evsel, const union perf_event *event,
+ struct perf_sample *sample, bool hide_unresolved,
+ int max_stack_depth)
+{
+ int err, err2;
+
+ err = sample__resolve_callchain(sample, &iter->parent, evsel, al,
+ max_stack_depth);
+ if (err)
+ return err;
+
+ iter->evsel = evsel;
+ iter->event = event;
+ iter->sample = sample;
+ iter->hide_unresolved = hide_unresolved;
+
+ err = iter->prepare_entry(iter, al);
+ if (err)
+ goto out;
+
+ err = iter->add_single_entry(iter, al);
+ if (err)
+ goto out;
+
+ while (iter->next_entry(iter, al)) {
+ err = iter->add_next_entry(iter, al);
+ if (err)
+ break;
+ }
+
+out:
+ err2 = iter->finish_entry(iter, al);
+ if (!err)
+ err = err2;
+
+ return err;
+}
+
int64_t
hist_entry__cmp(struct hist_entry *left, struct hist_entry *right)
{
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 2f5db3a6562e..95fff1cd6727 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -91,6 +91,31 @@ struct hists {
u16 col_len[HISTC_NR_COLS];
};
+struct hist_entry_iter {
+ int total;
+ int curr;
+
+ bool hide_unresolved;
+
+ const union perf_event *event;
+ struct perf_evsel *evsel;
+ struct perf_sample *sample;
+ struct hist_entry *he;
+ struct symbol *parent;
+ void *priv;
+
+ int (*prepare_entry)(struct hist_entry_iter *, struct addr_location *);
+ int (*add_single_entry)(struct hist_entry_iter *, struct addr_location *);
+ int (*next_entry)(struct hist_entry_iter *, struct addr_location *);
+ int (*add_next_entry)(struct hist_entry_iter *, struct addr_location *);
+ int (*finish_entry)(struct hist_entry_iter *, struct addr_location *);
+};
+
+extern struct hist_entry_iter hist_iter_normal;
+extern struct hist_entry_iter hist_iter_branch;
+extern struct hist_entry_iter hist_iter_mem;
+extern struct hist_entry_iter hist_iter_cumulative;
+
struct hist_entry *__hists__add_entry(struct hists *hists,
struct addr_location *al,
struct symbol *parent,
@@ -98,6 +123,11 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
struct mem_info *mi, u64 period,
u64 weight, u64 transaction,
bool sample_self);
+int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
+ struct perf_evsel *evsel, const union perf_event *event,
+ struct perf_sample *sample, bool hide_unresolved,
+ int max_stack_depth);
+
int64_t hist_entry__cmp(struct hist_entry *left, struct hist_entry *right);
int64_t hist_entry__collapse(struct hist_entry *left, struct hist_entry *right);
int hist_entry__transaction_len(void);
--
1.7.11.7
Now perf top and perf report will show children column by default if
it has callchain information.
Requested-by: Ingo Molnar <[email protected]>
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/symbol.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 39ce9adbaaf0..186dca9df434 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -29,11 +29,12 @@ int vmlinux_path__nr_entries;
char **vmlinux_path;
struct symbol_conf symbol_conf = {
- .use_modules = true,
- .try_vmlinux_path = true,
- .annotate_src = true,
- .demangle = true,
- .symfs = "",
+ .use_modules = true,
+ .try_vmlinux_path = true,
+ .annotate_src = true,
+ .demangle = true,
+ .cumulate_callchain = true,
+ .symfs = "",
};
static enum dso_binary_type binary_type_symtab[] = {
--
1.7.11.7
The report__resolve_callchain() can be shared with perf top code as it
doesn't really depend on the perf report code. Factor it out as
sample__resolve_callchain(). The same goes to the hist_entry__append_
callchain() too.
Acked-by: Jiri Olsa <[email protected]>
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-report.c | 26 ++------------------------
tools/perf/builtin-top.c | 22 +++++++---------------
tools/perf/util/callchain.c | 24 ++++++++++++++++++++++++
tools/perf/util/callchain.h | 6 ++++++
4 files changed, 39 insertions(+), 39 deletions(-)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index a15fa05e4a70..06eceab8d4af 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -79,28 +79,6 @@ static int report__config(const char *var, const char *value, void *cb)
return perf_default_config(var, value, cb);
}
-static int report__resolve_callchain(struct report *rep, struct symbol **parent,
- struct perf_evsel *evsel, struct addr_location *al,
- struct perf_sample *sample)
-{
- if (sample->callchain == NULL)
- return 0;
-
- if (symbol_conf.use_callchain || symbol_conf.cumulate_callchain ||
- sort__has_parent) {
- return machine__resolve_callchain(al->machine, evsel, al->thread, sample,
- parent, al, rep->max_stack);
- }
- return 0;
-}
-
-static int hist_entry__append_callchain(struct hist_entry *he, struct perf_sample *sample)
-{
- if (!symbol_conf.use_callchain)
- return 0;
- return callchain_append(he->callchain, &callchain_cursor, sample->period);
-}
-
struct hist_entry_iter {
int total;
int curr;
@@ -562,8 +540,8 @@ iter_add_entry(struct hist_entry_iter *iter, struct addr_location *al)
{
int err, err2;
- err = report__resolve_callchain(iter->rep, &iter->parent, iter->evsel,
- al, iter->sample);
+ err = sample__resolve_callchain(iter->sample, &iter->parent,
+ iter->evsel, al, iter->rep->max_stack);
if (err)
return err;
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 6d721bccdd26..f0f55e6030cd 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -743,15 +743,10 @@ static void perf_event__process_sample(struct perf_tool *tool,
if (al.sym == NULL || !al.sym->ignore) {
struct hist_entry *he;
- if ((sort__has_parent || symbol_conf.use_callchain) &&
- sample->callchain) {
- err = machine__resolve_callchain(machine, evsel,
- al.thread, sample,
- &parent, &al,
- top->max_stack);
- if (err)
- return;
- }
+ err = sample__resolve_callchain(sample, &parent, evsel, &al,
+ top->max_stack);
+ if (err)
+ return;
he = perf_evsel__add_hist_entry(evsel, &al, sample);
if (he == NULL) {
@@ -759,12 +754,9 @@ static void perf_event__process_sample(struct perf_tool *tool,
return;
}
- if (symbol_conf.use_callchain) {
- err = callchain_append(he->callchain, &callchain_cursor,
- sample->period);
- if (err)
- return;
- }
+ err = hist_entry__append_callchain(he, sample);
+ if (err)
+ return;
if (sort__has_sym)
perf_top__record_precise_ip(top, he, evsel->idx, ip);
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index e3970e3eaacf..92b9010a651c 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -17,6 +17,8 @@
#include "hist.h"
#include "util.h"
+#include "sort.h"
+#include "machine.h"
#include "callchain.h"
__thread struct callchain_cursor callchain_cursor;
@@ -531,3 +533,25 @@ int callchain_cursor_append(struct callchain_cursor *cursor,
return 0;
}
+
+int sample__resolve_callchain(struct perf_sample *sample, struct symbol **parent,
+ struct perf_evsel *evsel, struct addr_location *al,
+ int max_stack)
+{
+ if (sample->callchain == NULL)
+ return 0;
+
+ if (symbol_conf.use_callchain || symbol_conf.cumulate_callchain ||
+ sort__has_parent) {
+ return machine__resolve_callchain(al->machine, evsel, al->thread,
+ sample, parent, al, max_stack);
+ }
+ return 0;
+}
+
+int hist_entry__append_callchain(struct hist_entry *he, struct perf_sample *sample)
+{
+ if (!symbol_conf.use_callchain)
+ return 0;
+ return callchain_append(he->callchain, &callchain_cursor, sample->period);
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index bad694287cb8..209538924cb6 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -145,11 +145,17 @@ static inline void callchain_cursor_advance(struct callchain_cursor *cursor)
}
struct option;
+struct hist_entry;
int record_parse_callchain(const char *arg, struct record_opts *opts);
int record_parse_callchain_opt(const struct option *opt, const char *arg, int unset);
int record_callchain_opt(const struct option *opt, const char *arg, int unset);
+int sample__resolve_callchain(struct perf_sample *sample, struct symbol **parent,
+ struct perf_evsel *evsel, struct addr_location *al,
+ int max_stack);
+int hist_entry__append_callchain(struct hist_entry *he, struct perf_sample *sample);
+
extern const char record_callchain_help[];
static inline void callchain_cursor_snapshot(struct callchain_cursor *dest,
--
1.7.11.7
Add top.children config option for setting default value of
callchain accumulation. It affects the output only if one of
-g or --call-graph option is given as well.
A user can write .perfconfig file like below to enable accumulation
by default:
$ cat ~/.perfconfig
[top]
children = true
And it can be disabled through command line:
$ perf top --no-children
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-top.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index fa34bcbca587..ed2aaad60813 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -87,6 +87,16 @@ static void perf_top__sig_winch(int sig __maybe_unused,
perf_top__update_print_entries(top);
}
+static int top__config(const char *var, const char *value, void *cb)
+{
+ if (!strcmp(var, "top.children")) {
+ symbol_conf.cumulate_callchain = perf_config_bool(var, value);
+ return 0;
+ }
+
+ return perf_default_config(var, value, cb);
+}
+
static int perf_top__parse_source(struct perf_top *top, struct hist_entry *he)
{
struct symbol *sym;
@@ -1116,6 +1126,8 @@ int cmd_top(int argc, const char **argv, const char *prefix __maybe_unused)
if (top.evlist == NULL)
return -ENOMEM;
+ perf_config(top__config, &top);
+
argc = parse_options(argc, argv, options, top_usage, 0);
if (argc)
usage_with_options(top_usage, options);
--
1.7.11.7
Add report.children config option for setting default value of
callchain accumulation. It affects the report output only if
perf.data contains callchain info.
A user can write .perfconfig file like below to enable accumulation
by default:
$ cat ~/.perfconfig
[report]
children = true
And it can be disabled through command line:
$ perf report --no-children
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-report.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 6db8d6d74ad1..a15fa05e4a70 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -71,6 +71,10 @@ static int report__config(const char *var, const char *value, void *cb)
rep->min_percent = strtof(value, NULL);
return 0;
}
+ if (!strcmp(var, "report.children")) {
+ symbol_conf.cumulate_callchain = perf_config_bool(var, value);
+ return 0;
+ }
return perf_default_config(var, value, cb);
}
--
1.7.11.7
The --children option is for showing accumulated overhead (period)
value as well as self overhead.
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/Documentation/perf-report.txt | 5 +++++
tools/perf/builtin-report.c | 10 ++++++++++
2 files changed, 15 insertions(+)
diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 8eab8a4bdeb8..710fc8b18f6b 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -141,6 +141,11 @@ OPTIONS
Default: fractal,0.5,callee,function.
+--children::
+ Accumulate callchain of children to parent entry so that then can
+ show up in the output. The output will have a new "Children" column
+ and will be sorted on the data. It requires callchains are recorded.
+
--max-stack::
Set the stack depth limit when parsing the callchain, anything
beyond the specified depth will be ignored. This is a trade-off
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 5f07da10f83c..6db8d6d74ad1 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -685,6 +685,14 @@ static int report__setup_sample_type(struct report *rep)
}
}
+ if (symbol_conf.cumulate_callchain) {
+ /* Silently ignore if callchain is missing */
+ if (!(sample_type & PERF_SAMPLE_CALLCHAIN)) {
+ symbol_conf.cumulate_callchain = false;
+ perf_hpp__cancel_cumulate();
+ }
+ }
+
if (sort__mode == SORT_MODE__BRANCH) {
if (!is_pipe &&
!(sample_type & PERF_SAMPLE_BRANCH_STACK)) {
@@ -1115,6 +1123,8 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
OPT_CALLBACK_DEFAULT('g', "call-graph", &report, "output_type,min_percent[,print_limit],call_order",
"Display callchains using output_type (graph, flat, fractal, or none) , min percent threshold, optional print limit, callchain order, key (function or address). "
"Default: fractal,0.5,callee,function", &parse_callchain_opt, callchain_default_opt),
+ OPT_BOOLEAN(0, "children", &symbol_conf.cumulate_callchain,
+ "Accumulate callchains of children and show total overhead as well"),
OPT_INTEGER(0, "max-stack", &report.max_stack,
"Set the maximum stack depth when parsing the callchain, "
"anything beyond the specified depth will be ignored. "
--
1.7.11.7
Print accumulated stat of a hist entry if requested.
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/ui/gtk/hists.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index 2ca66cc1160f..70ed0d5e1b94 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -98,11 +98,25 @@ static int perf_gtk__hpp_color_##_type(struct perf_hpp_fmt *fmt __maybe_unused,
return __hpp__color_fmt(hpp, he, he_get_##_field); \
}
+#define __HPP_COLOR_ACC_PERCENT_FN(_type, _field) \
+static u64 he_get_acc_##_field(struct hist_entry *he) \
+{ \
+ return he->stat_acc->_field; \
+} \
+ \
+static int perf_gtk__hpp_color_##_type(struct perf_hpp_fmt *fmt __maybe_unused, \
+ struct perf_hpp *hpp, \
+ struct hist_entry *he) \
+{ \
+ return __hpp__color_fmt(hpp, he, he_get_acc_##_field); \
+}
+
__HPP_COLOR_PERCENT_FN(overhead, period)
__HPP_COLOR_PERCENT_FN(overhead_sys, period_sys)
__HPP_COLOR_PERCENT_FN(overhead_us, period_us)
__HPP_COLOR_PERCENT_FN(overhead_guest_sys, period_guest_sys)
__HPP_COLOR_PERCENT_FN(overhead_guest_us, period_guest_us)
+__HPP_COLOR_ACC_PERCENT_FN(overhead_acc, period)
#undef __HPP_COLOR_PERCENT_FN
@@ -121,6 +135,8 @@ void perf_gtk__init_hpp(void)
perf_gtk__hpp_color_overhead_guest_sys;
perf_hpp__format[PERF_HPP__OVERHEAD_GUEST_US].color =
perf_gtk__hpp_color_overhead_guest_us;
+ perf_hpp__format[PERF_HPP__OVERHEAD_ACC].color =
+ perf_gtk__hpp_color_overhead_acc;
}
static void callchain_list__sym_name(struct callchain_list *cl,
--
1.7.11.7
If -g cumulative option is given, it needs to show entries which don't
have self overhead. So apply percent-limit to accumulated overhead
percentage in this case.
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/ui/browsers/hists.c | 33 +++++++++++----------------------
tools/perf/ui/gtk/hists.c | 4 ++--
tools/perf/ui/stdio/hist.c | 4 ++--
tools/perf/util/hist.h | 1 +
tools/perf/util/sort.h | 17 ++++++++++++++++-
5 files changed, 32 insertions(+), 27 deletions(-)
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 250f42b1ad81..88e102972e51 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -828,12 +828,12 @@ static unsigned int hist_browser__refresh(struct ui_browser *browser)
for (nd = browser->top; nd; nd = rb_next(nd)) {
struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
- float percent = h->stat.period * 100.0 /
- hb->hists->stats.total_period;
+ float percent;
if (h->filtered)
continue;
+ percent = hist_entry__get_percent_limit(h);
if (percent < hb->min_pcnt)
continue;
@@ -846,18 +846,13 @@ static unsigned int hist_browser__refresh(struct ui_browser *browser)
}
static struct rb_node *hists__filter_entries(struct rb_node *nd,
- struct hists *hists,
float min_pcnt)
{
while (nd != NULL) {
struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
- float percent = h->stat.period * 100.0 /
- hists->stats.total_period;
+ float percent = hist_entry__get_percent_limit(h);
- if (percent < min_pcnt)
- return NULL;
-
- if (!h->filtered)
+ if (!h->filtered && percent >= min_pcnt)
return nd;
nd = rb_next(nd);
@@ -867,13 +862,11 @@ static struct rb_node *hists__filter_entries(struct rb_node *nd,
}
static struct rb_node *hists__filter_prev_entries(struct rb_node *nd,
- struct hists *hists,
float min_pcnt)
{
while (nd != NULL) {
struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
- float percent = h->stat.period * 100.0 /
- hists->stats.total_period;
+ float percent = hist_entry__get_percent_limit(h);
if (!h->filtered && percent >= min_pcnt)
return nd;
@@ -902,14 +895,14 @@ static void ui_browser__hists_seek(struct ui_browser *browser,
switch (whence) {
case SEEK_SET:
nd = hists__filter_entries(rb_first(browser->entries),
- hb->hists, hb->min_pcnt);
+ hb->min_pcnt);
break;
case SEEK_CUR:
nd = browser->top;
goto do_offset;
case SEEK_END:
nd = hists__filter_prev_entries(rb_last(browser->entries),
- hb->hists, hb->min_pcnt);
+ hb->min_pcnt);
first = false;
break;
default:
@@ -952,8 +945,7 @@ do_offset:
break;
}
}
- nd = hists__filter_entries(rb_next(nd), hb->hists,
- hb->min_pcnt);
+ nd = hists__filter_entries(rb_next(nd), hb->min_pcnt);
if (nd == NULL)
break;
--offset;
@@ -986,7 +978,7 @@ do_offset:
}
}
- nd = hists__filter_prev_entries(rb_prev(nd), hb->hists,
+ nd = hists__filter_prev_entries(rb_prev(nd),
hb->min_pcnt);
if (nd == NULL)
break;
@@ -1157,7 +1149,6 @@ static int hist_browser__fprintf_entry(struct hist_browser *browser,
static int hist_browser__fprintf(struct hist_browser *browser, FILE *fp)
{
struct rb_node *nd = hists__filter_entries(rb_first(browser->b.entries),
- browser->hists,
browser->min_pcnt);
int printed = 0;
@@ -1165,8 +1156,7 @@ static int hist_browser__fprintf(struct hist_browser *browser, FILE *fp)
struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
printed += hist_browser__fprintf_entry(browser, h, fp);
- nd = hists__filter_entries(rb_next(nd), browser->hists,
- browser->min_pcnt);
+ nd = hists__filter_entries(rb_next(nd), browser->min_pcnt);
}
return printed;
@@ -1390,8 +1380,7 @@ static void hist_browser__update_pcnt_entries(struct hist_browser *hb)
while (nd) {
nr_entries++;
- nd = hists__filter_entries(rb_next(nd), hb->hists,
- hb->min_pcnt);
+ nd = hists__filter_entries(rb_next(nd), hb->min_pcnt);
}
hb->nr_pcnt_entries = nr_entries;
diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index 70ed0d5e1b94..4f12ff96fdac 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -296,12 +296,12 @@ static void perf_gtk__show_hists(GtkWidget *window, struct hists *hists,
for (nd = rb_first(&hists->entries); nd; nd = rb_next(nd)) {
struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
GtkTreeIter iter;
- float percent = h->stat.period * 100.0 /
- hists->stats.total_period;
+ float percent;
if (h->filtered)
continue;
+ percent = hist_entry__get_percent_limit(h);
if (percent < min_pcnt)
continue;
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 831fbb77d1ff..b48c7ba49a1c 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -487,12 +487,12 @@ print_entries:
for (nd = rb_first(&hists->entries); nd; nd = rb_next(nd)) {
struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
- float percent = h->stat.period * 100.0 /
- hists->stats.total_period;
+ float percent;
if (h->filtered)
continue;
+ percent = hist_entry__get_percent_limit(h);
if (percent < min_pcnt)
continue;
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index e632b29fe5d6..b6f2fdc49079 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -105,6 +105,7 @@ int hist_entry__sort_snprintf(struct hist_entry *he, char *bf, size_t size,
struct hists *hists);
void hist_entry__free(struct hist_entry *);
+
void hists__output_resort(struct hists *hists);
void hists__collapse_resort(struct hists *hists, struct ui_progress *prog);
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 309f2838a1b4..103748ba25f2 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -20,7 +20,7 @@
#include "parse-options.h"
#include "parse-events.h"
-
+#include "hist.h"
#include "thread.h"
extern regex_t parent_regex;
@@ -130,6 +130,21 @@ static inline void hist_entry__add_pair(struct hist_entry *pair,
list_add_tail(&pair->pairs.node, &he->pairs.head);
}
+static inline float hist_entry__get_percent_limit(struct hist_entry *he)
+{
+ u64 period = he->stat.period;
+ u64 total_period = he->hists->stats.total_period;
+
+ if (unlikely(total_period == 0))
+ return 0;
+
+ if (symbol_conf.cumulate_callchain)
+ period = he->stat_acc->period;
+
+ return period * 100.0 / total_period;
+}
+
+
enum sort_mode {
SORT_MODE__NORMAL,
SORT_MODE__BRANCH,
--
1.7.11.7
Sometimes it needs to disable some columns at runtime. Add help
functions to support that.
Cc: Jiri Olsa <[email protected]>
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/ui/hist.c | 17 +++++++++++++++++
tools/perf/util/hist.h | 3 +++
2 files changed, 20 insertions(+)
diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index eb9c07bcdb01..1f9e25211da2 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -275,12 +275,29 @@ void perf_hpp__column_register(struct perf_hpp_fmt *format)
list_add_tail(&format->list, &perf_hpp__list);
}
+void perf_hpp__column_unregister(struct perf_hpp_fmt *format)
+{
+ list_del(&format->list);
+}
+
void perf_hpp__column_enable(unsigned col)
{
BUG_ON(col >= PERF_HPP__MAX_INDEX);
perf_hpp__column_register(&perf_hpp__format[col]);
}
+void perf_hpp__column_disable(unsigned col)
+{
+ BUG_ON(col >= PERF_HPP__MAX_INDEX);
+ perf_hpp__column_unregister(&perf_hpp__format[col]);
+}
+
+void perf_hpp__cancel_cumulate(void)
+{
+ perf_hpp__column_disable(PERF_HPP__OVERHEAD_ACC);
+ perf_hpp__format[PERF_HPP__OVERHEAD].header = hpp__header_overhead;
+}
+
int hist_entry__sort_snprintf(struct hist_entry *he, char *s, size_t size,
struct hists *hists)
{
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index b6f2fdc49079..2f5db3a6562e 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -174,7 +174,10 @@ enum {
void perf_hpp__init(void);
void perf_hpp__column_register(struct perf_hpp_fmt *format);
+void perf_hpp__column_unregister(struct perf_hpp_fmt *format);
void perf_hpp__column_enable(unsigned col);
+void perf_hpp__column_disable(unsigned col);
+void perf_hpp__cancel_cumulate(void);
static inline size_t perf_hpp__use_color(void)
{
--
1.7.11.7
Print accumulated stat of a hist entry if requested.
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/ui/browsers/hists.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index a7045ea6d1d5..250f42b1ad81 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -693,11 +693,26 @@ hist_browser__hpp_color_##_type(struct perf_hpp_fmt *fmt __maybe_unused,\
return __hpp__color_fmt(hpp, he, __hpp_get_##_field, _cb); \
}
+#define __HPP_COLOR_ACC_PERCENT_FN(_type, _field, _cb) \
+static u64 __hpp_get_acc_##_field(struct hist_entry *he) \
+{ \
+ return he->stat_acc->_field; \
+} \
+ \
+static int \
+hist_browser__hpp_color_##_type(struct perf_hpp_fmt *fmt __maybe_unused,\
+ struct perf_hpp *hpp, \
+ struct hist_entry *he) \
+{ \
+ return __hpp__color_fmt(hpp, he, __hpp_get_acc_##_field, _cb); \
+}
+
__HPP_COLOR_PERCENT_FN(overhead, period, __hpp__color_callchain)
__HPP_COLOR_PERCENT_FN(overhead_sys, period_sys, NULL)
__HPP_COLOR_PERCENT_FN(overhead_us, period_us, NULL)
__HPP_COLOR_PERCENT_FN(overhead_guest_sys, period_guest_sys, NULL)
__HPP_COLOR_PERCENT_FN(overhead_guest_us, period_guest_us, NULL)
+__HPP_COLOR_ACC_PERCENT_FN(overhead_acc, period, NULL)
#undef __HPP_COLOR_PERCENT_FN
@@ -715,6 +730,8 @@ void hist_browser__init_hpp(void)
hist_browser__hpp_color_overhead_guest_sys;
perf_hpp__format[PERF_HPP__OVERHEAD_GUEST_US].color =
hist_browser__hpp_color_overhead_guest_us;
+ perf_hpp__format[PERF_HPP__OVERHEAD_ACC].color =
+ hist_browser__hpp_color_overhead_acc;
}
static int hist_browser__show_entry(struct hist_browser *browser,
--
1.7.11.7
hist_entry__add_cpumode_period() and hist_entry__decay() are dealing
with hist_entry's stat fields only. So make them use the struct
directly.
Acked-by: Jiri Olsa <[email protected]>
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/hist.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 266d9b238856..ca3b3b707983 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -174,21 +174,21 @@ void hists__output_recalc_col_len(struct hists *hists, int max_rows)
}
}
-static void hist_entry__add_cpumode_period(struct hist_entry *he,
- unsigned int cpumode, u64 period)
+static void he_stat__add_cpumode_period(struct he_stat *he_stat,
+ unsigned int cpumode, u64 period)
{
switch (cpumode) {
case PERF_RECORD_MISC_KERNEL:
- he->stat.period_sys += period;
+ he_stat->period_sys += period;
break;
case PERF_RECORD_MISC_USER:
- he->stat.period_us += period;
+ he_stat->period_us += period;
break;
case PERF_RECORD_MISC_GUEST_KERNEL:
- he->stat.period_guest_sys += period;
+ he_stat->period_guest_sys += period;
break;
case PERF_RECORD_MISC_GUEST_USER:
- he->stat.period_guest_us += period;
+ he_stat->period_guest_us += period;
break;
default:
break;
@@ -215,10 +215,10 @@ static void he_stat__add_stat(struct he_stat *dest, struct he_stat *src)
dest->weight += src->weight;
}
-static void hist_entry__decay(struct hist_entry *he)
+static void he_stat__decay(struct he_stat *he_stat)
{
- he->stat.period = (he->stat.period * 7) / 8;
- he->stat.nr_events = (he->stat.nr_events * 7) / 8;
+ he_stat->period = (he_stat->period * 7) / 8;
+ he_stat->nr_events = (he_stat->nr_events * 7) / 8;
/* XXX need decay for weight too? */
}
@@ -229,7 +229,7 @@ static bool hists__decay_entry(struct hists *hists, struct hist_entry *he)
if (prev_period == 0)
return true;
- hist_entry__decay(he);
+ he_stat__decay(&he->stat);
if (!he->filtered)
hists->stats.total_period -= prev_period - he->stat.period;
@@ -395,7 +395,7 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
rb_link_node(&he->rb_node_in, parent, p);
rb_insert_color(&he->rb_node_in, hists->entries_in);
out:
- hist_entry__add_cpumode_period(he, al->cpumode, period);
+ he_stat__add_cpumode_period(&he->stat, al->cpumode, period);
return he;
}
--
1.7.11.7
Print accumulated stat of a hist entry if requested.
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/ui/hist.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/hist.h | 1 +
2 files changed, 46 insertions(+)
diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index 78f4c92e9b73..eb9c07bcdb01 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -129,6 +129,28 @@ static int hpp__entry_##_type(struct perf_hpp_fmt *_fmt __maybe_unused, \
scnprintf, true); \
}
+#define __HPP_COLOR_ACC_PERCENT_FN(_type, _field) \
+static u64 he_get_acc_##_field(struct hist_entry *he) \
+{ \
+ return he->stat_acc->_field; \
+} \
+ \
+static int hpp__color_acc_##_type(struct perf_hpp_fmt *fmt __maybe_unused, \
+ struct perf_hpp *hpp, struct hist_entry *he) \
+{ \
+ return __hpp__fmt(hpp, he, he_get_acc_##_field, " %6.2f%%", \
+ (hpp_snprint_fn)percent_color_snprintf, true); \
+}
+
+#define __HPP_ENTRY_ACC_PERCENT_FN(_type, _field) \
+static int hpp__entry_acc_##_type(struct perf_hpp_fmt *_fmt __maybe_unused, \
+ struct perf_hpp *hpp, struct hist_entry *he) \
+{ \
+ const char *fmt = symbol_conf.field_sep ? " %.2f" : " %6.2f%%"; \
+ return __hpp__fmt(hpp, he, he_get_acc_##_field, fmt, \
+ scnprintf, true); \
+}
+
#define __HPP_ENTRY_RAW_FN(_type, _field) \
static u64 he_get_raw_##_field(struct hist_entry *he) \
{ \
@@ -148,17 +170,25 @@ __HPP_WIDTH_FN(_type, _min_width, _unit_width) \
__HPP_COLOR_PERCENT_FN(_type, _field) \
__HPP_ENTRY_PERCENT_FN(_type, _field)
+#define HPP_PERCENT_ACC_FNS(_type, _str, _field, _min_width, _unit_width)\
+__HPP_HEADER_FN(_type, _str, _min_width, _unit_width) \
+__HPP_WIDTH_FN(_type, _min_width, _unit_width) \
+__HPP_COLOR_ACC_PERCENT_FN(_type, _field) \
+__HPP_ENTRY_ACC_PERCENT_FN(_type, _field)
+
#define HPP_RAW_FNS(_type, _str, _field, _min_width, _unit_width) \
__HPP_HEADER_FN(_type, _str, _min_width, _unit_width) \
__HPP_WIDTH_FN(_type, _min_width, _unit_width) \
__HPP_ENTRY_RAW_FN(_type, _field)
+__HPP_HEADER_FN(overhead_self, "Self", 8, 8)
HPP_PERCENT_FNS(overhead, "Overhead", period, 8, 8)
HPP_PERCENT_FNS(overhead_sys, "sys", period_sys, 8, 8)
HPP_PERCENT_FNS(overhead_us, "usr", period_us, 8, 8)
HPP_PERCENT_FNS(overhead_guest_sys, "guest sys", period_guest_sys, 9, 8)
HPP_PERCENT_FNS(overhead_guest_us, "guest usr", period_guest_us, 9, 8)
+HPP_PERCENT_ACC_FNS(overhead_acc, "Children", period, 8, 8)
HPP_RAW_FNS(samples, "Samples", nr_events, 12, 12)
HPP_RAW_FNS(period, "Period", period, 12, 12)
@@ -171,6 +201,14 @@ HPP_RAW_FNS(period, "Period", period, 12, 12)
.entry = hpp__entry_ ## _name \
}
+#define HPP__COLOR_ACC_PRINT_FNS(_name) \
+ { \
+ .header = hpp__header_ ## _name, \
+ .width = hpp__width_ ## _name, \
+ .color = hpp__color_acc_ ## _name, \
+ .entry = hpp__entry_acc_ ## _name \
+ }
+
#define HPP__PRINT_FNS(_name) \
{ \
.header = hpp__header_ ## _name, \
@@ -184,6 +222,7 @@ struct perf_hpp_fmt perf_hpp__format[] = {
HPP__COLOR_PRINT_FNS(overhead_us),
HPP__COLOR_PRINT_FNS(overhead_guest_sys),
HPP__COLOR_PRINT_FNS(overhead_guest_us),
+ HPP__COLOR_ACC_PRINT_FNS(overhead_acc),
HPP__PRINT_FNS(samples),
HPP__PRINT_FNS(period)
};
@@ -208,6 +247,12 @@ void perf_hpp__init(void)
{
perf_hpp__column_enable(PERF_HPP__OVERHEAD);
+ if (symbol_conf.cumulate_callchain) {
+ perf_hpp__format[PERF_HPP__OVERHEAD].header =
+ hpp__header_overhead_self;
+ perf_hpp__column_enable(PERF_HPP__OVERHEAD_ACC);
+ }
+
if (symbol_conf.show_cpu_utilization) {
perf_hpp__column_enable(PERF_HPP__OVERHEAD_SYS);
perf_hpp__column_enable(PERF_HPP__OVERHEAD_US);
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 212aec21735f..e632b29fe5d6 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -164,6 +164,7 @@ enum {
PERF_HPP__OVERHEAD_US,
PERF_HPP__OVERHEAD_GUEST_SYS,
PERF_HPP__OVERHEAD_GUEST_US,
+ PERF_HPP__OVERHEAD_ACC,
PERF_HPP__SAMPLES,
PERF_HPP__PERIOD,
--
1.7.11.7
Maintain accumulated stat information in hist_entry->stat_acc if
symbol_conf.cumulate_callchain is set. Fields in ->stat_acc have same
vaules initially, and will be updated as callchain is processed later.
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/hist.c | 28 ++++++++++++++++++++++++++--
tools/perf/util/sort.h | 1 +
tools/perf/util/symbol.h | 1 +
3 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index ca3b3b707983..50de02c478b4 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -230,6 +230,8 @@ static bool hists__decay_entry(struct hists *hists, struct hist_entry *he)
return true;
he_stat__decay(&he->stat);
+ if (symbol_conf.cumulate_callchain)
+ he_stat__decay(he->stat_acc);
if (!he->filtered)
hists->stats.total_period -= prev_period - he->stat.period;
@@ -271,12 +273,26 @@ void hists__decay_entries(struct hists *hists, bool zap_user, bool zap_kernel)
static struct hist_entry *hist_entry__new(struct hist_entry *template)
{
- size_t callchain_size = symbol_conf.use_callchain ? sizeof(struct callchain_root) : 0;
- struct hist_entry *he = zalloc(sizeof(*he) + callchain_size);
+ size_t callchain_size = 0;
+ struct hist_entry *he;
+
+ if (symbol_conf.use_callchain || symbol_conf.cumulate_callchain)
+ callchain_size = sizeof(struct callchain_root);
+
+ he = zalloc(sizeof(*he) + callchain_size);
if (he != NULL) {
*he = *template;
+ if (symbol_conf.cumulate_callchain) {
+ he->stat_acc = malloc(sizeof(he->stat));
+ if (he->stat_acc == NULL) {
+ free(he);
+ return NULL;
+ }
+ memcpy(he->stat_acc, &he->stat, sizeof(he->stat));
+ }
+
if (he->ms.map)
he->ms.map->referenced = true;
@@ -288,6 +304,7 @@ static struct hist_entry *hist_entry__new(struct hist_entry *template)
*/
he->branch_info = malloc(sizeof(*he->branch_info));
if (he->branch_info == NULL) {
+ free(he->stat_acc);
free(he);
return NULL;
}
@@ -360,6 +377,8 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
if (!cmp) {
he_stat__add_period(&he->stat, period, weight);
+ if (symbol_conf.cumulate_callchain)
+ he_stat__add_period(he->stat_acc, period, weight);
/*
* This mem info was allocated from machine__resolve_mem
@@ -396,6 +415,8 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
rb_insert_color(&he->rb_node_in, hists->entries_in);
out:
he_stat__add_cpumode_period(&he->stat, al->cpumode, period);
+ if (symbol_conf.cumulate_callchain)
+ he_stat__add_cpumode_period(he->stat_acc, al->cpumode, period);
return he;
}
@@ -470,6 +491,7 @@ void hist_entry__free(struct hist_entry *he)
{
zfree(&he->branch_info);
zfree(&he->mem_info);
+ zfree(&he->stat_acc);
free_srcline(he->srcline);
free(he);
}
@@ -495,6 +517,8 @@ static bool hists__collapse_insert_entry(struct hists *hists __maybe_unused,
if (!cmp) {
he_stat__add_stat(&iter->stat, &he->stat);
+ if (symbol_conf.cumulate_callchain)
+ he_stat__add_stat(iter->stat_acc, he->stat_acc);
if (symbol_conf.use_callchain) {
callchain_cursor_reset(&callchain_cursor);
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 43e5ff42a609..309f2838a1b4 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -81,6 +81,7 @@ struct hist_entry {
struct list_head head;
} pairs;
struct he_stat stat;
+ struct he_stat *stat_acc;
struct map_symbol ms;
struct thread *thread;
struct comm *comm;
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index e42b410c3f11..a384fba8e585 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -92,6 +92,7 @@ struct symbol_conf {
show_nr_samples,
show_total_period,
use_callchain,
+ cumulate_callchain,
exclude_other,
show_cpu_utilization,
initialized,
--
1.7.11.7
To support callchain accumulation, @entry should be recognized if it's
accumulated or not when add_hist_entry() called. The period of an
accumulated entry should be added to ->stat_acc but not ->stat. Add
@sample_self arg for that.
Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-annotate.c | 3 ++-
tools/perf/builtin-diff.c | 2 +-
tools/perf/builtin-report.c | 6 +++---
tools/perf/builtin-top.c | 2 +-
tools/perf/tests/hists_link.c | 4 ++--
tools/perf/util/hist.c | 23 +++++++++++++++--------
tools/perf/util/hist.h | 3 ++-
7 files changed, 26 insertions(+), 17 deletions(-)
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index ab65057a0317..adf4bb3a9414 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -65,7 +65,8 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
return 0;
}
- he = __hists__add_entry(&evsel->hists, al, NULL, NULL, NULL, 1, 1, 0);
+ he = __hists__add_entry(&evsel->hists, al, NULL, NULL, NULL, 1, 1, 0,
+ true);
if (he == NULL)
return -ENOMEM;
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index e6a0844bc2f0..2aa1cbbe4f2d 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -308,7 +308,7 @@ static int hists__add_entry(struct hists *hists,
u64 weight, u64 transaction)
{
if (__hists__add_entry(hists, al, NULL, NULL, NULL, period, weight,
- transaction) != NULL)
+ transaction, true) != NULL)
return 0;
return -ENOMEM;
}
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index c11c4e8493da..80c50f243f34 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -166,7 +166,7 @@ iter_add_single_mem_entry(struct hist_entry_iter *iter, struct addr_location *al
* and the he_stat__add_period() function.
*/
he = __hists__add_entry(&iter->evsel->hists, al, iter->parent, NULL, mi,
- cost, cost, 0);
+ cost, cost, 0, true);
if (!he)
return -ENOMEM;
@@ -278,7 +278,7 @@ iter_add_next_branch_entry(struct hist_entry_iter *iter, struct addr_location *a
* and not events sampled. Thus we use a pseudo period of 1.
*/
he = __hists__add_entry(&evsel->hists, al, iter->parent, &bi[i], NULL,
- 1, 1, 0);
+ 1, 1, 0, true);
if (he == NULL)
return -ENOMEM;
@@ -324,7 +324,7 @@ iter_add_single_normal_entry(struct hist_entry_iter *iter, struct addr_location
he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
sample->period, sample->weight,
- sample->transaction);
+ sample->transaction, true);
if (he == NULL)
return -ENOMEM;
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index e0fd0aa57f06..6d721bccdd26 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -245,7 +245,7 @@ static struct hist_entry *perf_evsel__add_hist_entry(struct perf_evsel *evsel,
pthread_mutex_lock(&evsel->hists.lock);
he = __hists__add_entry(&evsel->hists, al, NULL, NULL, NULL,
sample->period, sample->weight,
- sample->transaction);
+ sample->transaction, true);
pthread_mutex_unlock(&evsel->hists.lock);
if (he == NULL)
return NULL;
diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c
index 173bf42cc03e..b829c2a1a598 100644
--- a/tools/perf/tests/hists_link.c
+++ b/tools/perf/tests/hists_link.c
@@ -223,7 +223,7 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine)
goto out;
he = __hists__add_entry(&evsel->hists, &al, NULL,
- NULL, NULL, 1, 1, 0);
+ NULL, NULL, 1, 1, 0, true);
if (he == NULL)
goto out;
@@ -246,7 +246,7 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine)
goto out;
he = __hists__add_entry(&evsel->hists, &al, NULL,
- NULL, NULL, 1, 1, 0);
+ NULL, NULL, 1, 1, 0, true);
if (he == NULL)
goto out;
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 50de02c478b4..a6bb3e329857 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -271,7 +271,8 @@ void hists__decay_entries(struct hists *hists, bool zap_user, bool zap_kernel)
* histogram, sorted on item, collects periods
*/
-static struct hist_entry *hist_entry__new(struct hist_entry *template)
+static struct hist_entry *hist_entry__new(struct hist_entry *template,
+ bool sample_self)
{
size_t callchain_size = 0;
struct hist_entry *he;
@@ -291,6 +292,8 @@ static struct hist_entry *hist_entry__new(struct hist_entry *template)
return NULL;
}
memcpy(he->stat_acc, &he->stat, sizeof(he->stat));
+ if (!sample_self)
+ memset(&he->stat, 0, sizeof(he->stat));
}
if (he->ms.map)
@@ -352,7 +355,8 @@ static u8 symbol__parent_filter(const struct symbol *parent)
static struct hist_entry *add_hist_entry(struct hists *hists,
struct hist_entry *entry,
- struct addr_location *al)
+ struct addr_location *al,
+ bool sample_self)
{
struct rb_node **p;
struct rb_node *parent = NULL;
@@ -376,7 +380,8 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
cmp = hist_entry__cmp(he, entry);
if (!cmp) {
- he_stat__add_period(&he->stat, period, weight);
+ if (sample_self)
+ he_stat__add_period(&he->stat, period, weight);
if (symbol_conf.cumulate_callchain)
he_stat__add_period(he->stat_acc, period, weight);
@@ -406,7 +411,7 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
p = &(*p)->rb_right;
}
- he = hist_entry__new(entry);
+ he = hist_entry__new(entry, sample_self);
if (!he)
return NULL;
@@ -414,7 +419,8 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
rb_link_node(&he->rb_node_in, parent, p);
rb_insert_color(&he->rb_node_in, hists->entries_in);
out:
- he_stat__add_cpumode_period(&he->stat, al->cpumode, period);
+ if (sample_self)
+ he_stat__add_cpumode_period(&he->stat, al->cpumode, period);
if (symbol_conf.cumulate_callchain)
he_stat__add_cpumode_period(he->stat_acc, al->cpumode, period);
return he;
@@ -425,7 +431,8 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
struct symbol *sym_parent,
struct branch_info *bi,
struct mem_info *mi,
- u64 period, u64 weight, u64 transaction)
+ u64 period, u64 weight, u64 transaction,
+ bool sample_self)
{
struct hist_entry entry = {
.thread = al->thread,
@@ -450,7 +457,7 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
.transaction = transaction,
};
- return add_hist_entry(hists, &entry, al);
+ return add_hist_entry(hists, &entry, al, sample_self);
}
int64_t
@@ -864,7 +871,7 @@ static struct hist_entry *hists__add_dummy_entry(struct hists *hists,
p = &(*p)->rb_right;
}
- he = hist_entry__new(pair);
+ he = hist_entry__new(pair, true);
if (he) {
memset(&he->stat, 0, sizeof(he->stat));
he->hists = hists;
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 7d1d973d9a39..212aec21735f 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -96,7 +96,8 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
struct symbol *parent,
struct branch_info *bi,
struct mem_info *mi, u64 period,
- u64 weight, u64 transaction);
+ u64 weight, u64 transaction,
+ bool sample_self);
int64_t hist_entry__cmp(struct hist_entry *left, struct hist_entry *right);
int64_t hist_entry__collapse(struct hist_entry *left, struct hist_entry *right);
int hist_entry__transaction_len(void);
--
1.7.11.7
There're some duplicate code when adding hist entries. They are
different in that some have branch info or mem info but generally do
same thing. So introduce new struct hist_entry_iter and add callbacks
to customize each case in general way.
The new perf_evsel__add_entry() function will look like:
iter->prepare_entry();
iter->add_single_entry();
while (iter->next_entry())
iter->add_next_entry();
iter->finish_entry();
This will help further work like the cumulative callchain patchset.
Cc: Jiri Olsa <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-report.c | 362 +++++++++++++++++++++++++++++++++-----------
1 file changed, 273 insertions(+), 89 deletions(-)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 6bfcb83db8ff..c11c4e8493da 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -93,29 +93,68 @@ static int hist_entry__append_callchain(struct hist_entry *he, struct perf_sampl
return callchain_append(he->callchain, &callchain_cursor, sample->period);
}
-static int report__add_mem_hist_entry(struct perf_tool *tool, struct addr_location *al,
- struct perf_sample *sample, struct perf_evsel *evsel,
- union perf_event *event)
-{
- struct report *rep = container_of(tool, struct report, tool);
- struct symbol *parent = NULL;
- u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+struct hist_entry_iter {
+ int total;
+ int curr;
+
+ struct report *rep;
+ union perf_event *event;
+ struct perf_evsel *evsel;
+ struct perf_sample *sample;
struct hist_entry *he;
- struct mem_info *mi, *mx;
- uint64_t cost;
- int err = report__resolve_callchain(rep, &parent, evsel, al, sample);
+ struct symbol *parent;
+ void *priv;
+
+ int (*prepare_entry)(struct hist_entry_iter *, struct addr_location *);
+ int (*add_single_entry)(struct hist_entry_iter *, struct addr_location *);
+ int (*next_entry)(struct hist_entry_iter *, struct addr_location *);
+ int (*add_next_entry)(struct hist_entry_iter *, struct addr_location *);
+ int (*finish_entry)(struct hist_entry_iter *, struct addr_location *);
+};
- if (err)
- return err;
+static int
+iter_next_nop_entry(struct hist_entry_iter *iter __maybe_unused,
+ struct addr_location *al __maybe_unused)
+{
+ return 0;
+}
+
+static int
+iter_add_next_nop_entry(struct hist_entry_iter *iter __maybe_unused,
+ struct addr_location *al __maybe_unused)
+{
+ return 0;
+}
+
+static int
+iter_prepare_mem_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ union perf_event *event = iter->event;
+ struct perf_sample *sample = iter->sample;
+ struct mem_info *mi;
+ u8 cpumode;
+
+ cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
mi = machine__resolve_mem(al->machine, al->thread, sample, cpumode);
- if (!mi)
+ if (mi == NULL)
return -ENOMEM;
- if (rep->hide_unresolved && !al->sym)
- return 0;
+ iter->priv = mi;
+ return 0;
+}
+
+static int
+iter_add_single_mem_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ u64 cost;
+ struct mem_info *mi = iter->priv;
+ struct hist_entry *he;
+
+ if (mi == NULL)
+ return -EINVAL;
- cost = sample->weight;
+ cost = iter->sample->weight;
if (!cost)
cost = 1;
@@ -126,11 +165,27 @@ static int report__add_mem_hist_entry(struct perf_tool *tool, struct addr_locati
* and this is indirectly achieved by passing period=weight here
* and the he_stat__add_period() function.
*/
- he = __hists__add_entry(&evsel->hists, al, parent, NULL, mi,
+ he = __hists__add_entry(&iter->evsel->hists, al, iter->parent, NULL, mi,
cost, cost, 0);
if (!he)
return -ENOMEM;
+ iter->he = he;
+ return 0;
+}
+
+static int
+iter_finish_mem_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ struct perf_evsel *evsel = iter->evsel;
+ struct hist_entry *he = iter->he;
+ struct mem_info *mx;
+ int err = -EINVAL;
+ u64 cost;
+
+ if (he == NULL)
+ goto out;
+
err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
if (err)
goto out;
@@ -140,97 +195,222 @@ static int report__add_mem_hist_entry(struct perf_tool *tool, struct addr_locati
if (err)
goto out;
+ cost = iter->sample->weight;
+ if (!cost)
+ cost = 1;
+
evsel->hists.stats.total_period += cost;
hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
- err = hist_entry__append_callchain(he, sample);
+
+ err = hist_entry__append_callchain(he, iter->sample);
+
out:
+ /*
+ * We don't need to free iter->priv (mem_info) here since
+ * the mem info was either already freed in add_hist_entry() or
+ * passed to a new hist entry by hist_entry__new().
+ */
+ iter->priv = NULL;
+
+ iter->he = NULL;
return err;
}
-static int report__add_branch_hist_entry(struct perf_tool *tool, struct addr_location *al,
- struct perf_sample *sample, struct perf_evsel *evsel)
+static int
+iter_prepare_branch_entry(struct hist_entry_iter *iter, struct addr_location *al)
{
- struct report *rep = container_of(tool, struct report, tool);
- struct symbol *parent = NULL;
- unsigned i;
- struct hist_entry *he;
- struct branch_info *bi, *bx;
- int err = report__resolve_callchain(rep, &parent, evsel, al, sample);
-
- if (err)
- return err;
+ struct branch_info *bi;
+ struct perf_sample *sample = iter->sample;
bi = machine__resolve_bstack(al->machine, al->thread,
sample->branch_stack);
if (!bi)
return -ENOMEM;
- for (i = 0; i < sample->branch_stack->nr; i++) {
- if (rep->hide_unresolved && !(bi[i].from.sym && bi[i].to.sym))
- continue;
+ iter->curr = 0;
+ iter->total = sample->branch_stack->nr;
- err = -ENOMEM;
+ iter->priv = bi;
+ return 0;
+}
- /* overwrite the 'al' to branch-to info */
- al->map = bi[i].to.map;
- al->sym = bi[i].to.sym;
- al->addr = bi[i].to.addr;
- /*
- * The report shows the percentage of total branches captured
- * and not events sampled. Thus we use a pseudo period of 1.
- */
- he = __hists__add_entry(&evsel->hists, al, parent, &bi[i], NULL,
- 1, 1, 0);
- if (he) {
- bx = he->branch_info;
- err = addr_map_symbol__inc_samples(&bx->from, evsel->idx);
- if (err)
- goto out;
-
- err = addr_map_symbol__inc_samples(&bx->to, evsel->idx);
- if (err)
- goto out;
-
- evsel->hists.stats.total_period += 1;
- hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
- } else
- goto out;
- }
- err = 0;
-out:
- free(bi);
- return err;
+static int
+iter_add_single_branch_entry(struct hist_entry_iter *iter __maybe_unused,
+ struct addr_location *al __maybe_unused)
+{
+ return 0;
}
-static int report__add_hist_entry(struct perf_tool *tool, struct perf_evsel *evsel,
- struct addr_location *al, struct perf_sample *sample)
+static int
+iter_next_branch_entry(struct hist_entry_iter *iter, struct addr_location *al)
{
- struct report *rep = container_of(tool, struct report, tool);
- struct symbol *parent = NULL;
+ struct branch_info *bi = iter->priv;
+ int i = iter->curr;
+
+ if (bi == NULL)
+ return 0;
+
+ if (iter->curr >= iter->total)
+ return 0;
+
+ al->map = bi[i].to.map;
+ al->sym = bi[i].to.sym;
+ al->addr = bi[i].to.addr;
+ return 1;
+}
+
+static int
+iter_add_next_branch_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ struct branch_info *bi, *bx;
+ struct perf_evsel *evsel = iter->evsel;
struct hist_entry *he;
- int err = report__resolve_callchain(rep, &parent, evsel, al, sample);
+ int i = iter->curr;
+ int err = 0;
+
+ bi = iter->priv;
+
+ if (iter->rep->hide_unresolved && !(bi[i].from.sym && bi[i].to.sym))
+ goto out;
+ /*
+ * The report shows the percentage of total branches captured
+ * and not events sampled. Thus we use a pseudo period of 1.
+ */
+ he = __hists__add_entry(&evsel->hists, al, iter->parent, &bi[i], NULL,
+ 1, 1, 0);
+ if (he == NULL)
+ return -ENOMEM;
+
+ bx = he->branch_info;
+ err = addr_map_symbol__inc_samples(&bx->from, evsel->idx);
if (err)
- return err;
+ goto out;
+
+ err = addr_map_symbol__inc_samples(&bx->to, evsel->idx);
+ if (err)
+ goto out;
+
+ evsel->hists.stats.total_period += 1;
+ hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+
+out:
+ iter->curr++;
+ return err;
+}
+
+static int
+iter_finish_branch_entry(struct hist_entry_iter *iter,
+ struct addr_location *al __maybe_unused)
+{
+ zfree(&iter->priv);
- he = __hists__add_entry(&evsel->hists, al, parent, NULL, NULL,
+ return iter->curr >= iter->total ? 0 : -1;
+}
+
+static int
+iter_prepare_normal_entry(struct hist_entry_iter *iter __maybe_unused,
+ struct addr_location *al __maybe_unused)
+{
+ return 0;
+}
+
+static int
+iter_add_single_normal_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ struct perf_evsel *evsel = iter->evsel;
+ struct perf_sample *sample = iter->sample;
+ struct hist_entry *he;
+
+ he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
sample->period, sample->weight,
sample->transaction);
if (he == NULL)
return -ENOMEM;
- err = hist_entry__append_callchain(he, sample);
- if (err)
- goto out;
+ iter->he = he;
+ return 0;
+}
+
+static int
+iter_finish_normal_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ int err;
+ struct hist_entry *he = iter->he;
+ struct perf_evsel *evsel = iter->evsel;
+ struct perf_sample *sample = iter->sample;
+
+ if (he == NULL)
+ return 0;
+
+ iter->he = NULL;
err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+ if (err)
+ return err;
+
evsel->hists.stats.total_period += sample->period;
hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+
+ return hist_entry__append_callchain(he, sample);
+}
+
+static struct hist_entry_iter mem_iter = {
+ .prepare_entry = iter_prepare_mem_entry,
+ .add_single_entry = iter_add_single_mem_entry,
+ .next_entry = iter_next_nop_entry,
+ .add_next_entry = iter_add_next_nop_entry,
+ .finish_entry = iter_finish_mem_entry,
+};
+
+static struct hist_entry_iter branch_iter = {
+ .prepare_entry = iter_prepare_branch_entry,
+ .add_single_entry = iter_add_single_branch_entry,
+ .next_entry = iter_next_branch_entry,
+ .add_next_entry = iter_add_next_branch_entry,
+ .finish_entry = iter_finish_branch_entry,
+};
+
+static struct hist_entry_iter normal_iter = {
+ .prepare_entry = iter_prepare_normal_entry,
+ .add_single_entry = iter_add_single_normal_entry,
+ .next_entry = iter_next_nop_entry,
+ .add_next_entry = iter_add_next_nop_entry,
+ .finish_entry = iter_finish_normal_entry,
+};
+
+static int
+iter_add_entry(struct hist_entry_iter *iter, struct addr_location *al)
+{
+ int err, err2;
+
+ err = report__resolve_callchain(iter->rep, &iter->parent, iter->evsel,
+ al, iter->sample);
+ if (err)
+ return err;
+
+ err = iter->prepare_entry(iter, al);
+ if (err)
+ goto out;
+
+ err = iter->add_single_entry(iter, al);
+ if (err)
+ goto out;
+
+ while (iter->next_entry(iter, al)) {
+ err = iter->add_next_entry(iter, al);
+ if (err)
+ break;
+ }
+
out:
+ err2 = iter->finish_entry(iter, al);
+ if (!err)
+ err = err2;
+
return err;
}
-
static int process_sample_event(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
@@ -239,6 +419,7 @@ static int process_sample_event(struct perf_tool *tool,
{
struct report *rep = container_of(tool, struct report, tool);
struct addr_location al;
+ struct hist_entry_iter *iter;
int ret;
if (perf_event__preprocess_sample(event, machine, &al, sample) < 0) {
@@ -253,22 +434,25 @@ static int process_sample_event(struct perf_tool *tool,
if (rep->cpu_list && !test_bit(sample->cpu, rep->cpu_bitmap))
return 0;
- if (sort__mode == SORT_MODE__BRANCH) {
- ret = report__add_branch_hist_entry(tool, &al, sample, evsel);
- if (ret < 0)
- pr_debug("problem adding lbr entry, skipping event\n");
- } else if (rep->mem_mode == 1) {
- ret = report__add_mem_hist_entry(tool, &al, sample, evsel, event);
- if (ret < 0)
- pr_debug("problem adding mem entry, skipping event\n");
- } else {
- if (al.map != NULL)
- al.map->dso->hit = 1;
-
- ret = report__add_hist_entry(tool, evsel, &al, sample);
- if (ret < 0)
- pr_debug("problem incrementing symbol period, skipping event\n");
- }
+ if (sort__mode == SORT_MODE__BRANCH)
+ iter = &branch_iter;
+ else if (rep->mem_mode == 1)
+ iter = &mem_iter;
+ else
+ iter = &normal_iter;
+
+ if (al.map != NULL)
+ al.map->dso->hit = 1;
+
+ iter->rep = rep;
+ iter->evsel = evsel;
+ iter->event = event;
+ iter->sample = sample;
+
+ ret = iter_add_entry(iter, &al);
+ if (ret < 0)
+ pr_debug("problem adding hist entry, skipping event\n");
+
return ret;
}
--
1.7.11.7
Em Wed, Jan 08, 2014 at 05:46:06PM +0900, Namhyung Kim escreveu:
> Currently if a sample was filtered by command line option, it just
> dropped. But this affects final output in that the percentage can be
> different since the filtered entries were not included to the total.
>
> For example, if an original output looked like below:
Humm, if one says that he/she is interested on just samples for a and b,
the current behaviour will state how many of the filtered samples are
for a and b, which is valid.
I bet the number of samples will reflect that as well, but you filtered
it out, yes, it stays there, so the percentages are relative to the
number of samples.
So I think this change in behaviour is wrong, no?
Example used:
[root@ssdandy ~]# perf report --stdio -s comm
# To display the perf.data header info, please use
# --header/--header-only options.
#
# Samples: 158 of event 'cycles'
# Event count (approx.): 75273747
#
# Overhead Command
# ........ ...............
#
42.65% swapper
25.58% qemu-kvm
8.85% Xorg
7.81% gnome-shell
3.81% vhost-7290
3.25% gnome-terminal
2.74% gnome-settings-
2.68% irq/55-iwlwifi
1.71% perf
0.61% gnome-session
0.31% kworker/4:2
[root@ssdandy ~]#
[root@ssdandy ~]# perf report --stdio -s comm -c swapper,qemu-kvm
# To display the perf.data header info, please use
# --header/--header-only options.
#
# Samples: 97 of event 'cycles'
# Event count (approx.): 51360039
#
# Overhead Command
# ........ ........
#
62.50% swapper
37.50% qemu-kvm
[root@ssdandy ~]#
To get what you want, kinda:
[root@ssdandy ~]# perf report --stdio -s comm | egrep -w swapper\|qemu-kvm
42.65% swapper
25.58% qemu-kvm
[root@ssdandy ~]#
- Arnaldo
> $ perf report --stdio -s comm
>
> # Overhead Command
> # ........ .......
> #
> 74.19% cc1
> 7.61% gcc
> 6.11% as
> 4.35% sh
> 4.14% make
> 1.13% fixdep
>
> If we gave a filter, the output changed like below:
>
> $ perf report --stdio -s comm -c gcc,as
>
> # Overhead Command
> # ........ .......
> #
> 55.49% gcc
> 44.51% as
>
> But user might want to see this:
>
> $ perf report --stdio -s comm -c gcc,as
>
> # Overhead Command
> # ........ .......
> #
> 7.61% gcc
> 6.11% as
>
> Signed-off-by: Namhyung Kim <[email protected]>
> ---
> tools/perf/builtin-report.c | 2 +-
> tools/perf/util/event.c | 18 ++++++++----------
> tools/perf/util/hist.c | 11 ++---------
> tools/perf/util/hist.h | 7 +++++++
> tools/perf/util/symbol.h | 2 +-
> 5 files changed, 19 insertions(+), 21 deletions(-)
>
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index bf8dd2e893e4..6bfcb83db8ff 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -247,7 +247,7 @@ static int process_sample_event(struct perf_tool *tool,
> return -1;
> }
>
> - if (al.filtered || (rep->hide_unresolved && al.sym == NULL))
> + if (rep->hide_unresolved && al.sym == NULL)
> return 0;
>
> if (rep->cpu_list && !test_bit(sample->cpu, rep->cpu_bitmap))
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index 45a76c69a9ed..2c6a583bc2e9 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -3,6 +3,7 @@
> #include "debug.h"
> #include "machine.h"
> #include "sort.h"
> +#include "hist.h"
> #include "string.h"
> #include "strlist.h"
> #include "thread.h"
> @@ -663,7 +664,7 @@ void thread__find_addr_map(struct thread *thread,
> al->thread = thread;
> al->addr = addr;
> al->cpumode = cpumode;
> - al->filtered = false;
> + al->filtered = 0;
>
> if (machine == NULL) {
> al->map = NULL;
> @@ -750,9 +751,6 @@ int perf_event__preprocess_sample(const union perf_event *event,
> if (thread == NULL)
> return -1;
>
> - if (thread__is_filtered(thread))
> - goto out_filtered;
> -
> dump_printf(" ... thread: %s:%d\n", thread__comm_str(thread), thread->tid);
> /*
> * Have we already created the kernel maps for this machine?
> @@ -767,6 +765,10 @@ int perf_event__preprocess_sample(const union perf_event *event,
>
> thread__find_addr_map(thread, machine, cpumode, MAP__FUNCTION,
> sample->ip, al);
> +
> + if (thread__is_filtered(thread))
> + al->filtered |= (1 << HIST_FILTER__THREAD);
> +
> dump_printf(" ...... dso: %s\n",
> al->map ? al->map->dso->long_name :
> al->level == 'H' ? "[hypervisor]" : "<not found>");
> @@ -782,7 +784,7 @@ int perf_event__preprocess_sample(const union perf_event *event,
> (dso->short_name != dso->long_name &&
> strlist__has_entry(symbol_conf.dso_list,
> dso->long_name)))))
> - goto out_filtered;
> + al->filtered |= (1 << HIST_FILTER__DSO);
>
> al->sym = map__find_symbol(al->map, al->addr,
> machine->symbol_filter);
> @@ -791,11 +793,7 @@ int perf_event__preprocess_sample(const union perf_event *event,
> if (symbol_conf.sym_list &&
> (!al->sym || !strlist__has_entry(symbol_conf.sym_list,
> al->sym->name)))
> - goto out_filtered;
> -
> - return 0;
> + al->filtered |= (1 << HIST_FILTER__SYMBOL);
>
> -out_filtered:
> - al->filtered = true;
> return 0;
> }
> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> index 4ed3e883240d..64df6b96e7ea 100644
> --- a/tools/perf/util/hist.c
> +++ b/tools/perf/util/hist.c
> @@ -13,13 +13,6 @@ static bool hists__filter_entry_by_thread(struct hists *hists,
> static bool hists__filter_entry_by_symbol(struct hists *hists,
> struct hist_entry *he);
>
> -enum hist_filter {
> - HIST_FILTER__DSO,
> - HIST_FILTER__THREAD,
> - HIST_FILTER__PARENT,
> - HIST_FILTER__SYMBOL,
> -};
> -
> struct callchain_param callchain_param = {
> .mode = CHAIN_GRAPH_REL,
> .min_percent = 0.5,
> @@ -329,8 +322,8 @@ void hists__inc_nr_entries(struct hists *hists, struct hist_entry *h)
> if (!h->filtered) {
> hists__calc_col_len(hists, h);
> ++hists->nr_entries;
> - hists->stats.total_period += h->stat.period;
> }
> + hists->stats.total_period += h->stat.period;
> }
>
> static u8 symbol__parent_filter(const struct symbol *parent)
> @@ -429,7 +422,7 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
> .weight = weight,
> },
> .parent = sym_parent,
> - .filtered = symbol__parent_filter(sym_parent),
> + .filtered = symbol__parent_filter(sym_parent) | al->filtered,
> .hists = hists,
> .branch_info = bi,
> .mem_info = mi,
> diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
> index a59743fa3ef7..7d1d973d9a39 100644
> --- a/tools/perf/util/hist.h
> +++ b/tools/perf/util/hist.h
> @@ -14,6 +14,13 @@ struct hist_entry;
> struct addr_location;
> struct symbol;
>
> +enum hist_filter {
> + HIST_FILTER__DSO,
> + HIST_FILTER__THREAD,
> + HIST_FILTER__PARENT,
> + HIST_FILTER__SYMBOL,
> +};
> +
> /*
> * The kernel collects the number of events it couldn't send in a stretch and
> * when possible sends this number in a PERF_RECORD_LOST event. The number of
> diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> index cbd680361806..e42b410c3f11 100644
> --- a/tools/perf/util/symbol.h
> +++ b/tools/perf/util/symbol.h
> @@ -170,7 +170,7 @@ struct addr_location {
> struct symbol *sym;
> u64 addr;
> char level;
> - bool filtered;
> + u8 filtered;
> u8 cpumode;
> s32 cpu;
> };
> --
> 1.7.11.7
Em Wed, Jan 08, 2014 at 05:46:08PM +0900, Namhyung Kim escreveu:
> The machine__resolve_callchain() is called only if symbol_conf.
> use_callchain is set so no need to check it again.
Not really it may be called with use_callchain not set, I'm checking if
the logic is needed (doesn't look like).
- Arnaldo
> Signed-off-by: Namhyung Kim <[email protected]>
> ---
> tools/perf/util/machine.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index a98538dc465a..4dcc89490858 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -1313,8 +1313,6 @@ static int machine__resolve_callchain_sample(struct machine *machine,
> *root_al = al;
> callchain_cursor_reset(&callchain_cursor);
> }
> - if (!symbol_conf.use_callchain)
> - break;
> }
>
> err = callchain_cursor_append(&callchain_cursor,
> --
> 1.7.11.7
On Wed, Jan 08, 2014 at 09:41:13AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Jan 08, 2014 at 05:46:06PM +0900, Namhyung Kim escreveu:
> > Currently if a sample was filtered by command line option, it just
> > dropped. But this affects final output in that the percentage can be
> > different since the filtered entries were not included to the total.
> >
> > For example, if an original output looked like below:
>
> Humm, if one says that he/she is interested on just samples for a and b,
> the current behaviour will state how many of the filtered samples are
> for a and b, which is valid.
>
> I bet the number of samples will reflect that as well, but you filtered
> it out, yes, it stays there, so the percentages are relative to the
> number of samples.
>
> So I think this change in behaviour is wrong, no?
>
hi,
haven't checked the implementation yet, but it kind of does
what I'd expect for symbol filtering:
perf report
...
22.00% yes libc-2.17.so [.] __strlen_sse2
11.79% yes libc-2.17.so [.] fputs_unlocked
9.65% yes libc-2.17.so [.] __GI___mempcpy
1.91% yes yes [.] fputs_unlocked@plt
...
search (press '/') for fputs_unlocked (with Namhyung's change):
11.79% yes libc-2.17.so [.] fputs_unlocked
1.91% yes yes [.] fputs_unlocked@plt
while the current one shows:
86.08% yes libc-2.17.so [.] fputs_unlocked
13.92% yes yes [.] fputs_unlocked@plt
which annoys me when searching for 'invisible' symbol
within tons of others.. I had to do that grep thing
you showed.
I'd like to have the Namhyung's change behaviour as default,
but I'll be happy with some switch as well ;-)
thanks,
jirka
Em Wed, Jan 08, 2014 at 05:22:53PM +0100, Jiri Olsa escreveu:
> On Wed, Jan 08, 2014 at 09:41:13AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Jan 08, 2014 at 05:46:06PM +0900, Namhyung Kim escreveu:
> > > Currently if a sample was filtered by command line option, it just
> > > dropped. But this affects final output in that the percentage can be
> > > different since the filtered entries were not included to the total.
> > >
> > > For example, if an original output looked like below:
> >
> > Humm, if one says that he/she is interested on just samples for a and b,
> > the current behaviour will state how many of the filtered samples are
> > for a and b, which is valid.
> >
> > I bet the number of samples will reflect that as well, but you filtered
> > it out, yes, it stays there, so the percentages are relative to the
> > number of samples.
> >
> > So I think this change in behaviour is wrong, no?
> >
> hi,
> haven't checked the implementation yet, but it kind of does
> what I'd expect for symbol filtering:
>
> perf report
> ...
> 22.00% yes libc-2.17.so [.] __strlen_sse2
> 11.79% yes libc-2.17.so [.] fputs_unlocked
> 9.65% yes libc-2.17.so [.] __GI___mempcpy
> 1.91% yes yes [.] fputs_unlocked@plt
> ...
>
> search (press '/') for fputs_unlocked (with Namhyung's change):
> 11.79% yes libc-2.17.so [.] fputs_unlocked
> 1.91% yes yes [.] fputs_unlocked@plt
>
> while the current one shows:
> 86.08% yes libc-2.17.so [.] fputs_unlocked
> 13.92% yes yes [.] fputs_unlocked@plt
>
> which annoys me when searching for 'invisible' symbol
> within tons of others.. I had to do that grep thing
> you showed.
>
> I'd like to have the Namhyung's change behaviour as default,
> but I'll be happy with some switch as well ;-)
I understand the desire for this different mode, looks indeed useful.
So I think that this is a new feature and as so we should provide it as
an option, that may (or not) become the default.
Some concerns I have are that when we go on filtering we have to have
all the things that are zeroed to then get accrued for each hist entry
that matches the filter being applied and now at least a nr_entries
field got out of the if (al.filtered) block, i.e. in the end we will
have the number of hist entries entries filtered but continue having the
total period for all (filtered or not) hist entries.
Having it as a separate feature would allow to have both views:
1. the percentages relative to the filtered samples
2. the percentages relative to all (filtered or not) samples
Being selectable on the command line and also with a hotkey to provide
two columns: %total, %filtered.
So we would have new field: hists->stats.total_filtered_period, and
hists->stats.nr_filtered_entries, for this, etc.
What do you think?
- Arnaldo
2014-01-08 (수), 15:59 -0300, Arnaldo Carvalho de Melo:
> Em Wed, Jan 08, 2014 at 05:22:53PM +0100, Jiri Olsa escreveu:
> > On Wed, Jan 08, 2014 at 09:41:13AM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Wed, Jan 08, 2014 at 05:46:06PM +0900, Namhyung Kim escreveu:
> > > > Currently if a sample was filtered by command line option, it just
> > > > dropped. But this affects final output in that the percentage can be
> > > > different since the filtered entries were not included to the total.
> > > >
> > > > For example, if an original output looked like below:
> > >
> > > Humm, if one says that he/she is interested on just samples for a and b,
> > > the current behaviour will state how many of the filtered samples are
> > > for a and b, which is valid.
> > >
> > > I bet the number of samples will reflect that as well, but you filtered
> > > it out, yes, it stays there, so the percentages are relative to the
> > > number of samples.
> > >
> > > So I think this change in behaviour is wrong, no?
> > >
> > hi,
> > haven't checked the implementation yet, but it kind of does
> > what I'd expect for symbol filtering:
> >
> > perf report
> > ...
> > 22.00% yes libc-2.17.so [.] __strlen_sse2
> > 11.79% yes libc-2.17.so [.] fputs_unlocked
> > 9.65% yes libc-2.17.so [.] __GI___mempcpy
> > 1.91% yes yes [.] fputs_unlocked@plt
> > ...
> >
> > search (press '/') for fputs_unlocked (with Namhyung's change):
> > 11.79% yes libc-2.17.so [.] fputs_unlocked
> > 1.91% yes yes [.] fputs_unlocked@plt
> >
> > while the current one shows:
> > 86.08% yes libc-2.17.so [.] fputs_unlocked
> > 13.92% yes yes [.] fputs_unlocked@plt
> >
> > which annoys me when searching for 'invisible' symbol
> > within tons of others.. I had to do that grep thing
> > you showed.
> >
> > I'd like to have the Namhyung's change behaviour as default,
> > but I'll be happy with some switch as well ;-)
>
> I understand the desire for this different mode, looks indeed useful.
Yeah, the above is the reason why I wrote this firstly. And then I
thought it should be applied to the command line filter options too.
>
> So I think that this is a new feature and as so we should provide it as
> an option, that may (or not) become the default.
>
> Some concerns I have are that when we go on filtering we have to have
> all the things that are zeroed to then get accrued for each hist entry
> that matches the filter being applied and now at least a nr_entries
> field got out of the if (al.filtered) block, i.e. in the end we will
> have the number of hist entries entries filtered but continue having the
> total period for all (filtered or not) hist entries.
One thing related to it is when --children option is used. Since total
period is added only for a real sample, if the sample is filtered but
the parents are not, the parents might have more than 100% overhead.
>
> Having it as a separate feature would allow to have both views:
>
> 1. the percentages relative to the filtered samples
> 2. the percentages relative to all (filtered or not) samples
>
> Being selectable on the command line and also with a hotkey to provide
> two columns: %total, %filtered.
Hmm.. do you really want two columns instead of single column and a
switch/option? Then the (second) %filtered column will be shown up only
if filtering is enabled. Isn't it annoying for a dynamic filtering
(i.e. '/' key on TUI)?
>
> So we would have new field: hists->stats.total_filtered_period, and
> hists->stats.nr_filtered_entries, for this, etc.
>
> What do you think?
I'm fine with it if we decide to support two columns at the same time.
Thanks,
Namhyung
2014-01-08 (수), 09:57 -0300, Arnaldo Carvalho de Melo:
> Em Wed, Jan 08, 2014 at 05:46:08PM +0900, Namhyung Kim escreveu:
> > The machine__resolve_callchain() is called only if symbol_conf.
> > use_callchain is set so no need to check it again.
>
> Not really it may be called with use_callchain not set, I'm checking if
> the logic is needed (doesn't look like).
To be more precise, it's called with use_callchain not set - if "--sort
parent" was given on the command line. But even in this case we need to
proceed the callchain traversal, if not the parent column will have
invalid results since it only sees the first entry (usually itself) and
then stops.
$ perf report -s sym,parent
+ 2.66% [k] rb_next sys_ioctl
+ 1.52% [.] pthread_mutex_lock [other]
+ 0.92% [k] page_fault [other]
+ 0.92% [.] 0x00000000008dca3f [other]
+ 0.86% [.] 0x0000000000a052e9 [other]
+ 0.76% [.] pthread_mutex_unlock [other]
+ 0.73% [.] 0x0000000000976361 [other]
+ 0.69% [k] find_vma sys_mmap_pgoff
$ perf report -s sym,parent -g none
+ 2.66% [k] rb_next [other]
+ 1.52% [.] pthread_mutex_lock [other]
+ 0.92% [k] page_fault [other]
+ 0.92% [.] 0x00000000008dca3f [other]
+ 0.86% [.] 0x0000000000a052e9 [other]
+ 0.76% [.] pthread_mutex_unlock [other]
+ 0.73% [.] 0x0000000000976361 [other]
+ 0.69% [k] find_vma [other]
With --children patchset, we'll need to traverse the callstack
regardless of the use_callchain anyway. So we need this patch.
Thanks,
Namhyung
Em Thu, Jan 09, 2014 at 09:57:35PM +0900, Namhyung Kim escreveu:
> 2014-01-08 (수), 15:59 -0300, Arnaldo Carvalho de Melo:
> > Em Wed, Jan 08, 2014 at 05:22:53PM +0100, Jiri Olsa escreveu:
> > > On Wed, Jan 08, 2014 at 09:41:13AM -0300, Arnaldo Carvalho de Melo wrote:
> > > > Em Wed, Jan 08, 2014 at 05:46:06PM +0900, Namhyung Kim escreveu:
> > > > > Currently if a sample was filtered by command line option, it just
> > > > > dropped. But this affects final output in that the percentage can be
> > > > > different since the filtered entries were not included to the total.
> > > > >
> > > > > For example, if an original output looked like below:
> > > >
> > > > Humm, if one says that he/she is interested on just samples for a and b,
> > > > the current behaviour will state how many of the filtered samples are
> > > > for a and b, which is valid.
> > > >
> > > > I bet the number of samples will reflect that as well, but you filtered
> > > > it out, yes, it stays there, so the percentages are relative to the
> > > > number of samples.
> > > >
> > > > So I think this change in behaviour is wrong, no?
> > > haven't checked the implementation yet, but it kind of does
> > > what I'd expect for symbol filtering:
> > > perf report
> > > ...
> > > 22.00% yes libc-2.17.so [.] __strlen_sse2
> > > 11.79% yes libc-2.17.so [.] fputs_unlocked
> > > 9.65% yes libc-2.17.so [.] __GI___mempcpy
> > > 1.91% yes yes [.] fputs_unlocked@plt
> > > ...
> > >
> > > search (press '/') for fputs_unlocked (with Namhyung's change):
> > > 11.79% yes libc-2.17.so [.] fputs_unlocked
> > > 1.91% yes yes [.] fputs_unlocked@plt
> > >
> > > while the current one shows:
> > > 86.08% yes libc-2.17.so [.] fputs_unlocked
> > > 13.92% yes yes [.] fputs_unlocked@plt
> > >
> > > which annoys me when searching for 'invisible' symbol
> > > within tons of others.. I had to do that grep thing
> > > you showed.
> > >
> > > I'd like to have the Namhyung's change behaviour as default,
> > > but I'll be happy with some switch as well ;-)
> >
> > I understand the desire for this different mode, looks indeed useful.
>
> Yeah, the above is the reason why I wrote this firstly. And then I
> thought it should be applied to the command line filter options too.
I don't have a problem with providing a new option, but for those who
think that when you filter samples based on some criteria the
percentages that should appear should be relative to the new, filtered,
total_period, that is a change in behaviour, so needs to be switchable.
> > So I think that this is a new feature and as so we should provide it as
> > an option, that may (or not) become the default.
> >
> > Some concerns I have are that when we go on filtering we have to have
> > all the things that are zeroed to then get accrued for each hist entry
> > that matches the filter being applied and now at least a nr_entries
> > field got out of the if (al.filtered) block, i.e. in the end we will
> > have the number of hist entries entries filtered but continue having the
> > total period for all (filtered or not) hist entries.
> One thing related to it is when --children option is used. Since total
> period is added only for a real sample, if the sample is filtered but
> the parents are not, the parents might have more than 100% overhead.
So when implementing the new option this has to be taken into account,
no problem (haven't really thought about the full implications here).
> > Having it as a separate feature would allow to have both views:
> >
> > 1. the percentages relative to the filtered samples
> > 2. the percentages relative to all (filtered or not) samples
> >
> > Being selectable on the command line and also with a hotkey to provide
> > two columns: %total, %filtered.
>
> Hmm.. do you really want two columns instead of single column and a
> switch/option? Then the (second) %filtered column will be shown up only
> if filtering is enabled. Isn't it annoying for a dynamic filtering
> (i.e. '/' key on TUI)?
Hey, I'm not the one to decide this :-)
There _are_ two choices for how the percentage gets computed, if one
wants one, the other, or both, well, the hard part here is to decide the
default, but there are two options, showing one, the other or both
should be left to the user, even if after one or two keystrokes :)
> > So we would have new field: hists->stats.total_filtered_period, and
> > hists->stats.nr_filtered_entries, for this, etc.
> >
> > What do you think?
>
> I'm fine with it if we decide to support two columns at the same time.
See above, the difficult part is to figure out what should appear by
default.
- Arnaldo
On Wed, Jan 08, 2014 at 05:46:15PM +0900, Namhyung Kim wrote:
SNIP
> goto out;
> }
>
> - if (al->map->groups == &iter->machine->kmaps) {
> - if (machine__is_host(iter->machine)) {
> + if (al->map->groups == &al->machine->kmaps) {
> + if (machine__is_host(al->machine)) {
> al->cpumode = PERF_RECORD_MISC_KERNEL;
> al->level = 'k';
> } else {
> @@ -417,7 +440,7 @@ iter_next_cumulative_entry(struct hist_entry_iter *iter,
> al->level = 'g';
> }
> } else {
> - if (machine__is_host(iter->machine)) {
> + if (machine__is_host(al->machine)) {
> al->cpumode = PERF_RECORD_MISC_USER;
> al->level = '.';
> } else if (perf_guest) {
> @@ -440,7 +463,29 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
> {
> struct perf_evsel *evsel = iter->evsel;
> struct perf_sample *sample = iter->sample;
> + struct hist_entry **he_cache = iter->priv;
> struct hist_entry *he;
> + struct hist_entry he_tmp = {
> + .cpu = al->cpu,
> + .thread = al->thread,
> + .comm = thread__comm(al->thread),
> + .ip = al->addr,
> + .ms = {
> + .map = al->map,
> + .sym = al->sym,
> + },
> + .parent = iter->parent,
> + };
> + int i;
> +
> + /*
> + * Check if there's duplicate entries in the callchain.
> + * It's possible that it has cycles or recursive calls.
> + */
> + for (i = 0; i < iter->curr; i++) {
> + if (hist_entry__cmp(he_cache[i], &he_tmp) == 0)
we need here:
iter->he = he_cache[i];
> + return 0;
> + }
otherwise iter->he and al are not in sync and
hist_entry__inc_addr_samples fails in hist_iter_cb
jirka
On Wed, Jan 08, 2014 at 05:46:15PM +0900, Namhyung Kim wrote:
> It is possble that a callchain has cycles or recursive calls. In that
> case it'll end up having entries more than 100% overhead in the
> output. In order to prevent such entries, cache each callchain node
> and skip if same entry already cumulated.
>
> Cc: Arun Sharma <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Signed-off-by: Namhyung Kim <[email protected]>
> ---
> tools/perf/builtin-report.c | 54 ++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 51 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index 06330736485b..087d01cac1aa 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -363,7 +363,27 @@ static int
> iter_prepare_cumulative_entry(struct hist_entry_iter *iter __maybe_unused,
> struct addr_location *al __maybe_unused)
> {
> + struct callchain_cursor_node *node;
> + struct hist_entry **he_cache;
> +
> callchain_cursor_commit(&callchain_cursor);
> +
> + /*
> + * This is for detecting cycles or recursions so that they're
> + * cumulated only one time to prevent entries more than 100%
> + * overhead.
> + */
> + he_cache = malloc(sizeof(*he_cache) * (PERF_MAX_STACK_DEPTH + 1));
> + if (he_cache == NULL)
> + return -ENOMEM;
> +
> + iter->priv = he_cache;
> + iter->curr = 0;
> +
> + node = callchain_cursor_current(&callchain_cursor);
> + if (node == NULL)
> + return 0;
some leftover? looks like nop..
jirka
> +
> return 0;
> }
>
SNIP
On Wed, Jan 08, 2014 at 05:46:28PM +0900, Namhyung Kim wrote:
> Now the hist_entry_iter code will be shared with perf top code base.
> So move it to util/hist.c and do some necessary cleanups and renames.
factoring code from this very patchset.. looks like it could be easily
squashed with:
perf tools: Introduce struct hist_entry_iter
jirka
>
> Cc: Arun Sharma <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Signed-off-by: Namhyung Kim <[email protected]>
> ---
> tools/perf/builtin-report.c | 468 +-------------------------------------------
> tools/perf/util/hist.c | 441 +++++++++++++++++++++++++++++++++++++++++
> tools/perf/util/hist.h | 30 +++
> 3 files changed, 477 insertions(+), 462 deletions(-)
>
SNIP
> - err = iter->prepare_entry(iter, al);
> - if (err)
> - goto out;
> -
> - err = iter->add_single_entry(iter, al);
> - if (err)
> - goto out;
> -
> - while (iter->next_entry(iter, al)) {
> - err = iter->add_next_entry(iter, al);
> - if (err)
> - break;
> - }
> -
> -out:
> - err2 = iter->finish_entry(iter, al);
> - if (!err)
> - err = err2;
> -
> - return err;
> -}
> -
> static int process_sample_event(struct perf_tool *tool,
> union perf_event *event,
> struct perf_sample *sample,
> @@ -555,23 +103,19 @@ static int process_sample_event(struct perf_tool *tool,
> return 0;
>
> if (sort__mode == SORT_MODE__BRANCH)
> - iter = &branch_iter;
> + iter = &hist_iter_branch;
> else if (rep->mem_mode == 1)
> - iter = &mem_iter;
> + iter = &hist_iter_mem;
> else if (symbol_conf.cumulate_callchain)
> - iter = &cumulative_iter;
> + iter = &hist_iter_cumulative;
> else
> - iter = &normal_iter;
> + iter = &hist_iter_normal;
looks like we could add 'struct hist_entry_iter_ops' and ops
pointer in 'struct hist_entry_iter' ...just feel better ;-)
jirka
On Wed, Jan 08, 2014 at 05:46:30PM +0900, Namhyung Kim wrote:
> Reuse hist_entry_iter__add() function to share the similar code with
> perf report. Note that it needs to be called with hists.lock so tweak
> some internal functions not to deadlock or hold the lock too long.
>
> Signed-off-by: Namhyung Kim <[email protected]>
> ---
> tools/perf/builtin-top.c | 75 ++++++++++++++++++++++++------------------------
> 1 file changed, 37 insertions(+), 38 deletions(-)
>
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index f0f55e6030cd..cf330c66bed7 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -186,9 +186,6 @@ static void perf_top__record_precise_ip(struct perf_top *top,
> sym = he->ms.sym;
> notes = symbol__annotation(sym);
>
> - if (pthread_mutex_trylock(¬es->lock))
> - return;
> -
> ip = he->ms.map->map_ip(he->ms.map, ip);
> err = hist_entry__inc_addr_samples(he, counter, ip);
>
> @@ -201,6 +198,8 @@ static void perf_top__record_precise_ip(struct perf_top *top,
> sym->name);
> sleep(1);
> }
> +
> + pthread_mutex_lock(¬es->lock);
> }
locking on function exit.. does not look right ;-)
jirka
Hi Jiri,
On Sat, 11 Jan 2014 17:02:54 +0100, Jiri Olsa wrote:
> On Wed, Jan 08, 2014 at 05:46:15PM +0900, Namhyung Kim wrote:
>> It is possble that a callchain has cycles or recursive calls. In that
>> case it'll end up having entries more than 100% overhead in the
>> output. In order to prevent such entries, cache each callchain node
>> and skip if same entry already cumulated.
[SNIP]
>> + node = callchain_cursor_current(&callchain_cursor);
>> + if (node == NULL)
>> + return 0;
>
> some leftover? looks like nop..
Ah, right. I'll get rid of it.
Thanks,
Namhyung
On Sat, 11 Jan 2014 17:24:08 +0100, Jiri Olsa wrote:
> On Wed, Jan 08, 2014 at 05:46:28PM +0900, Namhyung Kim wrote:
>> Now the hist_entry_iter code will be shared with perf top code base.
>> So move it to util/hist.c and do some necessary cleanups and renames.
>
> factoring code from this very patchset.. looks like it could be easily
> squashed with:
> perf tools: Introduce struct hist_entry_iter
OK, I'll squash it. It was a result of extending it to support perf top
but yes, it'd be better if it's in the separate code from the beginning.
[SNIP]
>> @@ -555,23 +103,19 @@ static int process_sample_event(struct perf_tool *tool,
>> return 0;
>>
>> if (sort__mode == SORT_MODE__BRANCH)
>> - iter = &branch_iter;
>> + iter = &hist_iter_branch;
>> else if (rep->mem_mode == 1)
>> - iter = &mem_iter;
>> + iter = &hist_iter_mem;
>> else if (symbol_conf.cumulate_callchain)
>> - iter = &cumulative_iter;
>> + iter = &hist_iter_cumulative;
>> else
>> - iter = &normal_iter;
>> + iter = &hist_iter_normal;
>
> looks like we could add 'struct hist_entry_iter_ops' and ops
> pointer in 'struct hist_entry_iter' ...just feel better ;-)
Yes, it looks better - I'll change it too.
Thanks for your review!
Namhyung
On Sat, 11 Jan 2014 17:35:29 +0100, Jiri Olsa wrote:
> On Wed, Jan 08, 2014 at 05:46:30PM +0900, Namhyung Kim wrote:
>> Reuse hist_entry_iter__add() function to share the similar code with
>> perf report. Note that it needs to be called with hists.lock so tweak
>> some internal functions not to deadlock or hold the lock too long.
>>
>> Signed-off-by: Namhyung Kim <[email protected]>
>> ---
>> tools/perf/builtin-top.c | 75 ++++++++++++++++++++++++------------------------
>> 1 file changed, 37 insertions(+), 38 deletions(-)
>>
>> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
>> index f0f55e6030cd..cf330c66bed7 100644
>> --- a/tools/perf/builtin-top.c
>> +++ b/tools/perf/builtin-top.c
>> @@ -186,9 +186,6 @@ static void perf_top__record_precise_ip(struct perf_top *top,
>> sym = he->ms.sym;
>> notes = symbol__annotation(sym);
>>
>> - if (pthread_mutex_trylock(¬es->lock))
>> - return;
>> -
>> ip = he->ms.map->map_ip(he->ms.map, ip);
>> err = hist_entry__inc_addr_samples(he, counter, ip);
>>
>> @@ -201,6 +198,8 @@ static void perf_top__record_precise_ip(struct perf_top *top,
>> sym->name);
>> sleep(1);
>> }
>> +
>> + pthread_mutex_lock(¬es->lock);
>> }
>
> locking on function exit.. does not look right ;-)
Yes, it looks weird.. but it's because of the change in locking. After
I changed perf top to use the hist_entry_iter, it needed to protect
the whole hist_entry_iter__add() by hists->lock so the
perf_top__record_precise_ip() should be called with the lock acquired.
I might remove the lock in the function completely, but it seemed not
good since it goes to sleep with the lock then. So I added the
(reverse) unlock/lock dance to avoid it.
So yes, I think I need to add a comment to explain this. :)
Thanks,
Namhyung
On Mon, 13 Jan 2014 17:55:46 +0900, Namhyung Kim wrote:
> On Sat, 11 Jan 2014 17:35:29 +0100, Jiri Olsa wrote:
>> On Wed, Jan 08, 2014 at 05:46:30PM +0900, Namhyung Kim wrote:
>>> Reuse hist_entry_iter__add() function to share the similar code with
>>> perf report. Note that it needs to be called with hists.lock so tweak
>>> some internal functions not to deadlock or hold the lock too long.
>>>
>>> Signed-off-by: Namhyung Kim <[email protected]>
>>> ---
>>> tools/perf/builtin-top.c | 75 ++++++++++++++++++++++++------------------------
>>> 1 file changed, 37 insertions(+), 38 deletions(-)
>>>
>>> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
>>> index f0f55e6030cd..cf330c66bed7 100644
>>> --- a/tools/perf/builtin-top.c
>>> +++ b/tools/perf/builtin-top.c
>>> @@ -186,9 +186,6 @@ static void perf_top__record_precise_ip(struct perf_top *top,
>>> sym = he->ms.sym;
>>> notes = symbol__annotation(sym);
>>>
>>> - if (pthread_mutex_trylock(¬es->lock))
>>> - return;
>>> -
>>> ip = he->ms.map->map_ip(he->ms.map, ip);
>>> err = hist_entry__inc_addr_samples(he, counter, ip);
>>>
>>> @@ -201,6 +198,8 @@ static void perf_top__record_precise_ip(struct perf_top *top,
>>> sym->name);
>>> sleep(1);
>>> }
>>> +
>>> + pthread_mutex_lock(¬es->lock);
>>> }
>>
>> locking on function exit.. does not look right ;-)
>
> Yes, it looks weird.. but it's because of the change in locking. After
> I changed perf top to use the hist_entry_iter, it needed to protect
> the whole hist_entry_iter__add() by hists->lock so the
> perf_top__record_precise_ip() should be called with the lock acquired.
Argh, it was a different lock.. But it seems still need to dance
anyway. ;-)
Thanks,
Namhyung
Hi Jiri,
Thanks for reminding me. :)
On Thu, 9 Jan 2014 19:06:28 +0100, Jiri Olsa wrote:
> On Wed, Jan 08, 2014 at 05:46:15PM +0900, Namhyung Kim wrote:
[SNIP]
>> + /*
>> + * Check if there's duplicate entries in the callchain.
>> + * It's possible that it has cycles or recursive calls.
>> + */
>> + for (i = 0; i < iter->curr; i++) {
>> + if (hist_entry__cmp(he_cache[i], &he_tmp) == 0)
>
> we need here:
> iter->he = he_cache[i];
>
>> + return 0;
>> + }
>
> otherwise iter->he and al are not in sync and
> hist_entry__inc_addr_samples fails in hist_iter_cb
Right. But the point of the he_cache is to skip duplicate entries. So
I'd like to change it like setting it to NULL and check it before the
callback function:
while (iter->next_entry(iter, al)) {
err = iter->add_next_entry(iter, al);
if (err)
break;
if (iter->he && iter->add_entry_cb) {
err = iter->add_entry_cb(iter, al, ...);
if (err)
break;
}
}
What do you think?
Thanks,
Namhyung
Hi Arnaldo,
On Thu, 9 Jan 2014 11:37:24 -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Jan 09, 2014 at 09:57:35PM +0900, Namhyung Kim escreveu:
>> 2014-01-08 (수), 15:59 -0300, Arnaldo Carvalho de Melo:
>> > Em Wed, Jan 08, 2014 at 05:22:53PM +0100, Jiri Olsa escreveu:
>> > > On Wed, Jan 08, 2014 at 09:41:13AM -0300, Arnaldo Carvalho de Melo wrote:
>> > > > Em Wed, Jan 08, 2014 at 05:46:06PM +0900, Namhyung Kim escreveu:
>> > > > > Currently if a sample was filtered by command line option, it just
>> > > > > dropped. But this affects final output in that the percentage can be
>> > > > > different since the filtered entries were not included to the total.
>> > > > >
>> > > > > For example, if an original output looked like below:
>> > > >
>> > > > Humm, if one says that he/she is interested on just samples for a and b,
>> > > > the current behaviour will state how many of the filtered samples are
>> > > > for a and b, which is valid.
>> > > >
>> > > > I bet the number of samples will reflect that as well, but you filtered
>> > > > it out, yes, it stays there, so the percentages are relative to the
>> > > > number of samples.
>> > > >
>> > > > So I think this change in behaviour is wrong, no?
>
>> > > haven't checked the implementation yet, but it kind of does
>> > > what I'd expect for symbol filtering:
>
>> > > perf report
>> > > ...
>> > > 22.00% yes libc-2.17.so [.] __strlen_sse2
>> > > 11.79% yes libc-2.17.so [.] fputs_unlocked
>> > > 9.65% yes libc-2.17.so [.] __GI___mempcpy
>> > > 1.91% yes yes [.] fputs_unlocked@plt
>> > > ...
>> > >
>> > > search (press '/') for fputs_unlocked (with Namhyung's change):
>> > > 11.79% yes libc-2.17.so [.] fputs_unlocked
>> > > 1.91% yes yes [.] fputs_unlocked@plt
>> > >
>> > > while the current one shows:
>> > > 86.08% yes libc-2.17.so [.] fputs_unlocked
>> > > 13.92% yes yes [.] fputs_unlocked@plt
>> > >
>> > > which annoys me when searching for 'invisible' symbol
>> > > within tons of others.. I had to do that grep thing
>> > > you showed.
>> > >
>> > > I'd like to have the Namhyung's change behaviour as default,
>> > > but I'll be happy with some switch as well ;-)
>> >
>> > I understand the desire for this different mode, looks indeed useful.
>>
>> Yeah, the above is the reason why I wrote this firstly. And then I
>> thought it should be applied to the command line filter options too.
>
> I don't have a problem with providing a new option, but for those who
> think that when you filter samples based on some criteria the
> percentages that should appear should be relative to the new, filtered,
> total_period, that is a change in behaviour, so needs to be switchable.
>
>> > So I think that this is a new feature and as so we should provide it as
>> > an option, that may (or not) become the default.
>> >
>> > Some concerns I have are that when we go on filtering we have to have
>> > all the things that are zeroed to then get accrued for each hist entry
>> > that matches the filter being applied and now at least a nr_entries
>> > field got out of the if (al.filtered) block, i.e. in the end we will
>> > have the number of hist entries entries filtered but continue having the
>> > total period for all (filtered or not) hist entries.
>
>> One thing related to it is when --children option is used. Since total
>> period is added only for a real sample, if the sample is filtered but
>> the parents are not, the parents might have more than 100% overhead.
>
> So when implementing the new option this has to be taken into account,
> no problem (haven't really thought about the full implications here).
>
>> > Having it as a separate feature would allow to have both views:
>> >
>> > 1. the percentages relative to the filtered samples
>> > 2. the percentages relative to all (filtered or not) samples
>> >
>> > Being selectable on the command line and also with a hotkey to provide
>> > two columns: %total, %filtered.
>>
>> Hmm.. do you really want two columns instead of single column and a
>> switch/option? Then the (second) %filtered column will be shown up only
>> if filtering is enabled. Isn't it annoying for a dynamic filtering
>> (i.e. '/' key on TUI)?
>
> Hey, I'm not the one to decide this :-)
>
> There _are_ two choices for how the percentage gets computed, if one
> wants one, the other, or both, well, the hard part here is to decide the
> default, but there are two options, showing one, the other or both
> should be left to the user, even if after one or two keystrokes :)
So I'd like to make this changed behavior as default like Jiri said.
But adding a new percentage column will be a headache since it'll
increase the combination of current behavior - total x sys/user x group
x children - I'd really want to keep it small..
What about just adding --percentage <relative|absolute> option and make
"absolute" default (it can also be changed via config option, of course)?
Thanks,
Namhyung
On Tue, Jan 14, 2014 at 08:55:50AM +0900, Namhyung Kim wrote:
> Hi Jiri,
>
> Thanks for reminding me. :)
>
> On Thu, 9 Jan 2014 19:06:28 +0100, Jiri Olsa wrote:
> > On Wed, Jan 08, 2014 at 05:46:15PM +0900, Namhyung Kim wrote:
>
> [SNIP]
> >> + /*
> >> + * Check if there's duplicate entries in the callchain.
> >> + * It's possible that it has cycles or recursive calls.
> >> + */
> >> + for (i = 0; i < iter->curr; i++) {
> >> + if (hist_entry__cmp(he_cache[i], &he_tmp) == 0)
> >
> > we need here:
> > iter->he = he_cache[i];
> >
> >> + return 0;
> >> + }
> >
> > otherwise iter->he and al are not in sync and
> > hist_entry__inc_addr_samples fails in hist_iter_cb
>
> Right. But the point of the he_cache is to skip duplicate entries. So
> I'd like to change it like setting it to NULL and check it before the
> callback function:
>
> while (iter->next_entry(iter, al)) {
> err = iter->add_next_entry(iter, al);
> if (err)
> break;
>
> if (iter->he && iter->add_entry_cb) {
> err = iter->add_entry_cb(iter, al, ...);
> if (err)
> break;
> }
> }
>
> What do you think?
yep, better ;-) I saw you sent v6.. I'll check that
jirka