2013-10-31 06:56:22

by Namhyung Kim

[permalink] [raw]
Subject: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)

Hi,

This is my second attempt to implement cumulative hist period report.
This work begins from Arun's SORT_INCLUSIVE patch [1] but I completely
rewrote it from scratch.

Please see first two patches. I refactored functions that add hist
entries with struct add_entry_iter. While I converted all functions
carefully, it'd be better anyone can test and confirm that I didn't
mess up something - especially for branch stack and mem stuff.

This patchset basically adds period in a sample to every node in the
callchain. A hist_entry now has an additional fields to keep the
cumulative period if -g cumulative option is given on perf report.

Let me show you an example:

$ cat abc.c
#define barrier() asm volatile("" ::: "memory")

void a(void)
{
int i;
for (i = 0; i < 1000000; i++)
barrier();
}
void b(void)
{
a();
}
void c(void)
{
b();
}
int main(void)
{
c();
return 0;
}

With this simple program I ran perf record and report:

$ perf record -g -e cycles:u ./abc

$ perf report --stdio
88.29% abc abc [.] a
|
--- a
b
c
main
__libc_start_main

9.43% abc ld-2.17.so [.] _dl_relocate_object
|
--- _dl_relocate_object
dl_main
_dl_sysdep_start

2.27% abc [kernel.kallsyms] [k] page_fault
|
--- page_fault
|
|--95.94%-- _dl_sysdep_start
| _dl_start_user
|
--4.06%-- _start

0.00% abc ld-2.17.so [.] _start
|
--- _start


When the -g cumulative option is given, it'll be shown like this:

$ perf report -g cumulative --stdio

# Overhead Overhead (Acc) Command Shared Object Symbol
# ........ .............. ....... ................. .......................
#
0.00% 88.29% abc libc-2.17.so [.] __libc_start_main
0.00% 88.29% abc abc [.] main
0.00% 88.29% abc abc [.] c
0.00% 88.29% abc abc [.] b
88.29% 88.29% abc abc [.] a
0.00% 11.61% abc ld-2.17.so [k] _dl_sysdep_start
0.00% 9.43% abc ld-2.17.so [.] dl_main
9.43% 9.43% abc ld-2.17.so [.] _dl_relocate_object
2.27% 2.27% abc [kernel.kallsyms] [k] page_fault
0.00% 2.18% abc ld-2.17.so [k] _dl_start_user
0.00% 0.10% abc ld-2.17.so [.] _start

As you can see __libc_start_main -> main -> c -> b -> a callchain show
up in the output.

I know it have some rough edges or even bugs, but I really want to
release it and get reviews. It does not handle event groups and
annotations and it has a bug on TUI.

You can also get this series on 'perf/cumulate-v2' branch in my tree at:

git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git


Any comments are welcome, thanks.
Namhyung


Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>

[1] https://lkml.org/lkml/2012/3/31/6

Namhyung Kim (14):
perf tools: Consolidate __hists__add_*entry()
perf tools: Introduce struct add_entry_iter
perf hists: Convert hist entry functions to use struct he_stat
perf hists: Add support for accumulated stat of hist entry
perf hists: Check if accumulated when adding a hist entry
perf hists: Accumulate hist entry stat based on the callchain
perf tools: Update cpumode for each cumulative entry
perf report: Cache cumulative callchains
perf hists: Sort hist entries by accumulated period
perf ui/hist: Add support to accumulated hist stat
perf ui/browser: Add support to accumulated hist stat
perf ui/gtk: Add support to accumulated hist stat
perf tools: Apply percent-limit to cumulative percentage
perf report: Add -g cumulative option

tools/perf/Documentation/perf-report.txt | 2 +
tools/perf/builtin-annotate.c | 3 +-
tools/perf/builtin-diff.c | 3 +-
tools/perf/builtin-report.c | 659 ++++++++++++++++++++++++-------
tools/perf/builtin-top.c | 5 +-
tools/perf/tests/hists_link.c | 6 +-
tools/perf/ui/browsers/hists.c | 32 +-
tools/perf/ui/gtk/hists.c | 20 +
tools/perf/ui/hist.c | 41 ++
tools/perf/ui/stdio/hist.c | 5 +
tools/perf/util/callchain.c | 12 +
tools/perf/util/callchain.h | 3 +-
tools/perf/util/hist.c | 142 +++----
tools/perf/util/hist.h | 22 +-
tools/perf/util/sort.h | 1 +
15 files changed, 701 insertions(+), 255 deletions(-)

--
1.7.11.7


2013-10-31 06:56:24

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 01/14] perf tools: Consolidate __hists__add_*entry()

From: Namhyung Kim <[email protected]>

The __hists__add_{branch,mem}_entry() did almost same thing that
__hists__add_entry() does. Consolidate them into one.

Cc: Jiri Olsa <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-annotate.c | 2 +-
tools/perf/builtin-diff.c | 3 +-
tools/perf/builtin-report.c | 16 +++++++---
tools/perf/builtin-top.c | 5 +--
tools/perf/tests/hists_link.c | 6 ++--
tools/perf/util/hist.c | 73 +++++--------------------------------------
tools/perf/util/hist.h | 18 ++---------
7 files changed, 30 insertions(+), 93 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 6c5ae57831f6..4087ab19823c 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -65,7 +65,7 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
return 0;
}

- he = __hists__add_entry(&evsel->hists, al, NULL, 1, 1, 0);
+ he = __hists__add_entry(&evsel->hists, al, NULL, NULL, NULL, 1, 1, 0);
if (he == NULL)
return -ENOMEM;

diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index b605009e803f..3b67ea2444bd 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -307,7 +307,8 @@ static int hists__add_entry(struct hists *hists,
struct addr_location *al, u64 period,
u64 weight, u64 transaction)
{
- if (__hists__add_entry(hists, al, NULL, period, weight, transaction) != NULL)
+ if (__hists__add_entry(hists, al, NULL, NULL, NULL, period, weight,
+ transaction) != NULL)
return 0;
return -ENOMEM;
}
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 98d3891392e2..a7a8f7769629 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -115,7 +115,8 @@ static int perf_report__add_mem_hist_entry(struct perf_tool *tool,
* and this is indirectly achieved by passing period=weight here
* and the he_stat__add_period() function.
*/
- he = __hists__add_mem_entry(&evsel->hists, al, parent, mi, cost, cost);
+ he = __hists__add_entry(&evsel->hists, al, parent, NULL, mi,
+ cost, cost, 0);
if (!he)
return -ENOMEM;

@@ -200,12 +201,16 @@ static int perf_report__add_branch_hist_entry(struct perf_tool *tool,

err = -ENOMEM;

+ /* overwrite the 'al' to branch-to info */
+ al->map = bi[i].to.map;
+ al->sym = bi[i].to.sym;
+ al->addr = bi[i].to.addr;
/*
* The report shows the percentage of total branches captured
* and not events sampled. Thus we use a pseudo period of 1.
*/
- he = __hists__add_branch_entry(&evsel->hists, al, parent,
- &bi[i], 1, 1);
+ he = __hists__add_entry(&evsel->hists, al, parent, &bi[i], NULL,
+ 1, 1, 0);
if (he) {
struct annotation *notes;
bx = he->branch_info;
@@ -266,8 +271,9 @@ static int perf_evsel__add_hist_entry(struct perf_tool *tool,
return err;
}

- he = __hists__add_entry(&evsel->hists, al, parent, sample->period,
- sample->weight, sample->transaction);
+ he = __hists__add_entry(&evsel->hists, al, parent, NULL, NULL,
+ sample->period, sample->weight,
+ sample->transaction);
if (he == NULL)
return -ENOMEM;

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 04f5bf2d8e10..5c538a43f263 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -246,8 +246,9 @@ static struct hist_entry *perf_evsel__add_hist_entry(struct perf_evsel *evsel,
struct hist_entry *he;

pthread_mutex_lock(&evsel->hists.lock);
- he = __hists__add_entry(&evsel->hists, al, NULL, sample->period,
- sample->weight, sample->transaction);
+ he = __hists__add_entry(&evsel->hists, al, NULL, NULL, NULL,
+ sample->period, sample->weight,
+ sample->transaction);
pthread_mutex_unlock(&evsel->hists.lock);
if (he == NULL)
return NULL;
diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c
index b51abcb2c243..76992aba00a7 100644
--- a/tools/perf/tests/hists_link.c
+++ b/tools/perf/tests/hists_link.c
@@ -223,7 +223,7 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine)
goto out;

he = __hists__add_entry(&evsel->hists, &al, NULL,
- 1, 1, 0);
+ NULL, NULL, 1, 1, 0);
if (he == NULL)
goto out;

@@ -245,8 +245,8 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine)
&sample) < 0)
goto out;

- he = __hists__add_entry(&evsel->hists, &al, NULL, 1, 1,
- 0);
+ he = __hists__add_entry(&evsel->hists, &al, NULL,
+ NULL, NULL, 1, 1, 0);
if (he == NULL)
goto out;

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 7e80253074b0..e6f953190443 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -407,71 +407,12 @@ out:
return he;
}

-struct hist_entry *__hists__add_mem_entry(struct hists *hists,
- struct addr_location *al,
- struct symbol *sym_parent,
- struct mem_info *mi,
- u64 period,
- u64 weight)
-{
- struct hist_entry entry = {
- .thread = al->thread,
- .ms = {
- .map = al->map,
- .sym = al->sym,
- },
- .stat = {
- .period = period,
- .weight = weight,
- .nr_events = 1,
- },
- .cpu = al->cpu,
- .ip = al->addr,
- .level = al->level,
- .parent = sym_parent,
- .filtered = symbol__parent_filter(sym_parent),
- .hists = hists,
- .mem_info = mi,
- .branch_info = NULL,
- };
- return add_hist_entry(hists, &entry, al, period, weight);
-}
-
-struct hist_entry *__hists__add_branch_entry(struct hists *hists,
- struct addr_location *al,
- struct symbol *sym_parent,
- struct branch_info *bi,
- u64 period,
- u64 weight)
-{
- struct hist_entry entry = {
- .thread = al->thread,
- .ms = {
- .map = bi->to.map,
- .sym = bi->to.sym,
- },
- .cpu = al->cpu,
- .ip = bi->to.addr,
- .level = al->level,
- .stat = {
- .period = period,
- .nr_events = 1,
- .weight = weight,
- },
- .parent = sym_parent,
- .filtered = symbol__parent_filter(sym_parent),
- .branch_info = bi,
- .hists = hists,
- .mem_info = NULL,
- };
-
- return add_hist_entry(hists, &entry, al, period, weight);
-}
-
struct hist_entry *__hists__add_entry(struct hists *hists,
struct addr_location *al,
- struct symbol *sym_parent, u64 period,
- u64 weight, u64 transaction)
+ struct symbol *sym_parent,
+ struct branch_info *bi,
+ struct mem_info *mi,
+ u64 period, u64 weight, u64 transaction)
{
struct hist_entry entry = {
.thread = al->thread,
@@ -483,15 +424,15 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
.ip = al->addr,
.level = al->level,
.stat = {
- .period = period,
.nr_events = 1,
+ .period = period,
.weight = weight,
},
.parent = sym_parent,
.filtered = symbol__parent_filter(sym_parent),
.hists = hists,
- .branch_info = NULL,
- .mem_info = NULL,
+ .branch_info = bi,
+ .mem_info = mi,
.transaction = transaction,
};

diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 9d2d022cdb79..307f1c742563 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -86,7 +86,9 @@ struct hists {

struct hist_entry *__hists__add_entry(struct hists *self,
struct addr_location *al,
- struct symbol *parent, u64 period,
+ struct symbol *parent,
+ struct branch_info *bi,
+ struct mem_info *mi, u64 period,
u64 weight, u64 transaction);
int64_t hist_entry__cmp(struct hist_entry *left, struct hist_entry *right);
int64_t hist_entry__collapse(struct hist_entry *left, struct hist_entry *right);
@@ -95,20 +97,6 @@ int hist_entry__sort_snprintf(struct hist_entry *self, char *bf, size_t size,
struct hists *hists);
void hist_entry__free(struct hist_entry *);

-struct hist_entry *__hists__add_branch_entry(struct hists *self,
- struct addr_location *al,
- struct symbol *sym_parent,
- struct branch_info *bi,
- u64 period,
- u64 weight);
-
-struct hist_entry *__hists__add_mem_entry(struct hists *self,
- struct addr_location *al,
- struct symbol *sym_parent,
- struct mem_info *mi,
- u64 period,
- u64 weight);
-
void hists__output_resort(struct hists *self);
void hists__collapse_resort(struct hists *self, struct ui_progress *prog);

--
1.7.11.7

2013-10-31 06:56:28

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 14/14] perf report: Add -g cumulative option

From: Namhyung Kim <[email protected]>

The -g cumulative option is for showing accumulated overhead (period)
value as well as self overhead.

Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/Documentation/perf-report.txt | 2 ++
tools/perf/builtin-report.c | 3 +++
tools/perf/util/callchain.c | 12 +++++++++++-
3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 10a279871251..b150bfb734f4 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -130,6 +130,8 @@ OPTIONS
- graph: use a graph tree, displaying absolute overhead rates.
- fractal: like graph, but displays relative rates. Each branch of
the tree is considered as a new profiled object. +
+ - cumulative: accumulate callchain to parent entry so that they can
+ show up in the output.

order can be either:
- callee: callee based call graph.
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 3b626127c8d6..281053b28898 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1030,6 +1030,9 @@ parse_callchain_opt(const struct option *opt, const char *arg, int unset)
else if (!strncmp(tok, "fractal", strlen(arg)))
callchain_param.mode = CHAIN_GRAPH_REL;

+ else if (!strncmp(tok, "cumulative", strlen(arg)))
+ callchain_param.mode = CHAIN_CUMULATIVE;
+
else if (!strncmp(tok, "none", strlen(arg))) {
callchain_param.mode = CHAIN_NONE;
symbol_conf.use_callchain = false;
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index c10052c6e73c..c3c73eb839df 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -150,6 +150,14 @@ sort_chain_graph_rel(struct rb_root *rb_root, struct callchain_root *chain_root,
rb_root->rb_node = chain_root->node.rb_root.rb_node;
}

+static void
+sort_chain_cumulative(struct rb_root *rb_root __maybe_unused,
+ struct callchain_root *chain_root __maybe_unused,
+ u64 min_hit __maybe_unused,
+ struct callchain_param *param __maybe_unused)
+{
+}
+
int callchain_register_param(struct callchain_param *param)
{
switch (param->mode) {
@@ -162,8 +170,10 @@ int callchain_register_param(struct callchain_param *param)
case CHAIN_FLAT:
param->sort = sort_chain_flat;
break;
- case CHAIN_NONE:
case CHAIN_CUMULATIVE:
+ param->sort = sort_chain_cumulative;
+ break;
+ case CHAIN_NONE:
default:
return -1;
}
--
1.7.11.7

2013-10-31 06:56:27

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 08/14] perf report: Cache cumulative callchains

From: Namhyung Kim <[email protected]>

It is possble that a callchain has cycles or recursive calls. In that
case it'll end up having entries more than 100% overhead in the
output. In order to prevent such entries, cache each callchain node
and skip if same entry already cumulated.

Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-report.c | 49 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 1b152a8b7f51..1a0de9a4a568 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -387,6 +387,11 @@ iter_finish_normal_entry(struct add_entry_iter *iter, struct addr_location *al)
return err;
}

+struct cumulative_cache {
+ struct dso *dso;
+ struct symbol *sym;
+};
+
static int
iter_prepare_cumulative_entry(struct add_entry_iter *iter,
struct machine *machine,
@@ -394,9 +399,31 @@ iter_prepare_cumulative_entry(struct add_entry_iter *iter,
struct addr_location *al __maybe_unused,
struct perf_sample *sample)
{
+ struct callchain_cursor_node *node;
+ struct cumulative_cache *ccache;
+
callchain_cursor_commit(&callchain_cursor);

/*
+ * This is for detecting cycles or recursions so that they're
+ * cumulated only one time to prevent entries more than 100%
+ * overhead.
+ */
+ ccache = malloc(sizeof(*ccache) * PERF_MAX_STACK_DEPTH);
+ if (ccache == NULL)
+ return -ENOMEM;
+
+ node = callchain_cursor_current(&callchain_cursor);
+ if (node == NULL)
+ return 0;
+
+ ccache[0].dso = node->map->dso;
+ ccache[0].sym = node->sym;
+
+ iter->priv = ccache;
+ iter->curr = 1;
+
+ /*
* The first callchain node always contains same information
* as a hist entry itself. So skip it in order to prevent
* double accounting.
@@ -501,8 +528,29 @@ iter_add_next_cumulative_entry(struct add_entry_iter *iter,
{
struct perf_evsel *evsel = iter->evsel;
struct perf_sample *sample = iter->sample;
+ struct cumulative_cache *ccache = iter->priv;
struct hist_entry *he;
int err = 0;
+ int i;
+
+ /*
+ * Check if there's duplicate entries in the callchain.
+ * It's possible that it has cycles or recursive calls.
+ */
+ for (i = 0; i < iter->curr; i++) {
+ if (sort__has_sym) {
+ if (ccache[i].sym == al->sym)
+ return 0;
+ } else {
+ /* Not much we can do - just compare the dso. */
+ if (ccache[i].dso == al->map->dso)
+ return 0;
+ }
+ }
+
+ ccache[i].dso = al->map->dso;
+ ccache[i].sym = al->sym;
+ iter->curr++;

he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
sample->period, sample->weight,
@@ -538,6 +586,7 @@ iter_finish_cumulative_entry(struct add_entry_iter *iter,
evsel->hists.stats.total_period += sample->period;
hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);

+ free(iter->priv);
return 0;
}

--
1.7.11.7

2013-10-31 06:56:54

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 13/14] perf tools: Apply percent-limit to cumulative percentage

From: Namhyung Kim <[email protected]>

If -g cumulative option is given, it needs to show entries which don't
have self overhead. So apply percent-limit to accumulated overhead
percentage in this case.

Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/ui/browsers/hists.c | 15 ++++++++++++---
tools/perf/ui/gtk/hists.c | 4 ++++
tools/perf/ui/stdio/hist.c | 4 ++++
3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 2cea6cd9824e..07f500d8df1b 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -834,6 +834,10 @@ static unsigned int hist_browser__refresh(struct ui_browser *browser)
if (h->filtered)
continue;

+ if (callchain_param.mode == CHAIN_CUMULATIVE)
+ percent = h->stat_acc->period * 100.0 /
+ hb->hists->stats.total_period;
+
if (percent < hb->min_pcnt)
continue;

@@ -854,10 +858,11 @@ static struct rb_node *hists__filter_entries(struct rb_node *nd,
float percent = h->stat.period * 100.0 /
hists->stats.total_period;

- if (percent < min_pcnt)
- return NULL;
+ if (callchain_param.mode == CHAIN_CUMULATIVE)
+ percent = h->stat_acc->period * 100.0 /
+ hists->stats.total_period;

- if (!h->filtered)
+ if (!h->filtered && percent >= min_pcnt)
return nd;

nd = rb_next(nd);
@@ -875,6 +880,10 @@ static struct rb_node *hists__filter_prev_entries(struct rb_node *nd,
float percent = h->stat.period * 100.0 /
hists->stats.total_period;

+ if (callchain_param.mode == CHAIN_CUMULATIVE)
+ percent = h->stat_acc->period * 100.0 /
+ hists->stats.total_period;
+
if (!h->filtered && percent >= min_pcnt)
return nd;

diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index 70ed0d5e1b94..b93302840cce 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -302,6 +302,10 @@ static void perf_gtk__show_hists(GtkWidget *window, struct hists *hists,
if (h->filtered)
continue;

+ if (callchain_param.mode == CHAIN_CUMULATIVE)
+ percent = h->stat_acc->period * 100.0 /
+ hists->stats.total_period;
+
if (percent < min_pcnt)
continue;

diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 302f08afd6a2..49db435f0cc2 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -496,6 +496,10 @@ print_entries:
if (h->filtered)
continue;

+ if (callchain_param.mode == CHAIN_CUMULATIVE)
+ percent = h->stat_acc->period * 100.0 /
+ hists->stats.total_period;
+
if (percent < min_pcnt)
continue;

--
1.7.11.7

2013-10-31 06:56:26

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 07/14] perf tools: Update cpumode for each cumulative entry

From: Namhyung Kim <[email protected]>

The cpumode and level in struct addr_localtion was set for a sample
and but updated as cumulative callchains were added. This led to have
non-matching symbol and cpumode in the output.

Update it accordingly based on the fact whether the map is a part of
the kernel or not.

Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-report.c | 25 ++++++++++++++++++++++++-
1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 92cbd5cd1ab1..1b152a8b7f51 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -80,6 +80,7 @@ struct add_entry_iter {
struct perf_report *rep;
struct perf_evsel *evsel;
struct perf_sample *sample;
+ struct machine *machine;
struct hist_entry *he;
struct symbol *parent;
void *priv;
@@ -388,7 +389,7 @@ iter_finish_normal_entry(struct add_entry_iter *iter, struct addr_location *al)

static int
iter_prepare_cumulative_entry(struct add_entry_iter *iter,
- struct machine *machine __maybe_unused,
+ struct machine *machine,
struct perf_evsel *evsel,
struct addr_location *al __maybe_unused,
struct perf_sample *sample)
@@ -404,6 +405,7 @@ iter_prepare_cumulative_entry(struct add_entry_iter *iter,

iter->evsel = evsel;
iter->sample = sample;
+ iter->machine = machine;
return 0;
}

@@ -468,6 +470,27 @@ iter_next_cumulative_entry(struct add_entry_iter *iter __maybe_unused,
if (al->sym == NULL)
return 0;

+ if (al->map->groups == &iter->machine->kmaps) {
+ if (machine__is_host(iter->machine)) {
+ al->cpumode = PERF_RECORD_MISC_KERNEL;
+ al->level = 'k';
+ } else {
+ al->cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
+ al->level = 'g';
+ }
+ } else {
+ if (machine__is_host(iter->machine)) {
+ al->cpumode = PERF_RECORD_MISC_USER;
+ al->level = '.';
+ } else if (perf_guest) {
+ al->cpumode = PERF_RECORD_MISC_GUEST_USER;
+ al->level = 'u';
+ } else {
+ al->cpumode = PERF_RECORD_MISC_HYPERVISOR;
+ al->level = 'H';
+ }
+ }
+
callchain_cursor_advance(&callchain_cursor);
return 1;
}
--
1.7.11.7

2013-10-31 06:57:09

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 12/14] perf ui/gtk: Add support to accumulated hist stat

From: Namhyung Kim <[email protected]>

Print accumulated stat of a hist entry if requested.

Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/ui/gtk/hists.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index 2ca66cc1160f..70ed0d5e1b94 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -98,11 +98,25 @@ static int perf_gtk__hpp_color_##_type(struct perf_hpp_fmt *fmt __maybe_unused,
return __hpp__color_fmt(hpp, he, he_get_##_field); \
}

+#define __HPP_COLOR_ACC_PERCENT_FN(_type, _field) \
+static u64 he_get_acc_##_field(struct hist_entry *he) \
+{ \
+ return he->stat_acc->_field; \
+} \
+ \
+static int perf_gtk__hpp_color_##_type(struct perf_hpp_fmt *fmt __maybe_unused, \
+ struct perf_hpp *hpp, \
+ struct hist_entry *he) \
+{ \
+ return __hpp__color_fmt(hpp, he, he_get_acc_##_field); \
+}
+
__HPP_COLOR_PERCENT_FN(overhead, period)
__HPP_COLOR_PERCENT_FN(overhead_sys, period_sys)
__HPP_COLOR_PERCENT_FN(overhead_us, period_us)
__HPP_COLOR_PERCENT_FN(overhead_guest_sys, period_guest_sys)
__HPP_COLOR_PERCENT_FN(overhead_guest_us, period_guest_us)
+__HPP_COLOR_ACC_PERCENT_FN(overhead_acc, period)

#undef __HPP_COLOR_PERCENT_FN

@@ -121,6 +135,8 @@ void perf_gtk__init_hpp(void)
perf_gtk__hpp_color_overhead_guest_sys;
perf_hpp__format[PERF_HPP__OVERHEAD_GUEST_US].color =
perf_gtk__hpp_color_overhead_guest_us;
+ perf_hpp__format[PERF_HPP__OVERHEAD_ACC].color =
+ perf_gtk__hpp_color_overhead_acc;
}

static void callchain_list__sym_name(struct callchain_list *cl,
--
1.7.11.7

2013-10-31 06:57:42

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 09/14] perf hists: Sort hist entries by accumulated period

From: Namhyung Kim <[email protected]>

When callchain accumulation is requested, we need to sort the entries
by accumulated period value. When accumulated periods of two entries
are same (i.e. single path callchain) put the caller above since
accumulation tends to put callers on higher position for obvious
reason.

Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-report.c | 13 +++++++++++++
tools/perf/util/hist.c | 12 ++++++++++++
2 files changed, 25 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 1a0de9a4a568..3b626127c8d6 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -349,6 +349,13 @@ iter_add_single_normal_entry(struct add_entry_iter *iter, struct addr_location *
if (he == NULL)
return -ENOMEM;

+ /*
+ * This is for putting parents upward during output resort iff
+ * only a child gets sampled. See hist_entry__sort_on_period().
+ */
+ he->callchain->max_depth = callchain_cursor.nr - callchain_cursor.pos;
+ he->callchain->max_depth += PERF_MAX_STACK_DEPTH + 1;
+
iter->he = he;
return 0;
}
@@ -559,6 +566,12 @@ iter_add_next_cumulative_entry(struct add_entry_iter *iter,
return -ENOMEM;

/*
+ * This is for putting parents upward during output resort iff
+ * only a child gets sampled. See hist_entry__sort_on_period().
+ */
+ he->callchain->max_depth = callchain_cursor.nr - callchain_cursor.pos;
+
+ /*
* Only in the TUI browser we are doing integrated annotation,
* so we don't allocated the extra space needed because the stdio
* code will not use it.
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 229329a20ba5..c1d5d6b43bba 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -623,6 +623,18 @@ static int hist_entry__sort_on_period(struct hist_entry *a,
struct hist_entry *pair;
u64 *periods_a, *periods_b;

+ if (callchain_param.mode == CHAIN_CUMULATIVE) {
+ /*
+ * Put caller above callee when they have equal period.
+ */
+ if (a->stat_acc->period != b->stat_acc->period)
+ return a->stat_acc->period > b->stat_acc->period ? 1 : -1;
+
+ if (a->callchain->max_depth != b->callchain->max_depth)
+ return a->callchain->max_depth < b->callchain->max_depth ?
+ 1 : -1;
+ }
+
ret = period_cmp(a->stat.period, b->stat.period);
if (ret || !symbol_conf.event_group)
return ret;
--
1.7.11.7

2013-10-31 06:57:43

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 05/14] perf hists: Check if accumulated when adding a hist entry

From: Namhyung Kim <[email protected]>

To support callchain accumulation, @entry should be recognized if it's
accumulated or not when add_hist_entry() called. The period of an
accumulated entry should be added to ->stat_acc but not ->stat. Add
@sample_self arg for that.

Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-annotate.c | 3 ++-
tools/perf/builtin-diff.c | 2 +-
tools/perf/builtin-report.c | 6 +++---
tools/perf/builtin-top.c | 2 +-
tools/perf/tests/hists_link.c | 4 ++--
tools/perf/util/hist.c | 23 ++++++++++++++---------
tools/perf/util/hist.h | 3 ++-
7 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 4087ab19823c..dcc7706e2136 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -65,7 +65,8 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
return 0;
}

- he = __hists__add_entry(&evsel->hists, al, NULL, NULL, NULL, 1, 1, 0);
+ he = __hists__add_entry(&evsel->hists, al, NULL, NULL, NULL, 1, 1, 0,
+ true);
if (he == NULL)
return -ENOMEM;

diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 3b67ea2444bd..dd2638001372 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -308,7 +308,7 @@ static int hists__add_entry(struct hists *hists,
u64 weight, u64 transaction)
{
if (__hists__add_entry(hists, al, NULL, NULL, NULL, period, weight,
- transaction) != NULL)
+ transaction, true) != NULL)
return 0;
return -ENOMEM;
}
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index f18cd43687d9..d171f4d18b67 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -152,7 +152,7 @@ iter_add_single_mem_entry(struct add_entry_iter *iter, struct addr_location *al)
* and the he_stat__add_period() function.
*/
he = __hists__add_entry(&iter->evsel->hists, al, iter->parent, NULL, mi,
- cost, cost, 0);
+ cost, cost, 0, true);
if (!he)
return -ENOMEM;

@@ -280,7 +280,7 @@ iter_add_next_branch_entry(struct add_entry_iter *iter, struct addr_location *al
* and not events sampled. Thus we use a pseudo period of 1.
*/
he = __hists__add_entry(&evsel->hists, al, iter->parent, &bi[i], NULL,
- 1, 1, 0);
+ 1, 1, 0, true);
if (he == NULL)
return -ENOMEM;

@@ -344,7 +344,7 @@ iter_add_single_normal_entry(struct add_entry_iter *iter, struct addr_location *

he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
sample->period, sample->weight,
- sample->transaction);
+ sample->transaction, true);
if (he == NULL)
return -ENOMEM;

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 5c538a43f263..bd4cb3800943 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -248,7 +248,7 @@ static struct hist_entry *perf_evsel__add_hist_entry(struct perf_evsel *evsel,
pthread_mutex_lock(&evsel->hists.lock);
he = __hists__add_entry(&evsel->hists, al, NULL, NULL, NULL,
sample->period, sample->weight,
- sample->transaction);
+ sample->transaction, true);
pthread_mutex_unlock(&evsel->hists.lock);
if (he == NULL)
return NULL;
diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c
index 76992aba00a7..4ea3000e7663 100644
--- a/tools/perf/tests/hists_link.c
+++ b/tools/perf/tests/hists_link.c
@@ -223,7 +223,7 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine)
goto out;

he = __hists__add_entry(&evsel->hists, &al, NULL,
- NULL, NULL, 1, 1, 0);
+ NULL, NULL, 1, 1, 0, true);
if (he == NULL)
goto out;

@@ -246,7 +246,7 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine)
goto out;

he = __hists__add_entry(&evsel->hists, &al, NULL,
- NULL, NULL, 1, 1, 0);
+ NULL, NULL, 1, 1, 0, true);
if (he == NULL)
goto out;

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index c38655dedad3..229329a20ba5 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -279,7 +279,8 @@ void hists__decay_entries(struct hists *hists, bool zap_user, bool zap_kernel)
* histogram, sorted on item, collects periods
*/

-static struct hist_entry *hist_entry__new(struct hist_entry *template)
+static struct hist_entry *hist_entry__new(struct hist_entry *template,
+ bool sample_self)
{
size_t callchain_size = symbol_conf.use_callchain ? sizeof(struct callchain_root) : 0;
struct hist_entry *he = zalloc(sizeof(*he) + callchain_size);
@@ -294,6 +295,8 @@ static struct hist_entry *hist_entry__new(struct hist_entry *template)
return NULL;
}
memcpy(he->stat_acc, &he->stat, sizeof(he->stat));
+ if (!sample_self)
+ memset(&he->stat, 0, sizeof(he->stat));
}

if (he->ms.map)
@@ -356,8 +359,7 @@ static u8 symbol__parent_filter(const struct symbol *parent)
static struct hist_entry *add_hist_entry(struct hists *hists,
struct hist_entry *entry,
struct addr_location *al,
- u64 period,
- u64 weight)
+ u64 period, u64 weight, bool sample_self)
{
struct rb_node **p;
struct rb_node *parent = NULL;
@@ -379,7 +381,8 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
cmp = hist_entry__cmp(he, entry);

if (!cmp) {
- he_stat__add_period(&he->stat, period, weight);
+ if (sample_self)
+ he_stat__add_period(&he->stat, period, weight);
if (callchain_param.mode == CHAIN_CUMULATIVE)
he_stat__add_period(he->stat_acc, period, weight);

@@ -409,7 +412,7 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
p = &(*p)->rb_right;
}

- he = hist_entry__new(entry);
+ he = hist_entry__new(entry, sample_self);
if (!he)
return NULL;

@@ -417,7 +420,8 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
rb_link_node(&he->rb_node_in, parent, p);
rb_insert_color(&he->rb_node_in, hists->entries_in);
out:
- hist_entry__add_cpumode_period(&he->stat, al->cpumode, period);
+ if (sample_self)
+ hist_entry__add_cpumode_period(&he->stat, al->cpumode, period);
if (callchain_param.mode == CHAIN_CUMULATIVE)
hist_entry__add_cpumode_period(he->stat_acc, al->cpumode, period);
return he;
@@ -428,7 +432,8 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
struct symbol *sym_parent,
struct branch_info *bi,
struct mem_info *mi,
- u64 period, u64 weight, u64 transaction)
+ u64 period, u64 weight, u64 transaction,
+ bool sample_self)
{
struct hist_entry entry = {
.thread = al->thread,
@@ -452,7 +457,7 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
.transaction = transaction,
};

- return add_hist_entry(hists, &entry, al, period, weight);
+ return add_hist_entry(hists, &entry, al, period, weight, sample_self);
}

int64_t
@@ -876,7 +881,7 @@ static struct hist_entry *hists__add_dummy_entry(struct hists *hists,
p = &(*p)->rb_right;
}

- he = hist_entry__new(pair);
+ he = hist_entry__new(pair, true);
if (he) {
memset(&he->stat, 0, sizeof(he->stat));
he->hists = hists;
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 307f1c742563..3a31f5e72cf7 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -89,7 +89,8 @@ struct hist_entry *__hists__add_entry(struct hists *self,
struct symbol *parent,
struct branch_info *bi,
struct mem_info *mi, u64 period,
- u64 weight, u64 transaction);
+ u64 weight, u64 transaction,
+ bool sample_self);
int64_t hist_entry__cmp(struct hist_entry *left, struct hist_entry *right);
int64_t hist_entry__collapse(struct hist_entry *left, struct hist_entry *right);
int hist_entry__transaction_len(void);
--
1.7.11.7

2013-10-31 06:57:41

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 10/14] perf ui/hist: Add support to accumulated hist stat

From: Namhyung Kim <[email protected]>

Print accumulated stat of a hist entry if requested.

Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/ui/hist.c | 41 +++++++++++++++++++++++++++++++++++++++++
tools/perf/util/hist.h | 1 +
2 files changed, 42 insertions(+)

diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index 0a193281eba8..e32a954c5c97 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -129,6 +129,28 @@ static int hpp__entry_##_type(struct perf_hpp_fmt *_fmt __maybe_unused, \
scnprintf, true); \
}

+#define __HPP_COLOR_ACC_PERCENT_FN(_type, _field) \
+static u64 he_get_acc_##_field(struct hist_entry *he) \
+{ \
+ return he->stat_acc->_field; \
+} \
+ \
+static int hpp__color_acc_##_type(struct perf_hpp_fmt *fmt __maybe_unused, \
+ struct perf_hpp *hpp, struct hist_entry *he) \
+{ \
+ return __hpp__fmt(hpp, he, he_get_acc_##_field, " %12.2f%%", \
+ (hpp_snprint_fn)percent_color_snprintf, true); \
+}
+
+#define __HPP_ENTRY_ACC_PERCENT_FN(_type, _field) \
+static int hpp__entry_acc_##_type(struct perf_hpp_fmt *_fmt __maybe_unused, \
+ struct perf_hpp *hpp, struct hist_entry *he) \
+{ \
+ const char *fmt = symbol_conf.field_sep ? " %.2f" : " %12.2f%%"; \
+ return __hpp__fmt(hpp, he, he_get_acc_##_field, fmt, \
+ scnprintf, true); \
+}
+
#define __HPP_ENTRY_RAW_FN(_type, _field) \
static u64 he_get_raw_##_field(struct hist_entry *he) \
{ \
@@ -148,6 +170,12 @@ __HPP_WIDTH_FN(_type, _min_width, _unit_width) \
__HPP_COLOR_PERCENT_FN(_type, _field) \
__HPP_ENTRY_PERCENT_FN(_type, _field)

+#define HPP_PERCENT_ACC_FNS(_type, _str, _field, _min_width, _unit_width)\
+__HPP_HEADER_FN(_type, _str, _min_width, _unit_width) \
+__HPP_WIDTH_FN(_type, _min_width, _unit_width) \
+__HPP_COLOR_ACC_PERCENT_FN(_type, _field) \
+__HPP_ENTRY_ACC_PERCENT_FN(_type, _field)
+
#define HPP_RAW_FNS(_type, _str, _field, _min_width, _unit_width) \
__HPP_HEADER_FN(_type, _str, _min_width, _unit_width) \
__HPP_WIDTH_FN(_type, _min_width, _unit_width) \
@@ -159,6 +187,7 @@ HPP_PERCENT_FNS(overhead_sys, "sys", period_sys, 8, 8)
HPP_PERCENT_FNS(overhead_us, "usr", period_us, 8, 8)
HPP_PERCENT_FNS(overhead_guest_sys, "guest sys", period_guest_sys, 9, 8)
HPP_PERCENT_FNS(overhead_guest_us, "guest usr", period_guest_us, 9, 8)
+HPP_PERCENT_ACC_FNS(overhead_acc, "Overhead (Acc)", period, 14, 8)

HPP_RAW_FNS(samples, "Samples", nr_events, 12, 12)
HPP_RAW_FNS(period, "Period", period, 12, 12)
@@ -171,6 +200,14 @@ HPP_RAW_FNS(period, "Period", period, 12, 12)
.entry = hpp__entry_ ## _name \
}

+#define HPP__COLOR_ACC_PRINT_FNS(_name) \
+ { \
+ .header = hpp__header_ ## _name, \
+ .width = hpp__width_ ## _name, \
+ .color = hpp__color_acc_ ## _name, \
+ .entry = hpp__entry_acc_ ## _name \
+ }
+
#define HPP__PRINT_FNS(_name) \
{ \
.header = hpp__header_ ## _name, \
@@ -184,6 +221,7 @@ struct perf_hpp_fmt perf_hpp__format[] = {
HPP__COLOR_PRINT_FNS(overhead_us),
HPP__COLOR_PRINT_FNS(overhead_guest_sys),
HPP__COLOR_PRINT_FNS(overhead_guest_us),
+ HPP__COLOR_ACC_PRINT_FNS(overhead_acc),
HPP__PRINT_FNS(samples),
HPP__PRINT_FNS(period)
};
@@ -208,6 +246,9 @@ void perf_hpp__init(void)
{
perf_hpp__column_enable(PERF_HPP__OVERHEAD);

+ if (callchain_param.mode == CHAIN_CUMULATIVE)
+ perf_hpp__column_enable(PERF_HPP__OVERHEAD_ACC);
+
if (symbol_conf.show_cpu_utilization) {
perf_hpp__column_enable(PERF_HPP__OVERHEAD_SYS);
perf_hpp__column_enable(PERF_HPP__OVERHEAD_US);
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 3a31f5e72cf7..707442fe5c4f 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -160,6 +160,7 @@ enum {
PERF_HPP__OVERHEAD_US,
PERF_HPP__OVERHEAD_GUEST_SYS,
PERF_HPP__OVERHEAD_GUEST_US,
+ PERF_HPP__OVERHEAD_ACC,
PERF_HPP__SAMPLES,
PERF_HPP__PERIOD,

--
1.7.11.7

2013-10-31 06:57:40

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 11/14] perf ui/browser: Add support to accumulated hist stat

From: Namhyung Kim <[email protected]>

Print accumulated stat of a hist entry if requested.

Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/ui/browsers/hists.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 7ef36c360471..2cea6cd9824e 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -693,11 +693,26 @@ hist_browser__hpp_color_##_type(struct perf_hpp_fmt *fmt __maybe_unused,\
return __hpp__color_fmt(hpp, he, __hpp_get_##_field, _cb); \
}

+#define __HPP_COLOR_ACC_PERCENT_FN(_type, _field, _cb) \
+static u64 __hpp_get_acc_##_field(struct hist_entry *he) \
+{ \
+ return he->stat_acc->_field; \
+} \
+ \
+static int \
+hist_browser__hpp_color_##_type(struct perf_hpp_fmt *fmt __maybe_unused,\
+ struct perf_hpp *hpp, \
+ struct hist_entry *he) \
+{ \
+ return __hpp__color_fmt(hpp, he, __hpp_get_acc_##_field, _cb); \
+}
+
__HPP_COLOR_PERCENT_FN(overhead, period, __hpp__color_callchain)
__HPP_COLOR_PERCENT_FN(overhead_sys, period_sys, NULL)
__HPP_COLOR_PERCENT_FN(overhead_us, period_us, NULL)
__HPP_COLOR_PERCENT_FN(overhead_guest_sys, period_guest_sys, NULL)
__HPP_COLOR_PERCENT_FN(overhead_guest_us, period_guest_us, NULL)
+__HPP_COLOR_ACC_PERCENT_FN(overhead_acc, period, NULL)

#undef __HPP_COLOR_PERCENT_FN

@@ -715,6 +730,8 @@ void hist_browser__init_hpp(void)
hist_browser__hpp_color_overhead_guest_sys;
perf_hpp__format[PERF_HPP__OVERHEAD_GUEST_US].color =
hist_browser__hpp_color_overhead_guest_us;
+ perf_hpp__format[PERF_HPP__OVERHEAD_ACC].color =
+ hist_browser__hpp_color_overhead_acc;
}

static int hist_browser__show_entry(struct hist_browser *browser,
--
1.7.11.7

2013-10-31 06:58:39

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 06/14] perf hists: Accumulate hist entry stat based on the callchain

From: Namhyung Kim <[email protected]>

Call __hists__add_entry() for each callchain node to get an
accumulated stat for an entry. However skip nodes which do not have
symbol info as they caused subtle problems.

AFAICS the current sort methods cannot distinguish entries with NULL
dso/sym well so that processing a callchian for an entry that doesn't
have symbol info might add a period to a same entry multiple times.
It ended up with an entry that have more than 100% of accumulated
period value which is not good. So just stop processing when those
entries are met.

Introduce new cumulative_iter ops to process them properly.

Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-report.c | 142 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 142 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index d171f4d18b67..92cbd5cd1ab1 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -386,6 +386,138 @@ iter_finish_normal_entry(struct add_entry_iter *iter, struct addr_location *al)
return err;
}

+static int
+iter_prepare_cumulative_entry(struct add_entry_iter *iter,
+ struct machine *machine __maybe_unused,
+ struct perf_evsel *evsel,
+ struct addr_location *al __maybe_unused,
+ struct perf_sample *sample)
+{
+ callchain_cursor_commit(&callchain_cursor);
+
+ /*
+ * The first callchain node always contains same information
+ * as a hist entry itself. So skip it in order to prevent
+ * double accounting.
+ */
+ callchain_cursor_advance(&callchain_cursor);
+
+ iter->evsel = evsel;
+ iter->sample = sample;
+ return 0;
+}
+
+static int
+iter_add_single_cumulative_entry(struct add_entry_iter *iter,
+ struct addr_location *al)
+{
+ struct perf_evsel *evsel = iter->evsel;
+ struct perf_sample *sample = iter->sample;
+ struct hist_entry *he;
+ int err = 0;
+
+ he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
+ sample->period, sample->weight,
+ sample->transaction, true);
+ if (he == NULL)
+ return -ENOMEM;
+
+ /*
+ * This is for putting parents upward during output resort iff
+ * only a child gets sampled. See hist_entry__sort_on_period().
+ */
+ he->callchain->max_depth = PERF_MAX_STACK_DEPTH + 1;
+
+ /*
+ * Only in the TUI browser we are doing integrated annotation,
+ * so we don't allocated the extra space needed because the stdio
+ * code will not use it.
+ */
+ if (he->ms.sym != NULL && use_browser == 1 && sort__has_sym) {
+ struct annotation *notes = symbol__annotation(he->ms.sym);
+
+ assert(evsel != NULL);
+
+ if (notes->src == NULL && symbol__alloc_hist(he->ms.sym) < 0)
+ return -ENOMEM;
+
+ err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+ }
+
+ return err;
+}
+
+static int
+iter_next_cumulative_entry(struct add_entry_iter *iter __maybe_unused,
+ struct addr_location *al)
+{
+ struct callchain_cursor_node *node;
+
+ node = callchain_cursor_current(&callchain_cursor);
+ if (node == NULL)
+ return 0;
+
+ al->map = node->map;
+ al->sym = node->sym;
+ al->addr = node->ip;
+
+ /*
+ * XXX: Adding an entry without symbol info caused subtle
+ * problems. Stop it.
+ */
+ if (al->sym == NULL)
+ return 0;
+
+ callchain_cursor_advance(&callchain_cursor);
+ return 1;
+}
+
+static int
+iter_add_next_cumulative_entry(struct add_entry_iter *iter,
+ struct addr_location *al)
+{
+ struct perf_evsel *evsel = iter->evsel;
+ struct perf_sample *sample = iter->sample;
+ struct hist_entry *he;
+ int err = 0;
+
+ he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
+ sample->period, sample->weight,
+ sample->transaction, false);
+ if (he == NULL)
+ return -ENOMEM;
+
+ /*
+ * Only in the TUI browser we are doing integrated annotation,
+ * so we don't allocated the extra space needed because the stdio
+ * code will not use it.
+ */
+ if (he->ms.sym != NULL && use_browser == 1 && sort__has_sym) {
+ struct annotation *notes = symbol__annotation(he->ms.sym);
+
+ assert(evsel != NULL);
+
+ if (notes->src == NULL && symbol__alloc_hist(he->ms.sym) < 0)
+ return -ENOMEM;
+
+ err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+ }
+ return err;
+}
+
+static int
+iter_finish_cumulative_entry(struct add_entry_iter *iter,
+ struct addr_location *al __maybe_unused)
+{
+ struct perf_evsel *evsel = iter->evsel;
+ struct perf_sample *sample = iter->sample;
+
+ evsel->hists.stats.total_period += sample->period;
+ hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+
+ return 0;
+}
+
static struct add_entry_iter mem_iter = {
.prepare_entry = iter_prepare_mem_entry,
.add_single_entry = iter_add_single_mem_entry,
@@ -410,6 +542,14 @@ static struct add_entry_iter normal_iter = {
.finish_entry = iter_finish_normal_entry,
};

+static struct add_entry_iter cumulative_iter = {
+ .prepare_entry = iter_prepare_cumulative_entry,
+ .add_single_entry = iter_add_single_cumulative_entry,
+ .next_entry = iter_next_cumulative_entry,
+ .add_next_entry = iter_add_next_cumulative_entry,
+ .finish_entry = iter_finish_cumulative_entry,
+};
+
static int
perf_evsel__add_entry(struct perf_evsel *evsel, struct addr_location *al,
struct perf_sample *sample, struct machine *machine,
@@ -471,6 +611,8 @@ static int process_sample_event(struct perf_tool *tool,
iter = &branch_iter;
else if (rep->mem_mode == 1)
iter = &mem_iter;
+ else if (callchain_param.mode == CHAIN_CUMULATIVE)
+ iter = &cumulative_iter;
else
iter = &normal_iter;

--
1.7.11.7

2013-10-31 06:58:59

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 02/14] perf tools: Introduce struct add_entry_iter

From: Namhyung Kim <[email protected]>

There're some duplicate code when adding hist entries. They are
different in that some have branch info or mem info but generally do
same thing. So introduce new struct add_entry_iter and add callbacks
to customize each case in general way.

The new perf_evsel__add_entry() function will look like:

iter->prepare_entry();
iter->add_single_entry();

while (iter->next_entry())
iter->add_next_entry();

iter->finish_entry();

This will help further work like the cumulative callchain patchset.

Cc: Jiri Olsa <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-report.c | 435 +++++++++++++++++++++++++++++---------------
1 file changed, 284 insertions(+), 151 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index a7a8f7769629..f18cd43687d9 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -73,38 +73,74 @@ static int perf_report_config(const char *var, const char *value, void *cb)
return perf_default_config(var, value, cb);
}

-static int perf_report__add_mem_hist_entry(struct perf_tool *tool,
- struct addr_location *al,
- struct perf_sample *sample,
- struct perf_evsel *evsel,
- struct machine *machine,
- union perf_event *event)
-{
- struct perf_report *rep = container_of(tool, struct perf_report, tool);
- struct symbol *parent = NULL;
- u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
- int err = 0;
+struct add_entry_iter {
+ int total;
+ int curr;
+
+ struct perf_report *rep;
+ struct perf_evsel *evsel;
+ struct perf_sample *sample;
struct hist_entry *he;
- struct mem_info *mi, *mx;
- uint64_t cost;
+ struct symbol *parent;
+ void *priv;
+
+ int (*prepare_entry)(struct add_entry_iter *, struct machine *,
+ struct perf_evsel *, struct addr_location *,
+ struct perf_sample *);
+ int (*add_single_entry)(struct add_entry_iter *, struct addr_location *);
+ int (*next_entry)(struct add_entry_iter *, struct addr_location *);
+ int (*add_next_entry)(struct add_entry_iter *, struct addr_location *);
+ int (*finish_entry)(struct add_entry_iter *, struct addr_location *);
+};

- if ((sort__has_parent || symbol_conf.use_callchain) &&
- sample->callchain) {
- err = machine__resolve_callchain(machine, evsel, al->thread,
- sample, &parent, al,
- rep->max_stack);
- if (err)
- return err;
- }
+static int
+iter_next_nop_entry(struct add_entry_iter *iter __maybe_unused,
+ struct addr_location *al __maybe_unused)
+{
+ return 0;
+}
+
+static int
+iter_add_next_nop_entry(struct add_entry_iter *iter __maybe_unused,
+ struct addr_location *al __maybe_unused)
+{
+ return 0;
+}
+
+static int
+iter_prepare_mem_entry(struct add_entry_iter *iter, struct machine *machine,
+ struct perf_evsel *evsel, struct addr_location *al,
+ struct perf_sample *sample)
+{
+ union perf_event *event = iter->priv;
+ struct mem_info *mi;
+ u8 cpumode;
+
+ BUG_ON(event == NULL);
+
+ cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;

mi = machine__resolve_mem(machine, al->thread, sample, cpumode);
- if (!mi)
+ if (mi == NULL)
return -ENOMEM;

- if (rep->hide_unresolved && !al->sym)
+ iter->evsel = evsel;
+ iter->sample = sample;
+ iter->priv = mi;
+ return 0;
+}
+
+static int
+iter_add_single_mem_entry(struct add_entry_iter *iter, struct addr_location *al)
+{
+ u64 cost;
+ struct mem_info *mi = iter->priv;
+ struct hist_entry *he;
+
+ if (iter->rep->hide_unresolved && !al->sym)
return 0;

- cost = sample->weight;
+ cost = iter->sample->weight;
if (!cost)
cost = 1;

@@ -115,11 +151,24 @@ static int perf_report__add_mem_hist_entry(struct perf_tool *tool,
* and this is indirectly achieved by passing period=weight here
* and the he_stat__add_period() function.
*/
- he = __hists__add_entry(&evsel->hists, al, parent, NULL, mi,
+ he = __hists__add_entry(&iter->evsel->hists, al, iter->parent, NULL, mi,
cost, cost, 0);
if (!he)
return -ENOMEM;

+ iter->he = he;
+ return 0;
+}
+
+static int
+iter_finish_mem_entry(struct add_entry_iter *iter, struct addr_location *al)
+{
+ struct perf_evsel *evsel = iter->evsel;
+ struct hist_entry *he = iter->he;
+ struct mem_info *mi = iter->priv;
+ int err = -ENOMEM;
+ u64 cost;
+
/*
* In the TUI browser, we are doing integrated annotation,
* so we don't allocate the extra space needed because the stdio
@@ -138,152 +187,179 @@ static int perf_report__add_mem_hist_entry(struct perf_tool *tool,
goto out;
}

- if (sort__has_sym && he->mem_info->daddr.sym && use_browser > 0) {
+ if (sort__has_sym && mi->daddr.sym && use_browser > 0) {
struct annotation *notes;

- mx = he->mem_info;
-
- notes = symbol__annotation(mx->daddr.sym);
- if (notes->src == NULL && symbol__alloc_hist(mx->daddr.sym) < 0)
+ notes = symbol__annotation(mi->daddr.sym);
+ if (notes->src == NULL && symbol__alloc_hist(mi->daddr.sym) < 0)
goto out;

- err = symbol__inc_addr_samples(mx->daddr.sym,
- mx->daddr.map,
- evsel->idx,
- mx->daddr.al_addr);
+ err = symbol__inc_addr_samples(mi->daddr.sym, mi->daddr.map,
+ evsel->idx, mi->daddr.al_addr);
if (err)
goto out;
}

+ cost = iter->sample->weight;
+ if (!cost)
+ cost = 1;
+
evsel->hists.stats.total_period += cost;
hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
err = 0;

if (symbol_conf.use_callchain) {
- err = callchain_append(he->callchain,
- &callchain_cursor,
- sample->period);
+ err = callchain_append(he->callchain, &callchain_cursor,
+ iter->sample->period);
}
+
out:
return err;
}

-static int perf_report__add_branch_hist_entry(struct perf_tool *tool,
- struct addr_location *al,
- struct perf_sample *sample,
- struct perf_evsel *evsel,
- struct machine *machine)
+static int
+iter_prepare_branch_entry(struct add_entry_iter *iter, struct machine *machine,
+ struct perf_evsel *evsel, struct addr_location *al,
+ struct perf_sample *sample)
{
- struct perf_report *rep = container_of(tool, struct perf_report, tool);
- struct symbol *parent = NULL;
- int err = 0;
- unsigned i;
- struct hist_entry *he;
- struct branch_info *bi, *bx;
-
- if ((sort__has_parent || symbol_conf.use_callchain)
- && sample->callchain) {
- err = machine__resolve_callchain(machine, evsel, al->thread,
- sample, &parent, al,
- rep->max_stack);
- if (err)
- return err;
- }
+ struct branch_info *bi;

bi = machine__resolve_bstack(machine, al->thread,
sample->branch_stack);
if (!bi)
return -ENOMEM;

- for (i = 0; i < sample->branch_stack->nr; i++) {
- if (rep->hide_unresolved && !(bi[i].from.sym && bi[i].to.sym))
- continue;
+ iter->curr = 0;
+ iter->total = sample->branch_stack->nr;

- err = -ENOMEM;
+ iter->evsel = evsel;
+ iter->sample = sample;
+ iter->priv = bi;
+ return 0;
+}

- /* overwrite the 'al' to branch-to info */
- al->map = bi[i].to.map;
- al->sym = bi[i].to.sym;
- al->addr = bi[i].to.addr;
- /*
- * The report shows the percentage of total branches captured
- * and not events sampled. Thus we use a pseudo period of 1.
- */
- he = __hists__add_entry(&evsel->hists, al, parent, &bi[i], NULL,
- 1, 1, 0);
- if (he) {
- struct annotation *notes;
- bx = he->branch_info;
- if (bx->from.sym && use_browser == 1 && sort__has_sym) {
- notes = symbol__annotation(bx->from.sym);
- if (!notes->src
- && symbol__alloc_hist(bx->from.sym) < 0)
- goto out;
-
- err = symbol__inc_addr_samples(bx->from.sym,
- bx->from.map,
- evsel->idx,
- bx->from.al_addr);
- if (err)
- goto out;
- }
+static int
+iter_add_single_branch_entry(struct add_entry_iter *iter __maybe_unused,
+ struct addr_location *al __maybe_unused)
+{
+ return 0;
+}

- if (bx->to.sym && use_browser == 1 && sort__has_sym) {
- notes = symbol__annotation(bx->to.sym);
- if (!notes->src
- && symbol__alloc_hist(bx->to.sym) < 0)
- goto out;
-
- err = symbol__inc_addr_samples(bx->to.sym,
- bx->to.map,
- evsel->idx,
- bx->to.al_addr);
- if (err)
- goto out;
- }
- evsel->hists.stats.total_period += 1;
- hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
- } else
- goto out;
- }
- err = 0;
-out:
- free(bi);
- return err;
+static int
+iter_next_branch_entry(struct add_entry_iter *iter, struct addr_location *al)
+{
+ struct branch_info *bi = iter->priv;
+ int i = iter->curr;
+
+ if (iter->curr >= iter->total)
+ return 0;
+
+ al->map = bi[i].to.map;
+ al->sym = bi[i].to.sym;
+ al->addr = bi[i].to.addr;
+ return 1;
}

-static int perf_evsel__add_hist_entry(struct perf_tool *tool,
- struct perf_evsel *evsel,
- struct addr_location *al,
- struct perf_sample *sample,
- struct machine *machine)
+static int
+iter_add_next_branch_entry(struct add_entry_iter *iter, struct addr_location *al)
{
- struct perf_report *rep = container_of(tool, struct perf_report, tool);
- struct symbol *parent = NULL;
- int err = 0;
+ struct branch_info *bi, *bx;
+ struct annotation *notes;
+ struct perf_evsel *evsel = iter->evsel;
struct hist_entry *he;
+ int i = iter->curr;
+ int err;

- if ((sort__has_parent || symbol_conf.use_callchain) && sample->callchain) {
- err = machine__resolve_callchain(machine, evsel, al->thread,
- sample, &parent, al,
- rep->max_stack);
+ bi = iter->priv;
+
+ if (iter->rep->hide_unresolved && !(bi[i].from.sym && bi[i].to.sym))
+ goto out;
+
+ /*
+ * The report shows the percentage of total branches captured
+ * and not events sampled. Thus we use a pseudo period of 1.
+ */
+ he = __hists__add_entry(&evsel->hists, al, iter->parent, &bi[i], NULL,
+ 1, 1, 0);
+ if (he == NULL)
+ return -ENOMEM;
+
+ bx = he->branch_info;
+ if (bx->from.sym && use_browser == 1 && sort__has_sym) {
+ notes = symbol__annotation(bx->from.sym);
+ if (!notes->src && symbol__alloc_hist(bx->from.sym) < 0)
+ return -ENOMEM;
+
+ err = symbol__inc_addr_samples(bx->from.sym, bx->from.map,
+ evsel->idx, bx->from.al_addr);
+ if (err)
+ return err;
+ }
+
+ if (bx->to.sym && use_browser == 1 && sort__has_sym) {
+ notes = symbol__annotation(bx->to.sym);
+ if (!notes->src && symbol__alloc_hist(bx->to.sym) < 0)
+ return -ENOMEM;
+
+ err = symbol__inc_addr_samples(bx->to.sym, bx->to.map,
+ evsel->idx, bx->to.al_addr);
if (err)
return err;
}
+ evsel->hists.stats.total_period += 1;
+ hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+
+out:
+ iter->curr++;
+ return 0;
+}
+
+static int
+iter_finish_branch_entry(struct add_entry_iter *iter,
+ struct addr_location *al __maybe_unused)
+{
+ free(iter->priv);
+
+ return iter->curr >= iter->total ? 0 : -1;
+}
+
+static int
+iter_prepare_normal_entry(struct add_entry_iter *iter,
+ struct machine *machine __maybe_unused,
+ struct perf_evsel *evsel,
+ struct addr_location *al __maybe_unused,
+ struct perf_sample *sample)
+{
+ iter->evsel = evsel;
+ iter->sample = sample;
+ return 0;
+}
+
+static int
+iter_add_single_normal_entry(struct add_entry_iter *iter, struct addr_location *al)
+{
+ struct perf_evsel *evsel = iter->evsel;
+ struct perf_sample *sample = iter->sample;
+ struct hist_entry *he;

- he = __hists__add_entry(&evsel->hists, al, parent, NULL, NULL,
+ he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
sample->period, sample->weight,
sample->transaction);
if (he == NULL)
return -ENOMEM;

- if (symbol_conf.use_callchain) {
- err = callchain_append(he->callchain,
- &callchain_cursor,
- sample->period);
- if (err)
- return err;
- }
+ iter->he = he;
+ return 0;
+}
+
+static int
+iter_finish_normal_entry(struct add_entry_iter *iter, struct addr_location *al)
+{
+ int err = 0;
+ struct hist_entry *he = iter->he;
+ struct perf_evsel *evsel = iter->evsel;
+ struct perf_sample *sample = iter->sample;
+
/*
* Only in the TUI browser we are doing integrated annotation,
* so we don't allocated the extra space needed because the stdio
@@ -294,19 +370,79 @@ static int perf_evsel__add_hist_entry(struct perf_tool *tool,

assert(evsel != NULL);

- err = -ENOMEM;
if (notes->src == NULL && symbol__alloc_hist(he->ms.sym) < 0)
- goto out;
+ return -ENOMEM;

err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
}

evsel->hists.stats.total_period += sample->period;
hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
-out:
+
+ if (symbol_conf.use_callchain) {
+ err = callchain_append(he->callchain, &callchain_cursor,
+ sample->period);
+ }
return err;
}

+static struct add_entry_iter mem_iter = {
+ .prepare_entry = iter_prepare_mem_entry,
+ .add_single_entry = iter_add_single_mem_entry,
+ .next_entry = iter_next_nop_entry,
+ .add_next_entry = iter_add_next_nop_entry,
+ .finish_entry = iter_finish_mem_entry,
+};
+
+static struct add_entry_iter branch_iter = {
+ .prepare_entry = iter_prepare_branch_entry,
+ .add_single_entry = iter_add_single_branch_entry,
+ .next_entry = iter_next_branch_entry,
+ .add_next_entry = iter_add_next_branch_entry,
+ .finish_entry = iter_finish_branch_entry,
+};
+
+static struct add_entry_iter normal_iter = {
+ .prepare_entry = iter_prepare_normal_entry,
+ .add_single_entry = iter_add_single_normal_entry,
+ .next_entry = iter_next_nop_entry,
+ .add_next_entry = iter_add_next_nop_entry,
+ .finish_entry = iter_finish_normal_entry,
+};
+
+static int
+perf_evsel__add_entry(struct perf_evsel *evsel, struct addr_location *al,
+ struct perf_sample *sample, struct machine *machine,
+ struct add_entry_iter *iter)
+{
+ int err;
+
+ if ((sort__has_parent || symbol_conf.use_callchain) && sample->callchain) {
+ err = machine__resolve_callchain(machine, evsel, al->thread,
+ sample, &iter->parent, al,
+ iter->rep->max_stack);
+ if (err)
+ return err;
+ }
+
+ err = iter->prepare_entry(iter, machine, evsel, al, sample);
+ if (err)
+ return err;
+
+ err = iter->add_single_entry(iter, al);
+ if (err)
+ return err;
+
+ while (iter->next_entry(iter, al)) {
+ err = iter->add_next_entry(iter, al);
+ if (err)
+ break;
+ }
+
+ err = iter->finish_entry(iter, al);
+
+ return err;
+}

static int process_sample_event(struct perf_tool *tool,
union perf_event *event,
@@ -316,6 +452,7 @@ static int process_sample_event(struct perf_tool *tool,
{
struct perf_report *rep = container_of(tool, struct perf_report, tool);
struct addr_location al;
+ struct add_entry_iter *iter;
int ret;

if (perf_event__preprocess_sample(event, machine, &al, sample) < 0) {
@@ -330,25 +467,21 @@ static int process_sample_event(struct perf_tool *tool,
if (rep->cpu_list && !test_bit(sample->cpu, rep->cpu_bitmap))
return 0;

- if (sort__mode == SORT_MODE__BRANCH) {
- ret = perf_report__add_branch_hist_entry(tool, &al, sample,
- evsel, machine);
- if (ret < 0)
- pr_debug("problem adding lbr entry, skipping event\n");
- } else if (rep->mem_mode == 1) {
- ret = perf_report__add_mem_hist_entry(tool, &al, sample,
- evsel, machine, event);
- if (ret < 0)
- pr_debug("problem adding mem entry, skipping event\n");
- } else {
- if (al.map != NULL)
- al.map->dso->hit = 1;
-
- ret = perf_evsel__add_hist_entry(tool, evsel, &al, sample,
- machine);
- if (ret < 0)
- pr_debug("problem incrementing symbol period, skipping event\n");
- }
+ if (sort__mode == SORT_MODE__BRANCH)
+ iter = &branch_iter;
+ else if (rep->mem_mode == 1)
+ iter = &mem_iter;
+ else
+ iter = &normal_iter;
+
+ if (al.map != NULL)
+ al.map->dso->hit = 1;
+
+ iter->rep = rep;
+ ret = perf_evsel__add_entry(evsel, &al, sample, machine, iter);
+ if (ret < 0)
+ pr_debug("problem adding hist entry, skipping event\n");
+
return ret;
}

--
1.7.11.7

2013-10-31 06:58:57

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 04/14] perf hists: Add support for accumulated stat of hist entry

From: Namhyung Kim <[email protected]>

Maintain accumulated stat information in hist_entry->stat_acc if
symbol_conf.cumulate_callchain is set. Fields in ->stat_acc have same
vaules initially, and will be updated as callchain is processed later.

Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/ui/stdio/hist.c | 1 +
tools/perf/util/callchain.c | 2 ++
tools/perf/util/callchain.h | 3 ++-
tools/perf/util/hist.c | 18 ++++++++++++++++++
tools/perf/util/sort.h | 1 +
5 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 6c152686e837..302f08afd6a2 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -283,6 +283,7 @@ static size_t hist_entry_callchain__fprintf(struct hist_entry *he,
return callchain__fprintf_flat(fp, &he->sorted_chain, total_samples);
break;
case CHAIN_NONE:
+ case CHAIN_CUMULATIVE:
break;
default:
pr_err("Bad callchain mode\n");
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index e3970e3eaacf..c10052c6e73c 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -52,6 +52,7 @@ rb_insert_callchain(struct rb_root *root, struct callchain_node *chain,
p = &(*p)->rb_right;
break;
case CHAIN_NONE:
+ case CHAIN_CUMULATIVE:
default:
break;
}
@@ -162,6 +163,7 @@ int callchain_register_param(struct callchain_param *param)
param->sort = sort_chain_flat;
break;
case CHAIN_NONE:
+ case CHAIN_CUMULATIVE:
default:
return -1;
}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 7bb36022377f..08cf367c2357 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -11,7 +11,8 @@ enum chain_mode {
CHAIN_NONE,
CHAIN_FLAT,
CHAIN_GRAPH_ABS,
- CHAIN_GRAPH_REL
+ CHAIN_GRAPH_REL,
+ CHAIN_CUMULATIVE,
};

enum chain_order {
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 3a06b3326f12..c38655dedad3 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -238,6 +238,8 @@ static bool hists__decay_entry(struct hists *hists, struct hist_entry *he)
return true;

hist_entry__decay(&he->stat);
+ if (callchain_param.mode == CHAIN_CUMULATIVE)
+ hist_entry__decay(he->stat_acc);

if (!he->filtered)
hists->stats.total_period -= prev_period - he->stat.period;
@@ -285,6 +287,15 @@ static struct hist_entry *hist_entry__new(struct hist_entry *template)
if (he != NULL) {
*he = *template;

+ if (callchain_param.mode == CHAIN_CUMULATIVE) {
+ he->stat_acc = malloc(sizeof(he->stat));
+ if (he->stat_acc == NULL) {
+ free(he);
+ return NULL;
+ }
+ memcpy(he->stat_acc, &he->stat, sizeof(he->stat));
+ }
+
if (he->ms.map)
he->ms.map->referenced = true;

@@ -296,6 +307,7 @@ static struct hist_entry *hist_entry__new(struct hist_entry *template)
*/
he->branch_info = malloc(sizeof(*he->branch_info));
if (he->branch_info == NULL) {
+ free(he->stat_acc);
free(he);
return NULL;
}
@@ -368,6 +380,8 @@ static struct hist_entry *add_hist_entry(struct hists *hists,

if (!cmp) {
he_stat__add_period(&he->stat, period, weight);
+ if (callchain_param.mode == CHAIN_CUMULATIVE)
+ he_stat__add_period(he->stat_acc, period, weight);

/*
* This mem info was allocated from machine__resolve_mem
@@ -404,6 +418,8 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
rb_insert_color(&he->rb_node_in, hists->entries_in);
out:
hist_entry__add_cpumode_period(&he->stat, al->cpumode, period);
+ if (callchain_param.mode == CHAIN_CUMULATIVE)
+ hist_entry__add_cpumode_period(he->stat_acc, al->cpumode, period);
return he;
}

@@ -502,6 +518,8 @@ static bool hists__collapse_insert_entry(struct hists *hists __maybe_unused,

if (!cmp) {
he_stat__add_stat(&iter->stat, &he->stat);
+ if (callchain_param.mode == CHAIN_CUMULATIVE)
+ he_stat__add_stat(iter->stat_acc, he->stat_acc);

if (symbol_conf.use_callchain) {
callchain_cursor_reset(&callchain_cursor);
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index bf4333694d3a..2e2f0f5a1502 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -82,6 +82,7 @@ struct hist_entry {
struct list_head head;
} pairs;
struct he_stat stat;
+ struct he_stat *stat_acc;
struct map_symbol ms;
struct thread *thread;
u64 ip;
--
1.7.11.7

2013-10-31 06:59:35

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 03/14] perf hists: Convert hist entry functions to use struct he_stat

From: Namhyung Kim <[email protected]>

hist_entry__add_cpumode_period() and hist_entry__decay() are dealing
with hist_entry's stat fields only. So make them use the struct
directly.

Cc: Arun Sharma <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/hist.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index e6f953190443..3a06b3326f12 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -182,21 +182,21 @@ void hists__output_recalc_col_len(struct hists *hists, int max_rows)
}
}

-static void hist_entry__add_cpumode_period(struct hist_entry *he,
+static void hist_entry__add_cpumode_period(struct he_stat *he_stat,
unsigned int cpumode, u64 period)
{
switch (cpumode) {
case PERF_RECORD_MISC_KERNEL:
- he->stat.period_sys += period;
+ he_stat->period_sys += period;
break;
case PERF_RECORD_MISC_USER:
- he->stat.period_us += period;
+ he_stat->period_us += period;
break;
case PERF_RECORD_MISC_GUEST_KERNEL:
- he->stat.period_guest_sys += period;
+ he_stat->period_guest_sys += period;
break;
case PERF_RECORD_MISC_GUEST_USER:
- he->stat.period_guest_us += period;
+ he_stat->period_guest_us += period;
break;
default:
break;
@@ -223,10 +223,10 @@ static void he_stat__add_stat(struct he_stat *dest, struct he_stat *src)
dest->weight += src->weight;
}

-static void hist_entry__decay(struct hist_entry *he)
+static void hist_entry__decay(struct he_stat *he_stat)
{
- he->stat.period = (he->stat.period * 7) / 8;
- he->stat.nr_events = (he->stat.nr_events * 7) / 8;
+ he_stat->period = (he_stat->period * 7) / 8;
+ he_stat->nr_events = (he_stat->nr_events * 7) / 8;
/* XXX need decay for weight too? */
}

@@ -237,7 +237,7 @@ static bool hists__decay_entry(struct hists *hists, struct hist_entry *he)
if (prev_period == 0)
return true;

- hist_entry__decay(he);
+ hist_entry__decay(&he->stat);

if (!he->filtered)
hists->stats.total_period -= prev_period - he->stat.period;
@@ -403,7 +403,7 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
rb_link_node(&he->rb_node_in, parent, p);
rb_insert_color(&he->rb_node_in, hists->entries_in);
out:
- hist_entry__add_cpumode_period(he, al->cpumode, period);
+ hist_entry__add_cpumode_period(&he->stat, al->cpumode, period);
return he;
}

--
1.7.11.7

2013-10-31 08:09:40

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)



* Namhyung Kim <[email protected]> wrote:

> When the -g cumulative option is given, it'll be shown like this:
>
> $ perf report -g cumulative --stdio
>
> # Overhead Overhead (Acc) Command Shared Object Symbol
> # ........ .............. ....... ................. .......................
> #
> 0.00% 88.29% abc libc-2.17.so [.] __libc_start_main
> 0.00% 88.29% abc abc [.] main
> 0.00% 88.29% abc abc [.] c
> 0.00% 88.29% abc abc [.] b
> 88.29% 88.29% abc abc [.] a
> 0.00% 11.61% abc ld-2.17.so [k] _dl_sysdep_start
> 0.00% 9.43% abc ld-2.17.so [.] dl_main
> 9.43% 9.43% abc ld-2.17.so [.] _dl_relocate_object
> 2.27% 2.27% abc [kernel.kallsyms] [k] page_fault
> 0.00% 2.18% abc ld-2.17.so [k] _dl_start_user
> 0.00% 0.10% abc ld-2.17.so [.] _start
>
> As you can see __libc_start_main -> main -> c -> b -> a callchain
> show up in the output.

This looks really useful!

A couple of details:

1)

This is pretty close to SysProf output, right? So why not use the
well-known SysProf naming and call the first column 'self' and the
second column 'total'? I think those names are pretty intuitive and
it would help people who come from SysProf over to perf.

2)

Is it possible to configure the default 'report -g' style, so that
people who'd like to use it all the time don't have to type '-g
cumulative' all the time?

3)

I'd even argue that we enable this reporting feature by default, if
a data file includes call-chain data: the first column will still
show the well-known percentage that perf report produces today, the
second column will be a new feature in essence.

The only open question would be, by which column should we sort:
'sysprof style' sorts by 'total', 'perf style' sorts by 'self'.
Agreed?

4)

This is not directly related to the new feature you added:
call-graph profiling still takes quite a bit of time. It might make
sense to save the ordered histogram to a perf.data.ordered file, so
that repeat invocations of 'perf report' don't have to recalculate
everything again and again?

This file would be maintained transparently and would only be
re-created when the perf.data file changes, or something like that.

5)

I realize that this is an early RFC, still there are some usability
complaints I have about call-graph recording/reporting which should
be addressed before adding new features.

For example I tried to get a list of the -g output modi via:

$ perf report -g help

Which produced a lot of options - I think it should produce only a
list of -g options. It also doesn't list cumulative:

-g, --call-graph <output_type,min_percent[,print_limit],call_order>
Display callchains using output_type
(graph, flat, fractal, or none) , min percent threshold, optional
print limit, callchain order, key (function or address). Default:
fractal,0.5,callee,function

Also, the list is very long and not very readable - I think there
should be more newlines.

Then I tried to do:

$ perf report -g

which, somewhat surprisingly, was accepted. Given that call-graph
perf.data is recognized automatically by 'perf report', the -g
option should only accept -g <type> syntax and provide a list of
options when '-g' or '-g help' is provided.

6)

A similar UI problem exists on the 'perf record' side: 'perf record
--call-graph help' should produce a specific list of call-graph
possibilities, not the two screens full output it does today.

> I know it have some rough edges or even bugs, but I really want to
> release it and get reviews. It does not handle event groups and
> annotations and it has a bug on TUI.
>
> You can also get this series on 'perf/cumulate-v2' branch in my tree at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

So I tried it out on top of tip:master, with your testcase, and in
the --stdio case it works very well:

# Overhead Overhead (Acc) Command Shared Object Symbol
# ........ .............. ....... ................. ..........................................
#
0.00% 100.00% abc abc [.] _start
0.00% 100.00% abc libc-2.17.so [.] __libc_start_main
0.00% 100.00% abc abc [.] main
0.00% 100.00% abc abc [.] c
0.00% 100.00% abc abc [.] b
99.79% 100.00% abc abc [.] a
0.01% 0.21% abc [kernel.kallsyms] [k] apic_timer_interrupt

In the TUI output the 'c' entry is not visible:

99.79% 100.00% abc abc [.] a
0.00% 99.79% abc abc [.] b
0.01% 0.21% abc [kernel.kallsyms] [k] apic_timer_interrupt
0.00% 0.19% abc [kernel.kallsyms] [k] smp_apic_timer_interrupt

I suspect this is the 'TUI bug' you mentioned?

> Any comments are welcome, thanks.

Cool stuff, let's fix & merge it ASAP! :-)

Thanks,

Ingo

2013-10-31 11:14:11

by Rodrigo Campos

[permalink] [raw]
Subject: Re: [PATCH 08/14] perf report: Cache cumulative callchains

On Thu, Oct 31, 2013 at 03:56:10PM +0900, Namhyung Kim wrote:
> From: Namhyung Kim <[email protected]>
>
> It is possble that a callchain has cycles or recursive calls. In that
> case it'll end up having entries more than 100% overhead in the
> output. In order to prevent such entries, cache each callchain node
> and skip if same entry already cumulated.
>
> Cc: Arun Sharma <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Signed-off-by: Namhyung Kim <[email protected]>
> ---
> tools/perf/builtin-report.c | 49 +++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 49 insertions(+)
>
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index 1b152a8b7f51..1a0de9a4a568 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -387,6 +387,11 @@ iter_finish_normal_entry(struct add_entry_iter *iter, struct addr_location *al)
> return err;
> }
>
> +struct cumulative_cache {
> + struct dso *dso;
> + struct symbol *sym;
> +};
> +
> static int
> iter_prepare_cumulative_entry(struct add_entry_iter *iter,
> struct machine *machine,
> @@ -394,9 +399,31 @@ iter_prepare_cumulative_entry(struct add_entry_iter *iter,
> struct addr_location *al __maybe_unused,
> struct perf_sample *sample)
> {
> + struct callchain_cursor_node *node;
> + struct cumulative_cache *ccache;
> +
> callchain_cursor_commit(&callchain_cursor);
>
> /*
> + * This is for detecting cycles or recursions so that they're
> + * cumulated only one time to prevent entries more than 100%
> + * overhead.
> + */
> + ccache = malloc(sizeof(*ccache) * PERF_MAX_STACK_DEPTH);
> + if (ccache == NULL)
> + return -ENOMEM;
> +
> + node = callchain_cursor_current(&callchain_cursor);
> + if (node == NULL)
> + return 0;

Here you return without assigning iter->priv nor iter->priv->dso iter->priv->sym

> +
> + ccache[0].dso = node->map->dso;
> + ccache[0].sym = node->sym;
> +
> + iter->priv = ccache;
> + iter->curr = 1;

Because the assignment is done here.

> +
> + /*
> * The first callchain node always contains same information
> * as a hist entry itself. So skip it in order to prevent
> * double accounting.
> @@ -501,8 +528,29 @@ iter_add_next_cumulative_entry(struct add_entry_iter *iter,
> {
> struct perf_evsel *evsel = iter->evsel;
> struct perf_sample *sample = iter->sample;
> + struct cumulative_cache *ccache = iter->priv;
> struct hist_entry *he;
> int err = 0;
> + int i;
> +
> + /*
> + * Check if there's duplicate entries in the callchain.
> + * It's possible that it has cycles or recursive calls.
> + */
> + for (i = 0; i < iter->curr; i++) {
> + if (sort__has_sym) {
> + if (ccache[i].sym == al->sym)
> + return 0;
> + } else {
> + /* Not much we can do - just compare the dso. */
> + if (ccache[i].dso == al->map->dso)

sym and dso are used here

> + return 0;
> + }
> + }
> +
> + ccache[i].dso = al->map->dso;
> + ccache[i].sym = al->sym;
> + iter->curr++;
>
> he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
> sample->period, sample->weight,
> @@ -538,6 +586,7 @@ iter_finish_cumulative_entry(struct add_entry_iter *iter,
> evsel->hists.stats.total_period += sample->period;
> hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
>
> + free(iter->priv);

And here I'm seeing a double free when trying the patchset with other examples.
I added a printf to the "if (node == NULL)" case and I'm hitting it. So it seems
to me that, when reusing the entry, every user is freeing it and then the double
free.

This is my first time looking at perf code, so I might be missing LOT of things,
sorry in advance :)

I tried copying the dso and sym to the new allocated mem (and assigning
iter->priv = ccache before the return if "node == NULL"), as shown in the
attached patch, but when running with valgrind it also added some invalid reads
and segfaults (without valgrind it didn't segfault, but I must be "lucky").

So if there is no node (node == NULL) and we cannot read the dso and sym from
the current values of iter->priv (they show invalid reads in valgrind), I'm not
sure where can we read them. And, IIUC, we should initialize them because they
are used later. So maybe there are only some cases where we can read iter->priv
and for the other cases just initialize to something (although doesn't feel
possible because it's the dso and sym) ? Or should we read/copy them from some
other place (maybe before some other thing is free'd) ? Or maybe forget about
the malloc when node == NULL and just use iter->priv and the free shouldn't be
executed till iter->curr == 1 ? I added that if for the free, but didn't help.
Although I didn't really check how iter->curr is used. What am I missing ?

I'm not really sure which is the fix for this. Also just in case I tried
assigning "iter->priv = NULL" after it's free'd and it """fixes""" it.

Just reverting the patch (reverts without conflict) also solves the double free
problem for me (although it probably introduces the problem the patch tries to
fix =) and seems to make valgrind happy too.




Thanks a lot and sorry again if I'm completely missing some "rules/invariants",
I'm really new to perf :)

Rodrigo


Attachments:
(No filename) (5.14 kB)
broken-patch.diff (820.00 B)
Download all attachments