2009-06-26 14:28:17

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 0/2] perfcounter: callchains with perf report

Hi,

Here is a first shot for the sorted callchains per entries handling
with per report.

I'll continue to improve it:

- symbol resolution
- profit we have a tree to display a better graph hierarchy
- let the user provide a limit for hit percentage, depth, number of
backtraces, etc...
- better output
- colors
- and so on....

Thanks.

Frederic Weisbecker (2):
perfcounter: prepare a small callchain framework
perfcounter: print sorted callchains per histogram entries

tools/perf/Makefile | 1 +
tools/perf/builtin-report.c | 87 +++++++++++++++++----
tools/perf/perf.h | 5 +
tools/perf/util/callchain.c | 174 +++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/callchain.h | 33 ++++++++
5 files changed, 284 insertions(+), 16 deletions(-)
create mode 100644 tools/perf/util/callchain.c
create mode 100644 tools/perf/util/callchain.h


2009-06-26 14:28:28

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 1/2] perfcounter: prepare a small callchain framework

We plan to display the callchains depending on some user-configurable
parameters.
To gather the callchains stats from the recorded stream in a fast way,
this patch introduces an ad hoc radix tree adapted for callchains and also
a rbtree to sort these callchains once we have gathered every events
from the stream.

Signed-off-by: Frederic Weisbecker <[email protected]>
---
tools/perf/Makefile | 1 +
tools/perf/builtin-report.c | 5 -
tools/perf/perf.h | 5 +
tools/perf/util/callchain.c | 174 +++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/callchain.h | 33 ++++++++
5 files changed, 213 insertions(+), 5 deletions(-)
create mode 100644 tools/perf/util/callchain.c
create mode 100644 tools/perf/util/callchain.h

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index d3887ed..1c1296d 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -329,6 +329,7 @@ LIB_OBJS += util/symbol.o
LIB_OBJS += util/color.o
LIB_OBJS += util/pager.o
LIB_OBJS += util/header.o
+LIB_OBJS += util/callchain.o

BUILTIN_OBJS += builtin-annotate.o
BUILTIN_OBJS += builtin-help.o
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 681c223..28d1cb2 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -62,11 +62,6 @@ struct ip_event {
unsigned char __more_data[];
};

-struct ip_callchain {
- u64 nr;
- u64 ips[0];
-};
-
struct mmap_event {
struct perf_event_header header;
u32 pid, tid;
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 0f32a2e..ce39419 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -72,4 +72,9 @@ sys_perf_counter_open(struct perf_counter_attr *attr,
#define MAX_COUNTERS 256
#define MAX_NR_CPUS 256

+struct ip_callchain {
+ u64 nr;
+ u64 ips[0];
+};
+
#endif
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
new file mode 100644
index 0000000..ad3c285
--- /dev/null
+++ b/tools/perf/util/callchain.c
@@ -0,0 +1,174 @@
+/*
+ * Copyright (C) 2009, Frederic Weisbecker <[email protected]>
+ *
+ * Handle the callchains from the stream in an ad-hoc radix tree and then
+ * sort them in an rbtree.
+ *
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <stdbool.h>
+#include <errno.h>
+
+#include "callchain.h"
+
+
+static void rb_insert_callchain(struct rb_root *root, struct callchain_node *chain)
+{
+ struct rb_node **p = &root->rb_node;
+ struct rb_node *parent = NULL;
+ struct callchain_node *rnode;
+
+ while (*p) {
+ parent = *p;
+ rnode = rb_entry(parent, struct callchain_node, rb_node);
+
+ if (rnode->hit < chain->hit)
+ p = &(*p)->rb_left;
+ else
+ p = &(*p)->rb_right;
+ }
+
+ rb_link_node(&chain->rb_node, parent, p);
+ rb_insert_color(&chain->rb_node, root);
+}
+
+/*
+ * Once we get every callchains from the stream, we can now
+ * sort them by hit
+ */
+void sort_chain_to_rbtree(struct rb_root *rb_root, struct callchain_node *node)
+{
+ struct callchain_node *child;
+
+ list_for_each_entry(child, &node->children, brothers)
+ sort_chain_to_rbtree(rb_root, child);
+
+ if (node->hit)
+ rb_insert_callchain(rb_root, node);
+}
+
+static struct callchain_node *create_child(struct callchain_node *parent)
+{
+ struct callchain_node *new;
+
+ new = malloc(sizeof(*new));
+ if (!new) {
+ perror("not enough memory to create child for code path tree");
+ return NULL;
+ }
+ new->parent = parent;
+ INIT_LIST_HEAD(&new->children);
+ INIT_LIST_HEAD(&new->val);
+ list_add_tail(&new->brothers, &parent->children);
+
+ return new;
+}
+
+static void
+fill_node(struct callchain_node *node, struct ip_callchain *chain, int start)
+{
+ int i;
+
+ for (i = start; i < chain->nr; i++) {
+ struct callchain_list *call;
+
+ call = malloc(sizeof(*chain));
+ if (!call) {
+ perror("not enough memory for the code path tree");
+ return;
+ }
+ call->ip = chain->ips[i];
+ list_add_tail(&call->list, &node->val);
+ }
+ node->val_nr = i - start;
+}
+
+static void add_child(struct callchain_node *parent, struct ip_callchain *chain)
+{
+ struct callchain_node *new;
+
+ new = create_child(parent);
+ fill_node(new, chain, parent->val_nr);
+
+ new->hit = 1;
+}
+
+static void
+split_add_child(struct callchain_node *parent, struct ip_callchain *chain,
+ struct callchain_list *to_split, int idx)
+{
+ struct callchain_node *new;
+
+ /* split */
+ new = create_child(parent);
+ list_move_tail(&to_split->list, &new->val);
+ new->hit = parent->hit;
+ parent->hit = 0;
+ parent->val_nr = idx;
+
+ /* create the new one */
+ add_child(parent, chain);
+}
+
+static int
+__append_chain(struct callchain_node *root, struct ip_callchain *chain,
+ int start);
+
+static int
+__append_chain_children(struct callchain_node *root, struct ip_callchain *chain)
+{
+ struct callchain_node *rnode;
+
+ /* lookup in childrens */
+ list_for_each_entry(rnode, &root->children, brothers) {
+ int ret = __append_chain(rnode, chain, root->val_nr);
+ if (!ret)
+ return 0;
+ }
+ return -1;
+}
+
+static int
+__append_chain(struct callchain_node *root, struct ip_callchain *chain,
+ int start)
+{
+ struct callchain_list *cnode;
+ int i = start;
+ bool found = false;
+
+ /* lookup in the current node */
+ list_for_each_entry(cnode, &root->val, list) {
+ if (cnode->ip != chain->ips[i++])
+ break;
+ if (!found)
+ found = true;
+ if (i == chain->nr)
+ break;
+ }
+
+ /* matches not, relay on the parent */
+ if (!found)
+ return -1;
+
+ /* we match only a part of the node. Split it and add the new chain */
+ if (i < root->val_nr) {
+ split_add_child(root, chain, cnode, i);
+ return 0;
+ }
+
+ /* we match 100% of the path, increment the hit */
+ if (i == root->val_nr) {
+ root->hit++;
+ return 0;
+ }
+
+ return __append_chain_children(root, chain);
+}
+
+void append_chain(struct callchain_node *root, struct ip_callchain *chain)
+{
+ if (__append_chain_children(root, chain) == -1)
+ add_child(root, chain);
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
new file mode 100644
index 0000000..fa1cd2f
--- /dev/null
+++ b/tools/perf/util/callchain.h
@@ -0,0 +1,33 @@
+#ifndef __PERF_CALLCHAIN_H
+#define __PERF_CALLCHAIN_H
+
+#include "../perf.h"
+#include "list.h"
+#include "rbtree.h"
+
+
+struct callchain_node {
+ struct callchain_node *parent;
+ struct list_head brothers;
+ struct list_head children;
+ struct list_head val;
+ struct rb_node rb_node;
+ int val_nr;
+ int hit;
+};
+
+struct callchain_list {
+ unsigned long ip;
+ struct list_head list;
+};
+
+static inline void callchain_init(struct callchain_node *node)
+{
+ INIT_LIST_HEAD(&node->brothers);
+ INIT_LIST_HEAD(&node->children);
+ INIT_LIST_HEAD(&node->val);
+}
+
+void append_chain(struct callchain_node *root, struct ip_callchain *chain);
+void sort_chain_to_rbtree(struct rb_root *rb_root, struct callchain_node *node);
+#endif
--
1.6.2.3

2009-06-26 14:28:40

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 2/2] perfcounter: print sorted callchains per histogram entries

Use the newly created callchains radix tree to gather the chains stats
from the recorded events and then print the callchains for all of them,
sorted by hits, using the "-c" parameter with perf report.

Example:

66.15% [k] atm_clip_exit
63.08%
0xffffffffffffff80
0xffffffff810196a8
0xffffffff810c14c8
0xffffffff8101a79c
0xffffffff810194f3
0xffffffff8106ab7f
0xffffffff8106abe5
0xffffffff8106acde
0xffffffff8100d94b
0xffffffff8153e7ea
[...]

1.54%
0xffffffffffffff80
0xffffffff810196a8
0xffffffff810c14c8
0xffffffff8101a79c
[...]

Signed-off-by: Frederic Weisbecker <[email protected]>
---
tools/perf/builtin-report.c | 82 +++++++++++++++++++++++++++++++++++++------
1 files changed, 71 insertions(+), 11 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 28d1cb2..ed391db 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -15,6 +15,7 @@
#include "util/rbtree.h"
#include "util/symbol.h"
#include "util/string.h"
+#include "util/callchain.h"

#include "perf.h"
#include "util/header.h"
@@ -52,6 +53,7 @@ static char *parent_pattern = default_parent_pattern;
static regex_t parent_regex;

static int exclude_other = 1;
+static int callchain;

static u64 sample_type;

@@ -488,17 +490,19 @@ static size_t threads__fprintf(FILE *fp)
static struct rb_root hist;

struct hist_entry {
- struct rb_node rb_node;
-
- struct thread *thread;
- struct map *map;
- struct dso *dso;
- struct symbol *sym;
- struct symbol *parent;
- u64 ip;
- char level;
-
- u64 count;
+ struct rb_node rb_node;
+
+ struct thread *thread;
+ struct map *map;
+ struct dso *dso;
+ struct symbol *sym;
+ struct symbol *parent;
+ u64 ip;
+ char level;
+ struct callchain_node callchain;
+ struct rb_root sorted_chain;
+
+ u64 count;
};

/*
@@ -769,6 +773,48 @@ hist_entry__collapse(struct hist_entry *left, struct hist_entry *right)
}

static size_t
+callchain__fprintf(FILE *fp, struct callchain_node *self, u64 total_samples)
+{
+ struct callchain_list *chain;
+ size_t ret = 0;
+
+ if (!self)
+ return 0;
+
+ ret += callchain__fprintf(fp, self->parent, total_samples);
+
+
+ list_for_each_entry(chain, &self->val, list)
+ ret += fprintf(fp, " %p\n", (void *)chain->ip);
+
+ return ret;
+}
+
+static size_t
+hist_entry_callchain__fprintf(FILE *fp, struct hist_entry *self,
+ u64 total_samples)
+{
+ struct rb_node *rb_node;
+ struct callchain_node *chain;
+ size_t ret = 0;
+
+ rb_node = rb_first(&self->sorted_chain);
+ while (rb_node) {
+ double percent;
+
+ chain = rb_entry(rb_node, struct callchain_node, rb_node);
+ percent = chain->hit * 100.0 / total_samples;
+ ret += fprintf(fp, " %6.2f%%\n", percent);
+ ret += callchain__fprintf(fp, chain, total_samples);
+ ret += fprintf(fp, "\n");
+ rb_node = rb_next(rb_node);
+ }
+
+ return ret;
+}
+
+
+static size_t
hist_entry__fprintf(FILE *fp, struct hist_entry *self, u64 total_samples)
{
struct sort_entry *se;
@@ -808,6 +854,9 @@ hist_entry__fprintf(FILE *fp, struct hist_entry *self, u64 total_samples)

ret += fprintf(fp, "\n");

+ if (callchain)
+ hist_entry_callchain__fprintf(fp, self, total_samples);
+
return ret;
}

@@ -892,6 +941,7 @@ hist_entry__add(struct thread *thread, struct map *map, struct dso *dso,
.level = level,
.count = count,
.parent = NULL,
+ .sorted_chain = RB_ROOT
};
int cmp;

@@ -934,6 +984,8 @@ hist_entry__add(struct thread *thread, struct map *map, struct dso *dso,

if (!cmp) {
he->count += count;
+ if (callchain)
+ append_chain(&he->callchain, chain);
return 0;
}

@@ -947,6 +999,10 @@ hist_entry__add(struct thread *thread, struct map *map, struct dso *dso,
if (!he)
return -ENOMEM;
*he = entry;
+ if (callchain) {
+ callchain_init(&he->callchain);
+ append_chain(&he->callchain, chain);
+ }
rb_link_node(&he->rb_node, parent, p);
rb_insert_color(&he->rb_node, &hist);

@@ -1023,6 +1079,9 @@ static void output__insert_entry(struct hist_entry *he)
struct rb_node *parent = NULL;
struct hist_entry *iter;

+ if (callchain)
+ sort_chain_to_rbtree(&he->sorted_chain, &he->callchain);
+
while (*p != NULL) {
parent = *p;
iter = rb_entry(parent, struct hist_entry, rb_node);
@@ -1599,6 +1658,7 @@ static const struct option options[] = {
"regex filter to identify parent, see: '--sort parent'"),
OPT_BOOLEAN('x', "exclude-other", &exclude_other,
"Only display entries with parent-match"),
+ OPT_BOOLEAN('c', "callchain", &callchain, "Display callchains"),
OPT_END()
};

--
1.6.2.3

2009-06-26 14:48:39

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0/2] perfcounter: callchains with perf report


* Frederic Weisbecker <[email protected]> wrote:

> Hi,
>
> Here is a first shot for the sorted callchains per entries handling
> with per report.

ok, this is already useful at displaying the raw data.

> I'll continue to improve it:
>
> - symbol resolution
> - profit we have a tree to display a better graph hierarchy
> - let the user provide a limit for hit percentage, depth, number of
> backtraces, etc...
> - better output
> - colors
> - and so on....

nice plans! :)

Ingo

2009-06-26 15:52:40

by Frederic Weisbecker

[permalink] [raw]
Subject: [tip:perfcounters/urgent] perf_counter tools: Prepare a small callchain framework

Commit-ID: 8cb76d99d715741637b6d0884f389e17e9cb05d2
Gitweb: http://git.kernel.org/tip/8cb76d99d715741637b6d0884f389e17e9cb05d2
Author: Frederic Weisbecker <[email protected]>
AuthorDate: Fri, 26 Jun 2009 16:28:00 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 26 Jun 2009 16:47:00 +0200

perf_counter tools: Prepare a small callchain framework

We plan to display the callchains depending on some user-configurable
parameters.

To gather the callchains stats from the recorded stream in a fast way,
this patch introduces an ad hoc radix tree adapted for callchains and also
a rbtree to sort these callchains once we have gathered every events
from the stream.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
tools/perf/Makefile | 1 +
tools/perf/builtin-report.c | 5 -
tools/perf/perf.h | 5 +
tools/perf/util/callchain.c | 174 +++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/callchain.h | 33 ++++++++
5 files changed, 213 insertions(+), 5 deletions(-)

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index d3887ed..1c1296d 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -329,6 +329,7 @@ LIB_OBJS += util/symbol.o
LIB_OBJS += util/color.o
LIB_OBJS += util/pager.o
LIB_OBJS += util/header.o
+LIB_OBJS += util/callchain.o

BUILTIN_OBJS += builtin-annotate.o
BUILTIN_OBJS += builtin-help.o
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 681c223..28d1cb2 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -62,11 +62,6 @@ struct ip_event {
unsigned char __more_data[];
};

-struct ip_callchain {
- u64 nr;
- u64 ips[0];
-};
-
struct mmap_event {
struct perf_event_header header;
u32 pid, tid;
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 16c84fd..a49842b 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -66,4 +66,9 @@ sys_perf_counter_open(struct perf_counter_attr *attr,
#define MAX_COUNTERS 256
#define MAX_NR_CPUS 256

+struct ip_callchain {
+ u64 nr;
+ u64 ips[0];
+};
+
#endif
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
new file mode 100644
index 0000000..ad3c285
--- /dev/null
+++ b/tools/perf/util/callchain.c
@@ -0,0 +1,174 @@
+/*
+ * Copyright (C) 2009, Frederic Weisbecker <[email protected]>
+ *
+ * Handle the callchains from the stream in an ad-hoc radix tree and then
+ * sort them in an rbtree.
+ *
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <stdbool.h>
+#include <errno.h>
+
+#include "callchain.h"
+
+
+static void rb_insert_callchain(struct rb_root *root, struct callchain_node *chain)
+{
+ struct rb_node **p = &root->rb_node;
+ struct rb_node *parent = NULL;
+ struct callchain_node *rnode;
+
+ while (*p) {
+ parent = *p;
+ rnode = rb_entry(parent, struct callchain_node, rb_node);
+
+ if (rnode->hit < chain->hit)
+ p = &(*p)->rb_left;
+ else
+ p = &(*p)->rb_right;
+ }
+
+ rb_link_node(&chain->rb_node, parent, p);
+ rb_insert_color(&chain->rb_node, root);
+}
+
+/*
+ * Once we get every callchains from the stream, we can now
+ * sort them by hit
+ */
+void sort_chain_to_rbtree(struct rb_root *rb_root, struct callchain_node *node)
+{
+ struct callchain_node *child;
+
+ list_for_each_entry(child, &node->children, brothers)
+ sort_chain_to_rbtree(rb_root, child);
+
+ if (node->hit)
+ rb_insert_callchain(rb_root, node);
+}
+
+static struct callchain_node *create_child(struct callchain_node *parent)
+{
+ struct callchain_node *new;
+
+ new = malloc(sizeof(*new));
+ if (!new) {
+ perror("not enough memory to create child for code path tree");
+ return NULL;
+ }
+ new->parent = parent;
+ INIT_LIST_HEAD(&new->children);
+ INIT_LIST_HEAD(&new->val);
+ list_add_tail(&new->brothers, &parent->children);
+
+ return new;
+}
+
+static void
+fill_node(struct callchain_node *node, struct ip_callchain *chain, int start)
+{
+ int i;
+
+ for (i = start; i < chain->nr; i++) {
+ struct callchain_list *call;
+
+ call = malloc(sizeof(*chain));
+ if (!call) {
+ perror("not enough memory for the code path tree");
+ return;
+ }
+ call->ip = chain->ips[i];
+ list_add_tail(&call->list, &node->val);
+ }
+ node->val_nr = i - start;
+}
+
+static void add_child(struct callchain_node *parent, struct ip_callchain *chain)
+{
+ struct callchain_node *new;
+
+ new = create_child(parent);
+ fill_node(new, chain, parent->val_nr);
+
+ new->hit = 1;
+}
+
+static void
+split_add_child(struct callchain_node *parent, struct ip_callchain *chain,
+ struct callchain_list *to_split, int idx)
+{
+ struct callchain_node *new;
+
+ /* split */
+ new = create_child(parent);
+ list_move_tail(&to_split->list, &new->val);
+ new->hit = parent->hit;
+ parent->hit = 0;
+ parent->val_nr = idx;
+
+ /* create the new one */
+ add_child(parent, chain);
+}
+
+static int
+__append_chain(struct callchain_node *root, struct ip_callchain *chain,
+ int start);
+
+static int
+__append_chain_children(struct callchain_node *root, struct ip_callchain *chain)
+{
+ struct callchain_node *rnode;
+
+ /* lookup in childrens */
+ list_for_each_entry(rnode, &root->children, brothers) {
+ int ret = __append_chain(rnode, chain, root->val_nr);
+ if (!ret)
+ return 0;
+ }
+ return -1;
+}
+
+static int
+__append_chain(struct callchain_node *root, struct ip_callchain *chain,
+ int start)
+{
+ struct callchain_list *cnode;
+ int i = start;
+ bool found = false;
+
+ /* lookup in the current node */
+ list_for_each_entry(cnode, &root->val, list) {
+ if (cnode->ip != chain->ips[i++])
+ break;
+ if (!found)
+ found = true;
+ if (i == chain->nr)
+ break;
+ }
+
+ /* matches not, relay on the parent */
+ if (!found)
+ return -1;
+
+ /* we match only a part of the node. Split it and add the new chain */
+ if (i < root->val_nr) {
+ split_add_child(root, chain, cnode, i);
+ return 0;
+ }
+
+ /* we match 100% of the path, increment the hit */
+ if (i == root->val_nr) {
+ root->hit++;
+ return 0;
+ }
+
+ return __append_chain_children(root, chain);
+}
+
+void append_chain(struct callchain_node *root, struct ip_callchain *chain)
+{
+ if (__append_chain_children(root, chain) == -1)
+ add_child(root, chain);
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
new file mode 100644
index 0000000..fa1cd2f
--- /dev/null
+++ b/tools/perf/util/callchain.h
@@ -0,0 +1,33 @@
+#ifndef __PERF_CALLCHAIN_H
+#define __PERF_CALLCHAIN_H
+
+#include "../perf.h"
+#include "list.h"
+#include "rbtree.h"
+
+
+struct callchain_node {
+ struct callchain_node *parent;
+ struct list_head brothers;
+ struct list_head children;
+ struct list_head val;
+ struct rb_node rb_node;
+ int val_nr;
+ int hit;
+};
+
+struct callchain_list {
+ unsigned long ip;
+ struct list_head list;
+};
+
+static inline void callchain_init(struct callchain_node *node)
+{
+ INIT_LIST_HEAD(&node->brothers);
+ INIT_LIST_HEAD(&node->children);
+ INIT_LIST_HEAD(&node->val);
+}
+
+void append_chain(struct callchain_node *root, struct ip_callchain *chain);
+void sort_chain_to_rbtree(struct rb_root *rb_root, struct callchain_node *node);
+#endif

2009-06-26 15:53:15

by Frederic Weisbecker

[permalink] [raw]
Subject: [tip:perfcounters/urgent] perf report: Print sorted callchains per histogram entries

Commit-ID: f55c555226b1010b249730ec6b232e5470286950
Gitweb: http://git.kernel.org/tip/f55c555226b1010b249730ec6b232e5470286950
Author: Frederic Weisbecker <[email protected]>
AuthorDate: Fri, 26 Jun 2009 16:28:01 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 26 Jun 2009 16:47:01 +0200

perf report: Print sorted callchains per histogram entries

Use the newly created callchains radix tree to gather the chains stats
from the recorded events and then print the callchains for all of them,
sorted by hits, using the "-c" parameter with perf report.

Example:

66.15% [k] atm_clip_exit
63.08%
0xffffffffffffff80
0xffffffff810196a8
0xffffffff810c14c8
0xffffffff8101a79c
0xffffffff810194f3
0xffffffff8106ab7f
0xffffffff8106abe5
0xffffffff8106acde
0xffffffff8100d94b
0xffffffff8153e7ea
[...]

1.54%
0xffffffffffffff80
0xffffffff810196a8
0xffffffff810c14c8
0xffffffff8101a79c
[...]

Symbols are not yet resolved.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
tools/perf/builtin-report.c | 82 +++++++++++++++++++++++++++++++++++++------
1 files changed, 71 insertions(+), 11 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 28d1cb2..ed391db 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -15,6 +15,7 @@
#include "util/rbtree.h"
#include "util/symbol.h"
#include "util/string.h"
+#include "util/callchain.h"

#include "perf.h"
#include "util/header.h"
@@ -52,6 +53,7 @@ static char *parent_pattern = default_parent_pattern;
static regex_t parent_regex;

static int exclude_other = 1;
+static int callchain;

static u64 sample_type;

@@ -488,17 +490,19 @@ static size_t threads__fprintf(FILE *fp)
static struct rb_root hist;

struct hist_entry {
- struct rb_node rb_node;
-
- struct thread *thread;
- struct map *map;
- struct dso *dso;
- struct symbol *sym;
- struct symbol *parent;
- u64 ip;
- char level;
-
- u64 count;
+ struct rb_node rb_node;
+
+ struct thread *thread;
+ struct map *map;
+ struct dso *dso;
+ struct symbol *sym;
+ struct symbol *parent;
+ u64 ip;
+ char level;
+ struct callchain_node callchain;
+ struct rb_root sorted_chain;
+
+ u64 count;
};

/*
@@ -769,6 +773,48 @@ hist_entry__collapse(struct hist_entry *left, struct hist_entry *right)
}

static size_t
+callchain__fprintf(FILE *fp, struct callchain_node *self, u64 total_samples)
+{
+ struct callchain_list *chain;
+ size_t ret = 0;
+
+ if (!self)
+ return 0;
+
+ ret += callchain__fprintf(fp, self->parent, total_samples);
+
+
+ list_for_each_entry(chain, &self->val, list)
+ ret += fprintf(fp, " %p\n", (void *)chain->ip);
+
+ return ret;
+}
+
+static size_t
+hist_entry_callchain__fprintf(FILE *fp, struct hist_entry *self,
+ u64 total_samples)
+{
+ struct rb_node *rb_node;
+ struct callchain_node *chain;
+ size_t ret = 0;
+
+ rb_node = rb_first(&self->sorted_chain);
+ while (rb_node) {
+ double percent;
+
+ chain = rb_entry(rb_node, struct callchain_node, rb_node);
+ percent = chain->hit * 100.0 / total_samples;
+ ret += fprintf(fp, " %6.2f%%\n", percent);
+ ret += callchain__fprintf(fp, chain, total_samples);
+ ret += fprintf(fp, "\n");
+ rb_node = rb_next(rb_node);
+ }
+
+ return ret;
+}
+
+
+static size_t
hist_entry__fprintf(FILE *fp, struct hist_entry *self, u64 total_samples)
{
struct sort_entry *se;
@@ -808,6 +854,9 @@ hist_entry__fprintf(FILE *fp, struct hist_entry *self, u64 total_samples)

ret += fprintf(fp, "\n");

+ if (callchain)
+ hist_entry_callchain__fprintf(fp, self, total_samples);
+
return ret;
}

@@ -892,6 +941,7 @@ hist_entry__add(struct thread *thread, struct map *map, struct dso *dso,
.level = level,
.count = count,
.parent = NULL,
+ .sorted_chain = RB_ROOT
};
int cmp;

@@ -934,6 +984,8 @@ hist_entry__add(struct thread *thread, struct map *map, struct dso *dso,

if (!cmp) {
he->count += count;
+ if (callchain)
+ append_chain(&he->callchain, chain);
return 0;
}

@@ -947,6 +999,10 @@ hist_entry__add(struct thread *thread, struct map *map, struct dso *dso,
if (!he)
return -ENOMEM;
*he = entry;
+ if (callchain) {
+ callchain_init(&he->callchain);
+ append_chain(&he->callchain, chain);
+ }
rb_link_node(&he->rb_node, parent, p);
rb_insert_color(&he->rb_node, &hist);

@@ -1023,6 +1079,9 @@ static void output__insert_entry(struct hist_entry *he)
struct rb_node *parent = NULL;
struct hist_entry *iter;

+ if (callchain)
+ sort_chain_to_rbtree(&he->sorted_chain, &he->callchain);
+
while (*p != NULL) {
parent = *p;
iter = rb_entry(parent, struct hist_entry, rb_node);
@@ -1599,6 +1658,7 @@ static const struct option options[] = {
"regex filter to identify parent, see: '--sort parent'"),
OPT_BOOLEAN('x', "exclude-other", &exclude_other,
"Only display entries with parent-match"),
+ OPT_BOOLEAN('c', "callchain", &callchain, "Display callchains"),
OPT_END()
};

2009-06-27 01:13:07

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH 0/2] perfcounter: callchains with perf report

Frederic Weisbecker writes:

> Here is a first shot for the sorted callchains per entries handling
> with per report.
>
> I'll continue to improve it:
>
> - symbol resolution
> - profit we have a tree to display a better graph hierarchy
> - let the user provide a limit for hit percentage, depth, number of
> backtraces, etc...
> - better output
> - colors
> - and so on....

Nice!

I have just about finished doing the kernel piece of callchain support
on powerpc. Because of the way function calls and returns work on
powerpc, working out the first one or two return addresses can be
tricky. We potentially have a valid return address in the link
register (LR), or in the LR save area in the second stack frame, or
both, and you need extra information such as DWARF unwind tables to
work out which of those three possibilities you have, in general.
This is the case at each point where an interrupt or signal has
occurred.

Because I didn't want to go trawling through CFI tables at interrupt
time, particularly for user code, I made the kernel save both possible
return addresses in the callchain. For the kernel part of the
callchain, I check those two addresses to see if they're valid kernel
addresses and set them to 0 if not, or if they're equal.

That means I need to make some changes to builtin-report.c to ignore
zero addresses. I may need to add stuff to look for and use unwind
tables as well, if we want completely accurate call chains.

The other thing I did is to put PERF_CONTEXT_KERNEL markers in the
callchain every time we find an interrupt frame, and PERF_CONTEXT_USER
markers every time we find a signal frame, so that userspace knows
when it needs to do the unwinding.

Oh, and a third point is that on powerpc the sampled IP recorded if
you ask for PERF_SAMPLE_IP won't in general be the same as the first
IP in the callchain. The reason is that the PERF_SAMPLE_IP value
points to the instruction that caused the counter overflow whereas the
first IP in the callchain tells you where the CPU took the interrupt.
That is almost always a few instructions further on, and can be quite
a way further on if interrupts were disabled when the counter overflow
occur. In fact we regularly see the PERF_SAMPLE_IP value being in the
hypervisor but the first IP in the callchain being in the kernel.

Paul.

2009-06-28 21:06:08

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 0/2] perfcounter: callchains with perf report

On Sat, Jun 27, 2009 at 11:12:47AM +1000, Paul Mackerras wrote:
> Frederic Weisbecker writes:
>
> > Here is a first shot for the sorted callchains per entries handling
> > with per report.
> >
> > I'll continue to improve it:
> >
> > - symbol resolution
> > - profit we have a tree to display a better graph hierarchy
> > - let the user provide a limit for hit percentage, depth, number of
> > backtraces, etc...
> > - better output
> > - colors
> > - and so on....
>
> Nice!
>
> I have just about finished doing the kernel piece of callchain support
> on powerpc. Because of the way function calls and returns work on
> powerpc, working out the first one or two return addresses can be
> tricky. We potentially have a valid return address in the link
> register (LR), or in the LR save area in the second stack frame, or
> both, and you need extra information such as DWARF unwind tables to
> work out which of those three possibilities you have, in general.
> This is the case at each point where an interrupt or signal has
> occurred.
>
> Because I didn't want to go trawling through CFI tables at interrupt
> time, particularly for user code, I made the kernel save both possible
> return addresses in the callchain. For the kernel part of the
> callchain, I check those two addresses to see if they're valid kernel
> addresses and set them to 0 if not, or if they're equal.
>
> That means I need to make some changes to builtin-report.c to ignore
> zero addresses. I may need to add stuff to look for and use unwind
> tables as well, if we want completely accurate call chains.


Well, I guess I can ignore them in my further patches.
But wouldn't it be better to discard them from the kernel?
Unless it's somewhat useful to know we had an unknown entry?


> The other thing I did is to put PERF_CONTEXT_KERNEL markers in the
> callchain every time we find an interrupt frame, and PERF_CONTEXT_USER
> markers every time we find a signal frame, so that userspace knows
> when it needs to do the unwinding.
>
> Oh, and a third point is that on powerpc the sampled IP recorded if
> you ask for PERF_SAMPLE_IP won't in general be the same as the first
> IP in the callchain. The reason is that the PERF_SAMPLE_IP value
> points to the instruction that caused the counter overflow whereas the
> first IP in the callchain tells you where the CPU took the interrupt.
> That is almost always a few instructions further on, and can be quite
> a way further on if interrupts were disabled when the counter overflow
> occur. In fact we regularly see the PERF_SAMPLE_IP value being in the
> hypervisor but the first IP in the callchain being in the kernel.


Ok.

>
> Paul.

2009-06-28 23:41:14

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH 0/2] perfcounter: callchains with perf report

Frederic Weisbecker writes:

> On Sat, Jun 27, 2009 at 11:12:47AM +1000, Paul Mackerras wrote:
>
> > That means I need to make some changes to builtin-report.c to ignore
> > zero addresses. I may need to add stuff to look for and use unwind
> > tables as well, if we want completely accurate call chains.
>
>
> Well, I guess I can ignore them in my further patches.
> But wouldn't it be better to discard them from the kernel?
> Unless it's somewhat useful to know we had an unknown entry?

If we discard the entries then userspace doesn't know which entries
were discarded, or whether any were discarded. As it is, userspace
can know that the second value after PERF_CONTEXT_KERNEL/USER is a LR
value or zero, and the third value is from the second stack frame (or
zero). If we discarded the entries then userspace wouldn't know
exactly where the second and third values came from, which would make
it harder to use unwind or traceback tables to work out more
accurately what the call chain was.

I would be open to replacing the bogus entries with some other
distinguishable value rather than zero if you think that would be
better.

Paul.