2014-01-10 12:32:55

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 1/4] perf, tools: Add support for prepending LBRs to the callstack

From: Andi Kleen <[email protected]>

I never found the default LBR display mode which generates histograms
of individual branches particularly useful.

This implements an alternative mode that creates histograms over complete
branch traces, instead of individual branches, similar to how normal
callgraphs are handled. This is done by putting it in
front of the normal callgraph and then using the normal callgraph
histogram infrastructure to unify them.

This way in complex functions we can understand the control flow
that lead to a particular sample.

The default output is unchanged.

This is only implemented in perf report, no change to record
or anywhere else.

This adds the basic code to report:
- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.
The rest of the history code is unchanged and doesn't know the difference
between LBR entry and normal call entry.

Current limitations:
- There is no attempt to cut off the LBR at the beginning of the function,
so there may be small overlaps between the callstack and the LBR.
- The LBR flags (mispredict etc.) are not shown in the history

Signed-off-by: Andi Kleen <[email protected]>
---
tools/perf/builtin-report.c | 15 ++++--
tools/perf/util/callchain.h | 1 +
tools/perf/util/machine.c | 113 ++++++++++++++++++++++++++++++++++++--------
tools/perf/util/symbol.h | 3 +-
4 files changed, 106 insertions(+), 26 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 8cf8e66..c2e6e43 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -706,7 +706,7 @@ parse_callchain_opt(const struct option *opt, const char *arg, int unset)
callchain_param.order = ORDER_CALLER;
else if (!strncmp(tok2, "callee", strlen("callee")))
callchain_param.order = ORDER_CALLEE;
- else
+ else if (tok2[0] != 0)
return -1;

/* Get the sort key */
@@ -717,8 +717,15 @@ parse_callchain_opt(const struct option *opt, const char *arg, int unset)
callchain_param.key = CCKEY_FUNCTION;
else if (!strncmp(tok2, "address", strlen("address")))
callchain_param.key = CCKEY_ADDRESS;
- else
+ else if (tok2[0] != 0)
return -1;
+
+ tok2 = strtok(NULL, ",");
+ if (!tok2)
+ goto setup;
+ if (!strncmp(tok2, "branch", 6))
+ callchain_param.branch_callstack = 1;
+
setup:
if (callchain_register_param(&callchain_param) < 0) {
fprintf(stderr, "Can't register callchain params\n");
@@ -831,8 +838,8 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
"regex filter to identify parent, see: '--sort parent'"),
OPT_BOOLEAN('x', "exclude-other", &symbol_conf.exclude_other,
"Only display entries with parent-match"),
- OPT_CALLBACK_DEFAULT('g', "call-graph", &report, "output_type,min_percent[,print_limit],call_order",
- "Display callchains using output_type (graph, flat, fractal, or none) , min percent threshold, optional print limit, callchain order, key (function or address). "
+ OPT_CALLBACK_DEFAULT('g', "call-graph", &report, "output_type,min_percent[,print_limit],call_order[,branch]",
+ "Display callchains using output_type (graph, flat, fractal, or none) , min percent threshold, optional print limit, callchain order, key (function or address), add branches. "
"Default: fractal,0.5,callee,function", &parse_callchain_opt, callchain_default_opt),
OPT_INTEGER(0, "max-stack", &report.max_stack,
"Set the maximum stack depth when parsing the callchain, "
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 4f7f989..3d799f2 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -53,6 +53,7 @@ struct callchain_param {
sort_chain_func_t sort;
enum chain_order order;
enum chain_key key;
+ bool branch_callstack;
};

struct callchain_list {
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 84cdb07..a7e538b 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1254,9 +1254,58 @@ struct branch_info *machine__resolve_bstack(struct machine *machine,
return bi;
}

+static int add_callchain_ip(struct machine *machine,
+ struct thread *thread,
+ struct symbol **parent,
+ struct addr_location *root_al,
+ int cpumode,
+ u64 ip)
+{
+ struct addr_location al;
+
+ al.filtered = false;
+ al.sym = NULL;
+ if (cpumode == -1) {
+ int i;
+
+ for (i = 0; i < (int)NCPUMODES && !al.sym; i++) {
+ /*
+ * We cannot use the header.misc hint to determine whether a
+ * branch stack address is user, kernel, guest, hypervisor.
+ * Branches may straddle the kernel/user/hypervisor boundaries.
+ * Thus, we have to try consecutively until we find a match
+ * or else, the symbol is unknown
+ */
+ thread__find_addr_location(thread, machine, cpumodes[i],
+ MAP__FUNCTION,
+ ip, &al);
+ }
+ } else {
+ thread__find_addr_location(thread, machine, cpumode,
+ MAP__FUNCTION, ip, &al);
+ }
+ if (al.sym != NULL) {
+ if (sort__has_parent && !*parent &&
+ symbol__match_regex(al.sym, &parent_regex))
+ *parent = al.sym;
+ else if (have_ignore_callees && root_al &&
+ symbol__match_regex(al.sym, &ignore_callees_regex)) {
+ /* Treat this symbol as the root,
+ forgetting its callees. */
+ *root_al = al;
+ callchain_cursor_reset(&callchain_cursor);
+ }
+ if (!symbol_conf.use_callchain)
+ return -EINVAL;
+ }
+
+ return callchain_cursor_append(&callchain_cursor, ip, al.map, al.sym);
+}
+
static int machine__resolve_callchain_sample(struct machine *machine,
struct thread *thread,
struct ip_callchain *chain,
+ struct branch_stack *branch,
struct symbol **parent,
struct addr_location *root_al,
int max_stack)
@@ -1268,6 +1317,43 @@ static int machine__resolve_callchain_sample(struct machine *machine,

callchain_cursor_reset(&callchain_cursor);

+ /*
+ * Add branches to call stack for easier browsing. This gives
+ * more context for a sample than just the callers.
+ *
+ * This uses individual histograms of paths compared to the
+ * aggregated histograms the normal LBR mode uses.
+ *
+ * Limitations for now:
+ * - No extra filters
+ * - No annotations (should annotate somehow)
+ * - When the sample is near the beginning of the function
+ * we may overlap with the real callstack. Could handle this
+ * case later, by checking against the last ip.
+ */
+
+ if (callchain_param.branch_callstack) {
+ for (i = 0; i < branch->nr; i++) {
+ struct branch_entry *b;
+
+ if (callchain_param.order == ORDER_CALLEE)
+ b = &branch->entries[i];
+ else
+ b = &branch->entries[branch->nr - i - 1];
+
+ err = add_callchain_ip(machine, thread, parent, root_al,
+ -1, b->to);
+ if (!err)
+ err = add_callchain_ip(machine, thread, parent, root_al,
+ -1, b->from);
+ if (err == -EINVAL)
+ break;
+ if (err)
+ return err;
+
+ }
+ }
+
if (chain->nr > PERF_MAX_STACK_DEPTH) {
pr_warning("corrupted callchain. skipping...\n");
return 0;
@@ -1275,7 +1361,6 @@ static int machine__resolve_callchain_sample(struct machine *machine,

for (i = 0; i < chain_nr; i++) {
u64 ip;
- struct addr_location al;

if (callchain_param.order == ORDER_CALLEE)
ip = chain->ips[i];
@@ -1306,26 +1391,10 @@ static int machine__resolve_callchain_sample(struct machine *machine,
continue;
}

- al.filtered = false;
- thread__find_addr_location(thread, machine, cpumode,
- MAP__FUNCTION, ip, &al);
- if (al.sym != NULL) {
- if (sort__has_parent && !*parent &&
- symbol__match_regex(al.sym, &parent_regex))
- *parent = al.sym;
- else if (have_ignore_callees && root_al &&
- symbol__match_regex(al.sym, &ignore_callees_regex)) {
- /* Treat this symbol as the root,
- forgetting its callees. */
- *root_al = al;
- callchain_cursor_reset(&callchain_cursor);
- }
- if (!symbol_conf.use_callchain)
- break;
- }

- err = callchain_cursor_append(&callchain_cursor,
- ip, al.map, al.sym);
+ err = add_callchain_ip(machine, thread, parent, root_al, cpumode, ip);
+ if (err == -EINVAL)
+ break;
if (err)
return err;
}
@@ -1351,7 +1420,9 @@ int machine__resolve_callchain(struct machine *machine,
int ret;

ret = machine__resolve_callchain_sample(machine, thread,
- sample->callchain, parent,
+ sample->callchain,
+ sample->branch_stack,
+ parent,
root_al, max_stack);
if (ret)
return ret;
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 07de8fe..a21436e 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -99,7 +99,8 @@ struct symbol_conf {
annotate_asm_raw,
annotate_src,
event_group,
- demangle;
+ demangle,
+ branch_callstack;
const char *vmlinux_name,
*kallsyms_name,
*source_prefix,
--
1.8.3.1


2014-01-10 12:32:23

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 2/4] perf, tools: Add --branch-call-stack option to report

From: Andi Kleen <[email protected]>

Add a --branch-call-stack option toperf report that changes all
the settings necessary for using the branches in callstacks.

This is just a short cut to make this nicer to use.

Signed-off-by: Andi Kleen <[email protected]>
---
tools/perf/Documentation/perf-report.txt | 5 +++++
tools/perf/builtin-report.c | 25 ++++++++++++++++++++++---
2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 10a2798..77ec0b9 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -223,6 +223,11 @@ OPTIONS
branch stacks and it will automatically switch to the branch view mode,
unless --no-branch-stack is used.

+--branch-call-stack::
+ Add the addresses of sampled taken branches to the callstack.
+ This allows to examine the path the program took to each sample.
+ The data collection must have used -b or -j.
+
--objdump=<path>::
Path to objdump binary.

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index c2e6e43..c39d1ac 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -763,6 +763,16 @@ parse_branch_mode(const struct option *opt __maybe_unused,
}

static int
+parse_branch_call_mode(const struct option *opt __maybe_unused,
+ const char *str __maybe_unused, int unset)
+{
+ int *branch_mode = opt->value;
+
+ *branch_mode = !unset;
+ return 0;
+}
+
+static int
parse_percent_limit(const struct option *opt, const char *str,
int unset __maybe_unused)
{
@@ -777,7 +787,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
struct perf_session *session;
struct stat st;
bool has_br_stack = false;
- int branch_mode = -1;
+ int branch_mode = -1, branch_call_mode = -1;
int ret = -1;
char callchain_default_opt[] = "fractal,0.5,callee";
const char * const report_usage[] = {
@@ -883,7 +893,10 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
OPT_BOOLEAN(0, "group", &symbol_conf.event_group,
"Show event group information together"),
OPT_CALLBACK_NOOPT('b', "branch-stack", &branch_mode, "",
- "use branch records for histogram filling", parse_branch_mode),
+ "use branch records for per branch histogram filling", parse_branch_mode),
+ OPT_CALLBACK_NOOPT(0, "branch-call-stack", &branch_call_mode, "",
+ "add last branch records to call stack",
+ parse_branch_call_mode),
OPT_STRING(0, "objdump", &objdump_path, "path",
"objdump binary to use for disassembly and annotations"),
OPT_BOOLEAN(0, "demangle", &symbol_conf.demangle,
@@ -931,8 +944,14 @@ repeat:
has_br_stack = perf_header__has_feat(&session->header,
HEADER_BRANCH_STACK);

- if (branch_mode == -1 && has_br_stack)
+ if (branch_mode == -1 && has_br_stack && branch_call_mode == -1)
sort__mode = SORT_MODE__BRANCH;
+ if (branch_call_mode != -1) {
+ callchain_param.branch_callstack = 1;
+ callchain_param.key = CCKEY_ADDRESS;
+ symbol_conf.use_callchain = true;
+ callchain_register_param(&callchain_param);
+ }

/* sort__mode could be NORMAL if --no-branch-stack */
if (sort__mode == SORT_MODE__BRANCH) {
--
1.8.3.1

2014-01-10 12:32:26

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 3/4] perf, tools: Filter out small loops from LBR-as-call-stack

From: Andi Kleen <[email protected]>

Small loops can cause unnecessary duplication in the LBR-as-callstack,
because the loop body appears multiple times. Filter out duplications
from the LBR before unifying it into the histories. This way the
same loop body only appears once.

This uses a simple hash based cycle detector. It takes some short
cuts (not handling hash collisions) so in rare cases duplicates may
be missed.

Signed-off-by: Andi Kleen <[email protected]>
---
tools/perf/util/machine.c | 73 ++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 62 insertions(+), 11 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index a7e538b..0fb4e9a 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -10,6 +10,7 @@
#include "thread.h"
#include <stdbool.h>
#include "unwind.h"
+#include "linux/hash.h"

int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
{
@@ -1302,6 +1303,46 @@ static int add_callchain_ip(struct machine *machine,
return callchain_cursor_append(&callchain_cursor, ip, al.map, al.sym);
}

+#define CHASHSZ 127
+#define CHASHBITS 7
+#define NO_ENTRY 0xff
+
+#define PERF_MAX_BRANCH_DEPTH 127
+
+/* Remove loops. */
+static int remove_loops(struct branch_entry *l, int nr)
+{
+ int i, j, off;
+ unsigned char chash[CHASHSZ];
+ memset(chash, -1, sizeof(chash));
+
+ BUG_ON(nr >= 256);
+ for (i = 0; i < nr; i++) {
+ int h = hash_64(l[i].from, CHASHBITS) % CHASHSZ;
+
+ /* no collision handling for now */
+ if (chash[h] == NO_ENTRY) {
+ chash[h] = i;
+ } else if (l[chash[h]].from == l[i].from) {
+ bool is_loop = true;
+ /* check if it is a real loop */
+ off = 0;
+ for (j = chash[h]; j < i && i + off < nr; j++, off++)
+ if (l[j].from != l[i + off].from) {
+ is_loop = false;
+ break;
+ }
+ if (is_loop) {
+ memmove(l + i, l + i + off,
+ (nr - (i + off))
+ * sizeof(struct branch_entry));
+ nr -= off;
+ }
+ }
+ }
+ return nr;
+}
+
static int machine__resolve_callchain_sample(struct machine *machine,
struct thread *thread,
struct ip_callchain *chain,
@@ -1328,29 +1369,39 @@ static int machine__resolve_callchain_sample(struct machine *machine,
* - No extra filters
* - No annotations (should annotate somehow)
* - When the sample is near the beginning of the function
- * we may overlap with the real callstack. Could handle this
- * case later, by checking against the last ip.
+ * we may overlap with the real callstack.
*/

+ if (branch->nr > PERF_MAX_BRANCH_DEPTH) {
+ pr_warning("corrupted branch chain. skipping...\n");
+ return 0;
+ }
+
if (callchain_param.branch_callstack) {
- for (i = 0; i < branch->nr; i++) {
- struct branch_entry *b;
+ int nr = branch->nr;
+ struct branch_entry be[nr];

+ for (i = 0; i < nr; i++) {
if (callchain_param.order == ORDER_CALLEE)
- b = &branch->entries[i];
+ be[i] = branch->entries[i];
else
- b = &branch->entries[branch->nr - i - 1];
+ be[i] = branch->entries[branch->nr - i - 1];
+ }

- err = add_callchain_ip(machine, thread, parent, root_al,
- -1, b->to);
+ nr = remove_loops(be, nr);
+
+ for (i = 0; i < nr; i++) {
+ err = add_callchain_ip(machine, thread, parent,
+ root_al,
+ -1, be[i].to);
if (!err)
- err = add_callchain_ip(machine, thread, parent, root_al,
- -1, b->from);
+ err = add_callchain_ip(machine, thread,
+ parent, root_al,
+ -1, be[i].from);
if (err == -EINVAL)
break;
if (err)
return err;
-
}
}

--
1.8.3.1

2014-01-10 12:32:40

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 4/4] perf, tools: Enable printing the srcline in the history

From: Andi Kleen <[email protected]>

For lbr-as-callgraph we need to see the line number in the history,
because many LBR entries can be in a single function, and just
showing the same function name many times is not useful.

When the history code is configured to sort by address, also try to
resolve the address to a file:srcline and display this in the browser.
If that doesn't work still display the address.

This can be also useful without LBRs for understanding which call in a large
function (or in which inlined function) called something else.

Contains fixes from Namhyung Kim

Signed-off-by: Andi Kleen <[email protected]>
---
tools/perf/ui/browsers/hists.c | 15 ++++++++++++---
tools/perf/ui/stdio/hist.c | 16 +++++++++++++---
tools/perf/util/callchain.h | 1 +
tools/perf/util/machine.c | 2 +-
tools/perf/util/srcline.c | 8 +++++++-
5 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index a440e03..5e0688b 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -399,9 +399,18 @@ static char *callchain_list__sym_name(struct callchain_list *cl,
{
int printed;

- if (cl->ms.sym)
- printed = scnprintf(bf, bfsize, "%s", cl->ms.sym->name);
- else
+ if (cl->ms.sym) {
+ if (callchain_param.key == CCKEY_ADDRESS &&
+ cl->ms.map && !cl->srcline)
+ cl->srcline = get_srcline(cl->ms.map->dso,
+ map__rip_2objdump(cl->ms.map,
+ cl->ip));
+ if (cl->srcline)
+ printed = scnprintf(bf, bfsize, "%s %s",
+ cl->ms.sym->name, cl->srcline);
+ else
+ printed = scnprintf(bf, bfsize, "%s", cl->ms.sym->name);
+ } else
printed = scnprintf(bf, bfsize, "%#" PRIx64, cl->ip);

if (show_dso)
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index c244cb5..eea2af2 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -56,9 +56,19 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_list *chain,
} else
ret += fprintf(fp, "%s", " ");
}
- if (chain->ms.sym)
- ret += fprintf(fp, "%s\n", chain->ms.sym->name);
- else
+ if (chain->ms.sym) {
+ if (callchain_param.key == CCKEY_ADDRESS &&
+ chain->ms.map)
+ chain->srcline = get_srcline(chain->ms.map->dso,
+ map__rip_2objdump(
+ chain->ms.map,
+ chain->ip));
+ if (chain->srcline)
+ ret += fprintf(fp, "%s %s\n",
+ chain->ms.sym->name, chain->srcline);
+ else
+ ret += fprintf(fp, "%s\n", chain->ms.sym->name);
+ } else
ret += fprintf(fp, "0x%0" PRIx64 "\n", chain->ip);

return ret;
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 3d799f2..70bb29b 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -59,6 +59,7 @@ struct callchain_param {
struct callchain_list {
u64 ip;
struct map_symbol ms;
+ char *srcline;
struct list_head list;
};

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 0fb4e9a..14437af 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1300,7 +1300,7 @@ static int add_callchain_ip(struct machine *machine,
return -EINVAL;
}

- return callchain_cursor_append(&callchain_cursor, ip, al.map, al.sym);
+ return callchain_cursor_append(&callchain_cursor, al.addr, al.map, al.sym);
}

#define CHASHSZ 127
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index d11aefb..65c402d 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -230,7 +230,7 @@ char *get_srcline(struct dso *dso, unsigned long addr)
size_t size;

if (!dso->has_srcline)
- return SRCLINE_UNKNOWN;
+ goto out;

if (dso_name[0] == '[')
goto out;
@@ -255,6 +255,12 @@ char *get_srcline(struct dso *dso, unsigned long addr)

out:
dso->has_srcline = 0;
+ size = snprintf(NULL, 0, "%s[%lx]", dso->short_name, addr) + 1;
+ srcline = malloc(size);
+ if (srcline) {
+ snprintf(srcline, size, "%s[%lx]", dso->short_name, addr);
+ return srcline;
+ }
return SRCLINE_UNKNOWN;
}

--
1.8.3.1

2014-01-11 15:37:00

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 1/4] perf, tools: Add support for prepending LBRs to the callstack

On Fri, Jan 10, 2014 at 04:32:03AM -0800, Andi Kleen wrote:
> From: Andi Kleen <[email protected]>
>
> I never found the default LBR display mode which generates histograms
> of individual branches particularly useful.
>
> This implements an alternative mode that creates histograms over complete
> branch traces, instead of individual branches, similar to how normal
> callgraphs are handled. This is done by putting it in
> front of the normal callgraph and then using the normal callgraph
> histogram infrastructure to unify them.
>
> This way in complex functions we can understand the control flow
> that lead to a particular sample.
>
> The default output is unchanged.
>
> This is only implemented in perf report, no change to record
> or anywhere else.
>
> This adds the basic code to report:
> - add a new "branch" option to the -g option parser to enable this mode
> - when the flag is set include the LBR into the callstack in machine.c.
> The rest of the history code is unchanged and doesn't know the difference
> between LBR entry and normal call entry.

sounds like nice idea, but I could not get the patchset applied
on acme's perf/core

jirka

2014-01-11 17:58:21

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/4] perf, tools: Add support for prepending LBRs to the callstack

On Sat, Jan 11, 2014 at 04:36:14PM +0100, Jiri Olsa wrote:
> On Fri, Jan 10, 2014 at 04:32:03AM -0800, Andi Kleen wrote:
> > From: Andi Kleen <[email protected]>
> >
> > I never found the default LBR display mode which generates histograms
> > of individual branches particularly useful.
> >
> > This implements an alternative mode that creates histograms over complete
> > branch traces, instead of individual branches, similar to how normal
> > callgraphs are handled. This is done by putting it in
> > front of the normal callgraph and then using the normal callgraph
> > histogram infrastructure to unify them.
> >
> > This way in complex functions we can understand the control flow
> > that lead to a particular sample.
> >
> > The default output is unchanged.
> >
> > This is only implemented in perf report, no change to record
> > or anywhere else.
> >
> > This adds the basic code to report:
> > - add a new "branch" option to the -g option parser to enable this mode
> > - when the flag is set include the LBR into the callstack in machine.c.
> > The rest of the history code is unchanged and doesn't know the difference
> > between LBR entry and normal call entry.
>
> sounds like nice idea, but I could not get the patchset applied
> on acme's perf/core

It was on Linus master.

I tried to rebase on perf/core, but it seems to be totally broken by
itself. All the config tests fail on my opensuse system.

Arnaldo?

Auto-detecting system features:
... backtrace: [ OFF ]
... dwarf: [ OFF ]
... fortify-source: [ OFF ]
... glibc: [ OFF ]
... gtk2: [ OFF ]
... gtk2-infobar: [ OFF ]
... libaudit: [ OFF ]
... libbfd: [ OFF ]
... libelf: [ OFF ]
... libelf-getphdrnum: [ OFF ]
... libelf-mmap: [ OFF ]
... libnuma: [ OFF ]
... libperl: [ OFF ]
... libpython: [ OFF ]
... libpython-version: [ OFF ]
... libslang: [ OFF ]
... libunwind: [ OFF ]
... on-exit: [ OFF ]
... stackprotector-all: [ OFF ]
... timerfd: [ OFF ]

config/Makefile:282: *** No gnu/libc-version.h found, please install
glibc-dev[el]/glibc-static. Stop.
make: *** [all] Error 2

-Andi

2014-01-11 19:19:13

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 1/4] perf, tools: Add support for prepending LBRs to the callstack

Em Sat, Jan 11, 2014 at 04:16:57PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Sat, Jan 11, 2014 at 06:58:16PM +0100, Andi Kleen escreveu:
> > On Sat, Jan 11, 2014 at 04:36:14PM +0100, Jiri Olsa wrote:
> > > On Fri, Jan 10, 2014 at 04:32:03AM -0800, Andi Kleen wrote:
> > > > From: Andi Kleen <[email protected]>
> > > >
> > > > I never found the default LBR display mode which generates histograms
> > > > of individual branches particularly useful.
> > > >
> > > > This implements an alternative mode that creates histograms over complete
> > > > branch traces, instead of individual branches, similar to how normal
> > > > callgraphs are handled. This is done by putting it in
> > > > front of the normal callgraph and then using the normal callgraph
> > > > histogram infrastructure to unify them.
> > > >
> > > > This way in complex functions we can understand the control flow
> > > > that lead to a particular sample.
> > > >
> > > > The default output is unchanged.
> > > >
> > > > This is only implemented in perf report, no change to record
> > > > or anywhere else.
> > > >
> > > > This adds the basic code to report:
> > > > - add a new "branch" option to the -g option parser to enable this mode
> > > > - when the flag is set include the LBR into the callstack in machine.c.
> > > > The rest of the history code is unchanged and doesn't know the difference
> > > > between LBR entry and normal call entry.
> > >
> > > sounds like nice idea, but I could not get the patchset applied
> > > on acme's perf/core
> >
> > It was on Linus master.
> >
> > I tried to rebase on perf/core, but it seems to be totally broken by
> > itself. All the config tests fail on my opensuse system.
> >
> > Arnaldo?
>
> Oops, checking on some systems...

What was your build command line?

Here, on a f18 system it works with these:

$ make -C tools/perf O=/tmp/build/perf install

$ cd tools/perf ; make

Trying on another system...

- Arnaldo

> > Auto-detecting system features:
> > ... backtrace: [ OFF ]
> > ... dwarf: [ OFF ]
> > ... fortify-source: [ OFF ]
> > ... glibc: [ OFF ]
> > ... gtk2: [ OFF ]
> > ... gtk2-infobar: [ OFF ]
> > ... libaudit: [ OFF ]
> > ... libbfd: [ OFF ]
> > ... libelf: [ OFF ]
> > ... libelf-getphdrnum: [ OFF ]
> > ... libelf-mmap: [ OFF ]
> > ... libnuma: [ OFF ]
> > ... libperl: [ OFF ]
> > ... libpython: [ OFF ]
> > ... libpython-version: [ OFF ]
> > ... libslang: [ OFF ]
> > ... libunwind: [ OFF ]
> > ... on-exit: [ OFF ]
> > ... stackprotector-all: [ OFF ]
> > ... timerfd: [ OFF ]
> >
> > config/Makefile:282: *** No gnu/libc-version.h found, please install
> > glibc-dev[el]/glibc-static. Stop.
> > make: *** [all] Error 2
> >
> > -Andi

2014-01-11 19:23:53

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 1/4] perf, tools: Add support for prepending LBRs to the callstack

Em Sat, Jan 11, 2014 at 06:58:16PM +0100, Andi Kleen escreveu:
> On Sat, Jan 11, 2014 at 04:36:14PM +0100, Jiri Olsa wrote:
> > On Fri, Jan 10, 2014 at 04:32:03AM -0800, Andi Kleen wrote:
> > > From: Andi Kleen <[email protected]>
> > >
> > > I never found the default LBR display mode which generates histograms
> > > of individual branches particularly useful.
> > >
> > > This implements an alternative mode that creates histograms over complete
> > > branch traces, instead of individual branches, similar to how normal
> > > callgraphs are handled. This is done by putting it in
> > > front of the normal callgraph and then using the normal callgraph
> > > histogram infrastructure to unify them.
> > >
> > > This way in complex functions we can understand the control flow
> > > that lead to a particular sample.
> > >
> > > The default output is unchanged.
> > >
> > > This is only implemented in perf report, no change to record
> > > or anywhere else.
> > >
> > > This adds the basic code to report:
> > > - add a new "branch" option to the -g option parser to enable this mode
> > > - when the flag is set include the LBR into the callstack in machine.c.
> > > The rest of the history code is unchanged and doesn't know the difference
> > > between LBR entry and normal call entry.
> >
> > sounds like nice idea, but I could not get the patchset applied
> > on acme's perf/core
>
> It was on Linus master.
>
> I tried to rebase on perf/core, but it seems to be totally broken by
> itself. All the config tests fail on my opensuse system.
>
> Arnaldo?

Oops, checking on some systems...

> Auto-detecting system features:
> ... backtrace: [ OFF ]
> ... dwarf: [ OFF ]
> ... fortify-source: [ OFF ]
> ... glibc: [ OFF ]
> ... gtk2: [ OFF ]
> ... gtk2-infobar: [ OFF ]
> ... libaudit: [ OFF ]
> ... libbfd: [ OFF ]
> ... libelf: [ OFF ]
> ... libelf-getphdrnum: [ OFF ]
> ... libelf-mmap: [ OFF ]
> ... libnuma: [ OFF ]
> ... libperl: [ OFF ]
> ... libpython: [ OFF ]
> ... libpython-version: [ OFF ]
> ... libslang: [ OFF ]
> ... libunwind: [ OFF ]
> ... on-exit: [ OFF ]
> ... stackprotector-all: [ OFF ]
> ... timerfd: [ OFF ]
>
> config/Makefile:282: *** No gnu/libc-version.h found, please install
> glibc-dev[el]/glibc-static. Stop.
> make: *** [all] Error 2
>
> -Andi

2014-01-11 19:30:59

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/4] perf, tools: Add support for prepending LBRs to the callstack

> What was your build command line?
>
> Here, on a f18 system it works with these:
>
> $ make -C tools/perf O=/tmp/build/perf install
>
> $ cd tools/perf ; make
>
> Trying on another system...

Sorry for the false alarm. It looks like it was a problem on my side.
Works now.

-andi