2016-10-19 16:23:13

by Jin Yao

[permalink] [raw]
Subject: [PATCH v2 0/6] perf report: Show branch flags/cycles in --branch-history callgraph view

v2: Just a rebase to Arnaldo's perf/core branch, no functional changes.

Initial post

perf record -g -b ...
perf report --branch-history

Currently it only shows the branches from the LBR in the callgraph view.
It would be useful to annotate branch predictions and TSX aborts and
also timed LBR cycles also in the callgraph view.

This would allow a quick overview where branch predictions are and how
costly basic blocks are.

For example:

Overhead Source:Line Symbol Shared Object Predicted Abort Cycles
........ ............................................ ......... .............. ......... ..... ......

38.25% div.c:45 [.] main div 97.6% 0.0% 3
|
---main div.c:42 (cycles:2)
compute_flag div.c:28 (cycles:2)
compute_flag div.c:27 (cycles:1)
rand rand.c:28 (cycles:1)
rand rand.c:28 (cycles:1)
__random random.c:298 (cycles:1)
__random random.c:297 (cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (cycles:9)
|
|--36.73%--__random_r random_r.c:392 (cycles:9)
| __random_r random_r.c:357 (cycles:1)
| __random random.c:293 (cycles:1)
| __random random.c:293 (cycles:1)
| __random random.c:291 (cycles:1)
| __random random.c:291 (cycles:1)
| __random random.c:291 (cycles:1)
| __random random.c:288 (cycles:1)
| rand rand.c:27 (cycles:1)
| rand rand.c:26 (cycles:1)
| rand@plt +4194304 (cycles:1)
| rand@plt +4194304 (cycles:1)
| compute_flag div.c:25 (cycles:1)
| compute_flag div.c:22 (cycles:1)
| main div.c:40 (cycles:1)
| main div.c:40 (cycles:16)
| main div.c:39 (cycles:16)
| |
| |--29.93%--main div.c:39 (predicted:50.6%, cycles:1)
| | main div.c:44 (predicted:50.6%, cycles:1)
| | |
| | --22.69%--main div.c:42 (cycles:2)

Predicted is hide in callchain entry if the branch is 100% predicted.
Abort is hide in callchain entry if the branch is 0 aborted.

Now stdio and browser modes are both supported.

Jin Yao (6):
perf report: Add branch flag to callchain cursor node
perf report: Caculate and return the branch counting in callchain
perf report: Create a symbol_conf flag for showing branch flag
counting
perf report: Show branch info in callchain entry for stdio mode
perf report: Show branch info in callchain entry for browser mode
perf report: Display columns Predicted/Abort/Cycles in
--branch-history

tools/perf/Documentation/perf-report.txt | 8 ++
tools/perf/builtin-report.c | 9 +-
tools/perf/ui/browsers/hists.c | 15 ++-
tools/perf/ui/stdio/hist.c | 30 +++++-
tools/perf/util/callchain.c | 176 ++++++++++++++++++++++++++++++-
tools/perf/util/callchain.h | 16 ++-
tools/perf/util/hist.c | 3 +
tools/perf/util/hist.h | 3 +
tools/perf/util/machine.c | 56 +++++++---
tools/perf/util/sort.c | 117 +++++++++++++++++++-
tools/perf/util/sort.h | 3 +
tools/perf/util/symbol.h | 1 +
12 files changed, 411 insertions(+), 26 deletions(-)

--
2.7.4


2016-10-19 14:14:57

by Jin Yao

[permalink] [raw]
Subject: [PATCH v2 6/6] perf report: Display columns Predicted/Abort/Cycles in --branch-history

Use current sort mechanism but the real .se_cmp() just returns 0 so
that new columns "Predicted", "Abort" and Cycles are created in display
but actually these keys are not the sort keys.

For example:

Overhead Source:Line Symbol Shared Object Predicted Abort Cycles
........ ............ ........ ............. ......... ..... ......

38.25% div.c:45 [.] main div 97.6% 0.0% 3

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/Documentation/perf-report.txt | 8 +++
tools/perf/builtin-report.c | 6 +-
tools/perf/util/hist.c | 3 +
tools/perf/util/hist.h | 3 +
tools/perf/util/sort.c | 117 ++++++++++++++++++++++++++++++-
tools/perf/util/sort.h | 3 +
6 files changed, 138 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 2d17462..bb927cb 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -335,6 +335,14 @@ OPTIONS
--branch-history::
Add the addresses of sampled taken branches to the callstack.
This allows to examine the path the program took to each sample.
+
+ Also show with some branch flags that can be:
+ - Predicted: display the average percentage of predicated branches.
+ (predicated number / total number)
+ - Abort: display the average percentage of abort branches.
+ (abort number /total number)
+ - Cycles: cycles in basic block.
+
The data collection must have used -b (or -j) and -g.

--objdump=<path>::
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index c406393..df83ea4 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -664,6 +664,10 @@ const char report_callchain_help[] = "Display call graph (stack chain/backtrace)
CALLCHAIN_REPORT_HELP
"\n\t\t\t\tDefault: " CALLCHAIN_DEFAULT_OPT;

+#define CALLCHAIN_BRANCH_SORT_ORDER \
+ "srcline,symbol,dso,callchain_branch_predicted," \
+ "callchain_branch_abort,callchain_branch_cycles"
+
int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
{
struct perf_session *session;
@@ -924,7 +928,7 @@ repeat:
symbol_conf.use_callchain = true;
callchain_register_param(&callchain_param);
if (sort_order == NULL)
- sort_order = "srcline,symbol,dso";
+ sort_order = CALLCHAIN_BRANCH_SORT_ORDER;
}

if (report.mem_mode) {
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index e1be413..2470fff 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -176,6 +176,9 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
hists__new_col_len(hists, HISTC_MEM_LVL, 21 + 3);
hists__new_col_len(hists, HISTC_LOCAL_WEIGHT, 12);
hists__new_col_len(hists, HISTC_GLOBAL_WEIGHT, 12);
+ hists__new_col_len(hists, HISTC_CALLCHAIN_BRANCH_PREDICTED, 9);
+ hists__new_col_len(hists, HISTC_CALLCHAIN_BRANCH_ABORT, 5);
+ hists__new_col_len(hists, HISTC_CALLCHAIN_BRANCH_CYCLES, 6);

if (h->srcline) {
len = MAX(strlen(h->srcline), strlen(sort_srcline.se_header));
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index d4b6514..74e1dd4 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -57,6 +57,9 @@ enum hist_column {
HISTC_SRCLINE_FROM,
HISTC_SRCLINE_TO,
HISTC_TRACE,
+ HISTC_CALLCHAIN_BRANCH_PREDICTED,
+ HISTC_CALLCHAIN_BRANCH_ABORT,
+ HISTC_CALLCHAIN_BRANCH_CYCLES,
HISTC_NR_COLS, /* Last entry */
};

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index df622f4..e47a984 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -435,6 +435,106 @@ struct sort_entry sort_srcline_to = {
.se_width_idx = HISTC_SRCLINE_TO,
};

+/* --sort callchain_branch_predicted */
+
+static int64_t
+sort__callchain_branch_predicted_cmp(struct hist_entry *left __maybe_unused,
+ struct hist_entry *right __maybe_unused)
+{
+ return 0;
+}
+
+static int hist_entry__callchain_branch_predicted_snprintf(
+ struct hist_entry *he, char *bf, size_t size, unsigned int width)
+{
+ u64 branch_count, predicted_count;
+ double percent = 0.0;
+ char str[32];
+
+ callchain_branch_counts(he->callchain, &branch_count,
+ &predicted_count, NULL, NULL);
+
+ if (branch_count)
+ percent = predicted_count * 100.0 / branch_count;
+
+ snprintf(str, sizeof(str), "%.1f%%", percent);
+ return repsep_snprintf(bf, size, "%-*.*s", width, width, str);
+}
+
+struct sort_entry sort_callchain_branch_predicted = {
+ .se_header = "Predicted",
+ .se_cmp = sort__callchain_branch_predicted_cmp,
+ .se_snprintf = hist_entry__callchain_branch_predicted_snprintf,
+ .se_width_idx = HISTC_CALLCHAIN_BRANCH_PREDICTED,
+};
+
+/* --sort callchain_branch_abort */
+
+static int64_t
+sort__callchain_branch_abort_cmp(struct hist_entry *left __maybe_unused,
+ struct hist_entry *right __maybe_unused)
+{
+ return 0;
+}
+
+static int hist_entry__callchain_branch_abort_snprintf(struct hist_entry *he,
+ char *bf, size_t size,
+ unsigned int width)
+{
+ u64 branch_count, abort_count;
+ double percent = 0.0;
+ char str[32];
+
+ callchain_branch_counts(he->callchain, &branch_count,
+ NULL, &abort_count, NULL);
+
+ if (branch_count)
+ percent = abort_count * 100.0 / branch_count;
+
+ snprintf(str, sizeof(str), "%.1f%%", percent);
+ return repsep_snprintf(bf, size, "%-*.*s", width, width, str);
+}
+
+struct sort_entry sort_callchain_branch_abort = {
+ .se_header = "Abort",
+ .se_cmp = sort__callchain_branch_abort_cmp,
+ .se_snprintf = hist_entry__callchain_branch_abort_snprintf,
+ .se_width_idx = HISTC_CALLCHAIN_BRANCH_ABORT,
+};
+
+/* --sort callchain_branch_cycles */
+
+static int64_t
+sort__callchain_branch_cycles_cmp(struct hist_entry *left __maybe_unused,
+ struct hist_entry *right __maybe_unused)
+{
+ return 0;
+}
+
+static int hist_entry__callchain_branch_cycles_snprintf(struct hist_entry *he,
+ char *bf, size_t size,
+ unsigned int width)
+{
+ u64 branch_count, cycles_count, cycles = 0;
+ char str[32];
+
+ callchain_branch_counts(he->callchain, &branch_count,
+ NULL, NULL, &cycles_count);
+
+ if (branch_count)
+ cycles = cycles_count / branch_count;
+
+ snprintf(str, sizeof(str), "%" PRId64 "", cycles);
+ return repsep_snprintf(bf, size, "%-*.*s", width, width, str);
+}
+
+struct sort_entry sort_callchain_branch_cycles = {
+ .se_header = "Cycles",
+ .se_cmp = sort__callchain_branch_cycles_cmp,
+ .se_snprintf = hist_entry__callchain_branch_cycles_snprintf,
+ .se_width_idx = HISTC_CALLCHAIN_BRANCH_CYCLES,
+};
+
/* --sort srcfile */

static char no_srcfile[1];
@@ -1435,6 +1535,15 @@ static struct sort_dimension bstack_sort_dimensions[] = {
DIM(SORT_CYCLES, "cycles", sort_cycles),
DIM(SORT_SRCLINE_FROM, "srcline_from", sort_srcline_from),
DIM(SORT_SRCLINE_TO, "srcline_to", sort_srcline_to),
+ DIM(SORT_CALLCHAIN_BRANCH_PREDICTED,
+ "callchain_branch_predicted",
+ sort_callchain_branch_predicted),
+ DIM(SORT_CALLCHAIN_BRANCH_ABORT,
+ "callchain_branch_abort",
+ sort_callchain_branch_abort),
+ DIM(SORT_CALLCHAIN_BRANCH_CYCLES,
+ "callchain_branch_cycles",
+ sort_callchain_branch_cycles),
};

#undef DIM
@@ -2369,7 +2478,13 @@ int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
if (strncasecmp(tok, sd->name, strlen(tok)))
continue;

- if (sort__mode != SORT_MODE__BRANCH)
+ if ((sort__mode != SORT_MODE__BRANCH) &&
+ strncasecmp(tok, "callchain_branch_predicted",
+ strlen(tok)) &&
+ strncasecmp(tok, "callchain_branch_abort",
+ strlen(tok)) &&
+ strncasecmp(tok, "callchain_branch_cycles",
+ strlen(tok)))
return -EINVAL;

if (sd->entry == &sort_sym_from || sd->entry == &sort_sym_to)
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 7aff317..30c6e97 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -224,6 +224,9 @@ enum sort_type {
SORT_CYCLES,
SORT_SRCLINE_FROM,
SORT_SRCLINE_TO,
+ SORT_CALLCHAIN_BRANCH_PREDICTED,
+ SORT_CALLCHAIN_BRANCH_ABORT,
+ SORT_CALLCHAIN_BRANCH_CYCLES,

/* memory mode specific sort keys */
__SORT_MEMORY_MODE,
--
2.7.4

2016-10-19 16:24:00

by Jin Yao

[permalink] [raw]
Subject: [PATCH v2 1/6] perf report: Add branch flag to callchain cursor node

Since the branch ip has been added to call stack for easier browsing,
this patch adds more branch information. For example, add a flag to
indicate if this ip is a branch, and also add with the branch flag.

Then we can know if the cursor node represents a branch and know
what the branch flag it has.

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/util/callchain.c | 11 +++++++--
tools/perf/util/callchain.h | 5 +++-
tools/perf/util/machine.c | 56 +++++++++++++++++++++++++++++++++------------
3 files changed, 55 insertions(+), 17 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 07fd30b..342ef20 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -730,7 +730,8 @@ merge_chain_branch(struct callchain_cursor *cursor,

list_for_each_entry_safe(list, next_list, &src->val, list) {
callchain_cursor_append(cursor, list->ip,
- list->ms.map, list->ms.sym);
+ list->ms.map, list->ms.sym,
+ false, NULL);
list_del(&list->list);
free(list);
}
@@ -767,7 +768,8 @@ int callchain_merge(struct callchain_cursor *cursor,
}

int callchain_cursor_append(struct callchain_cursor *cursor,
- u64 ip, struct map *map, struct symbol *sym)
+ u64 ip, struct map *map, struct symbol *sym,
+ bool branch, struct branch_flags *flags)
{
struct callchain_cursor_node *node = *cursor->last;

@@ -782,6 +784,11 @@ int callchain_cursor_append(struct callchain_cursor *cursor,
node->ip = ip;
node->map = map;
node->sym = sym;
+ node->branch = branch;
+
+ if (flags)
+ memcpy(&node->branch_flags, flags,
+ sizeof(struct branch_flags));

cursor->nr++;

diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 13e7554..40ecf25 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -129,6 +129,8 @@ struct callchain_cursor_node {
u64 ip;
struct map *map;
struct symbol *sym;
+ bool branch;
+ struct branch_flags branch_flags;
struct callchain_cursor_node *next;
};

@@ -183,7 +185,8 @@ static inline void callchain_cursor_reset(struct callchain_cursor *cursor)
}

int callchain_cursor_append(struct callchain_cursor *cursor, u64 ip,
- struct map *map, struct symbol *sym);
+ struct map *map, struct symbol *sym,
+ bool branch, struct branch_flags *flags);

/* Close a cursor writing session. Initialize for the reader */
static inline void callchain_cursor_commit(struct callchain_cursor *cursor)
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index df85b9e..c2d9d9f 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1616,7 +1616,9 @@ static int add_callchain_ip(struct thread *thread,
struct symbol **parent,
struct addr_location *root_al,
u8 *cpumode,
- u64 ip)
+ u64 ip,
+ bool branch,
+ struct branch_flags *flags)
{
struct addr_location al;

@@ -1668,7 +1670,8 @@ static int add_callchain_ip(struct thread *thread,

if (symbol_conf.hide_unresolved && al.sym == NULL)
return 0;
- return callchain_cursor_append(cursor, al.addr, al.map, al.sym);
+ return callchain_cursor_append(cursor, al.addr, al.map, al.sym,
+ branch, flags);
}

struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
@@ -1757,7 +1760,9 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
/* LBR only affects the user callchain */
if (i != chain_nr) {
struct branch_stack *lbr_stack = sample->branch_stack;
- int lbr_nr = lbr_stack->nr, j;
+ int lbr_nr = lbr_stack->nr, j, k;
+ bool branch;
+ struct branch_flags *flags;
/*
* LBR callstack can only get user call chain.
* The mix_chain_nr is kernel call chain
@@ -1772,23 +1777,41 @@ static int resolve_lbr_callchain_sample(struct thread *thread,

for (j = 0; j < mix_chain_nr; j++) {
int err;
+ branch = false;
+ flags = NULL;
+
if (callchain_param.order == ORDER_CALLEE) {
if (j < i + 1)
ip = chain->ips[j];
- else if (j > i + 1)
- ip = lbr_stack->entries[j - i - 2].from;
- else
+ else if (j > i + 1) {
+ k = j - i - 2;
+ ip = lbr_stack->entries[k].from;
+ branch = true;
+ flags = &lbr_stack->entries[k].flags;
+ } else {
ip = lbr_stack->entries[0].to;
+ branch = true;
+ flags = &lbr_stack->entries[0].flags;
+ }
} else {
- if (j < lbr_nr)
- ip = lbr_stack->entries[lbr_nr - j - 1].from;
+ if (j < lbr_nr) {
+ k = lbr_nr - j - 1;
+ ip = lbr_stack->entries[k].from;
+ branch = true;
+ flags = &lbr_stack->entries[k].flags;
+ }
else if (j > lbr_nr)
ip = chain->ips[i + 1 - (j - lbr_nr)];
- else
+ else {
ip = lbr_stack->entries[0].to;
+ branch = true;
+ flags = &lbr_stack->entries[0].flags;
+ }
}

- err = add_callchain_ip(thread, cursor, parent, root_al, &cpumode, ip);
+ err = add_callchain_ip(thread, cursor, parent,
+ root_al, &cpumode, ip,
+ branch, flags);
if (err)
return (err < 0) ? err : 0;
}
@@ -1872,10 +1895,12 @@ static int thread__resolve_callchain_sample(struct thread *thread,

for (i = 0; i < nr; i++) {
err = add_callchain_ip(thread, cursor, parent, root_al,
- NULL, be[i].to);
+ NULL, be[i].to,
+ true, &be[i].flags);
if (!err)
err = add_callchain_ip(thread, cursor, parent, root_al,
- NULL, be[i].from);
+ NULL, be[i].from,
+ true, &be[i].flags);
if (err == -EINVAL)
break;
if (err)
@@ -1903,7 +1928,9 @@ check_calls:
if (ip < PERF_CONTEXT_MAX)
++nr_entries;

- err = add_callchain_ip(thread, cursor, parent, root_al, &cpumode, ip);
+ err = add_callchain_ip(thread, cursor, parent,
+ root_al, &cpumode, ip,
+ false, NULL);

if (err)
return (err < 0) ? err : 0;
@@ -1919,7 +1946,8 @@ static int unwind_entry(struct unwind_entry *entry, void *arg)
if (symbol_conf.hide_unresolved && entry->sym == NULL)
return 0;
return callchain_cursor_append(cursor, entry->ip,
- entry->map, entry->sym);
+ entry->map, entry->sym,
+ false, NULL);
}

static int thread__resolve_callchain_unwind(struct thread *thread,
--
2.7.4

2016-10-19 16:23:58

by Jin Yao

[permalink] [raw]
Subject: [PATCH v2 3/6] perf report: Create a symbol_conf flag for showing branch flag counting

Create a new flag show_branchflag_count in symbol_conf. The flag is used
to control if showing the branch flag counting information. The flag
depends on if the perf.data has branch data and if user chooses the
"branch-history" option in perf report command line.

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/builtin-report.c | 3 +++
tools/perf/util/symbol.h | 1 +
2 files changed, 4 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 6e88460..c406393 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -905,6 +905,9 @@ repeat:
if (itrace_synth_opts.last_branch)
has_br_stack = true;

+ if (has_br_stack && branch_call_mode)
+ symbol_conf.show_branchflag_count = true;
+
/*
* Branch mode is a tristate:
* -1 means default, so decide based on the file having branch data.
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index d964844..2d0a905 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -100,6 +100,7 @@ struct symbol_conf {
show_total_period,
use_callchain,
cumulate_callchain,
+ show_branchflag_count,
exclude_other,
show_cpu_utilization,
initialized,
--
2.7.4

2016-10-19 16:23:54

by Jin Yao

[permalink] [raw]
Subject: [PATCH v2 2/6] perf report: Caculate and return the branch counting in callchain

Create some branch counters in per callchain list entry. Each counter
is for a branch flag. For example, predicted_count counts all the
*predicted* branches. The counters get updated by processing the
callchain cursor nodes.

It also provides functions to retrieve or print the values of counters
in callchain list.

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/util/callchain.c | 165 +++++++++++++++++++++++++++++++++++++++++++-
tools/perf/util/callchain.h | 11 +++
2 files changed, 175 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 342ef20..8937a2c 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -440,6 +440,19 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor)
call->ip = cursor_node->ip;
call->ms.sym = cursor_node->sym;
call->ms.map = cursor_node->map;
+
+ if (cursor_node->branch) {
+ call->branch_count = 1;
+
+ if (cursor_node->branch_flags.predicted)
+ call->predicted_count = 1;
+
+ if (cursor_node->branch_flags.abort)
+ call->abort_count = 1;
+
+ call->cycles_count = cursor_node->branch_flags.cycles;
+ }
+
list_add_tail(&call->list, &node->val);

callchain_cursor_advance(cursor);
@@ -499,8 +512,21 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
right = node->ip;
}

- if (left == right)
+ if (left == right) {
+ if (node->branch) {
+ cnode->branch_count++;
+
+ if (node->branch_flags.predicted)
+ cnode->predicted_count++;
+
+ if (node->branch_flags.abort)
+ cnode->abort_count++;
+
+ cnode->cycles_count += node->branch_flags.cycles;
+ }
+
return MATCH_EQ;
+ }

return left > right ? MATCH_GT : MATCH_LT;
}
@@ -946,6 +972,143 @@ int callchain_node__fprintf_value(struct callchain_node *node,
return 0;
}

+static void callchain_counts_value(struct callchain_node *node,
+ u64 *branch_count, u64 *predicted_count,
+ u64 *abort_count, u64 *cycles_count)
+{
+ struct callchain_list *clist;
+
+ list_for_each_entry(clist, &node->val, list) {
+ if (branch_count)
+ *branch_count += clist->branch_count;
+
+ if (predicted_count)
+ *predicted_count += clist->predicted_count;
+
+ if (abort_count)
+ *abort_count += clist->abort_count;
+
+ if (cycles_count)
+ *cycles_count += clist->cycles_count;
+ }
+}
+
+static int callchain_node_branch_counts_cumul(struct callchain_node *node,
+ u64 *branch_count,
+ u64 *predicted_count,
+ u64 *abort_count,
+ u64 *cycles_count)
+{
+ struct callchain_node *child;
+ struct rb_node *n;
+
+ n = rb_first(&node->rb_root_in);
+ while (n) {
+ child = rb_entry(n, struct callchain_node, rb_node_in);
+ n = rb_next(n);
+
+ callchain_node_branch_counts_cumul(child, branch_count,
+ predicted_count,
+ abort_count,
+ cycles_count);
+
+ callchain_counts_value(child, branch_count,
+ predicted_count, abort_count,
+ cycles_count);
+ }
+
+ return 0;
+}
+
+int callchain_branch_counts(struct callchain_root *root,
+ u64 *branch_count, u64 *predicted_count,
+ u64 *abort_count, u64 *cycles_count)
+{
+ if (branch_count)
+ *branch_count = 0;
+
+ if (predicted_count)
+ *predicted_count = 0;
+
+ if (abort_count)
+ *abort_count = 0;
+
+ if (cycles_count)
+ *cycles_count = 0;
+
+ return callchain_node_branch_counts_cumul(&root->node,
+ branch_count,
+ predicted_count,
+ abort_count,
+ cycles_count);
+}
+
+static int callchain_counts_printf(FILE *fp, char *bf, int bfsize,
+ u64 branch_count, u64 predicted_count,
+ u64 abort_count, u64 cycles_count,
+ const char *cumul_str)
+{
+ double predicted_percent = 0.0;
+ double abort_percent = 0.0;
+ u64 cycles = 0;
+
+ if (branch_count == 0) {
+ if (fp)
+ return fprintf(fp, " (calltrace)");
+
+ return scnprintf(bf, bfsize, " (calltrace)");
+ }
+
+ predicted_percent = predicted_count * 100.0 / branch_count;
+ abort_percent = abort_count * 100.0 / branch_count;
+ cycles = cycles_count / branch_count;
+
+ if ((predicted_percent >= 100.0) && (abort_percent <= 0.0)) {
+ if (fp)
+ return fprintf(fp, " (%scycles:%" PRId64 ")",
+ cumul_str, cycles);
+
+ return scnprintf(bf, bfsize, " (%scycles:%" PRId64 ")",
+ cumul_str, cycles);
+ }
+
+ if ((predicted_percent < 100.0) && (abort_percent <= 0.0)) {
+ if (fp)
+ return fprintf(fp,
+ " (%spredicted:%.1f%%, cycles:%" PRId64 ")",
+ cumul_str, predicted_percent, cycles);
+
+ return scnprintf(bf, bfsize,
+ " (%spredicted:%.1f%%, cycles:%" PRId64 ")",
+ cumul_str, predicted_percent, cycles);
+ }
+
+ if (fp)
+ return fprintf(fp,
+ " (%spredicted:%.1f%%, abort:%.1f%%, cycles:%" PRId64 ")",
+ cumul_str, predicted_percent, abort_percent, cycles);
+
+ return scnprintf(bf, bfsize,
+ " (%spredicted:%.1f%%, abort:%.1f%%, cycles:%" PRId64 ")",
+ cumul_str, predicted_percent, abort_percent, cycles);
+}
+
+int callchain_list_counts__printf_value(struct callchain_list *clist,
+ FILE *fp, char *bf, int bfsize)
+{
+ u64 branch_count, predicted_count;
+ u64 abort_count, cycles_count;
+
+ branch_count = clist->branch_count;
+ predicted_count = clist->predicted_count;
+ abort_count = clist->abort_count;
+ cycles_count = clist->cycles_count;
+
+ return callchain_counts_printf(fp, bf, bfsize, branch_count,
+ predicted_count, abort_count,
+ cycles_count, "");
+}
+
static void free_callchain_node(struct callchain_node *node)
{
struct callchain_list *list, *tmp;
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 40ecf25..4f6bf6c 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -115,6 +115,10 @@ struct callchain_list {
bool unfolded;
bool has_children;
};
+ u64 branch_count;
+ u64 predicted_count;
+ u64 abort_count;
+ u64 cycles_count;
char *srcline;
struct list_head list;
};
@@ -264,8 +268,15 @@ char *callchain_node__scnprintf_value(struct callchain_node *node,
int callchain_node__fprintf_value(struct callchain_node *node,
FILE *fp, u64 total);

+int callchain_list_counts__printf_value(struct callchain_list *clist,
+ FILE *fp, char *bf, int bfsize);
+
void free_callchain(struct callchain_root *root);
void decay_callchain(struct callchain_root *root);
int callchain_node__make_parent_list(struct callchain_node *node);

+int callchain_branch_counts(struct callchain_root *root,
+ u64 *branch_count, u64 *predicted_count,
+ u64 *abort_count, u64 *cycles_count);
+
#endif /* __PERF_CALLCHAIN_H */
--
2.7.4

2016-10-19 16:25:17

by Jin Yao

[permalink] [raw]
Subject: [PATCH v2 4/6] perf report: Show branch info in callchain entry for stdio mode

If the branch is 100% predicated then the "predicated" is hide.
Similarly, if there is no branch tsx abort, the "abort" is hide.
There is only cycles shown (cycle is supported on skylake platform,
older platform would be 0).

One example is:

|--36.73%--__random_r random_r.c:392 (cycles:9)
| __random_r random_r.c:357 (cycles:1)
| __random random.c:293 (cycles:1)
| __random random.c:293 (cycles:1)
| __random random.c:291 (cycles:1)
| __random random.c:291 (cycles:1)
| __random random.c:291 (cycles:1)
| __random random.c:288 (cycles:1)
| rand rand.c:27 (cycles:1)
| rand rand.c:26 (cycles:1)
| rand@plt +4194304 (cycles:1)
| rand@plt +4194304 (cycles:1)
| compute_flag div.c:25 (cycles:1)
| compute_flag div.c:22 (cycles:1)
| main div.c:40 (cycles:1)
| main div.c:40 (cycles:16)
| main div.c:39 (cycles:16)
| |
| |--29.93%--main div.c:39 (predicted:50.6%, cycles:1)
| | main div.c:44 (predicted:50.6%, cycles:1)

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/ui/stdio/hist.c | 30 ++++++++++++++++++++++++++----
1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 89d8441..57e1f6f 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -41,7 +41,9 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_node *node,
{
int i;
size_t ret = 0;
- char bf[1024];
+ char bf[1024], *alloc_str = NULL;
+ char buf[64];
+ const char *str;

ret += callchain__fprintf_left_margin(fp, left_margin);
for (i = 0; i < depth; i++) {
@@ -56,8 +58,21 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_node *node,
} else
ret += fprintf(fp, "%s", " ");
}
- fputs(callchain_list__sym_name(chain, bf, sizeof(bf), false), fp);
+
+ str = callchain_list__sym_name(chain, bf, sizeof(bf), false);
+
+ if (symbol_conf.show_branchflag_count) {
+ callchain_list_counts__printf_value(chain, NULL,
+ buf, sizeof(buf));
+ if (asprintf(&alloc_str, "%s%s", str, buf) < 0)
+ str = "Not enough memory!";
+ else
+ str = alloc_str;
+ }
+
+ fputs(str, fp);
fputc('\n', fp);
+ free(alloc_str);
return ret;
}

@@ -219,8 +234,15 @@ static size_t callchain__fprintf_graph(FILE *fp, struct rb_root *root,
} else
ret += callchain__fprintf_left_margin(fp, left_margin);

- ret += fprintf(fp, "%s\n", callchain_list__sym_name(chain, bf, sizeof(bf),
- false));
+ ret += fprintf(fp, "%s",
+ callchain_list__sym_name(chain, bf,
+ sizeof(bf),
+ false));
+
+ if (symbol_conf.show_branchflag_count)
+ ret += callchain_list_counts__printf_value(
+ chain, fp, NULL, 0);
+ ret += fprintf(fp, "\n");

if (++entries_printed == callchain_param.print_limit)
break;
--
2.7.4

2016-10-19 16:25:15

by Jin Yao

[permalink] [raw]
Subject: [PATCH v2 5/6] perf report: Show branch info in callchain entry for browser mode

If the branch is 100% predicated then the "predicated" is hide.
Similarly, if there is no branch tsx abort, the "abort" is hide.
There is only cycles shown (cycle is supported on skylake platform,
older platform would be 0).

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/ui/browsers/hists.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index ddc4c3e..24d27c2 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -738,6 +738,7 @@ static int hist_browser__show_callchain_list(struct hist_browser *browser,
struct callchain_print_arg *arg)
{
char bf[1024], *alloc_str;
+ char buf[64], *alloc_str2;
const char *str;

if (arg->row_offset != 0) {
@@ -746,12 +747,21 @@ static int hist_browser__show_callchain_list(struct hist_browser *browser,
}

alloc_str = NULL;
+ alloc_str2 = NULL;
+
str = callchain_list__sym_name(chain, bf, sizeof(bf),
browser->show_dso);

- if (need_percent) {
- char buf[64];
+ if (symbol_conf.show_branchflag_count) {
+ callchain_list_counts__printf_value(chain, NULL, buf,
+ sizeof(buf));
+ if (asprintf(&alloc_str2, "%s%s", str, buf) < 0)
+ str = "Not enough memory!";
+ else
+ str = alloc_str2;
+ }

+ if (need_percent) {
callchain_node__scnprintf_value(node, buf, sizeof(buf),
total);

@@ -764,6 +774,7 @@ static int hist_browser__show_callchain_list(struct hist_browser *browser,
print(browser, chain, str, offset, row, arg);

free(alloc_str);
+ free(alloc_str2);
return 1;
}

--
2.7.4

2016-10-20 16:41:55

by Nilay Vaish

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] perf report: Caculate and return the branch counting in callchain

On 19 October 2016 at 17:01, Jin Yao <[email protected]> wrote:
> diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
> index 40ecf25..4f6bf6c 100644
> --- a/tools/perf/util/callchain.h
> +++ b/tools/perf/util/callchain.h
> @@ -115,6 +115,10 @@ struct callchain_list {
> bool unfolded;
> bool has_children;
> };
> + u64 branch_count;
> + u64 predicted_count;
> + u64 abort_count;

Can you explain what abort count is? It seems you are referring to
miss-speculated branches. If that is the case, I would prefer that we
replace abort by miss_speculated or miss_predicted.

--
Nilay

2016-10-20 16:49:22

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] perf report: Caculate and return the branch counting in callchain

On Thu, Oct 20, 2016 at 11:41:11AM -0500, Nilay Vaish wrote:
> On 19 October 2016 at 17:01, Jin Yao <[email protected]> wrote:
> > diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
> > index 40ecf25..4f6bf6c 100644
> > --- a/tools/perf/util/callchain.h
> > +++ b/tools/perf/util/callchain.h
> > @@ -115,6 +115,10 @@ struct callchain_list {
> > bool unfolded;
> > bool has_children;
> > };
> > + u64 branch_count;
> > + u64 predicted_count;
> > + u64 abort_count;
>
> Can you explain what abort count is? It seems you are referring to
> miss-speculated branches. If that is the case, I would prefer that we
> replace abort by miss_speculated or miss_predicted.

abort refers to TSX aborts. It has nothing to do with branch
mispredictions.

-Andi

2016-10-20 17:07:06

by Nilay Vaish

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] perf report: Caculate and return the branch counting in callchain

On 20 October 2016 at 11:48, Andi Kleen <[email protected]> wrote:
> On Thu, Oct 20, 2016 at 11:41:11AM -0500, Nilay Vaish wrote:
>> On 19 October 2016 at 17:01, Jin Yao <[email protected]> wrote:
>> > diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
>> > index 40ecf25..4f6bf6c 100644
>> > --- a/tools/perf/util/callchain.h
>> > +++ b/tools/perf/util/callchain.h
>> > @@ -115,6 +115,10 @@ struct callchain_list {
>> > bool unfolded;
>> > bool has_children;
>> > };
>> > + u64 branch_count;
>> > + u64 predicted_count;
>> > + u64 abort_count;
>>
>> Can you explain what abort count is? It seems you are referring to
>> miss-speculated branches. If that is the case, I would prefer that we
>> replace abort by miss_speculated or miss_predicted.
>
> abort refers to TSX aborts. It has nothing to do with branch
> mispredictions.

OK, I am more confused now. Are you predicting some quantity related
to transactions? Why would you divide abort count by branch count?
Further, I just looked at patch 6/6. It has the following text:

+ Also show with some branch flags that can be:
+ - Predicted: display the average percentage of predicated branches.
+ (predicated number / total number)
+ - Abort: display the average percentage of abort branches.
+ (abort number /total number)
+ - Cycles: cycles in basic block.


I think there is inconsistency between what you are suggesting and
what the patch has.

--
Nilay

2016-10-20 18:20:39

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] perf report: Caculate and return the branch counting in callchain

> OK, I am more confused now. Are you predicting some quantity related
> to transactions? Why would you divide abort count by branch count?
> Further, I just looked at patch 6/6. It has the following text:
>
> + Also show with some branch flags that can be:
> + - Predicted: display the average percentage of predicated branches.
> + (predicated number / total number)
> + - Abort: display the average percentage of abort branches.
> + (abort number /total number)
> + - Cycles: cycles in basic block.
>
>
> I think there is inconsistency between what you are suggesting and
> what the patch has.

An abort is an unique branch. But yes there is no total number,
so the formula will always be 100%. So yes would probably be
better to just display a count for abort.

-Andi

2016-10-21 00:23:45

by Jin Yao

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] perf report: Caculate and return the branch counting in callchain

Hi Andi, Hi Nilay,

Thanks so much for your comments!

I will upgrade the patch to just display the count for abort.

Thanks

Jin Yao

On 10/21/2016 2:20 AM, Andi Kleen wrote:
>> OK, I am more confused now. Are you predicting some quantity related
>> to transactions? Why would you divide abort count by branch count?
>> Further, I just looked at patch 6/6. It has the following text:
>>
>> + Also show with some branch flags that can be:
>> + - Predicted: display the average percentage of predicated branches.
>> + (predicated number / total number)
>> + - Abort: display the average percentage of abort branches.
>> + (abort number /total number)
>> + - Cycles: cycles in basic block.
>>
>>
>> I think there is inconsistency between what you are suggesting and
>> what the patch has.
> An abort is an unique branch. But yes there is no total number,
> so the formula will always be 100%. So yes would probably be
> better to just display a count for abort.
>
> -Andi

2016-10-23 14:10:19

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] perf report: Show branch flags/cycles in --branch-history callgraph view

On Thu, Oct 20, 2016 at 06:01:11AM +0800, Jin Yao wrote:
> v2: Just a rebase to Arnaldo's perf/core branch, no functional changes.
>

Reviewed-by: Jiri Olsa <[email protected]>

thanks,
jirka

2016-10-25 18:11:13

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] perf report: Caculate and return the branch counting in callchain

Em Fri, Oct 21, 2016 at 08:23:41AM +0800, Jin, Yao escreveu:
> Hi Andi, Hi Nilay,
>
> Thanks so much for your comments!
>
> I will upgrade the patch to just display the count for abort.

Ok, waiting for that then,

- Arnaldo

> Thanks
>
> Jin Yao
>
> On 10/21/2016 2:20 AM, Andi Kleen wrote:
> > > OK, I am more confused now. Are you predicting some quantity related
> > > to transactions? Why would you divide abort count by branch count?
> > > Further, I just looked at patch 6/6. It has the following text:
> > >
> > > + Also show with some branch flags that can be:
> > > + - Predicted: display the average percentage of predicated branches.
> > > + (predicated number / total number)
> > > + - Abort: display the average percentage of abort branches.
> > > + (abort number /total number)
> > > + - Cycles: cycles in basic block.
> > >
> > >
> > > I think there is inconsistency between what you are suggesting and
> > > what the patch has.
> > An abort is an unique branch. But yes there is no total number,
> > so the formula will always be 100%. So yes would probably be
> > better to just display a count for abort.
> >
> > -Andi