2014-12-02 15:26:33

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V5 0/3] perf tool: Haswell LBR call stack support (user)

From: Kan Liang <[email protected]>

This is the user space patch for Haswell LBR call stack support.
For many profiling tasks we need the callgraph. For example we often
need to see the caller of a lock or the caller of a memcpy or other
library function to actually tune the program. Frame pointer unwinding
is efficient and works well. But frame pointers are off by default on
64bit code (and on modern 32bit gccs), so there are many binaries around
that do not use frame pointers. Profiling unchanged production code is
very useful in practice. On some CPUs frame pointer also has a high
cost. Dwarf2 unwinding also does not always work and is extremely slow
(upto 20% overhead).

Haswell has a new feature that utilizes the existing Last Branch Record
facility to record call chains. When the feature is enabled, function
call will be collected as normal, but as return instructions are
executed the last captured branch record is popped from the on-chip LBR
registers. The LBR call stack facility provides an alternative to get
callgraph. It has some limitations too, but should work in most cases
and is significantly faster than dwarf. Frame pointer unwinding is still
the best default, but LBR call stack is a good alternative when nothing
else works.

Please find the kernel part patch at https://lkml.org/lkml/2014/11/6/432

Changes since v1
- Update help document
- Force exclude_user to 0 with warning in LBR call stack
- Dump both lbr and fp info when report -D
- Reconstruct thread__resolve_callchain_sample and split it into two patches
- Use has_branch_callstack function to check LBR call stack available

Changes since v2
- Rebase to 025ce5d33373

Changes since v3
- Rebase to cc502c23aadf
- Separated function for lbr call stack sample resolve and print
- Some minor changes according to comments

Changes since V4
- Rebase to 09a6a1b
- Falling back to framepointers if LBR not available, and warning user

Kan Liang (3):
perf tools: enable LBR call stack support
perf tool: Move cpumode resolve code to add_callchain_ip
perf tools: Construct LBR call chain

tools/perf/Documentation/perf-record.txt | 8 +-
tools/perf/builtin-record.c | 6 +-
tools/perf/builtin-report.c | 2 +
tools/perf/util/callchain.c | 10 +-
tools/perf/util/callchain.h | 1 +
tools/perf/util/evsel.c | 21 +++-
tools/perf/util/evsel.h | 4 +
tools/perf/util/machine.c | 174 ++++++++++++++++++++++---------
tools/perf/util/session.c | 64 ++++++++++--
9 files changed, 229 insertions(+), 61 deletions(-)

--
1.8.3.2


2014-12-02 15:26:36

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V5 3/3] perf tools: Construct LBR call chain

From: Kan Liang <[email protected]>

LBR call stack only has user callchain. It is output as
PERF_SAMPLE_BRANCH_STACK data format. For the kernel callchain, it's
still from PERF_SAMPLE_CALLCHAIN.
The perf tool has to handle both data sources to construct a
complete callstack.
For perf report -D option, both lbr and fp information will be
displayed.

A new call chain recording option "lbr" is introduced into perf tool for
LBR call stack. The user can use --call-graph lbr to get the call stack
information from hardware.

Here are some examples.
When profiling bc(1) on Fedora 19:
echo 'scale=2000; 4*a(1)' > cmd; perf record --call-graph lbr bc -l <
cmd
If enabling LBR, perf report output looks like:
50.36% bc bc [.] bc_divide
|
--- bc_divide
execute
run_code
yyparse
main
__libc_start_main
_start
33.66% bc bc [.] _one_mult
|
--- _one_mult
bc_divide
execute
run_code
yyparse
main
__libc_start_main
_start
7.62% bc bc [.] _bc_do_add
|
--- _bc_do_add
|
|--99.89%-- 0x2000186a8
--0.11%-- [...]
6.83% bc bc [.] _bc_do_sub
|
--- _bc_do_sub
|
|--99.94%-- bc_add
| execute
| run_code
| yyparse
| main
| __libc_start_main
| _start
--0.06%-- [...]
0.46% bc libc-2.17.so [.] __memset_sse2
|
--- __memset_sse2
|
|--54.13%-- bc_new_num
| |
| |--51.00%-- bc_divide
| | execute
| | run_code
| | yyparse
| | main
| | __libc_start_main
| | _start
| |
| |--30.46%-- _bc_do_sub
| | bc_add
| | execute
| | run_code
| | yyparse
| | main
| | __libc_start_main
| | _start
| |
| --18.55%-- _bc_do_add
| bc_add
| execute
| run_code
| yyparse
| main
| __libc_start_main
| _start
|
--45.87%-- bc_divide
execute
run_code
yyparse
main
__libc_start_main
_start

If using FP, perf report output looks like:
echo 'scale=2000; 4*a(1)' > cmd; perf record --call-graph fp bc -l < cmd
50.49% bc bc [.] bc_divide
|
--- bc_divide
33.57% bc bc [.] _one_mult
|
--- _one_mult
7.61% bc bc [.] _bc_do_add
|
--- _bc_do_add
0x2000186a8
6.88% bc bc [.] _bc_do_sub
|
--- _bc_do_sub
0.42% bc libc-2.17.so [.] __memcpy_ssse3_back
|
--- __memcpy_ssse3_back

If using LBR, perf report -D output looks like:
3458145275743 0x2fd750 [0xd8]: PERF_RECORD_SAMPLE(IP, 0x2): 9748/9748:
0x408ea8 period: 609644 addr: 0
... LBR call chain: nr:8
..... 0: fffffffffffffe00
..... 1: 0000000000408e50
..... 2: 000000000040a458
..... 3: 000000000040562e
..... 4: 0000000000408590
..... 5: 00000000004022c0
..... 6: 00000000004015dd
..... 7: 0000003d1cc21b43
... FP chain: nr:2
..... 0: fffffffffffffe00
..... 1: 0000000000408ea8
... thread: bc:9748
...... dso: /usr/bin/bc

The LBR call stack has following known limitations
- Zero length calls are not filtered out by hardware
- Exception handing such as setjmp/longjmp will have calls/returns not
match
- Pushing different return address onto the stack will have
calls/returns
not match
- If callstack is deeper than the LBR, only the last entries are
captured

Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/util/evsel.h | 4 ++
tools/perf/util/machine.c | 102 +++++++++++++++++++++++++++++++++++++++++-----
tools/perf/util/session.c | 64 ++++++++++++++++++++++++++---
3 files changed, 153 insertions(+), 17 deletions(-)

diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 3862274..dcf202a 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -355,4 +355,8 @@ for ((_evsel) = list_entry((_leader)->node.next, struct perf_evsel, node); \
(_evsel) && (_evsel)->leader == (_leader); \
(_evsel) = list_entry((_evsel)->node.next, struct perf_evsel, node))

+static inline bool has_branch_callstack(struct perf_evsel *evsel)
+{
+ return evsel->attr.branch_sample_type & PERF_SAMPLE_BRANCH_CALL_STACK;
+}
#endif /* __PERF_EVSEL_H */
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 94de3e4..36759fa 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1500,18 +1500,100 @@ static int remove_loops(struct branch_entry *l, int nr)
return nr;
}

-static int thread__resolve_callchain_sample(struct thread *thread,
- struct ip_callchain *chain,
- struct branch_stack *branch,
- struct symbol **parent,
- struct addr_location *root_al,
- int max_stack)
+/*
+ * Recolve LBR callstack chain sample
+ * Return:
+ * 1 on success get LBR callchain information
+ * 0 no available LBR callchain information, should try fp
+ * negative error code on other errors.
+ */
+static int resolve_lbr_callchain_sample(struct thread *thread,
+ struct perf_sample *sample,
+ struct symbol **parent,
+ struct addr_location *root_al,
+ int max_stack)
{
+ struct ip_callchain *chain = sample->callchain;
+ int chain_nr = min(max_stack, (int)chain->nr);
+ int i, j, err;
+ u64 ip;
+
+ for (i = 0; i < chain_nr; i++) {
+ if (chain->ips[i] == PERF_CONTEXT_USER)
+ break;
+ }
+
+ /* LBR only affects the user callchain */
+ if (i != chain_nr) {
+ struct branch_stack *lbr_stack = sample->branch_stack;
+ int lbr_nr = lbr_stack->nr;
+ /*
+ * LBR callstack can only get user call chain.
+ * The mix_chain_nr is kernel call chain
+ * number plus LBR user call chain number.
+ * i is kernel call chain number,
+ * 1 is PERF_CONTEXT_USER,
+ * lbr_nr + 1 is the user call chain number.
+ * For details, please refer to the comments
+ * in callchain__printf
+ */
+ int mix_chain_nr = i + 1 + lbr_nr + 1;
+
+ if (mix_chain_nr > PERF_MAX_STACK_DEPTH + PERF_MAX_BRANCH_DEPTH) {
+ pr_warning("corrupted callchain. skipping...\n");
+ return 0;
+ }
+
+ for (j = 0; j < mix_chain_nr; j++) {
+ if (callchain_param.order == ORDER_CALLEE) {
+ if (j < i + 1)
+ ip = chain->ips[j];
+ else if (j > i + 1)
+ ip = lbr_stack->entries[j - i - 2].from;
+ else
+ ip = lbr_stack->entries[0].to;
+ } else {
+ if (j < lbr_nr)
+ ip = lbr_stack->entries[lbr_nr - j - 1].from;
+ else if (j > lbr_nr)
+ ip = chain->ips[i + 1 - (j - lbr_nr)];
+ else
+ ip = lbr_stack->entries[0].to;
+ }
+
+ err = add_callchain_ip(thread, parent, root_al, false, ip);
+ if (err)
+ return (err < 0) ? err : 0;
+ }
+ return 1;
+ }
+
+ return 0;
+}
+
+static int thread__resolve_callchain_sample(struct thread *thread,
+ struct perf_evsel *evsel,
+ struct perf_sample *sample,
+ struct symbol **parent,
+ struct addr_location *root_al,
+ int max_stack)
+{
+ struct branch_stack *branch = sample->branch_stack;
+ struct ip_callchain *chain = sample->callchain;
int chain_nr = min(max_stack, (int)chain->nr);
int i, j, err;
int skip_idx = -1;
int first_call = 0;

+ callchain_cursor_reset(&callchain_cursor);
+
+ if (has_branch_callstack(evsel)) {
+ err = resolve_lbr_callchain_sample(thread, sample, parent,
+ root_al, max_stack);
+ if (err)
+ return (err < 0) ? err : 0;
+ }
+
/*
* Based on DWARF debug information, some architectures skip
* a callchain entry saved by the kernel.
@@ -1519,8 +1601,6 @@ static int thread__resolve_callchain_sample(struct thread *thread,
if (chain->nr < PERF_MAX_STACK_DEPTH)
skip_idx = arch_skip_callchain_idx(thread, chain);

- callchain_cursor_reset(&callchain_cursor);
-
/*
* Add branches to call stack for easier browsing. This gives
* more context for a sample than just the callers.
@@ -1621,9 +1701,9 @@ int thread__resolve_callchain(struct thread *thread,
struct addr_location *root_al,
int max_stack)
{
- int ret = thread__resolve_callchain_sample(thread, sample->callchain,
- sample->branch_stack,
- parent, root_al, max_stack);
+ int ret = thread__resolve_callchain_sample(thread, evsel,
+ sample, parent,
+ root_al, max_stack);
if (ret)
return ret;

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 6ac62ae..900b228 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -557,15 +557,67 @@ int perf_session_queue_event(struct perf_session *s, union perf_event *event,
return 0;
}

-static void callchain__printf(struct perf_sample *sample)
+static void callchain__lbr_callstack_printf(struct perf_sample *sample)
{
+ struct ip_callchain *callchain = sample->callchain;
+ struct branch_stack *lbr_stack = sample->branch_stack;
+ u64 kernel_callchain_nr = callchain->nr;
unsigned int i;

- printf("... chain: nr:%" PRIu64 "\n", sample->callchain->nr);
+ for (i = 0; i < kernel_callchain_nr; i++) {
+ if (callchain->ips[i] == PERF_CONTEXT_USER)
+ break;
+ }
+
+ if ((i != kernel_callchain_nr) && lbr_stack->nr) {
+ u64 total_nr;
+ /*
+ * LBR callstack can only get user call chain,
+ * i is kernel call chain number,
+ * 1 is PERF_CONTEXT_USER.
+ *
+ * The user call chain is stored in LBR registers.
+ * LBR are pair registers. The caller is stored
+ * in "from" register, while the callee is stored
+ * in "to" register.
+ * For example, there is a call stack
+ * "A"->"B"->"C"->"D".
+ * The LBR registers will recorde like
+ * "C"->"D", "B"->"C", "A"->"B".
+ * So only the first "to" register and all "from"
+ * registers are needed to construct the whole stack.
+ */
+ total_nr = i + 1 + lbr_stack->nr + 1;
+ kernel_callchain_nr = i + 1;
+
+ printf("... LBR call chain: nr:%" PRIu64 "\n", total_nr);
+
+ for (i = 0; i < kernel_callchain_nr; i++)
+ printf("..... %2d: %016" PRIx64 "\n",
+ i, callchain->ips[i]);
+
+ printf("..... %2d: %016" PRIx64 "\n",
+ (int)(kernel_callchain_nr), lbr_stack->entries[0].to);
+ for (i = 0; i < lbr_stack->nr; i++)
+ printf("..... %2d: %016" PRIx64 "\n",
+ (int)(i + kernel_callchain_nr + 1), lbr_stack->entries[i].from);
+ }
+}
+
+static void callchain__printf(struct perf_evsel *evsel,
+ struct perf_sample *sample)
+{
+ unsigned int i;
+ struct ip_callchain *callchain = sample->callchain;
+
+ if (has_branch_callstack(evsel))
+ callchain__lbr_callstack_printf(sample);
+
+ printf("... FP chain: nr:%" PRIu64 "\n", callchain->nr);

- for (i = 0; i < sample->callchain->nr; i++)
+ for (i = 0; i < callchain->nr; i++)
printf("..... %2d: %016" PRIx64 "\n",
- i, sample->callchain->ips[i]);
+ i, callchain->ips[i]);
}

static void branch_stack__printf(struct perf_sample *sample)
@@ -722,9 +774,9 @@ static void dump_sample(struct perf_evsel *evsel, union perf_event *event,
sample_type = evsel->attr.sample_type;

if (sample_type & PERF_SAMPLE_CALLCHAIN)
- callchain__printf(sample);
+ callchain__printf(evsel, sample);

- if (sample_type & PERF_SAMPLE_BRANCH_STACK)
+ if ((sample_type & PERF_SAMPLE_BRANCH_STACK) && !has_branch_callstack(evsel))
branch_stack__printf(sample);

if (sample_type & PERF_SAMPLE_REGS_USER)
--
1.8.3.2

2014-12-02 15:27:11

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V5 2/3] perf tool: Move cpumode resolve code to add_callchain_ip

From: Kan Liang <[email protected]>

Using flag to distinguish between branch_history and normal callchain.
Move the cpumode to add_callchain_ip function.
No change in behavior.

Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/util/machine.c | 72 +++++++++++++++++++++++------------------------
1 file changed, 35 insertions(+), 37 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 15dd0a9..94de3e4 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1385,19 +1385,46 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
static int add_callchain_ip(struct thread *thread,
struct symbol **parent,
struct addr_location *root_al,
- int cpumode,
+ bool branch_history,
u64 ip)
{
struct addr_location al;

al.filtered = 0;
al.sym = NULL;
- if (cpumode == -1)
+ if (branch_history)
thread__find_cpumode_addr_location(thread, MAP__FUNCTION,
ip, &al);
- else
+ else {
+ u8 cpumode = PERF_RECORD_MISC_USER;
+
+ if (ip >= PERF_CONTEXT_MAX) {
+ switch (ip) {
+ case PERF_CONTEXT_HV:
+ cpumode = PERF_RECORD_MISC_HYPERVISOR;
+ break;
+ case PERF_CONTEXT_KERNEL:
+ cpumode = PERF_RECORD_MISC_KERNEL;
+ break;
+ case PERF_CONTEXT_USER:
+ cpumode = PERF_RECORD_MISC_USER;
+ break;
+ default:
+ pr_debug("invalid callchain context: "
+ "%"PRId64"\n", (s64) ip);
+ /*
+ * It seems the callchain is corrupted.
+ * Discard all.
+ */
+ callchain_cursor_reset(&callchain_cursor);
+ return 1;
+ }
+ return 0;
+ }
thread__find_addr_location(thread, cpumode, MAP__FUNCTION,
ip, &al);
+ }
+
if (al.sym != NULL) {
if (sort__has_parent && !*parent &&
symbol__match_regex(al.sym, &parent_regex))
@@ -1480,11 +1507,8 @@ static int thread__resolve_callchain_sample(struct thread *thread,
struct addr_location *root_al,
int max_stack)
{
- u8 cpumode = PERF_RECORD_MISC_USER;
int chain_nr = min(max_stack, (int)chain->nr);
- int i;
- int j;
- int err;
+ int i, j, err;
int skip_idx = -1;
int first_call = 0;

@@ -1542,10 +1566,10 @@ static int thread__resolve_callchain_sample(struct thread *thread,

for (i = 0; i < nr; i++) {
err = add_callchain_ip(thread, parent, root_al,
- -1, be[i].to);
+ true, be[i].to);
if (!err)
err = add_callchain_ip(thread, parent, root_al,
- -1, be[i].from);
+ true, be[i].from);
if (err == -EINVAL)
break;
if (err)
@@ -1574,36 +1598,10 @@ check_calls:
#endif
ip = chain->ips[j];

- if (ip >= PERF_CONTEXT_MAX) {
- switch (ip) {
- case PERF_CONTEXT_HV:
- cpumode = PERF_RECORD_MISC_HYPERVISOR;
- break;
- case PERF_CONTEXT_KERNEL:
- cpumode = PERF_RECORD_MISC_KERNEL;
- break;
- case PERF_CONTEXT_USER:
- cpumode = PERF_RECORD_MISC_USER;
- break;
- default:
- pr_debug("invalid callchain context: "
- "%"PRId64"\n", (s64) ip);
- /*
- * It seems the callchain is corrupted.
- * Discard all.
- */
- callchain_cursor_reset(&callchain_cursor);
- return 0;
- }
- continue;
- }
+ err = add_callchain_ip(thread, parent, root_al, false, ip);

- err = add_callchain_ip(thread, parent, root_al,
- cpumode, ip);
- if (err == -EINVAL)
- break;
if (err)
- return err;
+ return (err < 0) ? err : 0;
}

return 0;
--
1.8.3.2

2014-12-02 15:27:31

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V5 1/3] perf tools: enable LBR call stack support

From: Kan Liang <[email protected]>

Currently, there are two call chain recording options, fp and dwarf.
Haswell has a new feature that utilizes the existing LBR facility to
record call chains. So it provides the third options to record call
chain. This patch enables the lbr call stack support.

LBR call stack has some limitations. It reuses current LBR facility, so
LBR call stack and branch record can not be enabled at the same time. It
is only available for user callchain.
However, LBR call stack can work on the user app which doesn't have
frame-pointer or dwarf debug info compiled. It is a good alternative
when nothing else works.

Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/Documentation/perf-record.txt | 8 +++++++-
tools/perf/builtin-record.c | 6 +++---
tools/perf/builtin-report.c | 2 ++
tools/perf/util/callchain.c | 10 +++++++++-
tools/perf/util/callchain.h | 1 +
tools/perf/util/evsel.c | 21 +++++++++++++++++++--
6 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index af9a54e..d10cb2c 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -100,13 +100,19 @@ OPTIONS
implies -g.

Allows specifying "fp" (frame pointer) or "dwarf"
- (DWARF's CFI - Call Frame Information) as the method to collect
+ (DWARF's CFI - Call Frame Information) or "lbr"
+ (Hardware Last Branch Record facility) as the method to collect
the information used to show the call graphs.

In some systems, where binaries are build with gcc
--fomit-frame-pointer, using the "fp" method will produce bogus
call graphs, using "dwarf", if available (perf tools linked to
the libunwind library) should be used instead.
+ Using the "lbr" method doesn't require any compiler options. It
+ will produce call graphs from the hardware LBR registers. The
+ main limition is that it is only available on new Intel
+ platforms, such as Haswell. It can only get user call chain. It
+ doesn't work with branch stack sampling at the same time.

-q::
--quiet::
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 8648c6d..6a68c85 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -639,7 +639,7 @@ error:

static void callchain_debug(void)
{
- static const char *str[CALLCHAIN_MAX] = { "NONE", "FP", "DWARF" };
+ static const char *str[CALLCHAIN_MAX] = { "NONE", "FP", "DWARF", "LBR" };

pr_debug("callchain: type %s\n", str[callchain_param.record_mode]);

@@ -725,9 +725,9 @@ static struct record record = {
#define CALLCHAIN_HELP "setup and enables call-graph (stack chain/backtrace) recording: "

#ifdef HAVE_DWARF_UNWIND_SUPPORT
-const char record_callchain_help[] = CALLCHAIN_HELP "fp dwarf";
+const char record_callchain_help[] = CALLCHAIN_HELP "fp dwarf lbr";
#else
-const char record_callchain_help[] = CALLCHAIN_HELP "fp";
+const char record_callchain_help[] = CALLCHAIN_HELP "fp lbr";
#endif

/*
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 3936760..635bf65 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -262,6 +262,8 @@ static int report__setup_sample_type(struct report *rep)
if ((sample_type & PERF_SAMPLE_REGS_USER) &&
(sample_type & PERF_SAMPLE_STACK_USER))
callchain_param.record_mode = CALLCHAIN_DWARF;
+ else if (sample_type & PERF_SAMPLE_BRANCH_STACK)
+ callchain_param.record_mode = CALLCHAIN_LBR;
else
callchain_param.record_mode = CALLCHAIN_FP;
}
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index cf524a3..64c8913 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -77,7 +77,7 @@ int parse_callchain_record_opt(const char *arg)
ret = 0;
} else
pr_err("callchain: No more arguments "
- "needed for -g fp\n");
+ "needed for --call-graph fp\n");
break;

#ifdef HAVE_DWARF_UNWIND_SUPPORT
@@ -97,6 +97,14 @@ int parse_callchain_record_opt(const char *arg)
callchain_param.dump_size = size;
}
#endif /* HAVE_DWARF_UNWIND_SUPPORT */
+ } else if (!strncmp(name, "lbr", sizeof("lbr"))) {
+ if (!strtok_r(NULL, ",", &saveptr)) {
+ callchain_param.record_mode = CALLCHAIN_LBR;
+ ret = 0;
+ } else
+ pr_err("callchain: No more arguments "
+ "needed for --call-graph lbr\n");
+ break;
} else {
pr_err("callchain: Unknown --call-graph option "
"value: %s\n", arg);
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index dbc08cf..b4b61d1 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -11,6 +11,7 @@ enum perf_call_graph_mode {
CALLCHAIN_NONE,
CALLCHAIN_FP,
CALLCHAIN_DWARF,
+ CALLCHAIN_LBR,
CALLCHAIN_MAX
};

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 1e90c85..3430bdf 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -537,13 +537,30 @@ int perf_evsel__group_desc(struct perf_evsel *evsel, char *buf, size_t size)
}

static void
-perf_evsel__config_callgraph(struct perf_evsel *evsel)
+perf_evsel__config_callgraph(struct perf_evsel *evsel,
+ struct record_opts *opts)
{
bool function = perf_evsel__is_function_event(evsel);
struct perf_event_attr *attr = &evsel->attr;

perf_evsel__set_sample_bit(evsel, CALLCHAIN);

+ if (callchain_param.record_mode == CALLCHAIN_LBR) {
+ if (!opts->branch_stack) {
+ if (attr->exclude_user) {
+ pr_warning("LBR callstack option is only available "
+ "to get user callchain information. "
+ "Falling back to framepointers.\n");
+ } else {
+ perf_evsel__set_sample_bit(evsel, BRANCH_STACK);
+ attr->branch_sample_type = PERF_SAMPLE_BRANCH_USER |
+ PERF_SAMPLE_BRANCH_CALL_STACK;
+ }
+ } else
+ pr_warning("Cannot use LBR callstack with branch stack. "
+ "Falling back to framepointers.\n");
+ }
+
if (callchain_param.record_mode == CALLCHAIN_DWARF) {
if (!function) {
perf_evsel__set_sample_bit(evsel, REGS_USER);
@@ -667,7 +684,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts)
evsel->attr.exclude_callchain_user = 1;

if (callchain_param.enabled && !evsel->no_aux_samples)
- perf_evsel__config_callgraph(evsel);
+ perf_evsel__config_callgraph(evsel, opts);

if (opts->sample_intr_regs) {
attr->sample_regs_intr = PERF_REGS_MASK;
--
1.8.3.2

2014-12-04 14:28:52

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH V5 0/3] perf tool: Haswell LBR call stack support (user)

On Tue, Dec 02, 2014 at 10:06:51AM -0500, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> This is the user space patch for Haswell LBR call stack support.
> For many profiling tasks we need the callgraph. For example we often
> need to see the caller of a lock or the caller of a memcpy or other
> library function to actually tune the program. Frame pointer unwinding
> is efficient and works well. But frame pointers are off by default on
> 64bit code (and on modern 32bit gccs), so there are many binaries around
> that do not use frame pointers. Profiling unchanged production code is
> very useful in practice. On some CPUs frame pointer also has a high
> cost. Dwarf2 unwinding also does not always work and is extremely slow
> (upto 20% overhead).
>
> Haswell has a new feature that utilizes the existing Last Branch Record
> facility to record call chains. When the feature is enabled, function
> call will be collected as normal, but as return instructions are
> executed the last captured branch record is popped from the on-chip LBR
> registers. The LBR call stack facility provides an alternative to get
> callgraph. It has some limitations too, but should work in most cases
> and is significantly faster than dwarf. Frame pointer unwinding is still
> the best default, but LBR call stack is a good alternative when nothing
> else works.
>
> Please find the kernel part patch at https://lkml.org/lkml/2014/11/6/432
>
> Changes since v1
> - Update help document
> - Force exclude_user to 0 with warning in LBR call stack
> - Dump both lbr and fp info when report -D
> - Reconstruct thread__resolve_callchain_sample and split it into two patches
> - Use has_branch_callstack function to check LBR call stack available
>
> Changes since v2
> - Rebase to 025ce5d33373
>
> Changes since v3
> - Rebase to cc502c23aadf
> - Separated function for lbr call stack sample resolve and print
> - Some minor changes according to comments
>
> Changes since V4
> - Rebase to 09a6a1b
> - Falling back to framepointers if LBR not available, and warning user

looks ok to me..

I'll test it once I get hands on Haswel server again, I guess we
wait for the kernel change to go in first anyway, right?

thanks,
jirka

2014-12-04 14:49:59

by Liang, Kan

[permalink] [raw]
Subject: RE: [PATCH V5 0/3] perf tool: Haswell LBR call stack support (user)



> On Tue, Dec 02, 2014 at 10:06:51AM -0500, [email protected] wrote:
> > From: Kan Liang <[email protected]>
> >
> > This is the user space patch for Haswell LBR call stack support.
> > For many profiling tasks we need the callgraph. For example we often
> > need to see the caller of a lock or the caller of a memcpy or other
> > library function to actually tune the program. Frame pointer unwinding
> > is efficient and works well. But frame pointers are off by default on
> > 64bit code (and on modern 32bit gccs), so there are many binaries
> > around that do not use frame pointers. Profiling unchanged production
> > code is very useful in practice. On some CPUs frame pointer also has a
> > high cost. Dwarf2 unwinding also does not always work and is extremely
> > slow (upto 20% overhead).
> >
> > Haswell has a new feature that utilizes the existing Last Branch
> > Record facility to record call chains. When the feature is enabled,
> > function call will be collected as normal, but as return instructions
> > are executed the last captured branch record is popped from the
> > on-chip LBR registers. The LBR call stack facility provides an
> > alternative to get callgraph. It has some limitations too, but should
> > work in most cases and is significantly faster than dwarf. Frame
> > pointer unwinding is still the best default, but LBR call stack is a
> > good alternative when nothing else works.
> >
> > Please find the kernel part patch at
> > https://lkml.org/lkml/2014/11/6/432
> >
> > Changes since v1
> > - Update help document
> > - Force exclude_user to 0 with warning in LBR call stack
> > - Dump both lbr and fp info when report -D
> > - Reconstruct thread__resolve_callchain_sample and split it into two
> > patches
> > - Use has_branch_callstack function to check LBR call stack available
> >
> > Changes since v2
> > - Rebase to 025ce5d33373
> >
> > Changes since v3
> > - Rebase to cc502c23aadf
> > - Separated function for lbr call stack sample resolve and print
> > - Some minor changes according to comments
> >
> > Changes since V4
> > - Rebase to 09a6a1b
> > - Falling back to framepointers if LBR not available, and warning
> > user
>
> looks ok to me..
>

Thanks for the review.

> I'll test it once I get hands on Haswel server again, I guess we wait for the
> kernel change to go in first anyway, right?
>

I'm not sure, let's ask Peter.

Peter?

Thanks,
Kan

> thanks,
> jirka

2014-12-04 15:51:50

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH V5 0/3] perf tool: Haswell LBR call stack support (user)

Em Thu, Dec 04, 2014 at 02:49:52PM +0000, Liang, Kan escreveu:
> Jiri Wrote:
> > looks ok to me..

> Thanks for the review.

> > I'll test it once I get hands on Haswel server again, I guess we wait for the
> > kernel change to go in first anyway, right?

> I'm not sure, let's ask Peter.

> Peter?

Would be good to go in one pull request, so that whoever pulls it has
the chance to test the kernel feature with the accompanying tooling bits.

- Arnaldo

2014-12-04 16:02:30

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH V5 0/3] perf tool: Haswell LBR call stack support (user)

On Thu, Dec 04, 2014 at 12:51:42PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Dec 04, 2014 at 02:49:52PM +0000, Liang, Kan escreveu:
> > Jiri Wrote:
> > > looks ok to me..
>
> > Thanks for the review.
>
> > > I'll test it once I get hands on Haswel server again, I guess we wait for the
> > > kernel change to go in first anyway, right?
>
> > I'm not sure, let's ask Peter.
>
> > Peter?
>
> Would be good to go in one pull request, so that whoever pulls it has
> the chance to test the kernel feature with the accompanying tooling bits.

also there's user part dependency on kernel.. soem new define IIRC

jirka

2014-12-04 16:23:06

by Liang, Kan

[permalink] [raw]
Subject: RE: [PATCH V5 0/3] perf tool: Haswell LBR call stack support (user)



>
> On Thu, Dec 04, 2014 at 12:51:42PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Thu, Dec 04, 2014 at 02:49:52PM +0000, Liang, Kan escreveu:
> > > Jiri Wrote:
> > > > looks ok to me..
> >
> > > Thanks for the review.
> >
> > > > I'll test it once I get hands on Haswel server again, I guess we
> > > > wait for the kernel change to go in first anyway, right?
> >
> > > I'm not sure, let's ask Peter.
> >
> > > Peter?
> >
> > Would be good to go in one pull request, so that whoever pulls it has
> > the chance to test the kernel feature with the accompanying tooling bits.
>
> also there's user part dependency on kernel.. soem new define IIRC
>

Oh, right. We have to let them go in together.

Hi Peter,

I have verified that the V8 kernel part patch is compatible with both latest
tip and latest perf/core. We don't need to rebase.
The user perf tool for tip is not updated yet. We need to merge latest
perf/core to tip before applying user part patch.

Do you have more comments for the code?

The latest kernel and user part codes are here.
https://lkml.org/lkml/2014/11/6/432
https://lkml.org/lkml/2014/12/2/396

Thanks,
Kan

2014-12-09 12:27:17

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH V5 0/3] perf tool: Haswell LBR call stack support (user)

Em Thu, Dec 04, 2014 at 04:18:56PM +0000, Liang, Kan escreveu:
>
>
> >
> > On Thu, Dec 04, 2014 at 12:51:42PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Thu, Dec 04, 2014 at 02:49:52PM +0000, Liang, Kan escreveu:
> > > > Jiri Wrote:
> > > > > looks ok to me..
> > >
> > > > Thanks for the review.
> > >
> > > > > I'll test it once I get hands on Haswel server again, I guess we
> > > > > wait for the kernel change to go in first anyway, right?
> > >
> > > > I'm not sure, let's ask Peter.
> > >
> > > > Peter?
> > >
> > > Would be good to go in one pull request, so that whoever pulls it has
> > > the chance to test the kernel feature with the accompanying tooling bits.
> >
> > also there's user part dependency on kernel.. soem new define IIRC
> >
>
> Oh, right. We have to let them go in together.
>
> Hi Peter,

The ones that are just prep patches I am merging now, Jiri, can I stick
an Acked-by to the non-LBR related ones?

> I have verified that the V8 kernel part patch is compatible with both latest
> tip and latest perf/core. We don't need to rebase.
> The user perf tool for tip is not updated yet. We need to merge latest
> perf/core to tip before applying user part patch.
>
> Do you have more comments for the code?
>
> The latest kernel and user part codes are here.
> https://lkml.org/lkml/2014/11/6/432
> https://lkml.org/lkml/2014/12/2/396
>
> Thanks,
> Kan

2014-12-09 12:53:25

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH V5 0/3] perf tool: Haswell LBR call stack support (user)

On Tue, Dec 09, 2014 at 09:27:08AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Dec 04, 2014 at 04:18:56PM +0000, Liang, Kan escreveu:
> >
> >
> > >
> > > On Thu, Dec 04, 2014 at 12:51:42PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > Em Thu, Dec 04, 2014 at 02:49:52PM +0000, Liang, Kan escreveu:
> > > > > Jiri Wrote:
> > > > > > looks ok to me..
> > > >
> > > > > Thanks for the review.
> > > >
> > > > > > I'll test it once I get hands on Haswel server again, I guess we
> > > > > > wait for the kernel change to go in first anyway, right?
> > > >
> > > > > I'm not sure, let's ask Peter.
> > > >
> > > > > Peter?
> > > >
> > > > Would be good to go in one pull request, so that whoever pulls it has
> > > > the chance to test the kernel feature with the accompanying tooling bits.
> > >
> > > also there's user part dependency on kernel.. soem new define IIRC
> > >
> >
> > Oh, right. We have to let them go in together.
> >
> > Hi Peter,
>
> The ones that are just prep patches I am merging now, Jiri, can I stick
> an Acked-by to the non-LBR related ones?

I guess u mean just this one?
2803 T Dec 02 kan.liang@intel (3.4K) ├─>[PATCH V5 2/3] perf tool: Move cpumode resolve code to add_callchain_ip

yep, ack

jirka

>
> > I have verified that the V8 kernel part patch is compatible with both latest
> > tip and latest perf/core. We don't need to rebase.
> > The user perf tool for tip is not updated yet. We need to merge latest
> > perf/core to tip before applying user part patch.
> >
> > Do you have more comments for the code?
> >
> > The latest kernel and user part codes are here.
> > https://lkml.org/lkml/2014/11/6/432
> > https://lkml.org/lkml/2014/12/2/396
> >
> > Thanks,
> > Kan

2014-12-09 13:11:15

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH V5 0/3] perf tool: Haswell LBR call stack support (user)

Em Tue, Dec 09, 2014 at 01:53:05PM +0100, Jiri Olsa escreveu:
> On Tue, Dec 09, 2014 at 09:27:08AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Thu, Dec 04, 2014 at 04:18:56PM +0000, Liang, Kan escreveu:
> > > > On Thu, Dec 04, 2014 at 12:51:42PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > Em Thu, Dec 04, 2014 at 02:49:52PM +0000, Liang, Kan escreveu:
> > > > > > Jiri Wrote:
> > > > > > > looks ok to me..
> > > > > > Thanks for the review.
> > > > > > > I'll test it once I get hands on Haswel server again, I guess we
> > > > > > > wait for the kernel change to go in first anyway, right?
> > > > > > I'm not sure, let's ask Peter.
> > > > > > Peter?
> > > > > Would be good to go in one pull request, so that whoever pulls it has
> > > > > the chance to test the kernel feature with the accompanying tooling bits.
> > > > also there's user part dependency on kernel.. soem new define IIRC
> > > Oh, right. We have to let them go in together.
> > The ones that are just prep patches I am merging now, Jiri, can I stick
> > an Acked-by to the non-LBR related ones?

> I guess u mean just this one?
> 2803 T Dec 02 kan.liang@intel (3.4K) ├─>[PATCH V5 2/3] perf tool: Move cpumode resolve code to add_callchain_ip

There is another I split from, iirc 1/3, that is unrelated to that
patch, fixing '-g fp' usage that became invalid after a patch from you:

https://git.kernel.org/cgit/linux/kernel/git/acme/linux.git/commit/?h=perf/core&id=f70b4e39de4ef25aade966c0dfc69cfb97091be9

> yep, ack
>
> jirka
>
> >
> > > I have verified that the V8 kernel part patch is compatible with both latest
> > > tip and latest perf/core. We don't need to rebase.
> > > The user perf tool for tip is not updated yet. We need to merge latest
> > > perf/core to tip before applying user part patch.
> > >
> > > Do you have more comments for the code?
> > >
> > > The latest kernel and user part codes are here.
> > > https://lkml.org/lkml/2014/11/6/432
> > > https://lkml.org/lkml/2014/12/2/396
> > >
> > > Thanks,
> > > Kan

2014-12-09 13:22:24

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH V5 0/3] perf tool: Haswell LBR call stack support (user)

On Tue, Dec 09, 2014 at 10:11:04AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Dec 09, 2014 at 01:53:05PM +0100, Jiri Olsa escreveu:
> > On Tue, Dec 09, 2014 at 09:27:08AM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Thu, Dec 04, 2014 at 04:18:56PM +0000, Liang, Kan escreveu:
> > > > > On Thu, Dec 04, 2014 at 12:51:42PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > > Em Thu, Dec 04, 2014 at 02:49:52PM +0000, Liang, Kan escreveu:
> > > > > > > Jiri Wrote:
> > > > > > > > looks ok to me..
> > > > > > > Thanks for the review.
> > > > > > > > I'll test it once I get hands on Haswel server again, I guess we
> > > > > > > > wait for the kernel change to go in first anyway, right?
> > > > > > > I'm not sure, let's ask Peter.
> > > > > > > Peter?
> > > > > > Would be good to go in one pull request, so that whoever pulls it has
> > > > > > the chance to test the kernel feature with the accompanying tooling bits.
> > > > > also there's user part dependency on kernel.. soem new define IIRC
> > > > Oh, right. We have to let them go in together.
> > > The ones that are just prep patches I am merging now, Jiri, can I stick
> > > an Acked-by to the non-LBR related ones?
>
> > I guess u mean just this one?
> > 2803 T Dec 02 kan.liang@intel (3.4K) ├─>[PATCH V5 2/3] perf tool: Move cpumode resolve code to add_callchain_ip
>
> There is another I split from, iirc 1/3, that is unrelated to that
> patch, fixing '-g fp' usage that became invalid after a patch from you:
>
> https://git.kernel.org/cgit/linux/kernel/git/acme/linux.git/commit/?h=perf/core&id=f70b4e39de4ef25aade966c0dfc69cfb97091be9

this onliner is ok, but I dont recall seeing this change separated..

and its 'Link' points to the whole 1/3 patch, which seems weird

what do I miss?

jirka

2014-12-09 13:27:19

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH V5 0/3] perf tool: Haswell LBR call stack support (user)

Em Tue, Dec 09, 2014 at 02:22:06PM +0100, Jiri Olsa escreveu:
> On Tue, Dec 09, 2014 at 10:11:04AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Dec 09, 2014 at 01:53:05PM +0100, Jiri Olsa escreveu:
> > > On Tue, Dec 09, 2014 at 09:27:08AM -0300, Arnaldo Carvalho de Melo wrote:
> > > > Em Thu, Dec 04, 2014 at 04:18:56PM +0000, Liang, Kan escreveu:
> > > > > > On Thu, Dec 04, 2014 at 12:51:42PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > > > Em Thu, Dec 04, 2014 at 02:49:52PM +0000, Liang, Kan escreveu:
> > > > > > > > Jiri Wrote:
> > > > > > > > > looks ok to me..
> > > > > > > > Thanks for the review.
> > > > > > > > > I'll test it once I get hands on Haswel server again, I guess we
> > > > > > > > > wait for the kernel change to go in first anyway, right?
> > > > > > > > I'm not sure, let's ask Peter.
> > > > > > > > Peter?
> > > > > > > Would be good to go in one pull request, so that whoever pulls it has
> > > > > > > the chance to test the kernel feature with the accompanying tooling bits.
> > > > > > also there's user part dependency on kernel.. soem new define IIRC
> > > > > Oh, right. We have to let them go in together.
> > > > The ones that are just prep patches I am merging now, Jiri, can I stick
> > > > an Acked-by to the non-LBR related ones?
> >
> > > I guess u mean just this one?
> > > 2803 T Dec 02 kan.liang@intel (3.4K) ├─>[PATCH V5 2/3] perf tool: Move cpumode resolve code to add_callchain_ip
> >
> > There is another I split from, iirc 1/3, that is unrelated to that
> > patch, fixing '-g fp' usage that became invalid after a patch from you:
> >
> > https://git.kernel.org/cgit/linux/kernel/git/acme/linux.git/commit/?h=perf/core&id=f70b4e39de4ef25aade966c0dfc69cfb97091be9
>
> this onliner is ok, but I dont recall seeing this change separated..

I did the separation here, I thought I could have your ack there as you
said you was ok with the whole patchkit, no?

> and its 'Link' points to the whole 1/3 patch, which seems weird
>
> what do I miss?

Well, that is why I added the comment just before my S-o-B :-) We could
have gone thru the whole process of me submitting a patchkit so that we
could have a proper Link:, but I thought it was too straightforward to
warrant that :-\

- Arnaldo

2014-12-09 13:33:53

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH V5 0/3] perf tool: Haswell LBR call stack support (user)

On Tue, Dec 09, 2014 at 10:27:12AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Dec 09, 2014 at 02:22:06PM +0100, Jiri Olsa escreveu:
> > On Tue, Dec 09, 2014 at 10:11:04AM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Tue, Dec 09, 2014 at 01:53:05PM +0100, Jiri Olsa escreveu:
> > > > On Tue, Dec 09, 2014 at 09:27:08AM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > Em Thu, Dec 04, 2014 at 04:18:56PM +0000, Liang, Kan escreveu:
> > > > > > > On Thu, Dec 04, 2014 at 12:51:42PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > > > > Em Thu, Dec 04, 2014 at 02:49:52PM +0000, Liang, Kan escreveu:
> > > > > > > > > Jiri Wrote:
> > > > > > > > > > looks ok to me..
> > > > > > > > > Thanks for the review.
> > > > > > > > > > I'll test it once I get hands on Haswel server again, I guess we
> > > > > > > > > > wait for the kernel change to go in first anyway, right?
> > > > > > > > > I'm not sure, let's ask Peter.
> > > > > > > > > Peter?
> > > > > > > > Would be good to go in one pull request, so that whoever pulls it has
> > > > > > > > the chance to test the kernel feature with the accompanying tooling bits.
> > > > > > > also there's user part dependency on kernel.. soem new define IIRC
> > > > > > Oh, right. We have to let them go in together.
> > > > > The ones that are just prep patches I am merging now, Jiri, can I stick
> > > > > an Acked-by to the non-LBR related ones?
> > >
> > > > I guess u mean just this one?
> > > > 2803 T Dec 02 kan.liang@intel (3.4K) ├─>[PATCH V5 2/3] perf tool: Move cpumode resolve code to add_callchain_ip
> > >
> > > There is another I split from, iirc 1/3, that is unrelated to that
> > > patch, fixing '-g fp' usage that became invalid after a patch from you:
> > >
> > > https://git.kernel.org/cgit/linux/kernel/git/acme/linux.git/commit/?h=perf/core&id=f70b4e39de4ef25aade966c0dfc69cfb97091be9
> >
> > this onliner is ok, but I dont recall seeing this change separated..
>
> I did the separation here, I thought I could have your ack there as you
> said you was ok with the whole patchkit, no?
>
> > and its 'Link' points to the whole 1/3 patch, which seems weird
> >
> > what do I miss?
>
> Well, that is why I added the comment just before my S-o-B :-) We could
> have gone thru the whole process of me submitting a patchkit so that we
> could have a proper Link:, but I thought it was too straightforward to
> warrant that :-\

ah ok.. u can keep the ack on the onliner of course

jirka

2014-12-11 22:22:04

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH V5 0/3] perf tool: Haswell LBR call stack support (user)

On Tue, Dec 02, 2014 at 10:06:51AM -0500, [email protected] wrote:

SNIP

>
> Please find the kernel part patch at https://lkml.org/lkml/2014/11/6/432
>
> Changes since v1
> - Update help document
> - Force exclude_user to 0 with warning in LBR call stack
> - Dump both lbr and fp info when report -D
> - Reconstruct thread__resolve_callchain_sample and split it into two patches
> - Use has_branch_callstack function to check LBR call stack available
>
> Changes since v2
> - Rebase to 025ce5d33373
>
> Changes since v3
> - Rebase to cc502c23aadf
> - Separated function for lbr call stack sample resolve and print
> - Some minor changes according to comments
>
> Changes since V4
> - Rebase to 09a6a1b
> - Falling back to framepointers if LBR not available, and warning user
>
> Kan Liang (3):
> perf tools: enable LBR call stack support
> perf tool: Move cpumode resolve code to add_callchain_ip
> perf tools: Construct LBR call chain

haven't reviewed the kernel part, but the user part
seems to work properly..

Tested-by: Jiri Olsa <[email protected]>

jirka

Subject: [tip:perf/urgent] perf callchain: Fixup parameter handling error message

Commit-ID: f70b4e39de4ef25aade966c0dfc69cfb97091be9
Gitweb: http://git.kernel.org/tip/f70b4e39de4ef25aade966c0dfc69cfb97091be9
Author: Kan Liang <[email protected]>
AuthorDate: Tue, 2 Dec 2014 10:06:52 -0500
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Tue, 9 Dec 2014 10:06:11 -0300

perf callchain: Fixup parameter handling error message

Fix up parse_callchain_record_opt error message for 'fp', in the past using '-g
fp' was a valid alternative to '--call-graph fp', which is not the case since:

commit 09b0fd45ff63413df94cbd832a765076b201edbb
Author: Jiri Olsa <[email protected]>
Date: Sat Oct 26 16:25:33 2013 +0200

perf record: Split -g and --call-graph

I.e. -g means "use the configured unwind data collection method" which has as
default 'fp', while --call-graph requires passing the method to use.

Signed-off-by: Kan Liang <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ split this from a larger patch related to LBR based unwinding ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/callchain.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index cf524a3..64b377e 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -77,7 +77,7 @@ int parse_callchain_record_opt(const char *arg)
ret = 0;
} else
pr_err("callchain: No more arguments "
- "needed for -g fp\n");
+ "needed for --call-graph fp\n");
break;

#ifdef HAVE_DWARF_UNWIND_SUPPORT

Subject: [tip:perf/urgent] perf callchain: Move cpumode resolve code to add_callchain_ip

Commit-ID: 2e77784bb7d882647c33d8e75a650625e6df0f8b
Gitweb: http://git.kernel.org/tip/2e77784bb7d882647c33d8e75a650625e6df0f8b
Author: Kan Liang <[email protected]>
AuthorDate: Tue, 2 Dec 2014 10:06:53 -0500
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Tue, 9 Dec 2014 10:06:29 -0300

perf callchain: Move cpumode resolve code to add_callchain_ip

Using flag to distinguish between branch_history and normal callchain.

Move the cpumode to add_callchain_ip function.

No change in behavior.

Signed-off-by: Kan Liang <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/machine.c | 72 +++++++++++++++++++++++------------------------
1 file changed, 35 insertions(+), 37 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 15dd0a9..94de3e4 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1385,19 +1385,46 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
static int add_callchain_ip(struct thread *thread,
struct symbol **parent,
struct addr_location *root_al,
- int cpumode,
+ bool branch_history,
u64 ip)
{
struct addr_location al;

al.filtered = 0;
al.sym = NULL;
- if (cpumode == -1)
+ if (branch_history)
thread__find_cpumode_addr_location(thread, MAP__FUNCTION,
ip, &al);
- else
+ else {
+ u8 cpumode = PERF_RECORD_MISC_USER;
+
+ if (ip >= PERF_CONTEXT_MAX) {
+ switch (ip) {
+ case PERF_CONTEXT_HV:
+ cpumode = PERF_RECORD_MISC_HYPERVISOR;
+ break;
+ case PERF_CONTEXT_KERNEL:
+ cpumode = PERF_RECORD_MISC_KERNEL;
+ break;
+ case PERF_CONTEXT_USER:
+ cpumode = PERF_RECORD_MISC_USER;
+ break;
+ default:
+ pr_debug("invalid callchain context: "
+ "%"PRId64"\n", (s64) ip);
+ /*
+ * It seems the callchain is corrupted.
+ * Discard all.
+ */
+ callchain_cursor_reset(&callchain_cursor);
+ return 1;
+ }
+ return 0;
+ }
thread__find_addr_location(thread, cpumode, MAP__FUNCTION,
ip, &al);
+ }
+
if (al.sym != NULL) {
if (sort__has_parent && !*parent &&
symbol__match_regex(al.sym, &parent_regex))
@@ -1480,11 +1507,8 @@ static int thread__resolve_callchain_sample(struct thread *thread,
struct addr_location *root_al,
int max_stack)
{
- u8 cpumode = PERF_RECORD_MISC_USER;
int chain_nr = min(max_stack, (int)chain->nr);
- int i;
- int j;
- int err;
+ int i, j, err;
int skip_idx = -1;
int first_call = 0;

@@ -1542,10 +1566,10 @@ static int thread__resolve_callchain_sample(struct thread *thread,

for (i = 0; i < nr; i++) {
err = add_callchain_ip(thread, parent, root_al,
- -1, be[i].to);
+ true, be[i].to);
if (!err)
err = add_callchain_ip(thread, parent, root_al,
- -1, be[i].from);
+ true, be[i].from);
if (err == -EINVAL)
break;
if (err)
@@ -1574,36 +1598,10 @@ check_calls:
#endif
ip = chain->ips[j];

- if (ip >= PERF_CONTEXT_MAX) {
- switch (ip) {
- case PERF_CONTEXT_HV:
- cpumode = PERF_RECORD_MISC_HYPERVISOR;
- break;
- case PERF_CONTEXT_KERNEL:
- cpumode = PERF_RECORD_MISC_KERNEL;
- break;
- case PERF_CONTEXT_USER:
- cpumode = PERF_RECORD_MISC_USER;
- break;
- default:
- pr_debug("invalid callchain context: "
- "%"PRId64"\n", (s64) ip);
- /*
- * It seems the callchain is corrupted.
- * Discard all.
- */
- callchain_cursor_reset(&callchain_cursor);
- return 0;
- }
- continue;
- }
+ err = add_callchain_ip(thread, parent, root_al, false, ip);

- err = add_callchain_ip(thread, parent, root_al,
- cpumode, ip);
- if (err == -EINVAL)
- break;
if (err)
- return err;
+ return (err < 0) ? err : 0;
}

return 0;