2015-11-06 06:45:12

by AKASHI Takahiro

[permalink] [raw]
Subject: [PATCH v5 0/6] arm64: ftrace: fix incorrect output from stack tracer

This is the fifth patch series for fixing stack tracer on arm64.
The original issue was reported by Jungseok[1], and then I found more
issues[2].

We don't have to care about the original issue because the root cause
(patch "ARM64: unwind: Fix PC calculation") has been reverted in v4.3.

I address here all the issues and implement fixes described in [2] except
for interrupt-triggered problems(II-3) and leaf function(II-5). Recent
discussions[3] about introducing a dedicated interrupt stack suggests that
we may avoid walking through from an interrupt stack to a process stack.
(So interrupt-stack patch is a prerequisite.)

Basically,
patch1 is a proactive improvement of function_graph tracer.
patch2 corresponds to II-4(functions under function_graph tracer).
patch3, 4 and 5 correspond to II-1(slurping stack) and II-2(differences
between x86 and arm64).
patch6 is a function prologue analyzer test. This won't attest
the correctness of the functionality, but it can suggest that all
the traced functions are treated properly by this function.
(Please note that patch3 has already been queued in Steven's for-next.)

I tested the code with v4.3 + Jungseok's patch v5[4].

Changes from v4:
- removed a patch("arm64: ftrace: adjust callsite addresses examined
by stack tracer")
- added a function prologue analyzer test(patch 6)

Changes from v3:
- fixed build errors/warnings reported by kbuild test robot
- addressed Steven's comments around check_stack()
- removed a patch ("arm64: ftrace: allow for tracing leaf functions")
I don't remember why I thought this was necessary, but anyhow "-pg" seems
to disable omit-leaf-stack-frame.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/354126.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355920.html
[3] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-September/368003.html
[4] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-October/378699.html

AKASHI Takahiro (6):
arm64: ftrace: modify a stack frame in a safe way
arm64: ftrace: fix a stack tracer's output under function graph
tracer
ftrace: allow arch-specific stack tracer
arm64: insn: add instruction decoders for ldp/stp and add/sub
arm64: ftrace: add arch-specific stack tracer
arm64: ftrace: add a test of function prologue analyzer

arch/arm64/include/asm/ftrace.h | 2 +
arch/arm64/include/asm/insn.h | 18 +++
arch/arm64/include/asm/stacktrace.h | 4 +
arch/arm64/kernel/ftrace.c | 75 +++++++++-
arch/arm64/kernel/insn.c | 102 ++++++++++++++
arch/arm64/kernel/stacktrace.c | 258 ++++++++++++++++++++++++++++++++++-
include/linux/ftrace.h | 10 ++
kernel/trace/trace_stack.c | 80 ++++++-----
8 files changed, 502 insertions(+), 47 deletions(-)

--
1.7.9.5


2015-11-06 06:45:28

by AKASHI Takahiro

[permalink] [raw]
Subject: [PATCH v5 1/6] arm64: ftrace: modify a stack frame in a safe way

Function graph tracer modifies a return address (LR) in a stack frame by
calling ftrace_prepare_return() in a traced function's function prologue.
The current code does this modification before preserving an original
address at ftrace_push_return_trace() and there is always a small window
of inconsistency when an interrupt occurs.

This doesn't matter, as far as an interrupt stack is introduced, because
stack tracer won't be invoked in an interrupt context. But it would be
better to proactively minimize such a window by moving the LR modification
after ftrace_push_return_trace().

Signed-off-by: AKASHI Takahiro <[email protected]>
---
arch/arm64/kernel/ftrace.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index c851be7..314f82d 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -125,23 +125,20 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
* on other archs. It's unlikely on AArch64.
*/
old = *parent;
- *parent = return_hooker;

trace.func = self_addr;
trace.depth = current->curr_ret_stack + 1;

/* Only trace if the calling function expects to */
- if (!ftrace_graph_entry(&trace)) {
- *parent = old;
+ if (!ftrace_graph_entry(&trace))
return;
- }

err = ftrace_push_return_trace(old, self_addr, &trace.depth,
frame_pointer);
- if (err == -EBUSY) {
- *parent = old;
+ if (err == -EBUSY)
return;
- }
+ else
+ *parent = return_hooker;
}

#ifdef CONFIG_DYNAMIC_FTRACE
--
1.7.9.5

2015-11-06 06:45:40

by AKASHI Takahiro

[permalink] [raw]
Subject: [PATCH v5 2/6] arm64: ftrace: fix a stack tracer's output under function graph tracer

Function graph tracer modifies a return address (LR) in a stack frame
to hook a function return. This will result in many useless entries
(return_to_handler) showing up in a stack tracer's output.

This patch replaces such entries with originals values preserved in
current->ret_stack[].

Signed-off-by: AKASHI Takahiro <[email protected]>
---
arch/arm64/include/asm/ftrace.h | 2 ++
arch/arm64/kernel/stacktrace.c | 21 +++++++++++++++++++++
2 files changed, 23 insertions(+)

diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index c5534fa..3c60f37 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -28,6 +28,8 @@ struct dyn_arch_ftrace {

extern unsigned long ftrace_graph_call;

+extern void return_to_handler(void);
+
static inline unsigned long ftrace_call_adjust(unsigned long addr)
{
/*
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index ccb6078..5fd3477 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -17,6 +17,7 @@
*/
#include <linux/kernel.h>
#include <linux/export.h>
+#include <linux/ftrace.h>
#include <linux/sched.h>
#include <linux/stacktrace.h>

@@ -73,6 +74,9 @@ struct stack_trace_data {
struct stack_trace *trace;
unsigned int no_sched_functions;
unsigned int skip;
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+ unsigned int ret_stack_index;
+#endif
};

static int save_trace(struct stackframe *frame, void *d)
@@ -81,6 +85,20 @@ static int save_trace(struct stackframe *frame, void *d)
struct stack_trace *trace = data->trace;
unsigned long addr = frame->pc;

+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+ if (addr == (unsigned long)return_to_handler - AARCH64_INSN_SIZE) {
+ /*
+ * This is a case where function graph tracer has
+ * modified a return address (LR) in a stack frame
+ * to hook a function return.
+ * So replace it to an original value.
+ */
+ frame->pc = addr =
+ current->ret_stack[data->ret_stack_index--].ret
+ - AARCH64_INSN_SIZE;
+ }
+#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
+
if (data->no_sched_functions && in_sched_functions(addr))
return 0;
if (data->skip) {
@@ -100,6 +118,9 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)

data.trace = trace;
data.skip = trace->skip;
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+ data.ret_stack_index = current->curr_ret_stack;
+#endif

if (tsk != current) {
data.no_sched_functions = 1;
--
1.7.9.5

2015-11-06 06:45:49

by AKASHI Takahiro

[permalink] [raw]
Subject: [PATCH v5 3/6] ftrace: allow arch-specific stack tracer

A stack frame may be used in a different way depending on cpu architecture.
Thus it is not always appropriate to slurp the stack contents, as current
check_stack() does, in order to calcurate a stack index (height) at a given
function call. At least not on arm64.
In addition, there is a possibility that we will mistakenly detect a stale
stack frame which has not been overwritten.

This patch makes check_stack() a weak function so as to later implement
arch-specific version.

Signed-off-by: AKASHI Takahiro <[email protected]>
---
include/linux/ftrace.h | 10 ++++++
kernel/trace/trace_stack.c | 80 ++++++++++++++++++++++++--------------------
2 files changed, 53 insertions(+), 37 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 6cd8c0e..fdc2ca5 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -263,7 +263,17 @@ static inline void ftrace_kill(void) { }
#endif /* CONFIG_FUNCTION_TRACER */

#ifdef CONFIG_STACK_TRACER
+#define STACK_TRACE_ENTRIES 500
+
+struct stack_trace;
+
+extern unsigned stack_trace_index[];
+extern struct stack_trace stack_trace_max;
+extern unsigned long stack_trace_max_size;
+extern arch_spinlock_t max_stack_lock;
+
extern int stack_tracer_enabled;
+void stack_trace_print(void);
int
stack_trace_sysctl(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp,
diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index 8abf1ba..e34392e 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -16,24 +16,22 @@

#include "trace.h"

-#define STACK_TRACE_ENTRIES 500
-
static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES+1] =
{ [0 ... (STACK_TRACE_ENTRIES)] = ULONG_MAX };
-static unsigned stack_dump_index[STACK_TRACE_ENTRIES];
+unsigned stack_trace_index[STACK_TRACE_ENTRIES];

/*
* Reserve one entry for the passed in ip. This will allow
* us to remove most or all of the stack size overhead
* added by the stack tracer itself.
*/
-static struct stack_trace max_stack_trace = {
+struct stack_trace stack_trace_max = {
.max_entries = STACK_TRACE_ENTRIES - 1,
.entries = &stack_dump_trace[0],
};

-static unsigned long max_stack_size;
-static arch_spinlock_t max_stack_lock =
+unsigned long stack_trace_max_size;
+arch_spinlock_t max_stack_lock =
(arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;

static DEFINE_PER_CPU(int, trace_active);
@@ -42,30 +40,38 @@ static DEFINE_MUTEX(stack_sysctl_mutex);
int stack_tracer_enabled;
static int last_stack_tracer_enabled;

-static inline void print_max_stack(void)
+void stack_trace_print(void)
{
long i;
int size;

pr_emerg(" Depth Size Location (%d entries)\n"
" ----- ---- --------\n",
- max_stack_trace.nr_entries);
+ stack_trace_max.nr_entries);

- for (i = 0; i < max_stack_trace.nr_entries; i++) {
+ for (i = 0; i < stack_trace_max.nr_entries; i++) {
if (stack_dump_trace[i] == ULONG_MAX)
break;
- if (i+1 == max_stack_trace.nr_entries ||
+ if (i+1 == stack_trace_max.nr_entries ||
stack_dump_trace[i+1] == ULONG_MAX)
- size = stack_dump_index[i];
+ size = stack_trace_index[i];
else
- size = stack_dump_index[i] - stack_dump_index[i+1];
+ size = stack_trace_index[i] - stack_trace_index[i+1];

- pr_emerg("%3ld) %8d %5d %pS\n", i, stack_dump_index[i],
+ pr_emerg("%3ld) %8d %5d %pS\n", i, stack_trace_index[i],
size, (void *)stack_dump_trace[i]);
}
}

-static inline void
+/*
+ * When arch-specific code overides this function, the following
+ * data should be filled up, assuming max_stack_lock is held to
+ * prevent concurrent updates.
+ * stack_trace_index[]
+ * stack_trace_max
+ * stack_trace_max_size
+ */
+void __weak
check_stack(unsigned long ip, unsigned long *stack)
{
unsigned long this_size, flags; unsigned long *p, *top, *start;
@@ -78,7 +84,7 @@ check_stack(unsigned long ip, unsigned long *stack)
/* Remove the frame of the tracer */
this_size -= frame_size;

- if (this_size <= max_stack_size)
+ if (this_size <= stack_trace_max_size)
return;

/* we do not handle interrupt stacks yet */
@@ -103,18 +109,18 @@ check_stack(unsigned long ip, unsigned long *stack)
this_size -= tracer_frame;

/* a race could have already updated it */
- if (this_size <= max_stack_size)
+ if (this_size <= stack_trace_max_size)
goto out;

- max_stack_size = this_size;
+ stack_trace_max_size = this_size;

- max_stack_trace.nr_entries = 0;
- max_stack_trace.skip = 3;
+ stack_trace_max.nr_entries = 0;
+ stack_trace_max.skip = 3;

- save_stack_trace(&max_stack_trace);
+ save_stack_trace(&stack_trace_max);

/* Skip over the overhead of the stack tracer itself */
- for (i = 0; i < max_stack_trace.nr_entries; i++) {
+ for (i = 0; i < stack_trace_max.nr_entries; i++) {
if (stack_dump_trace[i] == ip)
break;
}
@@ -134,18 +140,18 @@ check_stack(unsigned long ip, unsigned long *stack)
* loop will only happen once. This code only takes place
* on a new max, so it is far from a fast path.
*/
- while (i < max_stack_trace.nr_entries) {
+ while (i < stack_trace_max.nr_entries) {
int found = 0;

- stack_dump_index[x] = this_size;
+ stack_trace_index[x] = this_size;
p = start;

- for (; p < top && i < max_stack_trace.nr_entries; p++) {
+ for (; p < top && i < stack_trace_max.nr_entries; p++) {
if (stack_dump_trace[i] == ULONG_MAX)
break;
if (*p == stack_dump_trace[i]) {
stack_dump_trace[x] = stack_dump_trace[i++];
- this_size = stack_dump_index[x++] =
+ this_size = stack_trace_index[x++] =
(top - p) * sizeof(unsigned long);
found = 1;
/* Start the search from here */
@@ -160,7 +166,7 @@ check_stack(unsigned long ip, unsigned long *stack)
if (unlikely(!tracer_frame)) {
tracer_frame = (p - stack) *
sizeof(unsigned long);
- max_stack_size -= tracer_frame;
+ stack_trace_max_size -= tracer_frame;
}
}
}
@@ -169,12 +175,12 @@ check_stack(unsigned long ip, unsigned long *stack)
i++;
}

- max_stack_trace.nr_entries = x;
+ stack_trace_max.nr_entries = x;
for (; x < i; x++)
stack_dump_trace[x] = ULONG_MAX;

if (task_stack_end_corrupted(current)) {
- print_max_stack();
+ stack_trace_print();
BUG();
}

@@ -273,7 +279,7 @@ __next(struct seq_file *m, loff_t *pos)
{
long n = *pos - 1;

- if (n > max_stack_trace.nr_entries || stack_dump_trace[n] == ULONG_MAX)
+ if (n > stack_trace_max.nr_entries || stack_dump_trace[n] == ULONG_MAX)
return NULL;

m->private = (void *)n;
@@ -343,9 +349,9 @@ static int t_show(struct seq_file *m, void *v)
seq_printf(m, " Depth Size Location"
" (%d entries)\n"
" ----- ---- --------\n",
- max_stack_trace.nr_entries);
+ stack_trace_max.nr_entries);

- if (!stack_tracer_enabled && !max_stack_size)
+ if (!stack_tracer_enabled && !stack_trace_max_size)
print_disabled(m);

return 0;
@@ -353,17 +359,17 @@ static int t_show(struct seq_file *m, void *v)

i = *(long *)v;

- if (i >= max_stack_trace.nr_entries ||
+ if (i >= stack_trace_max.nr_entries ||
stack_dump_trace[i] == ULONG_MAX)
return 0;

- if (i+1 == max_stack_trace.nr_entries ||
+ if (i+1 == stack_trace_max.nr_entries ||
stack_dump_trace[i+1] == ULONG_MAX)
- size = stack_dump_index[i];
+ size = stack_trace_index[i];
else
- size = stack_dump_index[i] - stack_dump_index[i+1];
+ size = stack_trace_index[i] - stack_trace_index[i+1];

- seq_printf(m, "%3ld) %8d %5d ", i, stack_dump_index[i], size);
+ seq_printf(m, "%3ld) %8d %5d ", i, stack_trace_index[i], size);

trace_lookup_stack(m, i);

@@ -453,7 +459,7 @@ static __init int stack_trace_init(void)
return 0;

trace_create_file("stack_max_size", 0644, d_tracer,
- &max_stack_size, &stack_max_size_fops);
+ &stack_trace_max_size, &stack_max_size_fops);

trace_create_file("stack_trace", 0444, d_tracer,
NULL, &stack_trace_fops);
--
1.7.9.5

2015-11-06 06:45:56

by AKASHI Takahiro

[permalink] [raw]
Subject: [PATCH v5 4/6] arm64: insn: add instruction decoders for ldp/stp and add/sub

A function prologue analyzer is a requisite for implementing stack tracer
and getting better views of stack usages on arm64.
To implement a function prologue analyzer, we have to be able to decode,
at least, stp, add, sub and mov instructions.

This patch adds decoders for those instructions, that are used solely
by stack tracer for now, but generic enough for other uses.

Signed-off-by: AKASHI Takahiro <[email protected]>
---
arch/arm64/include/asm/insn.h | 18 ++++++++
arch/arm64/kernel/insn.c | 102 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 120 insertions(+)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 30e50eb..8d5c538 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -165,6 +165,8 @@ enum aarch64_insn_ldst_type {
AARCH64_INSN_LDST_STORE_PAIR_PRE_INDEX,
AARCH64_INSN_LDST_LOAD_PAIR_POST_INDEX,
AARCH64_INSN_LDST_STORE_PAIR_POST_INDEX,
+ AARCH64_INSN_LDST_LOAD_PAIR,
+ AARCH64_INSN_LDST_STORE_PAIR,
};

enum aarch64_insn_adsb_type {
@@ -225,6 +227,8 @@ static __always_inline u32 aarch64_insn_get_##abbr##_value(void) \

__AARCH64_INSN_FUNCS(str_reg, 0x3FE0EC00, 0x38206800)
__AARCH64_INSN_FUNCS(ldr_reg, 0x3FE0EC00, 0x38606800)
+__AARCH64_INSN_FUNCS(stp, 0x7FC00000, 0x29000000)
+__AARCH64_INSN_FUNCS(ldp, 0x7FC00000, 0x29400000)
__AARCH64_INSN_FUNCS(stp_post, 0x7FC00000, 0x28800000)
__AARCH64_INSN_FUNCS(ldp_post, 0x7FC00000, 0x28C00000)
__AARCH64_INSN_FUNCS(stp_pre, 0x7FC00000, 0x29800000)
@@ -277,6 +281,7 @@ __AARCH64_INSN_FUNCS(hint, 0xFFFFF01F, 0xD503201F)
__AARCH64_INSN_FUNCS(br, 0xFFFFFC1F, 0xD61F0000)
__AARCH64_INSN_FUNCS(blr, 0xFFFFFC1F, 0xD63F0000)
__AARCH64_INSN_FUNCS(ret, 0xFFFFFC1F, 0xD65F0000)
+__AARCH64_INSN_FUNCS(eret, 0xFFFFFFFF, 0xD69F00E0)

#undef __AARCH64_INSN_FUNCS

@@ -370,6 +375,19 @@ bool aarch32_insn_is_wide(u32 insn);
u32 aarch32_insn_extract_reg_num(u32 insn, int offset);
u32 aarch32_insn_mcr_extract_opc2(u32 insn);
u32 aarch32_insn_mcr_extract_crm(u32 insn);
+int aarch64_insn_decode_add_sub_imm(u32 insn,
+ enum aarch64_insn_register *dst,
+ enum aarch64_insn_register *src,
+ int *imm,
+ enum aarch64_insn_variant *variant,
+ enum aarch64_insn_adsb_type *type);
+int aarch64_insn_decode_load_store_pair(u32 insn,
+ enum aarch64_insn_register *reg1,
+ enum aarch64_insn_register *reg2,
+ enum aarch64_insn_register *base,
+ int *offset,
+ enum aarch64_insn_variant *variant,
+ enum aarch64_insn_ldst_type *type);
#endif /* __ASSEMBLY__ */

#endif /* __ASM_INSN_H */
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index c08b9ad..b56a66c 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -33,6 +33,7 @@
#include <asm/insn.h>

#define AARCH64_INSN_SF_BIT BIT(31)
+#define AARCH64_INSN_S_BIT BIT(29)
#define AARCH64_INSN_N_BIT BIT(22)

static int aarch64_insn_encoding_class[] = {
@@ -1141,3 +1142,104 @@ u32 aarch32_insn_mcr_extract_crm(u32 insn)
{
return insn & CRM_MASK;
}
+
+enum aarch64_insn_register aarch64_insn_extract_reg_num(u32 insn,
+ enum aarch64_insn_register_type type)
+{
+ int shift;
+
+ switch (type) {
+ case AARCH64_INSN_REGTYPE_RT:
+ case AARCH64_INSN_REGTYPE_RD:
+ shift = 0;
+ break;
+ case AARCH64_INSN_REGTYPE_RN:
+ shift = 5;
+ break;
+ case AARCH64_INSN_REGTYPE_RT2:
+ case AARCH64_INSN_REGTYPE_RA:
+ shift = 10;
+ break;
+ case AARCH64_INSN_REGTYPE_RM:
+ shift = 16;
+ break;
+ default:
+ pr_err("%s: unknown register type decoding %d\n", __func__,
+ type);
+ return ~0L;
+ }
+
+ return (insn & (GENMASK(4, 0) << shift)) >> shift;
+}
+
+int aarch64_insn_decode_add_sub_imm(u32 insn,
+ enum aarch64_insn_register *dst,
+ enum aarch64_insn_register *src,
+ int *imm,
+ enum aarch64_insn_variant *variant,
+ enum aarch64_insn_adsb_type *type)
+{
+ if (aarch64_insn_is_add_imm(insn))
+ *type = ((insn) & AARCH64_INSN_S_BIT) ?
+ AARCH64_INSN_ADSB_ADD_SETFLAGS :
+ AARCH64_INSN_ADSB_ADD;
+ else if (aarch64_insn_is_sub_imm(insn))
+ *type = ((insn) & AARCH64_INSN_S_BIT) ?
+ AARCH64_INSN_ADSB_SUB_SETFLAGS :
+ AARCH64_INSN_ADSB_SUB;
+ else
+ return 0;
+
+ *variant = (insn & AARCH64_INSN_SF_BIT) ? AARCH64_INSN_VARIANT_64BIT :
+ AARCH64_INSN_VARIANT_32BIT;
+
+ *dst = aarch64_insn_extract_reg_num(insn, AARCH64_INSN_REGTYPE_RD);
+
+ *src = aarch64_insn_extract_reg_num(insn, AARCH64_INSN_REGTYPE_RN);
+
+ /* TODO: ignore shilft field[23:22] */
+ *imm = (int)aarch64_insn_decode_immediate(AARCH64_INSN_IMM_12, insn);
+
+ return 1;
+}
+
+int aarch64_insn_decode_load_store_pair(u32 insn,
+ enum aarch64_insn_register *reg1,
+ enum aarch64_insn_register *reg2,
+ enum aarch64_insn_register *base,
+ int *offset,
+ enum aarch64_insn_variant *variant,
+ enum aarch64_insn_ldst_type *type)
+{
+ int imm;
+
+ if (aarch64_insn_is_stp(insn))
+ *type = AARCH64_INSN_LDST_STORE_PAIR;
+ else if (aarch64_insn_is_stp_post(insn))
+ *type = AARCH64_INSN_LDST_STORE_PAIR_POST_INDEX;
+ else if (aarch64_insn_is_stp_pre(insn))
+ *type = AARCH64_INSN_LDST_STORE_PAIR_PRE_INDEX;
+ else if (aarch64_insn_is_ldp(insn))
+ *type = AARCH64_INSN_LDST_LOAD_PAIR;
+ else if (aarch64_insn_is_ldp_post(insn))
+ *type = AARCH64_INSN_LDST_LOAD_PAIR_POST_INDEX;
+ else if (aarch64_insn_is_ldp_pre(insn))
+ *type = AARCH64_INSN_LDST_LOAD_PAIR_PRE_INDEX;
+ else
+ return 0;
+
+ *variant = (insn & AARCH64_INSN_S_BIT) ? AARCH64_INSN_VARIANT_64BIT :
+ AARCH64_INSN_VARIANT_32BIT;
+
+ *reg1 = aarch64_insn_extract_reg_num(insn, AARCH64_INSN_REGTYPE_RT);
+
+ *reg2 = aarch64_insn_extract_reg_num(insn, AARCH64_INSN_REGTYPE_RT2);
+
+ *base = aarch64_insn_extract_reg_num(insn, AARCH64_INSN_REGTYPE_RN);
+
+ imm = (int)aarch64_insn_decode_immediate(AARCH64_INSN_IMM_7, insn);
+ asm("sbfm %0, %0, 0, 6" : "+r" (imm));
+ *offset = imm * 8;
+
+ return 1;
+}
--
1.7.9.5

2015-11-06 06:46:04

by AKASHI Takahiro

[permalink] [raw]
Subject: [PATCH v5 5/6] arm64: ftrace: add arch-specific stack tracer

Stack tracer on arm64, check_stack(), is uniqeue in the following
points:
* analyze a function prologue of a traced function to estimate a more
accurate stack pointer value, replacing naive '<child's fp> + 0x10.'
* use walk_stackframe(), instead of slurping stack contents as orignal
check_stack() does, to identify a stack frame and a stack index (height)
for every callsite.

Regarding a function prologue analyzer, there is no guarantee that we can
handle all the possible patterns of function prologue as gcc does not use
any fixed templates to generate them. 'Instruction scheduling' is another
issue here.
Nevertheless, the current version will surely cover almost all the cases
in the kernel image and give us useful information on stack pointers.

So this patch utilizes a function prologue only for stack tracer, and
does not affect the behaviors of existing stacktrace functions.

Signed-off-by: AKASHI Takahiro <[email protected]>
---
arch/arm64/include/asm/stacktrace.h | 4 +
arch/arm64/kernel/ftrace.c | 64 ++++++++++++
arch/arm64/kernel/stacktrace.c | 185 ++++++++++++++++++++++++++++++++++-
3 files changed, 250 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index 7318f6d..47b4832 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -25,5 +25,9 @@ struct stackframe {
extern int unwind_frame(struct stackframe *frame);
extern void walk_stackframe(struct stackframe *frame,
int (*fn)(struct stackframe *, void *), void *data);
+#ifdef CONFIG_STACK_TRACER
+struct stack_trace;
+extern void save_stack_trace_sp(struct stack_trace *trace, unsigned long *sp);
+#endif

#endif /* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index 314f82d..46bfe37 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -9,6 +9,7 @@
* published by the Free Software Foundation.
*/

+#include <linux/bug.h>
#include <linux/ftrace.h>
#include <linux/swab.h>
#include <linux/uaccess.h>
@@ -16,6 +17,7 @@
#include <asm/cacheflush.h>
#include <asm/ftrace.h>
#include <asm/insn.h>
+#include <asm/stacktrace.h>

#ifdef CONFIG_DYNAMIC_FTRACE
/*
@@ -173,3 +175,65 @@ int ftrace_disable_ftrace_graph_caller(void)
}
#endif /* CONFIG_DYNAMIC_FTRACE */
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
+
+#ifdef CONFIG_STACK_TRACER
+static unsigned long stack_trace_sp[STACK_TRACE_ENTRIES];
+static unsigned long raw_stack_trace_max_size;
+
+void check_stack(unsigned long ip, unsigned long *stack)
+{
+ unsigned long this_size, flags;
+ unsigned long top;
+ int i, j;
+
+ this_size = ((unsigned long)stack) & (THREAD_SIZE-1);
+ this_size = THREAD_SIZE - this_size;
+
+ if (this_size <= raw_stack_trace_max_size)
+ return;
+
+ /* we do not handle an interrupt stack yet */
+ if (!object_is_on_stack(stack))
+ return;
+
+ local_irq_save(flags);
+ arch_spin_lock(&max_stack_lock);
+
+ /* check again */
+ if (this_size <= raw_stack_trace_max_size)
+ goto out;
+
+ /* find out stack frames */
+ stack_trace_max.nr_entries = 0;
+ stack_trace_max.skip = 0;
+ save_stack_trace_sp(&stack_trace_max, stack_trace_sp);
+ stack_trace_max.nr_entries--; /* for the last entry ('-1') */
+
+ /* calculate a stack index for each function */
+ top = ((unsigned long)stack & ~(THREAD_SIZE-1)) + THREAD_SIZE;
+ for (i = 0; i < stack_trace_max.nr_entries; i++)
+ stack_trace_index[i] = top - stack_trace_sp[i];
+ raw_stack_trace_max_size = this_size;
+
+ /* Skip over the overhead of the stack tracer itself */
+ for (i = 0; i < stack_trace_max.nr_entries; i++)
+ if (stack_trace_max.entries[i] == ip)
+ break;
+
+ stack_trace_max.nr_entries -= i;
+ for (j = 0; j < stack_trace_max.nr_entries; j++) {
+ stack_trace_index[j] = stack_trace_index[j + i];
+ stack_trace_max.entries[j] = stack_trace_max.entries[j + i];
+ }
+ stack_trace_max_size = stack_trace_index[0];
+
+ if (task_stack_end_corrupted(current)) {
+ WARN(1, "task stack is corrupted.\n");
+ stack_trace_print();
+ }
+
+ out:
+ arch_spin_unlock(&max_stack_lock);
+ local_irq_restore(flags);
+}
+#endif /* CONFIG_STACK_TRACER */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 5fd3477..4ee052a 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -23,6 +23,144 @@

#include <asm/stacktrace.h>

+#ifdef CONFIG_STACK_TRACER
+/*
+ * This function parses a function prologue of a traced function and
+ * determines its stack size.
+ * A return value indicates a location of @pc in a function prologue.
+ * @return value:
+ * <case 1> <case 1'>
+ * 1:
+ * sub sp, sp, #XX sub sp, sp, #XX
+ * 2:
+ * stp x29, x30, [sp, #YY] stp x29, x30, [sp, #--ZZ]!
+ * 3:
+ * add x29, sp, #YY mov x29, sp
+ * 0:
+ *
+ * <case 2>
+ * 1:
+ * stp x29, x30, [sp, #-XX]!
+ * 3:
+ * mov x29, sp
+ * 0:
+ *
+ * @size: sp offset from calller's sp (XX or XX + ZZ)
+ * @size2: fp offset from new sp (YY or 0)
+ */
+static int analyze_function_prologue(unsigned long pc,
+ unsigned long *size, unsigned long *size2)
+{
+ unsigned long offset;
+ u32 *addr, insn;
+ int pos = -1;
+ enum aarch64_insn_register src, dst, reg1, reg2, base;
+ int imm;
+ enum aarch64_insn_variant variant;
+ enum aarch64_insn_adsb_type adsb_type;
+ enum aarch64_insn_ldst_type ldst_type;
+
+ *size = *size2 = 0;
+
+ if (!pc)
+ goto out;
+
+ if (unlikely(!kallsyms_lookup_size_offset(pc, NULL, &offset)))
+ goto out;
+
+ addr = (u32 *)(pc - offset);
+#ifdef CONFIG_DYNAMIC_FTRACE
+ if (addr == (u32 *)ftrace_call)
+ addr = (u32 *)ftrace_caller;
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+ else if (addr == (u32 *)ftrace_graph_caller)
+#ifdef CONFIG_DYNAMIC_FTRACE
+ addr = (u32 *)ftrace_caller;
+#else
+ addr = (u32 *)_mcount;
+#endif
+#endif
+#endif
+
+ insn = *addr;
+ pos = 1;
+
+ /* analyze a function prologue */
+ while ((unsigned long)addr < pc) {
+ if (aarch64_insn_is_branch_imm(insn) ||
+ aarch64_insn_is_br(insn) ||
+ aarch64_insn_is_blr(insn) ||
+ aarch64_insn_is_ret(insn) ||
+ aarch64_insn_is_eret(insn))
+ /* exiting a basic block */
+ goto out;
+
+ if (aarch64_insn_decode_add_sub_imm(insn, &dst, &src,
+ &imm, &variant, &adsb_type)) {
+ if ((adsb_type == AARCH64_INSN_ADSB_SUB) &&
+ (dst == AARCH64_INSN_REG_SP) &&
+ (src == AARCH64_INSN_REG_SP)) {
+ /*
+ * Starting the following sequence:
+ * sub sp, sp, #xx
+ * stp x29, x30, [sp, #yy]
+ * add x29, sp, #yy
+ */
+ WARN_ON(pos != 1);
+ pos = 2;
+ *size += imm;
+ } else if ((adsb_type == AARCH64_INSN_ADSB_ADD) &&
+ (dst == AARCH64_INSN_REG_29) &&
+ (src == AARCH64_INSN_REG_SP)) {
+ /*
+ * add x29, sp, #yy
+ * or
+ * mov x29, sp
+ */
+ WARN_ON(pos != 3);
+ pos = 0;
+ *size2 = imm;
+
+ break;
+ }
+ } else if (aarch64_insn_decode_load_store_pair(insn,
+ &reg1, &reg2, &base, &imm,
+ &variant, &ldst_type)) {
+ if ((ldst_type ==
+ AARCH64_INSN_LDST_STORE_PAIR_PRE_INDEX) &&
+ (reg1 == AARCH64_INSN_REG_29) &&
+ (reg2 == AARCH64_INSN_REG_30) &&
+ (base == AARCH64_INSN_REG_SP)) {
+ /*
+ * Starting the following sequence:
+ * stp x29, x30, [sp, #-xx]!
+ * mov x29, sp
+ */
+ WARN_ON(!((pos == 1) || (pos == 2)));
+ pos = 3;
+ *size += -imm;
+ } else if ((ldst_type ==
+ AARCH64_INSN_LDST_STORE_PAIR) &&
+ (reg1 == AARCH64_INSN_REG_29) &&
+ (reg2 == AARCH64_INSN_REG_30) &&
+ (base == AARCH64_INSN_REG_SP)) {
+ /*
+ * stp x29, x30, [sp, #yy]
+ */
+ WARN_ON(pos != 2);
+ pos = 3;
+ }
+ }
+
+ addr++;
+ insn = *addr;
+ }
+
+out:
+ return pos;
+}
+#endif
+
/*
* AArch64 PCS assigns the frame pointer to x29.
*
@@ -77,6 +215,9 @@ struct stack_trace_data {
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
unsigned int ret_stack_index;
#endif
+#ifdef CONFIG_STACK_TRACER
+ unsigned long *sp;
+#endif
};

static int save_trace(struct stackframe *frame, void *d)
@@ -106,12 +247,33 @@ static int save_trace(struct stackframe *frame, void *d)
return 0;
}

+#ifdef CONFIG_STACK_TRACER
+ if (data->sp) {
+ if (trace->nr_entries) {
+ unsigned long child_pc, sp_off, fp_off;
+ int pos;
+
+ child_pc = trace->entries[trace->nr_entries - 1];
+ pos = analyze_function_prologue(child_pc,
+ &sp_off, &fp_off);
+ /*
+ * frame->sp - 0x10 is actually a child's fp.
+ * See above.
+ */
+ data->sp[trace->nr_entries] = (pos < 0 ? frame->sp :
+ (frame->sp - 0x10) + sp_off - fp_off);
+ } else {
+ data->sp[0] = frame->sp;
+ }
+ }
+#endif
trace->entries[trace->nr_entries++] = addr;

return trace->nr_entries >= trace->max_entries;
}

-void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
+static void __save_stack_trace_tsk(struct task_struct *tsk,
+ struct stack_trace *trace, unsigned long *stack_dump_sp)
{
struct stack_trace_data data;
struct stackframe frame;
@@ -121,6 +283,9 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
data.ret_stack_index = current->curr_ret_stack;
#endif
+#ifdef CONFIG_STACK_TRACER
+ data.sp = stack_dump_sp;
+#endif

if (tsk != current) {
data.no_sched_functions = 1;
@@ -131,7 +296,8 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
data.no_sched_functions = 0;
frame.fp = (unsigned long)__builtin_frame_address(0);
frame.sp = current_stack_pointer;
- frame.pc = (unsigned long)save_stack_trace_tsk;
+ asm("1:");
+ asm("ldr %0, =1b" : "=r" (frame.pc));
}

walk_stackframe(&frame, save_trace, &data);
@@ -139,9 +305,22 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
trace->entries[trace->nr_entries++] = ULONG_MAX;
}

+void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
+{
+ __save_stack_trace_tsk(tsk, trace, NULL);
+}
+
void save_stack_trace(struct stack_trace *trace)
{
- save_stack_trace_tsk(current, trace);
+ __save_stack_trace_tsk(current, trace, NULL);
}
EXPORT_SYMBOL_GPL(save_stack_trace);
+
+#ifdef CONFIG_STACK_TRACER
+void save_stack_trace_sp(struct stack_trace *trace,
+ unsigned long *stack_dump_sp)
+{
+ __save_stack_trace_tsk(current, trace, stack_dump_sp);
+}
+#endif /* CONFIG_STACK_TRACER */
#endif
--
1.7.9.5

2015-11-06 06:46:15

by AKASHI Takahiro

[permalink] [raw]
Subject: [PATCH v5 6/6] arm64: ftrace: add a test of function prologue analyzer

The patch ("arm64: ftrace: add arch-specific stack tracer") introduced
a function prologue analyzer.

Given that there is no fixed template for a function prologue, at least
on gcc for aarch64, a function prologue analyzer may be rather heuristic.
So this patch adds a kernel command line option,
function_prologue_analyzer_test, in order to run a basic test at startup
by executing an analyzer against all the *traceable* functions.

For function_prologue_analyzer_test=2, the output looks like:

po sp fp symbol
== == == ======
0: 0 0x040 0x000 gic_handle_irq+0x20/0xa4
1: 0 0x040 0x000 gic_handle_irq+0x34/0x114
2: 0 0x030 0x000 run_init_process+0x14/0x48
3: 0 0x020 0x000 try_to_run_init_process+0x14/0x58
4: 0 0x080 0x000 do_one_initcall+0x1c/0x194
...
22959: 0 0x020 0x000 tty_lock_slave+0x14/0x3c
22960: 0 0x020 0x000 tty_unlock_slave+0x14/0x3c
function prologue analyzer test: 0 errors

"po" indicates a position of callsite of mcount(), and should be zero
if an analyzer has parsed a function prologue successfully and reached
a location where fp is properly updated.
"sp" is a final offset to a parent's fp at the exit of function prologue.
"fp" is also an ofset against sp at the exit of function prologue.
So normally,
<new sp> = <old fp> + <"sp">
<new fp> = <new sp> - <"fp">

And the last line shows the number of possible errors in the result.

For function_prologue_analyzer_test=1, only the last line will be shown.

Signed-off-by: AKASHI Takahiro <[email protected]>
---
arch/arm64/kernel/stacktrace.c | 52 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 4ee052a..d1a9e5b 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -18,6 +18,7 @@
#include <linux/kernel.h>
#include <linux/export.h>
#include <linux/ftrace.h>
+#include <linux/kallsyms.h>
#include <linux/sched.h>
#include <linux/stacktrace.h>

@@ -322,5 +323,56 @@ void save_stack_trace_sp(struct stack_trace *trace,
{
__save_stack_trace_tsk(current, trace, stack_dump_sp);
}
+
+static int start_analyzer_test __initdata;
+
+static int __init enable_analyzer_test(char *str)
+{
+ get_option(&str, &start_analyzer_test);
+ return 0;
+}
+early_param("function_prologue_analyzer_test", enable_analyzer_test);
+
+static void __init do_test_function_prologue_analyzer(void)
+{
+ extern unsigned long __start_mcount_loc[];
+ extern unsigned long __stop_mcount_loc[];
+ unsigned long count, i, errors;
+ int print_once;
+
+ count = __stop_mcount_loc - __start_mcount_loc;
+ errors = print_once = 0;
+ for (i = 0; i < count; i++) {
+ unsigned long addr, sp_off, fp_off;
+ int pos;
+ bool check;
+ char buf[60];
+
+ addr = __start_mcount_loc[i];
+ pos = analyze_function_prologue(addr, &sp_off, &fp_off);
+ check = ((pos != 0) || !sp_off || (sp_off <= fp_off));
+ if (check)
+ errors++;
+ if (check || (start_analyzer_test > 1)) {
+ if (!print_once) {
+ pr_debug(" po sp fp symbol\n");
+ pr_debug(" == == == ======\n");
+ print_once++;
+ }
+ sprint_symbol(buf, addr);
+ pr_debug("%5ld: %d 0x%03lx 0x%03lx %s\n",
+ i, pos, sp_off, fp_off, buf);
+ }
+ }
+ pr_debug("function prologue analyzer test: %ld errors\n", errors);
+}
+
+static int __init test_function_prologue_analyzer(void)
+{
+ if (start_analyzer_test)
+ do_test_function_prologue_analyzer();
+ return 0;
+}
+late_initcall(test_function_prologue_analyzer);
#endif /* CONFIG_STACK_TRACER */
#endif
--
1.7.9.5

2015-11-06 13:39:49

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v5 3/6] ftrace: allow arch-specific stack tracer

On Fri, 6 Nov 2015 15:44:42 +0900
AKASHI Takahiro <[email protected]> wrote:

> A stack frame may be used in a different way depending on cpu architecture.
> Thus it is not always appropriate to slurp the stack contents, as current
> check_stack() does, in order to calcurate a stack index (height) at a given
> function call. At least not on arm64.
> In addition, there is a possibility that we will mistakenly detect a stale
> stack frame which has not been overwritten.
>
> This patch makes check_stack() a weak function so as to later implement
> arch-specific version.
>
> Signed-off-by: AKASHI Takahiro <[email protected]>

Note, I already accepted this patch. I'll be pushing it to Linus today.

I also added a patch on top of it to rename max_stack_lock to
stack_trace_max_lock to stay consistent with the other global variables
used in this file. You may need to update your code to handle that.

-- Steve

2015-11-09 14:04:19

by Jungseok Lee

[permalink] [raw]
Subject: Re: [PATCH v5 2/6] arm64: ftrace: fix a stack tracer's output under function graph tracer

On Nov 6, 2015, at 3:44 PM, AKASHI Takahiro wrote:

Hi Akashi,

> Function graph tracer modifies a return address (LR) in a stack frame
> to hook a function return. This will result in many useless entries
> (return_to_handler) showing up in a stack tracer's output.
>
> This patch replaces such entries with originals values preserved in
> current->ret_stack[].
>
> Signed-off-by: AKASHI Takahiro <[email protected]>
> ---
> arch/arm64/include/asm/ftrace.h | 2 ++
> arch/arm64/kernel/stacktrace.c | 21 +++++++++++++++++++++
> 2 files changed, 23 insertions(+)
>
> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
> index c5534fa..3c60f37 100644
> --- a/arch/arm64/include/asm/ftrace.h
> +++ b/arch/arm64/include/asm/ftrace.h
> @@ -28,6 +28,8 @@ struct dyn_arch_ftrace {
>
> extern unsigned long ftrace_graph_call;
>
> +extern void return_to_handler(void);
> +
> static inline unsigned long ftrace_call_adjust(unsigned long addr)
> {
> /*
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index ccb6078..5fd3477 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -17,6 +17,7 @@
> */
> #include <linux/kernel.h>
> #include <linux/export.h>
> +#include <linux/ftrace.h>
> #include <linux/sched.h>
> #include <linux/stacktrace.h>
>
> @@ -73,6 +74,9 @@ struct stack_trace_data {
> struct stack_trace *trace;
> unsigned int no_sched_functions;
> unsigned int skip;
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> + unsigned int ret_stack_index;
> +#endif
> };
>
> static int save_trace(struct stackframe *frame, void *d)
> @@ -81,6 +85,20 @@ static int save_trace(struct stackframe *frame, void *d)
> struct stack_trace *trace = data->trace;
> unsigned long addr = frame->pc;
>
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> + if (addr == (unsigned long)return_to_handler - AARCH64_INSN_SIZE) {

not if (adds == (unsigned long)return_to_handler)?

> + /*
> + * This is a case where function graph tracer has
> + * modified a return address (LR) in a stack frame
> + * to hook a function return.
> + * So replace it to an original value.
> + */
> + frame->pc = addr =
> + current->ret_stack[data->ret_stack_index--].ret
> + - AARCH64_INSN_SIZE;

Ditto. not without AARCH64_INSN_SIZE?

I've observed many return_to_handler without the changes.
Am I missing something?

> + }
> +#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> +
> if (data->no_sched_functions && in_sched_functions(addr))
> return 0;
> if (data->skip) {
> @@ -100,6 +118,9 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
>
> data.trace = trace;
> data.skip = trace->skip;
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> + data.ret_stack_index = current->curr_ret_stack;

Can I get an idea on why current->curr_ret_stack is used instead of
tsk->curr_ret_stack?

Best Regards
Jungseok Lee-

2015-11-09 14:24:19

by Jungseok Lee

[permalink] [raw]
Subject: Re: [PATCH v5 0/6] arm64: ftrace: fix incorrect output from stack tracer

On Nov 6, 2015, at 3:44 PM, AKASHI Takahiro wrote:

Hi Akashi,

> This is the fifth patch series for fixing stack tracer on arm64.
> The original issue was reported by Jungseok[1], and then I found more
> issues[2].
>
> We don't have to care about the original issue because the root cause
> (patch "ARM64: unwind: Fix PC calculation") has been reverted in v4.3.
>
> I address here all the issues and implement fixes described in [2] except
> for interrupt-triggered problems(II-3) and leaf function(II-5). Recent
> discussions[3] about introducing a dedicated interrupt stack suggests that
> we may avoid walking through from an interrupt stack to a process stack.
> (So interrupt-stack patch is a prerequisite.)
>
> Basically,
> patch1 is a proactive improvement of function_graph tracer.
> patch2 corresponds to II-4(functions under function_graph tracer).
> patch3, 4 and 5 correspond to II-1(slurping stack) and II-2(differences
> between x86 and arm64).
> patch6 is a function prologue analyzer test. This won't attest
> the correctness of the functionality, but it can suggest that all
> the traced functions are treated properly by this function.
> (Please note that patch3 has already been queued in Steven's for-next.)
>
> I tested the code with v4.3 + Jungseok's patch v5[4].

I've played this series with IRQ stack patch and it works well at least
on my system! In addition to this condition, I've run these changes without
IRQ stack since it is in progress. I could observe a single strange behavior,
minus stack size around elX_irq. Am I missing something?

My reproduction scenario is simple.

$ sudo echo 1 > /proc/sys/kernel/stack_trace_enabled
$ sudo echo function_graph > /sys/kernel/debug/tracing/current_tracer
$ [ Run any workload ]
$ sudo cat /sys/kernel/debug/tracing/stack_trace

Best Regards
Jungseok Lee

2015-11-10 02:42:40

by AKASHI Takahiro

[permalink] [raw]
Subject: Re: [PATCH v5 2/6] arm64: ftrace: fix a stack tracer's output under function graph tracer

On 11/09/2015 11:04 PM, Jungseok Lee wrote:
> On Nov 6, 2015, at 3:44 PM, AKASHI Takahiro wrote:
>
> Hi Akashi,
>
>> Function graph tracer modifies a return address (LR) in a stack frame
>> to hook a function return. This will result in many useless entries
>> (return_to_handler) showing up in a stack tracer's output.
>>
>> This patch replaces such entries with originals values preserved in
>> current->ret_stack[].
>>
>> Signed-off-by: AKASHI Takahiro <[email protected]>
>> ---
>> arch/arm64/include/asm/ftrace.h | 2 ++
>> arch/arm64/kernel/stacktrace.c | 21 +++++++++++++++++++++
>> 2 files changed, 23 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
>> index c5534fa..3c60f37 100644
>> --- a/arch/arm64/include/asm/ftrace.h
>> +++ b/arch/arm64/include/asm/ftrace.h
>> @@ -28,6 +28,8 @@ struct dyn_arch_ftrace {
>>
>> extern unsigned long ftrace_graph_call;
>>
>> +extern void return_to_handler(void);
>> +
>> static inline unsigned long ftrace_call_adjust(unsigned long addr)
>> {
>> /*
>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
>> index ccb6078..5fd3477 100644
>> --- a/arch/arm64/kernel/stacktrace.c
>> +++ b/arch/arm64/kernel/stacktrace.c
>> @@ -17,6 +17,7 @@
>> */
>> #include <linux/kernel.h>
>> #include <linux/export.h>
>> +#include <linux/ftrace.h>
>> #include <linux/sched.h>
>> #include <linux/stacktrace.h>
>>
>> @@ -73,6 +74,9 @@ struct stack_trace_data {
>> struct stack_trace *trace;
>> unsigned int no_sched_functions;
>> unsigned int skip;
>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
>> + unsigned int ret_stack_index;
>> +#endif
>> };
>>
>> static int save_trace(struct stackframe *frame, void *d)
>> @@ -81,6 +85,20 @@ static int save_trace(struct stackframe *frame, void *d)
>> struct stack_trace *trace = data->trace;
>> unsigned long addr = frame->pc;
>>
>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
>> + if (addr == (unsigned long)return_to_handler - AARCH64_INSN_SIZE) {
>
> not if (adds == (unsigned long)return_to_handler)?
>
>> + /*
>> + * This is a case where function graph tracer has
>> + * modified a return address (LR) in a stack frame
>> + * to hook a function return.
>> + * So replace it to an original value.
>> + */
>> + frame->pc = addr =
>> + current->ret_stack[data->ret_stack_index--].ret
>> + - AARCH64_INSN_SIZE;
>
> Ditto. not without AARCH64_INSN_SIZE?
>
> I've observed many return_to_handler without the changes.
> Am I missing something?

You're right!
I thought I had tested the patches, but...

>> + }
>> +#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
>> +
>> if (data->no_sched_functions && in_sched_functions(addr))
>> return 0;
>> if (data->skip) {
>> @@ -100,6 +118,9 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
>>
>> data.trace = trace;
>> data.skip = trace->skip;
>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
>> + data.ret_stack_index = current->curr_ret_stack;
>
> Can I get an idea on why current->curr_ret_stack is used instead of
> tsk->curr_ret_stack?

Thanks for pointing this out.
Will fix it although it works without a change since save_stack_trace_sp() is
called only in a 'current task' context.

-Takahiro AKASHI

> Best Regards
> Jungseok Lee
>

2015-11-10 02:58:32

by AKASHI Takahiro

[permalink] [raw]
Subject: Re: [PATCH v5 0/6] arm64: ftrace: fix incorrect output from stack tracer

On 11/09/2015 11:24 PM, Jungseok Lee wrote:
> On Nov 6, 2015, at 3:44 PM, AKASHI Takahiro wrote:
>
> Hi Akashi,
>
>> This is the fifth patch series for fixing stack tracer on arm64.
>> The original issue was reported by Jungseok[1], and then I found more
>> issues[2].
>>
>> We don't have to care about the original issue because the root cause
>> (patch "ARM64: unwind: Fix PC calculation") has been reverted in v4.3.
>>
>> I address here all the issues and implement fixes described in [2] except
>> for interrupt-triggered problems(II-3) and leaf function(II-5). Recent
>> discussions[3] about introducing a dedicated interrupt stack suggests that
>> we may avoid walking through from an interrupt stack to a process stack.
>> (So interrupt-stack patch is a prerequisite.)
>>
>> Basically,
>> patch1 is a proactive improvement of function_graph tracer.
>> patch2 corresponds to II-4(functions under function_graph tracer).
>> patch3, 4 and 5 correspond to II-1(slurping stack) and II-2(differences
>> between x86 and arm64).
>> patch6 is a function prologue analyzer test. This won't attest
>> the correctness of the functionality, but it can suggest that all
>> the traced functions are treated properly by this function.
>> (Please note that patch3 has already been queued in Steven's for-next.)
>>
>> I tested the code with v4.3 + Jungseok's patch v5[4].
>
> I've played this series with IRQ stack patch and it works well at least
> on my system! In addition to this condition, I've run these changes without
> IRQ stack since it is in progress. I could observe a single strange behavior,
> minus stack size around elX_irq. Am I missing something?

You saw the result like:
...
13) 4336 64 gic_handle_irq+0x5c/0xa4
14) 4272 576 el1_irq+0x68/0xd8
15) 3696 -160 smc_hardware_send_pkt+0x278/0x42c

This is the most difficult problem that I mentioned in II-3 of [1] and tried to fix.
For example, smc_hardware_send_pkt is NOT the function interrupted, but
_raw_spin_unlock_irqstore which is called at '+0x278/0x42c' is.
Giving a *perfect* solution against it is quite tough (and complicated).
Since you have introduced interrupt stack and even on x86 an interrupt stack is
not supported, I removed related patches.

-Takahiro AKASHI

> My reproduction scenario is simple.
>
> $ sudo echo 1 > /proc/sys/kernel/stack_trace_enabled
> $ sudo echo function_graph > /sys/kernel/debug/tracing/current_tracer
> $ [ Run any workload ]
> $ sudo cat /sys/kernel/debug/tracing/stack_trace
>
> Best Regards
> Jungseok Lee
>

2015-11-10 13:32:53

by Jungseok Lee

[permalink] [raw]
Subject: Re: [PATCH v5 0/6] arm64: ftrace: fix incorrect output from stack tracer

On Nov 10, 2015, at 11:58 AM, AKASHI Takahiro wrote:

Hi Akashi,

> On 11/09/2015 11:24 PM, Jungseok Lee wrote:
>> On Nov 6, 2015, at 3:44 PM, AKASHI Takahiro wrote:
>>
>> Hi Akashi,
>>
>>> This is the fifth patch series for fixing stack tracer on arm64.
>>> The original issue was reported by Jungseok[1], and then I found more
>>> issues[2].
>>>
>>> We don't have to care about the original issue because the root cause
>>> (patch "ARM64: unwind: Fix PC calculation") has been reverted in v4.3.
>>>
>>> I address here all the issues and implement fixes described in [2] except
>>> for interrupt-triggered problems(II-3) and leaf function(II-5). Recent
>>> discussions[3] about introducing a dedicated interrupt stack suggests that
>>> we may avoid walking through from an interrupt stack to a process stack.
>>> (So interrupt-stack patch is a prerequisite.)
>>>
>>> Basically,
>>> patch1 is a proactive improvement of function_graph tracer.
>>> patch2 corresponds to II-4(functions under function_graph tracer).
>>> patch3, 4 and 5 correspond to II-1(slurping stack) and II-2(differences
>>> between x86 and arm64).
>>> patch6 is a function prologue analyzer test. This won't attest
>>> the correctness of the functionality, but it can suggest that all
>>> the traced functions are treated properly by this function.
>>> (Please note that patch3 has already been queued in Steven's for-next.)
>>>
>>> I tested the code with v4.3 + Jungseok's patch v5[4].
>>
>> I've played this series with IRQ stack patch and it works well at least
>> on my system! In addition to this condition, I've run these changes without
>> IRQ stack since it is in progress. I could observe a single strange behavior,
>> minus stack size around elX_irq. Am I missing something?
>
> You saw the result like:
> ...
> 13) 4336 64 gic_handle_irq+0x5c/0xa4
> 14) 4272 576 el1_irq+0x68/0xd8
> 15) 3696 -160 smc_hardware_send_pkt+0x278/0x42c
>
> This is the most difficult problem that I mentioned in II-3 of [1] and tried to fix.
> For example, smc_hardware_send_pkt is NOT the function interrupted, but
> _raw_spin_unlock_irqstore which is called at '+0x278/0x42c' is.
> Giving a *perfect* solution against it is quite tough (and complicated).
> Since you have introduced interrupt stack and even on x86 an interrupt stack is
> not supported, I removed related patches.

Yes, that is what I've observed. I was not sure whether the behavior is related to
II-3, interrupted frame, or not. Thanks for clarification!

Best Regards
Jungseok Lee-

2015-11-10 13:40:50

by Jungseok Lee

[permalink] [raw]
Subject: Re: [PATCH v5 4/6] arm64: insn: add instruction decoders for ldp/stp and add/sub

On Nov 6, 2015, at 3:44 PM, AKASHI Takahiro wrote:
> A function prologue analyzer is a requisite for implementing stack tracer
> and getting better views of stack usages on arm64.
> To implement a function prologue analyzer, we have to be able to decode,
> at least, stp, add, sub and mov instructions.
>
> This patch adds decoders for those instructions, that are used solely
> by stack tracer for now, but generic enough for other uses.
>
> Signed-off-by: AKASHI Takahiro <[email protected]>
> ---
> arch/arm64/include/asm/insn.h | 18 ++++++++
> arch/arm64/kernel/insn.c | 102 +++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 120 insertions(+)
>
> diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
> index 30e50eb..8d5c538 100644
> --- a/arch/arm64/include/asm/insn.h
> +++ b/arch/arm64/include/asm/insn.h
> @@ -165,6 +165,8 @@ enum aarch64_insn_ldst_type {
> AARCH64_INSN_LDST_STORE_PAIR_PRE_INDEX,
> AARCH64_INSN_LDST_LOAD_PAIR_POST_INDEX,
> AARCH64_INSN_LDST_STORE_PAIR_POST_INDEX,
> + AARCH64_INSN_LDST_LOAD_PAIR,
> + AARCH64_INSN_LDST_STORE_PAIR,
> };
>
> enum aarch64_insn_adsb_type {
> @@ -225,6 +227,8 @@ static __always_inline u32 aarch64_insn_get_##abbr##_value(void) \
>
> __AARCH64_INSN_FUNCS(str_reg, 0x3FE0EC00, 0x38206800)
> __AARCH64_INSN_FUNCS(ldr_reg, 0x3FE0EC00, 0x38606800)
> +__AARCH64_INSN_FUNCS(stp, 0x7FC00000, 0x29000000)
> +__AARCH64_INSN_FUNCS(ldp, 0x7FC00000, 0x29400000)
> __AARCH64_INSN_FUNCS(stp_post, 0x7FC00000, 0x28800000)
> __AARCH64_INSN_FUNCS(ldp_post, 0x7FC00000, 0x28C00000)
> __AARCH64_INSN_FUNCS(stp_pre, 0x7FC00000, 0x29800000)
> @@ -277,6 +281,7 @@ __AARCH64_INSN_FUNCS(hint, 0xFFFFF01F, 0xD503201F)
> __AARCH64_INSN_FUNCS(br, 0xFFFFFC1F, 0xD61F0000)
> __AARCH64_INSN_FUNCS(blr, 0xFFFFFC1F, 0xD63F0000)
> __AARCH64_INSN_FUNCS(ret, 0xFFFFFC1F, 0xD65F0000)
> +__AARCH64_INSN_FUNCS(eret, 0xFFFFFFFF, 0xD69F00E0)

According to C4.2.7, the third argument looks like 0xD69F03E0. Rn field is 11111
in case of eret.

Best Regards
Jungseok Lee

> #undef __AARCH64_INSN_FUNCS
>
> @@ -370,6 +375,19 @@ bool aarch32_insn_is_wide(u32 insn);
> u32 aarch32_insn_extract_reg_num(u32 insn, int offset);
> u32 aarch32_insn_mcr_extract_opc2(u32 insn);
> u32 aarch32_insn_mcr_extract_crm(u32 insn);
> +int aarch64_insn_decode_add_sub_imm(u32 insn,
> + enum aarch64_insn_register *dst,
> + enum aarch64_insn_register *src,
> + int *imm,
> + enum aarch64_insn_variant *variant,
> + enum aarch64_insn_adsb_type *type);
> +int aarch64_insn_decode_load_store_pair(u32 insn,
> + enum aarch64_insn_register *reg1,
> + enum aarch64_insn_register *reg2,
> + enum aarch64_insn_register *base,
> + int *offset,
> + enum aarch64_insn_variant *variant,
> + enum aarch64_insn_ldst_type *type);
> #endif /* __ASSEMBLY__ */
>
> #endif /* __ASM_INSN_H */
> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> index c08b9ad..b56a66c 100644
> --- a/arch/arm64/kernel/insn.c
> +++ b/arch/arm64/kernel/insn.c
> @@ -33,6 +33,7 @@
> #include <asm/insn.h>
>
> #define AARCH64_INSN_SF_BIT BIT(31)
> +#define AARCH64_INSN_S_BIT BIT(29)
> #define AARCH64_INSN_N_BIT BIT(22)
>
> static int aarch64_insn_encoding_class[] = {
> @@ -1141,3 +1142,104 @@ u32 aarch32_insn_mcr_extract_crm(u32 insn)
> {
> return insn & CRM_MASK;
> }
> +
> +enum aarch64_insn_register aarch64_insn_extract_reg_num(u32 insn,
> + enum aarch64_insn_register_type type)
> +{
> + int shift;
> +
> + switch (type) {
> + case AARCH64_INSN_REGTYPE_RT:
> + case AARCH64_INSN_REGTYPE_RD:
> + shift = 0;
> + break;
> + case AARCH64_INSN_REGTYPE_RN:
> + shift = 5;
> + break;
> + case AARCH64_INSN_REGTYPE_RT2:
> + case AARCH64_INSN_REGTYPE_RA:
> + shift = 10;
> + break;
> + case AARCH64_INSN_REGTYPE_RM:
> + shift = 16;
> + break;
> + default:
> + pr_err("%s: unknown register type decoding %d\n", __func__,
> + type);
> + return ~0L;
> + }
> +
> + return (insn & (GENMASK(4, 0) << shift)) >> shift;
> +}
> +
> +int aarch64_insn_decode_add_sub_imm(u32 insn,
> + enum aarch64_insn_register *dst,
> + enum aarch64_insn_register *src,
> + int *imm,
> + enum aarch64_insn_variant *variant,
> + enum aarch64_insn_adsb_type *type)
> +{
> + if (aarch64_insn_is_add_imm(insn))
> + *type = ((insn) & AARCH64_INSN_S_BIT) ?
> + AARCH64_INSN_ADSB_ADD_SETFLAGS :
> + AARCH64_INSN_ADSB_ADD;
> + else if (aarch64_insn_is_sub_imm(insn))
> + *type = ((insn) & AARCH64_INSN_S_BIT) ?
> + AARCH64_INSN_ADSB_SUB_SETFLAGS :
> + AARCH64_INSN_ADSB_SUB;
> + else
> + return 0;
> +
> + *variant = (insn & AARCH64_INSN_SF_BIT) ? AARCH64_INSN_VARIANT_64BIT :
> + AARCH64_INSN_VARIANT_32BIT;
> +
> + *dst = aarch64_insn_extract_reg_num(insn, AARCH64_INSN_REGTYPE_RD);
> +
> + *src = aarch64_insn_extract_reg_num(insn, AARCH64_INSN_REGTYPE_RN);
> +
> + /* TODO: ignore shilft field[23:22] */
> + *imm = (int)aarch64_insn_decode_immediate(AARCH64_INSN_IMM_12, insn);
> +
> + return 1;
> +}
> +
> +int aarch64_insn_decode_load_store_pair(u32 insn,
> + enum aarch64_insn_register *reg1,
> + enum aarch64_insn_register *reg2,
> + enum aarch64_insn_register *base,
> + int *offset,
> + enum aarch64_insn_variant *variant,
> + enum aarch64_insn_ldst_type *type)
> +{
> + int imm;
> +
> + if (aarch64_insn_is_stp(insn))
> + *type = AARCH64_INSN_LDST_STORE_PAIR;
> + else if (aarch64_insn_is_stp_post(insn))
> + *type = AARCH64_INSN_LDST_STORE_PAIR_POST_INDEX;
> + else if (aarch64_insn_is_stp_pre(insn))
> + *type = AARCH64_INSN_LDST_STORE_PAIR_PRE_INDEX;
> + else if (aarch64_insn_is_ldp(insn))
> + *type = AARCH64_INSN_LDST_LOAD_PAIR;
> + else if (aarch64_insn_is_ldp_post(insn))
> + *type = AARCH64_INSN_LDST_LOAD_PAIR_POST_INDEX;
> + else if (aarch64_insn_is_ldp_pre(insn))
> + *type = AARCH64_INSN_LDST_LOAD_PAIR_PRE_INDEX;
> + else
> + return 0;
> +
> + *variant = (insn & AARCH64_INSN_S_BIT) ? AARCH64_INSN_VARIANT_64BIT :
> + AARCH64_INSN_VARIANT_32BIT;
> +
> + *reg1 = aarch64_insn_extract_reg_num(insn, AARCH64_INSN_REGTYPE_RT);
> +
> + *reg2 = aarch64_insn_extract_reg_num(insn, AARCH64_INSN_REGTYPE_RT2);
> +
> + *base = aarch64_insn_extract_reg_num(insn, AARCH64_INSN_REGTYPE_RN);
> +
> + imm = (int)aarch64_insn_decode_immediate(AARCH64_INSN_IMM_7, insn);
> + asm("sbfm %0, %0, 0, 6" : "+r" (imm));
> + *offset = imm * 8;
> +
> + return 1;
> +}
> --
> 1.7.9.5
>

2015-11-10 14:05:11

by Jungseok Lee

[permalink] [raw]
Subject: Re: [PATCH v5 5/6] arm64: ftrace: add arch-specific stack tracer

On Nov 6, 2015, at 3:44 PM, AKASHI Takahiro wrote:

Hi Akashi,

> Stack tracer on arm64, check_stack(), is uniqeue in the following
> points:
> * analyze a function prologue of a traced function to estimate a more
> accurate stack pointer value, replacing naive '<child's fp> + 0x10.'
> * use walk_stackframe(), instead of slurping stack contents as orignal
> check_stack() does, to identify a stack frame and a stack index (height)
> for every callsite.
>
> Regarding a function prologue analyzer, there is no guarantee that we can
> handle all the possible patterns of function prologue as gcc does not use
> any fixed templates to generate them. 'Instruction scheduling' is another
> issue here.
> Nevertheless, the current version will surely cover almost all the cases
> in the kernel image and give us useful information on stack pointers.
>
> So this patch utilizes a function prologue only for stack tracer, and
> does not affect the behaviors of existing stacktrace functions.
>
> Signed-off-by: AKASHI Takahiro <[email protected]>
> ---
> arch/arm64/include/asm/stacktrace.h | 4 +
> arch/arm64/kernel/ftrace.c | 64 ++++++++++++
> arch/arm64/kernel/stacktrace.c | 185 ++++++++++++++++++++++++++++++++++-
> 3 files changed, 250 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
> index 7318f6d..47b4832 100644
> --- a/arch/arm64/include/asm/stacktrace.h
> +++ b/arch/arm64/include/asm/stacktrace.h
> @@ -25,5 +25,9 @@ struct stackframe {
> extern int unwind_frame(struct stackframe *frame);
> extern void walk_stackframe(struct stackframe *frame,
> int (*fn)(struct stackframe *, void *), void *data);
> +#ifdef CONFIG_STACK_TRACER
> +struct stack_trace;
> +extern void save_stack_trace_sp(struct stack_trace *trace, unsigned long *sp);
> +#endif
>
> #endif /* __ASM_STACKTRACE_H */
> diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
> index 314f82d..46bfe37 100644
> --- a/arch/arm64/kernel/ftrace.c
> +++ b/arch/arm64/kernel/ftrace.c
> @@ -9,6 +9,7 @@
> * published by the Free Software Foundation.
> */
>
> +#include <linux/bug.h>
> #include <linux/ftrace.h>
> #include <linux/swab.h>
> #include <linux/uaccess.h>
> @@ -16,6 +17,7 @@
> #include <asm/cacheflush.h>
> #include <asm/ftrace.h>
> #include <asm/insn.h>
> +#include <asm/stacktrace.h>
>
> #ifdef CONFIG_DYNAMIC_FTRACE
> /*
> @@ -173,3 +175,65 @@ int ftrace_disable_ftrace_graph_caller(void)
> }
> #endif /* CONFIG_DYNAMIC_FTRACE */
> #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> +
> +#ifdef CONFIG_STACK_TRACER
> +static unsigned long stack_trace_sp[STACK_TRACE_ENTRIES];
> +static unsigned long raw_stack_trace_max_size;
> +
> +void check_stack(unsigned long ip, unsigned long *stack)
> +{
> + unsigned long this_size, flags;
> + unsigned long top;
> + int i, j;
> +
> + this_size = ((unsigned long)stack) & (THREAD_SIZE-1);
> + this_size = THREAD_SIZE - this_size;
> +
> + if (this_size <= raw_stack_trace_max_size)
> + return;
> +
> + /* we do not handle an interrupt stack yet */
> + if (!object_is_on_stack(stack))
> + return;
> +
> + local_irq_save(flags);
> + arch_spin_lock(&max_stack_lock);
> +
> + /* check again */
> + if (this_size <= raw_stack_trace_max_size)
> + goto out;
> +
> + /* find out stack frames */
> + stack_trace_max.nr_entries = 0;
> + stack_trace_max.skip = 0;
> + save_stack_trace_sp(&stack_trace_max, stack_trace_sp);
> + stack_trace_max.nr_entries--; /* for the last entry ('-1') */
> +
> + /* calculate a stack index for each function */
> + top = ((unsigned long)stack & ~(THREAD_SIZE-1)) + THREAD_SIZE;
> + for (i = 0; i < stack_trace_max.nr_entries; i++)
> + stack_trace_index[i] = top - stack_trace_sp[i];
> + raw_stack_trace_max_size = this_size;
> +
> + /* Skip over the overhead of the stack tracer itself */
> + for (i = 0; i < stack_trace_max.nr_entries; i++)
> + if (stack_trace_max.entries[i] == ip)
> + break;
> +
> + stack_trace_max.nr_entries -= i;
> + for (j = 0; j < stack_trace_max.nr_entries; j++) {
> + stack_trace_index[j] = stack_trace_index[j + i];
> + stack_trace_max.entries[j] = stack_trace_max.entries[j + i];
> + }
> + stack_trace_max_size = stack_trace_index[0];
> +
> + if (task_stack_end_corrupted(current)) {
> + WARN(1, "task stack is corrupted.\n");
> + stack_trace_print();
> + }
> +
> + out:
> + arch_spin_unlock(&max_stack_lock);
> + local_irq_restore(flags);
> +}
> +#endif /* CONFIG_STACK_TRACER */
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index 5fd3477..4ee052a 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -23,6 +23,144 @@
>
> #include <asm/stacktrace.h>
>
> +#ifdef CONFIG_STACK_TRACER
> +/*
> + * This function parses a function prologue of a traced function and
> + * determines its stack size.
> + * A return value indicates a location of @pc in a function prologue.
> + * @return value:
> + * <case 1> <case 1'>
> + * 1:
> + * sub sp, sp, #XX sub sp, sp, #XX
> + * 2:
> + * stp x29, x30, [sp, #YY] stp x29, x30, [sp, #--ZZ]!
> + * 3:
> + * add x29, sp, #YY mov x29, sp
> + * 0:
> + *
> + * <case 2>
> + * 1:
> + * stp x29, x30, [sp, #-XX]!
> + * 3:
> + * mov x29, sp
> + * 0:
> + *
> + * @size: sp offset from calller's sp (XX or XX + ZZ)
> + * @size2: fp offset from new sp (YY or 0)
> + */
> +static int analyze_function_prologue(unsigned long pc,
> + unsigned long *size, unsigned long *size2)
> +{
> + unsigned long offset;
> + u32 *addr, insn;
> + int pos = -1;
> + enum aarch64_insn_register src, dst, reg1, reg2, base;
> + int imm;
> + enum aarch64_insn_variant variant;
> + enum aarch64_insn_adsb_type adsb_type;
> + enum aarch64_insn_ldst_type ldst_type;
> +
> + *size = *size2 = 0;
> +
> + if (!pc)
> + goto out;
> +
> + if (unlikely(!kallsyms_lookup_size_offset(pc, NULL, &offset)))
> + goto out;
> +
> + addr = (u32 *)(pc - offset);
> +#ifdef CONFIG_DYNAMIC_FTRACE
> + if (addr == (u32 *)ftrace_call)
> + addr = (u32 *)ftrace_caller;
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> + else if (addr == (u32 *)ftrace_graph_caller)
> +#ifdef CONFIG_DYNAMIC_FTRACE
> + addr = (u32 *)ftrace_caller;
> +#else
> + addr = (u32 *)_mcount;
> +#endif
> +#endif
> +#endif
> +
> + insn = *addr;
> + pos = 1;
> +
> + /* analyze a function prologue */
> + while ((unsigned long)addr < pc) {
> + if (aarch64_insn_is_branch_imm(insn) ||
> + aarch64_insn_is_br(insn) ||
> + aarch64_insn_is_blr(insn) ||
> + aarch64_insn_is_ret(insn) ||
> + aarch64_insn_is_eret(insn))
> + /* exiting a basic block */
> + goto out;
> +
> + if (aarch64_insn_decode_add_sub_imm(insn, &dst, &src,
> + &imm, &variant, &adsb_type)) {
> + if ((adsb_type == AARCH64_INSN_ADSB_SUB) &&
> + (dst == AARCH64_INSN_REG_SP) &&
> + (src == AARCH64_INSN_REG_SP)) {
> + /*
> + * Starting the following sequence:
> + * sub sp, sp, #xx
> + * stp x29, x30, [sp, #yy]
> + * add x29, sp, #yy
> + */
> + WARN_ON(pos != 1);
> + pos = 2;
> + *size += imm;
> + } else if ((adsb_type == AARCH64_INSN_ADSB_ADD) &&
> + (dst == AARCH64_INSN_REG_29) &&
> + (src == AARCH64_INSN_REG_SP)) {
> + /*
> + * add x29, sp, #yy
> + * or
> + * mov x29, sp
> + */
> + WARN_ON(pos != 3);
> + pos = 0;
> + *size2 = imm;
> +
> + break;
> + }
> + } else if (aarch64_insn_decode_load_store_pair(insn,
> + &reg1, &reg2, &base, &imm,
> + &variant, &ldst_type)) {
> + if ((ldst_type ==
> + AARCH64_INSN_LDST_STORE_PAIR_PRE_INDEX) &&
> + (reg1 == AARCH64_INSN_REG_29) &&
> + (reg2 == AARCH64_INSN_REG_30) &&
> + (base == AARCH64_INSN_REG_SP)) {
> + /*
> + * Starting the following sequence:
> + * stp x29, x30, [sp, #-xx]!
> + * mov x29, sp
> + */
> + WARN_ON(!((pos == 1) || (pos == 2)));
> + pos = 3;
> + *size += -imm;
> + } else if ((ldst_type ==
> + AARCH64_INSN_LDST_STORE_PAIR) &&
> + (reg1 == AARCH64_INSN_REG_29) &&
> + (reg2 == AARCH64_INSN_REG_30) &&
> + (base == AARCH64_INSN_REG_SP)) {
> + /*
> + * stp x29, x30, [sp, #yy]
> + */
> + WARN_ON(pos != 2);
> + pos = 3;
> + }
> + }
> +
> + addr++;
> + insn = *addr;
> + }
> +
> +out:
> + return pos;
> +}
> +#endif
> +

A small instruction decoder in software!

>From a high level point of view, it makes sense to deal with only three main
cases. Since it is tough to deal with all function prologues in perspective
of computational overhead and implementation complexity, the above change looks
like a pretty good tradeoff. This is what I can say now. I'm also interested
in feedbacks from other folks.

Best Regards
Jungseok Lee-

2015-11-11 04:54:47

by AKASHI Takahiro

[permalink] [raw]
Subject: Re: [PATCH v5 4/6] arm64: insn: add instruction decoders for ldp/stp and add/sub

On 11/10/2015 10:40 PM, Jungseok Lee wrote:
> On Nov 6, 2015, at 3:44 PM, AKASHI Takahiro wrote:
>> A function prologue analyzer is a requisite for implementing stack tracer
>> and getting better views of stack usages on arm64.
>> To implement a function prologue analyzer, we have to be able to decode,
>> at least, stp, add, sub and mov instructions.
>>
>> This patch adds decoders for those instructions, that are used solely
>> by stack tracer for now, but generic enough for other uses.
>>
>> Signed-off-by: AKASHI Takahiro <[email protected]>
>> ---
>> arch/arm64/include/asm/insn.h | 18 ++++++++
>> arch/arm64/kernel/insn.c | 102 +++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 120 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
>> index 30e50eb..8d5c538 100644
>> --- a/arch/arm64/include/asm/insn.h
>> +++ b/arch/arm64/include/asm/insn.h
>> @@ -165,6 +165,8 @@ enum aarch64_insn_ldst_type {
>> AARCH64_INSN_LDST_STORE_PAIR_PRE_INDEX,
>> AARCH64_INSN_LDST_LOAD_PAIR_POST_INDEX,
>> AARCH64_INSN_LDST_STORE_PAIR_POST_INDEX,
>> + AARCH64_INSN_LDST_LOAD_PAIR,
>> + AARCH64_INSN_LDST_STORE_PAIR,
>> };
>>
>> enum aarch64_insn_adsb_type {
>> @@ -225,6 +227,8 @@ static __always_inline u32 aarch64_insn_get_##abbr##_value(void) \
>>
>> __AARCH64_INSN_FUNCS(str_reg, 0x3FE0EC00, 0x38206800)
>> __AARCH64_INSN_FUNCS(ldr_reg, 0x3FE0EC00, 0x38606800)
>> +__AARCH64_INSN_FUNCS(stp, 0x7FC00000, 0x29000000)
>> +__AARCH64_INSN_FUNCS(ldp, 0x7FC00000, 0x29400000)
>> __AARCH64_INSN_FUNCS(stp_post, 0x7FC00000, 0x28800000)
>> __AARCH64_INSN_FUNCS(ldp_post, 0x7FC00000, 0x28C00000)
>> __AARCH64_INSN_FUNCS(stp_pre, 0x7FC00000, 0x29800000)
>> @@ -277,6 +281,7 @@ __AARCH64_INSN_FUNCS(hint, 0xFFFFF01F, 0xD503201F)
>> __AARCH64_INSN_FUNCS(br, 0xFFFFFC1F, 0xD61F0000)
>> __AARCH64_INSN_FUNCS(blr, 0xFFFFFC1F, 0xD63F0000)
>> __AARCH64_INSN_FUNCS(ret, 0xFFFFFC1F, 0xD65F0000)
>> +__AARCH64_INSN_FUNCS(eret, 0xFFFFFFFF, 0xD69F00E0)
>
> According to C4.2.7, the third argument looks like 0xD69F03E0. Rn field is 11111
> in case of eret.

Thanks. Fix it (c3.2.7 though).

-Takahiro AKASHI

> Best Regards
> Jungseok Lee
>
>> #undef __AARCH64_INSN_FUNCS
>>
>> @@ -370,6 +375,19 @@ bool aarch32_insn_is_wide(u32 insn);
>> u32 aarch32_insn_extract_reg_num(u32 insn, int offset);
>> u32 aarch32_insn_mcr_extract_opc2(u32 insn);
>> u32 aarch32_insn_mcr_extract_crm(u32 insn);
>> +int aarch64_insn_decode_add_sub_imm(u32 insn,
>> + enum aarch64_insn_register *dst,
>> + enum aarch64_insn_register *src,
>> + int *imm,
>> + enum aarch64_insn_variant *variant,
>> + enum aarch64_insn_adsb_type *type);
>> +int aarch64_insn_decode_load_store_pair(u32 insn,
>> + enum aarch64_insn_register *reg1,
>> + enum aarch64_insn_register *reg2,
>> + enum aarch64_insn_register *base,
>> + int *offset,
>> + enum aarch64_insn_variant *variant,
>> + enum aarch64_insn_ldst_type *type);
>> #endif /* __ASSEMBLY__ */
>>
>> #endif /* __ASM_INSN_H */
>> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
>> index c08b9ad..b56a66c 100644
>> --- a/arch/arm64/kernel/insn.c
>> +++ b/arch/arm64/kernel/insn.c
>> @@ -33,6 +33,7 @@
>> #include <asm/insn.h>
>>
>> #define AARCH64_INSN_SF_BIT BIT(31)
>> +#define AARCH64_INSN_S_BIT BIT(29)
>> #define AARCH64_INSN_N_BIT BIT(22)
>>
>> static int aarch64_insn_encoding_class[] = {
>> @@ -1141,3 +1142,104 @@ u32 aarch32_insn_mcr_extract_crm(u32 insn)
>> {
>> return insn & CRM_MASK;
>> }
>> +
>> +enum aarch64_insn_register aarch64_insn_extract_reg_num(u32 insn,
>> + enum aarch64_insn_register_type type)
>> +{
>> + int shift;
>> +
>> + switch (type) {
>> + case AARCH64_INSN_REGTYPE_RT:
>> + case AARCH64_INSN_REGTYPE_RD:
>> + shift = 0;
>> + break;
>> + case AARCH64_INSN_REGTYPE_RN:
>> + shift = 5;
>> + break;
>> + case AARCH64_INSN_REGTYPE_RT2:
>> + case AARCH64_INSN_REGTYPE_RA:
>> + shift = 10;
>> + break;
>> + case AARCH64_INSN_REGTYPE_RM:
>> + shift = 16;
>> + break;
>> + default:
>> + pr_err("%s: unknown register type decoding %d\n", __func__,
>> + type);
>> + return ~0L;
>> + }
>> +
>> + return (insn & (GENMASK(4, 0) << shift)) >> shift;
>> +}
>> +
>> +int aarch64_insn_decode_add_sub_imm(u32 insn,
>> + enum aarch64_insn_register *dst,
>> + enum aarch64_insn_register *src,
>> + int *imm,
>> + enum aarch64_insn_variant *variant,
>> + enum aarch64_insn_adsb_type *type)
>> +{
>> + if (aarch64_insn_is_add_imm(insn))
>> + *type = ((insn) & AARCH64_INSN_S_BIT) ?
>> + AARCH64_INSN_ADSB_ADD_SETFLAGS :
>> + AARCH64_INSN_ADSB_ADD;
>> + else if (aarch64_insn_is_sub_imm(insn))
>> + *type = ((insn) & AARCH64_INSN_S_BIT) ?
>> + AARCH64_INSN_ADSB_SUB_SETFLAGS :
>> + AARCH64_INSN_ADSB_SUB;
>> + else
>> + return 0;
>> +
>> + *variant = (insn & AARCH64_INSN_SF_BIT) ? AARCH64_INSN_VARIANT_64BIT :
>> + AARCH64_INSN_VARIANT_32BIT;
>> +
>> + *dst = aarch64_insn_extract_reg_num(insn, AARCH64_INSN_REGTYPE_RD);
>> +
>> + *src = aarch64_insn_extract_reg_num(insn, AARCH64_INSN_REGTYPE_RN);
>> +
>> + /* TODO: ignore shilft field[23:22] */
>> + *imm = (int)aarch64_insn_decode_immediate(AARCH64_INSN_IMM_12, insn);
>> +
>> + return 1;
>> +}
>> +
>> +int aarch64_insn_decode_load_store_pair(u32 insn,
>> + enum aarch64_insn_register *reg1,
>> + enum aarch64_insn_register *reg2,
>> + enum aarch64_insn_register *base,
>> + int *offset,
>> + enum aarch64_insn_variant *variant,
>> + enum aarch64_insn_ldst_type *type)
>> +{
>> + int imm;
>> +
>> + if (aarch64_insn_is_stp(insn))
>> + *type = AARCH64_INSN_LDST_STORE_PAIR;
>> + else if (aarch64_insn_is_stp_post(insn))
>> + *type = AARCH64_INSN_LDST_STORE_PAIR_POST_INDEX;
>> + else if (aarch64_insn_is_stp_pre(insn))
>> + *type = AARCH64_INSN_LDST_STORE_PAIR_PRE_INDEX;
>> + else if (aarch64_insn_is_ldp(insn))
>> + *type = AARCH64_INSN_LDST_LOAD_PAIR;
>> + else if (aarch64_insn_is_ldp_post(insn))
>> + *type = AARCH64_INSN_LDST_LOAD_PAIR_POST_INDEX;
>> + else if (aarch64_insn_is_ldp_pre(insn))
>> + *type = AARCH64_INSN_LDST_LOAD_PAIR_PRE_INDEX;
>> + else
>> + return 0;
>> +
>> + *variant = (insn & AARCH64_INSN_S_BIT) ? AARCH64_INSN_VARIANT_64BIT :
>> + AARCH64_INSN_VARIANT_32BIT;
>> +
>> + *reg1 = aarch64_insn_extract_reg_num(insn, AARCH64_INSN_REGTYPE_RT);
>> +
>> + *reg2 = aarch64_insn_extract_reg_num(insn, AARCH64_INSN_REGTYPE_RT2);
>> +
>> + *base = aarch64_insn_extract_reg_num(insn, AARCH64_INSN_REGTYPE_RN);
>> +
>> + imm = (int)aarch64_insn_decode_immediate(AARCH64_INSN_IMM_7, insn);
>> + asm("sbfm %0, %0, 0, 6" : "+r" (imm));
>> + *offset = imm * 8;
>> +
>> + return 1;
>> +}
>> --
>> 1.7.9.5
>>
>

2015-11-11 05:03:51

by AKASHI Takahiro

[permalink] [raw]
Subject: Re: [PATCH v5 5/6] arm64: ftrace: add arch-specific stack tracer

Jungseok,

On 11/10/2015 11:05 PM, Jungseok Lee wrote:
> On Nov 6, 2015, at 3:44 PM, AKASHI Takahiro wrote:
>
> Hi Akashi,
>
>> Stack tracer on arm64, check_stack(), is uniqeue in the following
>> points:
>> * analyze a function prologue of a traced function to estimate a more
>> accurate stack pointer value, replacing naive '<child's fp> + 0x10.'
>> * use walk_stackframe(), instead of slurping stack contents as orignal
>> check_stack() does, to identify a stack frame and a stack index (height)
>> for every callsite.
>>
>> Regarding a function prologue analyzer, there is no guarantee that we can
>> handle all the possible patterns of function prologue as gcc does not use
>> any fixed templates to generate them. 'Instruction scheduling' is another
>> issue here.
>> Nevertheless, the current version will surely cover almost all the cases
>> in the kernel image and give us useful information on stack pointers.
>>
>> So this patch utilizes a function prologue only for stack tracer, and
>> does not affect the behaviors of existing stacktrace functions.
>>
>> Signed-off-by: AKASHI Takahiro <[email protected]>
>> ---
>> arch/arm64/include/asm/stacktrace.h | 4 +
>> arch/arm64/kernel/ftrace.c | 64 ++++++++++++
>> arch/arm64/kernel/stacktrace.c | 185 ++++++++++++++++++++++++++++++++++-
>> 3 files changed, 250 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
>> index 7318f6d..47b4832 100644
>> --- a/arch/arm64/include/asm/stacktrace.h
>> +++ b/arch/arm64/include/asm/stacktrace.h
>> @@ -25,5 +25,9 @@ struct stackframe {
>> extern int unwind_frame(struct stackframe *frame);
>> extern void walk_stackframe(struct stackframe *frame,
>> int (*fn)(struct stackframe *, void *), void *data);
>> +#ifdef CONFIG_STACK_TRACER
>> +struct stack_trace;
>> +extern void save_stack_trace_sp(struct stack_trace *trace, unsigned long *sp);
>> +#endif
>>
>> #endif /* __ASM_STACKTRACE_H */
>> diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
>> index 314f82d..46bfe37 100644
>> --- a/arch/arm64/kernel/ftrace.c
>> +++ b/arch/arm64/kernel/ftrace.c
>> @@ -9,6 +9,7 @@
>> * published by the Free Software Foundation.
>> */
>>
>> +#include <linux/bug.h>
>> #include <linux/ftrace.h>
>> #include <linux/swab.h>
>> #include <linux/uaccess.h>
>> @@ -16,6 +17,7 @@
>> #include <asm/cacheflush.h>
>> #include <asm/ftrace.h>
>> #include <asm/insn.h>
>> +#include <asm/stacktrace.h>
>>
>> #ifdef CONFIG_DYNAMIC_FTRACE
>> /*
>> @@ -173,3 +175,65 @@ int ftrace_disable_ftrace_graph_caller(void)
>> }
>> #endif /* CONFIG_DYNAMIC_FTRACE */
>> #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
>> +
>> +#ifdef CONFIG_STACK_TRACER
>> +static unsigned long stack_trace_sp[STACK_TRACE_ENTRIES];
>> +static unsigned long raw_stack_trace_max_size;
>> +
>> +void check_stack(unsigned long ip, unsigned long *stack)
>> +{
>> + unsigned long this_size, flags;
>> + unsigned long top;
>> + int i, j;
>> +
>> + this_size = ((unsigned long)stack) & (THREAD_SIZE-1);
>> + this_size = THREAD_SIZE - this_size;
>> +
>> + if (this_size <= raw_stack_trace_max_size)
>> + return;
>> +
>> + /* we do not handle an interrupt stack yet */
>> + if (!object_is_on_stack(stack))
>> + return;
>> +
>> + local_irq_save(flags);
>> + arch_spin_lock(&max_stack_lock);
>> +
>> + /* check again */
>> + if (this_size <= raw_stack_trace_max_size)
>> + goto out;
>> +
>> + /* find out stack frames */
>> + stack_trace_max.nr_entries = 0;
>> + stack_trace_max.skip = 0;
>> + save_stack_trace_sp(&stack_trace_max, stack_trace_sp);
>> + stack_trace_max.nr_entries--; /* for the last entry ('-1') */
>> +
>> + /* calculate a stack index for each function */
>> + top = ((unsigned long)stack & ~(THREAD_SIZE-1)) + THREAD_SIZE;
>> + for (i = 0; i < stack_trace_max.nr_entries; i++)
>> + stack_trace_index[i] = top - stack_trace_sp[i];
>> + raw_stack_trace_max_size = this_size;
>> +
>> + /* Skip over the overhead of the stack tracer itself */
>> + for (i = 0; i < stack_trace_max.nr_entries; i++)
>> + if (stack_trace_max.entries[i] == ip)
>> + break;
>> +
>> + stack_trace_max.nr_entries -= i;
>> + for (j = 0; j < stack_trace_max.nr_entries; j++) {
>> + stack_trace_index[j] = stack_trace_index[j + i];
>> + stack_trace_max.entries[j] = stack_trace_max.entries[j + i];
>> + }
>> + stack_trace_max_size = stack_trace_index[0];
>> +
>> + if (task_stack_end_corrupted(current)) {
>> + WARN(1, "task stack is corrupted.\n");
>> + stack_trace_print();
>> + }
>> +
>> + out:
>> + arch_spin_unlock(&max_stack_lock);
>> + local_irq_restore(flags);
>> +}
>> +#endif /* CONFIG_STACK_TRACER */
>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
>> index 5fd3477..4ee052a 100644
>> --- a/arch/arm64/kernel/stacktrace.c
>> +++ b/arch/arm64/kernel/stacktrace.c
>> @@ -23,6 +23,144 @@
>>
>> #include <asm/stacktrace.h>
>>
>> +#ifdef CONFIG_STACK_TRACER
>> +/*
>> + * This function parses a function prologue of a traced function and
>> + * determines its stack size.
>> + * A return value indicates a location of @pc in a function prologue.
>> + * @return value:
>> + * <case 1> <case 1'>
>> + * 1:
>> + * sub sp, sp, #XX sub sp, sp, #XX
>> + * 2:
>> + * stp x29, x30, [sp, #YY] stp x29, x30, [sp, #--ZZ]!
>> + * 3:
>> + * add x29, sp, #YY mov x29, sp
>> + * 0:
>> + *
>> + * <case 2>
>> + * 1:
>> + * stp x29, x30, [sp, #-XX]!
>> + * 3:
>> + * mov x29, sp
>> + * 0:
>> + *
>> + * @size: sp offset from calller's sp (XX or XX + ZZ)
>> + * @size2: fp offset from new sp (YY or 0)
>> + */
>> +static int analyze_function_prologue(unsigned long pc,
>> + unsigned long *size, unsigned long *size2)
>> +{
>> + unsigned long offset;
>> + u32 *addr, insn;
>> + int pos = -1;
>> + enum aarch64_insn_register src, dst, reg1, reg2, base;
>> + int imm;
>> + enum aarch64_insn_variant variant;
>> + enum aarch64_insn_adsb_type adsb_type;
>> + enum aarch64_insn_ldst_type ldst_type;
>> +
>> + *size = *size2 = 0;
>> +
>> + if (!pc)
>> + goto out;
>> +
>> + if (unlikely(!kallsyms_lookup_size_offset(pc, NULL, &offset)))
>> + goto out;
>> +
>> + addr = (u32 *)(pc - offset);
>> +#ifdef CONFIG_DYNAMIC_FTRACE
>> + if (addr == (u32 *)ftrace_call)
>> + addr = (u32 *)ftrace_caller;
>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
>> + else if (addr == (u32 *)ftrace_graph_caller)
>> +#ifdef CONFIG_DYNAMIC_FTRACE
>> + addr = (u32 *)ftrace_caller;
>> +#else
>> + addr = (u32 *)_mcount;
>> +#endif
>> +#endif
>> +#endif
>> +
>> + insn = *addr;
>> + pos = 1;
>> +
>> + /* analyze a function prologue */
>> + while ((unsigned long)addr < pc) {
>> + if (aarch64_insn_is_branch_imm(insn) ||
>> + aarch64_insn_is_br(insn) ||
>> + aarch64_insn_is_blr(insn) ||
>> + aarch64_insn_is_ret(insn) ||
>> + aarch64_insn_is_eret(insn))
>> + /* exiting a basic block */
>> + goto out;
>> +
>> + if (aarch64_insn_decode_add_sub_imm(insn, &dst, &src,
>> + &imm, &variant, &adsb_type)) {
>> + if ((adsb_type == AARCH64_INSN_ADSB_SUB) &&
>> + (dst == AARCH64_INSN_REG_SP) &&
>> + (src == AARCH64_INSN_REG_SP)) {
>> + /*
>> + * Starting the following sequence:
>> + * sub sp, sp, #xx
>> + * stp x29, x30, [sp, #yy]
>> + * add x29, sp, #yy
>> + */
>> + WARN_ON(pos != 1);
>> + pos = 2;
>> + *size += imm;
>> + } else if ((adsb_type == AARCH64_INSN_ADSB_ADD) &&
>> + (dst == AARCH64_INSN_REG_29) &&
>> + (src == AARCH64_INSN_REG_SP)) {
>> + /*
>> + * add x29, sp, #yy
>> + * or
>> + * mov x29, sp
>> + */
>> + WARN_ON(pos != 3);
>> + pos = 0;
>> + *size2 = imm;
>> +
>> + break;
>> + }
>> + } else if (aarch64_insn_decode_load_store_pair(insn,
>> + &reg1, &reg2, &base, &imm,
>> + &variant, &ldst_type)) {
>> + if ((ldst_type ==
>> + AARCH64_INSN_LDST_STORE_PAIR_PRE_INDEX) &&
>> + (reg1 == AARCH64_INSN_REG_29) &&
>> + (reg2 == AARCH64_INSN_REG_30) &&
>> + (base == AARCH64_INSN_REG_SP)) {
>> + /*
>> + * Starting the following sequence:
>> + * stp x29, x30, [sp, #-xx]!
>> + * mov x29, sp
>> + */
>> + WARN_ON(!((pos == 1) || (pos == 2)));
>> + pos = 3;
>> + *size += -imm;
>> + } else if ((ldst_type ==
>> + AARCH64_INSN_LDST_STORE_PAIR) &&
>> + (reg1 == AARCH64_INSN_REG_29) &&
>> + (reg2 == AARCH64_INSN_REG_30) &&
>> + (base == AARCH64_INSN_REG_SP)) {
>> + /*
>> + * stp x29, x30, [sp, #yy]
>> + */
>> + WARN_ON(pos != 2);
>> + pos = 3;
>> + }
>> + }
>> +
>> + addr++;
>> + insn = *addr;
>> + }
>> +
>> +out:
>> + return pos;
>> +}
>> +#endif
>> +
>
> A small instruction decoder in software!
>
> From a high level point of view, it makes sense to deal with only three main
> cases. Since it is tough to deal with all function prologues in perspective
> of computational overhead and implementation complexity, the above change looks
> like a pretty good tradeoff. This is what I can say now. I'm also interested
> in feedbacks from other folks.

I really appreciate your reviews and comments on the whole series of my patches
(#2 to #5).
May I add Reviewed-by (and Tested-by?) with your name if you like?

-Takahiro AKASHI

> Best Regards
> Jungseok Lee
>

2015-11-11 22:57:09

by Jungseok Lee

[permalink] [raw]
Subject: Re: [PATCH v5 5/6] arm64: ftrace: add arch-specific stack tracer

On Nov 11, 2015, at 2:03 PM, AKASHI Takahiro wrote:
> Jungseok,
>
> On 11/10/2015 11:05 PM, Jungseok Lee wrote:
>> On Nov 6, 2015, at 3:44 PM, AKASHI Takahiro wrote:
>>
>> Hi Akashi,
>>
>>> Stack tracer on arm64, check_stack(), is uniqeue in the following
>>> points:
>>> * analyze a function prologue of a traced function to estimate a more
>>> accurate stack pointer value, replacing naive '<child's fp> + 0x10.'
>>> * use walk_stackframe(), instead of slurping stack contents as orignal
>>> check_stack() does, to identify a stack frame and a stack index (height)
>>> for every callsite.
>>>
>>> Regarding a function prologue analyzer, there is no guarantee that we can
>>> handle all the possible patterns of function prologue as gcc does not use
>>> any fixed templates to generate them. 'Instruction scheduling' is another
>>> issue here.
>>> Nevertheless, the current version will surely cover almost all the cases
>>> in the kernel image and give us useful information on stack pointers.
>>>
>>> So this patch utilizes a function prologue only for stack tracer, and
>>> does not affect the behaviors of existing stacktrace functions.
>>>
>>> Signed-off-by: AKASHI Takahiro <[email protected]>
>>> ---
>>> arch/arm64/include/asm/stacktrace.h | 4 +
>>> arch/arm64/kernel/ftrace.c | 64 ++++++++++++
>>> arch/arm64/kernel/stacktrace.c | 185 ++++++++++++++++++++++++++++++++++-
>>> 3 files changed, 250 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
>>> index 7318f6d..47b4832 100644
>>> --- a/arch/arm64/include/asm/stacktrace.h
>>> +++ b/arch/arm64/include/asm/stacktrace.h
>>> @@ -25,5 +25,9 @@ struct stackframe {
>>> extern int unwind_frame(struct stackframe *frame);
>>> extern void walk_stackframe(struct stackframe *frame,
>>> int (*fn)(struct stackframe *, void *), void *data);
>>> +#ifdef CONFIG_STACK_TRACER
>>> +struct stack_trace;
>>> +extern void save_stack_trace_sp(struct stack_trace *trace, unsigned long *sp);
>>> +#endif
>>>
>>> #endif /* __ASM_STACKTRACE_H */
>>> diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
>>> index 314f82d..46bfe37 100644
>>> --- a/arch/arm64/kernel/ftrace.c
>>> +++ b/arch/arm64/kernel/ftrace.c
>>> @@ -9,6 +9,7 @@
>>> * published by the Free Software Foundation.
>>> */
>>>
>>> +#include <linux/bug.h>
>>> #include <linux/ftrace.h>
>>> #include <linux/swab.h>
>>> #include <linux/uaccess.h>
>>> @@ -16,6 +17,7 @@
>>> #include <asm/cacheflush.h>
>>> #include <asm/ftrace.h>
>>> #include <asm/insn.h>
>>> +#include <asm/stacktrace.h>
>>>
>>> #ifdef CONFIG_DYNAMIC_FTRACE
>>> /*
>>> @@ -173,3 +175,65 @@ int ftrace_disable_ftrace_graph_caller(void)
>>> }
>>> #endif /* CONFIG_DYNAMIC_FTRACE */
>>> #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
>>> +
>>> +#ifdef CONFIG_STACK_TRACER
>>> +static unsigned long stack_trace_sp[STACK_TRACE_ENTRIES];
>>> +static unsigned long raw_stack_trace_max_size;
>>> +
>>> +void check_stack(unsigned long ip, unsigned long *stack)
>>> +{
>>> + unsigned long this_size, flags;
>>> + unsigned long top;
>>> + int i, j;
>>> +
>>> + this_size = ((unsigned long)stack) & (THREAD_SIZE-1);
>>> + this_size = THREAD_SIZE - this_size;
>>> +
>>> + if (this_size <= raw_stack_trace_max_size)
>>> + return;
>>> +
>>> + /* we do not handle an interrupt stack yet */
>>> + if (!object_is_on_stack(stack))
>>> + return;
>>> +
>>> + local_irq_save(flags);
>>> + arch_spin_lock(&max_stack_lock);
>>> +
>>> + /* check again */
>>> + if (this_size <= raw_stack_trace_max_size)
>>> + goto out;
>>> +
>>> + /* find out stack frames */
>>> + stack_trace_max.nr_entries = 0;
>>> + stack_trace_max.skip = 0;
>>> + save_stack_trace_sp(&stack_trace_max, stack_trace_sp);
>>> + stack_trace_max.nr_entries--; /* for the last entry ('-1') */
>>> +
>>> + /* calculate a stack index for each function */
>>> + top = ((unsigned long)stack & ~(THREAD_SIZE-1)) + THREAD_SIZE;
>>> + for (i = 0; i < stack_trace_max.nr_entries; i++)
>>> + stack_trace_index[i] = top - stack_trace_sp[i];
>>> + raw_stack_trace_max_size = this_size;
>>> +
>>> + /* Skip over the overhead of the stack tracer itself */
>>> + for (i = 0; i < stack_trace_max.nr_entries; i++)
>>> + if (stack_trace_max.entries[i] == ip)
>>> + break;
>>> +
>>> + stack_trace_max.nr_entries -= i;
>>> + for (j = 0; j < stack_trace_max.nr_entries; j++) {
>>> + stack_trace_index[j] = stack_trace_index[j + i];
>>> + stack_trace_max.entries[j] = stack_trace_max.entries[j + i];
>>> + }
>>> + stack_trace_max_size = stack_trace_index[0];
>>> +
>>> + if (task_stack_end_corrupted(current)) {
>>> + WARN(1, "task stack is corrupted.\n");
>>> + stack_trace_print();
>>> + }
>>> +
>>> + out:
>>> + arch_spin_unlock(&max_stack_lock);
>>> + local_irq_restore(flags);
>>> +}
>>> +#endif /* CONFIG_STACK_TRACER */
>>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
>>> index 5fd3477..4ee052a 100644
>>> --- a/arch/arm64/kernel/stacktrace.c
>>> +++ b/arch/arm64/kernel/stacktrace.c
>>> @@ -23,6 +23,144 @@
>>>
>>> #include <asm/stacktrace.h>
>>>
>>> +#ifdef CONFIG_STACK_TRACER
>>> +/*
>>> + * This function parses a function prologue of a traced function and
>>> + * determines its stack size.
>>> + * A return value indicates a location of @pc in a function prologue.
>>> + * @return value:
>>> + * <case 1> <case 1'>
>>> + * 1:
>>> + * sub sp, sp, #XX sub sp, sp, #XX
>>> + * 2:
>>> + * stp x29, x30, [sp, #YY] stp x29, x30, [sp, #--ZZ]!
>>> + * 3:
>>> + * add x29, sp, #YY mov x29, sp
>>> + * 0:
>>> + *
>>> + * <case 2>
>>> + * 1:
>>> + * stp x29, x30, [sp, #-XX]!
>>> + * 3:
>>> + * mov x29, sp
>>> + * 0:
>>> + *
>>> + * @size: sp offset from calller's sp (XX or XX + ZZ)
>>> + * @size2: fp offset from new sp (YY or 0)
>>> + */
>>> +static int analyze_function_prologue(unsigned long pc,
>>> + unsigned long *size, unsigned long *size2)
>>> +{
>>> + unsigned long offset;
>>> + u32 *addr, insn;
>>> + int pos = -1;
>>> + enum aarch64_insn_register src, dst, reg1, reg2, base;
>>> + int imm;
>>> + enum aarch64_insn_variant variant;
>>> + enum aarch64_insn_adsb_type adsb_type;
>>> + enum aarch64_insn_ldst_type ldst_type;
>>> +
>>> + *size = *size2 = 0;
>>> +
>>> + if (!pc)
>>> + goto out;
>>> +
>>> + if (unlikely(!kallsyms_lookup_size_offset(pc, NULL, &offset)))
>>> + goto out;
>>> +
>>> + addr = (u32 *)(pc - offset);
>>> +#ifdef CONFIG_DYNAMIC_FTRACE
>>> + if (addr == (u32 *)ftrace_call)
>>> + addr = (u32 *)ftrace_caller;
>>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>> + else if (addr == (u32 *)ftrace_graph_caller)
>>> +#ifdef CONFIG_DYNAMIC_FTRACE
>>> + addr = (u32 *)ftrace_caller;
>>> +#else
>>> + addr = (u32 *)_mcount;
>>> +#endif
>>> +#endif
>>> +#endif
>>> +
>>> + insn = *addr;
>>> + pos = 1;
>>> +
>>> + /* analyze a function prologue */
>>> + while ((unsigned long)addr < pc) {
>>> + if (aarch64_insn_is_branch_imm(insn) ||
>>> + aarch64_insn_is_br(insn) ||
>>> + aarch64_insn_is_blr(insn) ||
>>> + aarch64_insn_is_ret(insn) ||
>>> + aarch64_insn_is_eret(insn))
>>> + /* exiting a basic block */
>>> + goto out;
>>> +
>>> + if (aarch64_insn_decode_add_sub_imm(insn, &dst, &src,
>>> + &imm, &variant, &adsb_type)) {
>>> + if ((adsb_type == AARCH64_INSN_ADSB_SUB) &&
>>> + (dst == AARCH64_INSN_REG_SP) &&
>>> + (src == AARCH64_INSN_REG_SP)) {
>>> + /*
>>> + * Starting the following sequence:
>>> + * sub sp, sp, #xx
>>> + * stp x29, x30, [sp, #yy]
>>> + * add x29, sp, #yy
>>> + */
>>> + WARN_ON(pos != 1);
>>> + pos = 2;
>>> + *size += imm;
>>> + } else if ((adsb_type == AARCH64_INSN_ADSB_ADD) &&
>>> + (dst == AARCH64_INSN_REG_29) &&
>>> + (src == AARCH64_INSN_REG_SP)) {
>>> + /*
>>> + * add x29, sp, #yy
>>> + * or
>>> + * mov x29, sp
>>> + */
>>> + WARN_ON(pos != 3);
>>> + pos = 0;
>>> + *size2 = imm;
>>> +
>>> + break;
>>> + }
>>> + } else if (aarch64_insn_decode_load_store_pair(insn,
>>> + &reg1, &reg2, &base, &imm,
>>> + &variant, &ldst_type)) {
>>> + if ((ldst_type ==
>>> + AARCH64_INSN_LDST_STORE_PAIR_PRE_INDEX) &&
>>> + (reg1 == AARCH64_INSN_REG_29) &&
>>> + (reg2 == AARCH64_INSN_REG_30) &&
>>> + (base == AARCH64_INSN_REG_SP)) {
>>> + /*
>>> + * Starting the following sequence:
>>> + * stp x29, x30, [sp, #-xx]!
>>> + * mov x29, sp
>>> + */
>>> + WARN_ON(!((pos == 1) || (pos == 2)));
>>> + pos = 3;
>>> + *size += -imm;
>>> + } else if ((ldst_type ==
>>> + AARCH64_INSN_LDST_STORE_PAIR) &&
>>> + (reg1 == AARCH64_INSN_REG_29) &&
>>> + (reg2 == AARCH64_INSN_REG_30) &&
>>> + (base == AARCH64_INSN_REG_SP)) {
>>> + /*
>>> + * stp x29, x30, [sp, #yy]
>>> + */
>>> + WARN_ON(pos != 2);
>>> + pos = 3;
>>> + }
>>> + }
>>> +
>>> + addr++;
>>> + insn = *addr;
>>> + }
>>> +
>>> +out:
>>> + return pos;
>>> +}
>>> +#endif
>>> +
>>
>> A small instruction decoder in software!
>>
>> From a high level point of view, it makes sense to deal with only three main
>> cases. Since it is tough to deal with all function prologues in perspective
>> of computational overhead and implementation complexity, the above change looks
>> like a pretty good tradeoff. This is what I can say now. I'm also interested
>> in feedbacks from other folks.
>
> I really appreciate your reviews and comments on the whole series of my patches
> (#2 to #5).
> May I add Reviewed-by (and Tested-by?) with your name if you like?

Sure. Tested-by is also okay.

Regarding patch6, the option would be useful if this analyzer is accepted. So,
like this patch5, I'm waiting for other folk's response.

How about adding your document [1] to this series? I think it explains all
backgrounds, such as motivation, issue, approach, and TODO, on this series.

[1] http://www.spinics.net/lists/arm-kernel/msg444300.html

Best Regards
Jungseok Lee-

2015-11-13 15:02:04

by Jungseok Lee

[permalink] [raw]
Subject: Re: [PATCH v5 2/6] arm64: ftrace: fix a stack tracer's output under function graph tracer

(+ Li Bin in CC)

On Nov 10, 2015, at 11:42 AM, AKASHI Takahiro wrote:
> On 11/09/2015 11:04 PM, Jungseok Lee wrote:
>> On Nov 6, 2015, at 3:44 PM, AKASHI Takahiro wrote:
>>
>> Hi Akashi,
>>
>>> Function graph tracer modifies a return address (LR) in a stack frame
>>> to hook a function return. This will result in many useless entries
>>> (return_to_handler) showing up in a stack tracer's output.
>>>
>>> This patch replaces such entries with originals values preserved in
>>> current->ret_stack[].
>>>
>>> Signed-off-by: AKASHI Takahiro <[email protected]>
>>> ---
>>> arch/arm64/include/asm/ftrace.h | 2 ++
>>> arch/arm64/kernel/stacktrace.c | 21 +++++++++++++++++++++
>>> 2 files changed, 23 insertions(+)
>>>
>>> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
>>> index c5534fa..3c60f37 100644
>>> --- a/arch/arm64/include/asm/ftrace.h
>>> +++ b/arch/arm64/include/asm/ftrace.h
>>> @@ -28,6 +28,8 @@ struct dyn_arch_ftrace {
>>>
>>> extern unsigned long ftrace_graph_call;
>>>
>>> +extern void return_to_handler(void);
>>> +
>>> static inline unsigned long ftrace_call_adjust(unsigned long addr)
>>> {
>>> /*
>>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
>>> index ccb6078..5fd3477 100644
>>> --- a/arch/arm64/kernel/stacktrace.c
>>> +++ b/arch/arm64/kernel/stacktrace.c
>>> @@ -17,6 +17,7 @@
>>> */
>>> #include <linux/kernel.h>
>>> #include <linux/export.h>
>>> +#include <linux/ftrace.h>
>>> #include <linux/sched.h>
>>> #include <linux/stacktrace.h>
>>>
>>> @@ -73,6 +74,9 @@ struct stack_trace_data {
>>> struct stack_trace *trace;
>>> unsigned int no_sched_functions;
>>> unsigned int skip;
>>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>> + unsigned int ret_stack_index;
>>> +#endif
>>> };
>>>
>>> static int save_trace(struct stackframe *frame, void *d)
>>> @@ -81,6 +85,20 @@ static int save_trace(struct stackframe *frame, void *d)
>>> struct stack_trace *trace = data->trace;
>>> unsigned long addr = frame->pc;
>>>
>>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>> + if (addr == (unsigned long)return_to_handler - AARCH64_INSN_SIZE) {
>>
>> not if (adds == (unsigned long)return_to_handler)?
>>
>>> + /*
>>> + * This is a case where function graph tracer has
>>> + * modified a return address (LR) in a stack frame
>>> + * to hook a function return.
>>> + * So replace it to an original value.
>>> + */
>>> + frame->pc = addr =
>>> + current->ret_stack[data->ret_stack_index--].ret
>>> + - AARCH64_INSN_SIZE;
>>
>> Ditto. not without AARCH64_INSN_SIZE?
>>
>> I've observed many return_to_handler without the changes.
>> Am I missing something?
>
> You're right!
> I thought I had tested the patches, but...
>
>>> + }
>>> +#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
>>> +
>>> if (data->no_sched_functions && in_sched_functions(addr))
>>> return 0;
>>> if (data->skip) {
>>> @@ -100,6 +118,9 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
>>>
>>> data.trace = trace;
>>> data.skip = trace->skip;
>>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>> + data.ret_stack_index = current->curr_ret_stack;
>>
>> Can I get an idea on why current->curr_ret_stack is used instead of
>> tsk->curr_ret_stack?
>
> Thanks for pointing this out.
> Will fix it although it works without a change since save_stack_trace_sp() is
> called only in a 'current task' context.
>
> -Takahiro AKASHI

As reading function_graph related codes in arm64, I've realized that this issue
can be observed from three different sources.

(A) stack tracer of ftrace
(B) perf call trace (perf record with '-g' option)
(C) dump_backtrace

The issue is orthogonal to the commit, e306dfd06f, and its revert. It seems that
Steve's approach, 7ee991fbc6, would be valid on arm64 and cover all three cases.
It does in case of x86. Li Bin posted a patch [1] to solve the issue from case(C)
in Steve's way. This hunk deals with case(A) with its own implementation. But,
case(B) is not covered yet. It can be reproduced easily with the following steps.

# echo function_graph > /sys/kernel/debug/tracing/current_tracer
# perf record -g sleep 2
# perf report --call-graph

So, how about considering Steve's approach on arm64 and then covering all three
cases with it? It would be good in code consolidation perspective. Note that the
idea is applied to arch/sh.

Best Regards
Jungseok Lee

[1] https://lkml.org/lkml/2015/10/15/368-

2015-11-16 09:24:07

by AKASHI Takahiro

[permalink] [raw]
Subject: Re: [PATCH v5 2/6] arm64: ftrace: fix a stack tracer's output under function graph tracer

Jungseok,

On 11/14/2015 12:01 AM, Jungseok Lee wrote:
> (+ Li Bin in CC)
>
> On Nov 10, 2015, at 11:42 AM, AKASHI Takahiro wrote:
>> On 11/09/2015 11:04 PM, Jungseok Lee wrote:
>>> On Nov 6, 2015, at 3:44 PM, AKASHI Takahiro wrote:
>>>
>>> Hi Akashi,
>>>
>>>> Function graph tracer modifies a return address (LR) in a stack frame
>>>> to hook a function return. This will result in many useless entries
>>>> (return_to_handler) showing up in a stack tracer's output.
>>>>
>>>> This patch replaces such entries with originals values preserved in
>>>> current->ret_stack[].
>>>>
>>>> Signed-off-by: AKASHI Takahiro <[email protected]>
>>>> ---
>>>> arch/arm64/include/asm/ftrace.h | 2 ++
>>>> arch/arm64/kernel/stacktrace.c | 21 +++++++++++++++++++++
>>>> 2 files changed, 23 insertions(+)
>>>>
>>>> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
>>>> index c5534fa..3c60f37 100644
>>>> --- a/arch/arm64/include/asm/ftrace.h
>>>> +++ b/arch/arm64/include/asm/ftrace.h
>>>> @@ -28,6 +28,8 @@ struct dyn_arch_ftrace {
>>>>
>>>> extern unsigned long ftrace_graph_call;
>>>>
>>>> +extern void return_to_handler(void);
>>>> +
>>>> static inline unsigned long ftrace_call_adjust(unsigned long addr)
>>>> {
>>>> /*
>>>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
>>>> index ccb6078..5fd3477 100644
>>>> --- a/arch/arm64/kernel/stacktrace.c
>>>> +++ b/arch/arm64/kernel/stacktrace.c
>>>> @@ -17,6 +17,7 @@
>>>> */
>>>> #include <linux/kernel.h>
>>>> #include <linux/export.h>
>>>> +#include <linux/ftrace.h>
>>>> #include <linux/sched.h>
>>>> #include <linux/stacktrace.h>
>>>>
>>>> @@ -73,6 +74,9 @@ struct stack_trace_data {
>>>> struct stack_trace *trace;
>>>> unsigned int no_sched_functions;
>>>> unsigned int skip;
>>>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>>> + unsigned int ret_stack_index;
>>>> +#endif
>>>> };
>>>>
>>>> static int save_trace(struct stackframe *frame, void *d)
>>>> @@ -81,6 +85,20 @@ static int save_trace(struct stackframe *frame, void *d)
>>>> struct stack_trace *trace = data->trace;
>>>> unsigned long addr = frame->pc;
>>>>
>>>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>>> + if (addr == (unsigned long)return_to_handler - AARCH64_INSN_SIZE) {
>>>
>>> not if (adds == (unsigned long)return_to_handler)?
>>>
>>>> + /*
>>>> + * This is a case where function graph tracer has
>>>> + * modified a return address (LR) in a stack frame
>>>> + * to hook a function return.
>>>> + * So replace it to an original value.
>>>> + */
>>>> + frame->pc = addr =
>>>> + current->ret_stack[data->ret_stack_index--].ret
>>>> + - AARCH64_INSN_SIZE;
>>>
>>> Ditto. not without AARCH64_INSN_SIZE?
>>>
>>> I've observed many return_to_handler without the changes.
>>> Am I missing something?
>>
>> You're right!
>> I thought I had tested the patches, but...
>>
>>>> + }
>>>> +#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
>>>> +
>>>> if (data->no_sched_functions && in_sched_functions(addr))
>>>> return 0;
>>>> if (data->skip) {
>>>> @@ -100,6 +118,9 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
>>>>
>>>> data.trace = trace;
>>>> data.skip = trace->skip;
>>>> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>>> + data.ret_stack_index = current->curr_ret_stack;
>>>
>>> Can I get an idea on why current->curr_ret_stack is used instead of
>>> tsk->curr_ret_stack?
>>
>> Thanks for pointing this out.
>> Will fix it although it works without a change since save_stack_trace_sp() is
>> called only in a 'current task' context.
>>
>> -Takahiro AKASHI
>
> As reading function_graph related codes in arm64, I've realized that this issue
> can be observed from three different sources.
>
> (A) stack tracer of ftrace
> (B) perf call trace (perf record with '-g' option)
> (C) dump_backtrace
>
> The issue is orthogonal to the commit, e306dfd06f, and its revert. It seems that
> Steve's approach, 7ee991fbc6, would be valid on arm64 and cover all three cases.
> It does in case of x86. Li Bin posted a patch [1] to solve the issue from case(C)
> in Steve's way. This hunk deals with case(A) with its own implementation. But,
> case(B) is not covered yet. It can be reproduced easily with the following steps.
>
> # echo function_graph > /sys/kernel/debug/tracing/current_tracer
> # perf record -g sleep 2
> # perf report --call-graph
>
> So, how about considering Steve's approach on arm64 and then covering all three
> cases with it? It would be good in code consolidation perspective. Note that the
> idea is applied to arch/sh.

Thank you for pointing this out.
I've already fixed all the cases, (A),(B) and (C), but in a different way.
I think that the point is that we should take care of frame->pc under function
graph tracer in one place, that is, unwind_frame().

After a bit more testing, I will submit a new version.
Then please review it again.

Thanks,
-Takahiro AKASHI


> Best Regards
> Jungseok Lee
>
> [1] https://lkml.org/lkml/2015/10/15/368
>