2015-08-04 07:44:45

by AKASHI Takahiro

[permalink] [raw]
Subject: [RFC v2 0/4] arm64: ftrace: fix incorrect output from stack tracer

See the following threads [1],[2] for the background.

With this patch series, I'm trying to fix several problems I noticed
with stack tracer on arm64. But it is rather experimental, and there still
remained are several issues.

Patch #1 modifies ftrace framework so that we can provide arm64-specific
stack tracer later.
(Well, I know there is some room for further refactoring, but this is
just a prototype code.)
Patch #2 implements this arch_check_stack() using unwind_stackframe() to
traverse stack frames and identify a stack index for each traced function.
Patch #3 addresses an issue that stack tracer doesn't work well in
conjuction with function graph tracer.
Patch #4 addresses an issue that unwind_stackframe() misses a function
that is the one when an exception happens by setting up a stack frame
for exception handler.

See below for the result with those patches.
Issues include:
a) Size of gic_handle_irq() is 48 (#13), but should be 64.
b) We cannot identify a location of function which is being returned
and hooked temporarily by return_to_handler() (#18)
c) A meaningless entry (#62) as well as a bogus size for the first
function, el0_svc (#61)
d) Size doesn't contain a function's "dynamically allocated local
variables," if any, but instead is sumed up in child's size.
(See details in [3].)

I'm afraid that we cannot fix issue b) unless we can do *atomically*
push/pop a return adress in current->ret_stack[], increment/decrement
current->curr_ret_stack and restore lr register.

We will be able to fix issue d) once we know all the locations in
the list, including b).

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/354126.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355920.html
[3] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/358034.html


=== stack tracer with function graph tracer
# cat /sys/kernel/debug/tracing/stack_max_size /sys/kernel/debug/tracing/stack_t
race
5376
Depth Size Location (63 entries)
----- ---- --------
0) 5376 64 notifier_call_chain+0x2c/0x94
1) 5312 48 raw_notifier_call_chain+0x34/0x44
2) 5264 64 timekeeping_update+0xe4/0x150
3) 5200 128 update_wall_time+0x400/0x6e4
4) 5072 80 tick_do_update_jiffies64+0xd8/0x154
5) 4992 32 tick_sched_do_timer+0x50/0x60
6) 4960 48 tick_sched_timer+0x34/0x90
7) 4912 112 __hrtimer_run_queues+0xbc/0x278
8) 4800 112 hrtimer_interrupt+0xa0/0x214
9) 4688 32 arch_timer_handler_phys+0x38/0x48
10) 4656 64 handle_percpu_devid_irq+0x84/0x188
11) 4592 32 generic_handle_irq+0x38/0x54
12) 4560 64 __handle_domain_irq+0x68/0xbc
13) 4496 48 gic_handle_irq+0x38/0x88
14) 4448 288 el1_irq+0x68/0xdc
15) 4160 64 sched_clock+0x38/0x88
16) 4096 32 trace_clock_local+0x1c/0x54
17) 4064 96 ftrace_return_to_handler+0x64/0x120
18) 3968 80 (null)
19) 3888 96 xprt_complete_rqst+0xfc/0x1b0
20) 3792 48 xs_udp_data_ready+0x168/0x170
21) 3744 48 sock_queue_rcv_skb+0x1fc/0x270
22) 3696 48 __udp_queue_rcv_skb+0x58/0x19c
23) 3648 96 udp_queue_rcv_skb+0x258/0x3e4
24) 3552 32 __udp4_lib_rcv+0x424/0x684
25) 3520 48 udp_rcv+0x2c/0x3c
26) 3472 64 ip_local_deliver+0xa4/0x224
27) 3408 128 ip_rcv+0x360/0x580
28) 3280 32 __netif_receive_skb_core+0x4d0/0x80c
29) 3248 80 __netif_receive_skb+0x24/0x84
30) 3168 160 process_backlog+0x9c/0x15c
31) 3008 128 net_rx_action+0x1ec/0x32c
32) 2880 32 __do_softirq+0x10c/0x2e8
33) 2848 32 do_softirq+0x60/0x68
34) 2816 96 __local_bh_enable_ip+0xb0/0xd4
35) 2720 64 ip_finish_output2+0x1b8/0x4b8
36) 2656 64 ip_finish_output+0x124/0x1cc
37) 2592 32 ip_output+0xf0/0x120
38) 2560 48 ip_local_out_sk+0x40/0x50
39) 2512 80 ip_send_skb+0x24/0xbc
40) 2432 272 udp_send_skb+0xfc/0x2f4
41) 2160 48 udp_sendmsg+0x2a8/0x7a0
42) 2112 32 inet_sendmsg+0xa0/0xd0
43) 2080 64 sock_sendmsg+0x30/0x78
44) 2016 176 kernel_sendmsg+0x48/0x58
45) 1840 112 xs_send_kvec+0xdc/0xec
46) 1728 64 xs_sendpages+0x88/0x254
47) 1664 64 xs_udp_send_request+0x4c/0xf0
48) 1600 48 xprt_transmit+0x54/0x264
49) 1552 112 call_transmit+0x168/0x200
50) 1440 48 __rpc_execute+0x68/0x37c
51) 1392 32 rpc_execute+0x6c/0xec
52) 1360 112 rpc_run_task+0x90/0xb4
53) 1248 80 rpc_call_sync+0x60/0xdc
54) 1168 48 nfs_proc_getattr+0x54/0x64
55) 1120 96 __nfs_revalidate_inode+0xd0/0x25c
56) 1024 80 nfs_lookup_revalidate+0x468/0x4a0
57) 944 192 lookup_dcache+0x74/0xc8
58) 752 272 path_openat+0x3f4/0xe8c
59) 480 112 do_filp_open+0x70/0xec
60) 368 64 SyS_openat+0x38/0x48
61) 304 -7600 el0_svc_naked+0x20/0x28
62) 7904 7904 0x471e04

AKASHI Takahiro (4):
ftrace: allow arch-specific check_stack()
arm64: ftrace: add arch-specific stack tracer
arm64: ftrace: fix a stack trace result under function graph tracer
arm64: ftrace: add a stack frame for exception handler

arch/arm64/include/asm/ftrace.h | 2 +
arch/arm64/kernel/entry.S | 4 ++
arch/arm64/kernel/ftrace.c | 37 +++++++++++++++
arch/arm64/kernel/stacktrace.c | 97 ++++++++++++++++++++++++++++++++++++++-
include/linux/stacktrace.h | 4 ++
kernel/trace/trace_stack.c | 88 ++++++++++++++++++++---------------
6 files changed, 194 insertions(+), 38 deletions(-)

--
1.7.9.5


2015-08-04 07:44:54

by AKASHI Takahiro

[permalink] [raw]
Subject: [RFC v2 1/4] ftrace: allow arch-specific check_stack()

A stack frame pointer may be used in a different way depending on
cpu architecture. Thus it is not always appropriate to slurp the stack
contents, as currently done in check_stack(), in order to calcurate
a stack index (height) at a given function call. At least not on arm64.

This patch extract potentially arch-specific code from check_stack()
and puts it into a new arch_check_stack(), which is declared as weak.
So we will be able to add arch-specific and most efficient way of
stack traversing Later.

Signed-off-by: AKASHI Takahiro <[email protected]>
---
include/linux/stacktrace.h | 4 ++
kernel/trace/trace_stack.c | 88 ++++++++++++++++++++++++++------------------
2 files changed, 56 insertions(+), 36 deletions(-)

diff --git a/include/linux/stacktrace.h b/include/linux/stacktrace.h
index 0a34489..bfae605 100644
--- a/include/linux/stacktrace.h
+++ b/include/linux/stacktrace.h
@@ -10,6 +10,10 @@ struct pt_regs;
struct stack_trace {
unsigned int nr_entries, max_entries;
unsigned long *entries;
+#ifdef CONFIG_STACK_TRACER
+ unsigned *index;
+ unsigned long *sp;
+#endif
int skip; /* input argument: How many entries to skip */
};

diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index 3d9356b..021b8c3 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -27,9 +27,10 @@ static unsigned stack_dump_index[STACK_TRACE_ENTRIES];
* us to remove most or all of the stack size overhead
* added by the stack tracer itself.
*/
-static struct stack_trace max_stack_trace = {
+ struct stack_trace max_stack_trace = {
.max_entries = STACK_TRACE_ENTRIES - 1,
.entries = &stack_dump_trace[0],
+ .index = &stack_dump_index[0],
};

static unsigned long max_stack_size;
@@ -65,42 +66,15 @@ static inline void print_max_stack(void)
}
}

-static inline void
-check_stack(unsigned long ip, unsigned long *stack)
+void __weak
+arch_check_stack(unsigned long ip, unsigned long *stack,
+ unsigned long *max_size, unsigned int *tracer_size)
{
- unsigned long this_size, flags; unsigned long *p, *top, *start;
- static int tracer_frame;
- int frame_size = ACCESS_ONCE(tracer_frame);
+ unsigned long *p, *top, *start;
+ unsigned long this_size;
int i, x;

- this_size = ((unsigned long)stack) & (THREAD_SIZE-1);
- this_size = THREAD_SIZE - this_size;
- /* Remove the frame of the tracer */
- this_size -= frame_size;
-
- if (this_size <= max_stack_size)
- return;
-
- /* we do not handle interrupt stacks yet */
- if (!object_is_on_stack(stack))
- return;
-
- local_irq_save(flags);
- arch_spin_lock(&max_stack_lock);
-
- /* In case another CPU set the tracer_frame on us */
- if (unlikely(!frame_size))
- this_size -= tracer_frame;
-
- /* a race could have already updated it */
- if (this_size <= max_stack_size)
- goto out;
-
- max_stack_size = this_size;
-
- max_stack_trace.nr_entries = 0;
max_stack_trace.skip = 3;
-
save_stack_trace(&max_stack_trace);

/* Skip over the overhead of the stack tracer itself */
@@ -116,6 +90,7 @@ check_stack(unsigned long ip, unsigned long *stack)
start = stack;
top = (unsigned long *)
(((unsigned long)start & ~(THREAD_SIZE-1)) + THREAD_SIZE);
+ this_size = *max_size;

/*
* Loop through all the entries. One of the entries may
@@ -146,10 +121,10 @@ check_stack(unsigned long ip, unsigned long *stack)
* out what that is, then figure it out
* now.
*/
- if (unlikely(!tracer_frame)) {
- tracer_frame = (p - stack) *
+ if (unlikely(!*tracer_size)) {
+ *tracer_size = (p - stack) *
sizeof(unsigned long);
- max_stack_size -= tracer_frame;
+ *max_size -= *tracer_size;
}
}
}
@@ -161,6 +136,47 @@ check_stack(unsigned long ip, unsigned long *stack)
max_stack_trace.nr_entries = x;
for (; x < i; x++)
stack_dump_trace[x] = ULONG_MAX;
+}
+
+static inline void
+check_stack(unsigned long ip, unsigned long *stack)
+{
+ unsigned long this_size, flags;
+ static int tracer_frame;
+ int frame_size = ACCESS_ONCE(tracer_frame);
+
+ this_size = ((unsigned long)stack) & (THREAD_SIZE-1);
+ this_size = THREAD_SIZE - this_size;
+ /* for safety, depending on arch_check_stack() */
+ if (this_size < frame_size)
+ return;
+
+ /* Remove the frame of the tracer */
+ this_size -= frame_size;
+
+ if (this_size <= max_stack_size)
+ return;
+
+ /* we do not handle interrupt stacks yet */
+ if (!object_is_on_stack(stack))
+ return;
+
+ local_irq_save(flags);
+ arch_spin_lock(&max_stack_lock);
+
+ /* In case another CPU set the tracer_frame on us */
+ if (unlikely(!frame_size))
+ this_size -= tracer_frame;
+
+ /* a race could have already updated it */
+ if (this_size <= max_stack_size)
+ goto out;
+
+ max_stack_size = this_size;
+
+ max_stack_trace.nr_entries = 0;
+
+ arch_check_stack(ip, stack, &max_stack_size, &tracer_frame);

if (task_stack_end_corrupted(current)) {
print_max_stack();
--
1.7.9.5

2015-08-04 07:45:03

by AKASHI Takahiro

[permalink] [raw]
Subject: [RFC v2 2/4] arm64: ftrace: add arch-specific stack tracer

This patch uses walk_stackframe(), instead of slurping stack contents
as orignal check_stack() does, to identify each stack frame.
return_to_handler() is handled in a special way because it is not
a function, but invoked via function graph tracer by faking a saved lr
register on stack.

Signed-off-by: AKASHI Takahiro <[email protected]>
---
arch/arm64/include/asm/ftrace.h | 2 ++
arch/arm64/kernel/ftrace.c | 37 +++++++++++++++++++++++++
arch/arm64/kernel/stacktrace.c | 58 +++++++++++++++++++++++++++++++++++++--
3 files changed, 95 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index 2b43e20..b7d597c 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -29,6 +29,8 @@ struct dyn_arch_ftrace {

extern unsigned long ftrace_graph_call;

+extern void return_to_handler(void);
+
static inline unsigned long ftrace_call_adjust(unsigned long addr)
{
/*
diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index c851be7..d812870 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -176,3 +176,40 @@ int ftrace_disable_ftrace_graph_caller(void)
}
#endif /* CONFIG_DYNAMIC_FTRACE */
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
+
+#ifdef CONFIG_STACK_TRACER
+#define stack_top(fp) (((fp) & ~(THREAD_SIZE-1)) + THREAD_SIZE)
+#define stack_index(fp) (stack_top((fp)) - (fp))
+
+extern struct stack_trace max_stack_trace;
+extern void save_stack_trace_index(struct stack_trace *trace);
+
+void arch_check_stack(unsigned long ip, unsigned long *stack,
+ unsigned long *max_size, int *tracer_size)
+{
+ int i, j;
+
+ max_stack_trace.skip = 0;
+ save_stack_trace_index(&max_stack_trace);
+ max_stack_trace.nr_entries--; /* for '-1' entry */
+
+ /* Skip over the overhead of the stack tracer itself */
+ for (i = 0; i < max_stack_trace.nr_entries; i++) {
+ if ((max_stack_trace.entries[i] + FTRACE_STACK_FRAME_OFFSET)
+ == ip)
+ break;
+ }
+
+ if (unlikely(!*tracer_size)) {
+ *tracer_size = stack_index((unsigned long)stack)
+ - max_stack_trace.index[i];
+ *max_size -= *tracer_size;
+ }
+
+ max_stack_trace.nr_entries -= i;
+ for (j = 0; j < max_stack_trace.nr_entries; j++) {
+ max_stack_trace.index[j] = max_stack_trace.index[j + i];
+ max_stack_trace.entries[j] = max_stack_trace.entries[j + i];
+ }
+}
+#endif /* CONFIG_STACK_TRACER */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index bc0689a..496ab0f 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -17,12 +17,15 @@
*/
#include <linux/kernel.h>
#include <linux/export.h>
+#include <linux/ftrace.h>
#include <linux/sched.h>
#include <linux/stacktrace.h>

#include <asm/insn.h>
#include <asm/stacktrace.h>

+#define S_FRAME_SIZE sizeof(struct pt_regs) /* asm-offsets.h */
+
/*
* AArch64 PCS assigns the frame pointer to x29.
*
@@ -78,9 +81,29 @@ struct stack_trace_data {
struct stack_trace *trace;
unsigned int no_sched_functions;
unsigned int skip;
+#ifdef CONFIG_STACK_TRACER
+ int ftracer;
+ int ret_stack_index;
+#endif
};

-static int save_trace(struct stackframe *frame, void *d)
+#ifdef CONFIG_STACK_TRACER
+static void notrace arm64_stack_index(struct stackframe *frame,
+ struct stack_trace_data *data)
+{
+ struct stack_trace *trace = data->trace;
+ unsigned long top;
+ unsigned int x = trace->nr_entries;
+
+ top = (frame->fp & ~(THREAD_SIZE-1)) + THREAD_SIZE;
+ trace->index[x] = top - frame->fp;
+ /* should not go beyond this frame */
+ if (trace->index[x] == THREAD_SIZE)
+ trace->index[x] = 0;
+}
+#endif /* CONFIG_STACK_TRACER */
+
+static int notrace save_trace(struct stackframe *frame, void *d)
{
struct stack_trace_data *data = d;
struct stack_trace *trace = data->trace;
@@ -93,7 +116,13 @@ static int save_trace(struct stackframe *frame, void *d)
return 0;
}

- trace->entries[trace->nr_entries++] = addr;
+ trace->entries[trace->nr_entries] = addr;
+#ifdef CONFIG_STACK_TRACER
+ if (data->ftracer) {
+ arm64_stack_index(frame, data);
+ }
+#endif
+ trace->nr_entries++;

return trace->nr_entries >= trace->max_entries;
}
@@ -105,6 +134,9 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)

data.trace = trace;
data.skip = trace->skip;
+#ifdef CONFIG_STACK_TRACER
+ data.ftracer = 0;
+#endif

if (tsk != current) {
data.no_sched_functions = 1;
@@ -128,4 +160,26 @@ void save_stack_trace(struct stack_trace *trace)
save_stack_trace_tsk(current, trace);
}
EXPORT_SYMBOL_GPL(save_stack_trace);
+
+#ifdef CONFIG_STACK_TRACER
+void notrace save_stack_trace_index(struct stack_trace *trace)
+{
+ struct stack_trace_data data;
+ struct stackframe frame;
+
+ data.trace = trace;
+ data.skip = trace->skip;
+ data.ftracer = 1;
+ data.ret_stack_index = current->curr_ret_stack;
+
+ data.no_sched_functions = 0;
+ frame.fp = (unsigned long)__builtin_frame_address(0);
+ frame.sp = current_stack_pointer;
+ frame.pc = (unsigned long)save_stack_trace_index;
+
+ walk_stackframe(&frame, save_trace, &data);
+ if (trace->nr_entries < trace->max_entries)
+ trace->entries[trace->nr_entries++] = ULONG_MAX;
+}
+#endif /* CONFIG_STACK_TRACER */
#endif
--
1.7.9.5

2015-08-04 07:45:12

by AKASHI Takahiro

[permalink] [raw]
Subject: [RFC v2 3/4] arm64: ftrace: fix a stack trace result under function graph tracer

Function graph tracer modifies a saved lr register on stack in order
to hook a function return. This results in finding many bogus entries
in a stack trace list.

This patch replaces such entries with originals values stored in
current->ret_stack[].

Signed-off-by: AKASHI Takahiro <[email protected]>
---
arch/arm64/kernel/stacktrace.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 496ab0f..d1790eb 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -100,6 +100,28 @@ static void notrace arm64_stack_index(struct stackframe *frame,
/* should not go beyond this frame */
if (trace->index[x] == THREAD_SIZE)
trace->index[x] = 0;
+
+ if (trace->entries[x] ==
+ ((unsigned long)return_to_handler + 0x8)) {
+ /*
+ * This is a case where return_to_handler() is calling
+ * ftrace_return_to_handler(). As we are already on
+ * an original function's stack, we have no way to fetch
+ * a correct pc value, just skip it.
+ */
+ trace->entries[x] = 0x0;
+ } else if (trace->entries[x] ==
+ (unsigned long)return_to_handler - AARCH64_INSN_SIZE) {
+ /*
+ * This is a case where function graph tracer has
+ * modified lr register on a stack to hook a function
+ * return.
+ * So replace it to original value.
+ */
+ trace->entries[x] =
+ current->ret_stack[data->ret_stack_index--].ret
+ - AARCH64_INSN_SIZE;
+ }
}
#endif /* CONFIG_STACK_TRACER */

--
1.7.9.5

2015-08-04 07:45:20

by AKASHI Takahiro

[permalink] [raw]
Subject: [RFC v2 4/4] arm64: ftrace: add a stack frame for exception handler

On arm64, an exception handler use the same stack as in non-exception
contexts, but doesn't create a stack frame for elx_xx entry, only updating
sp register. This behavior results in save_stace_trace() missing a function
that is the one when an exception happens.

This patch creates a stack frame for this case, and puts an additional
entry for the function in a stack trace list.

Signed-off-by: AKASHI Takahiro <[email protected]>
---
arch/arm64/kernel/entry.S | 4 ++++
arch/arm64/kernel/stacktrace.c | 17 +++++++++++++++++
2 files changed, 21 insertions(+)

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index f860bfd..aacb6c6 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -107,6 +107,8 @@
str x21, [sp, #S_SYSCALLNO]
.endif

+ /* create a stack frame for stack tracer */
+ mov x29, sp
/*
* Registers that may be useful after this macro is invoked:
*
@@ -737,3 +739,5 @@ ENTRY(sys_rt_sigreturn_wrapper)
mov x0, sp
b sys_rt_sigreturn
ENDPROC(sys_rt_sigreturn_wrapper)
+
+ENTRY(end_of_vectors)
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index d1790eb..22ce7c9 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -25,6 +25,10 @@
#include <asm/stacktrace.h>

#define S_FRAME_SIZE sizeof(struct pt_regs) /* asm-offsets.h */
+#define S_FP offsetof(struct pt_regs, regs[29])
+#define S_LR offsetof(struct pt_regs, regs[30])
+
+extern unsigned int *vectors, *end_of_vectors;

/*
* AArch64 PCS assigns the frame pointer to x29.
@@ -50,6 +54,19 @@ int notrace unwind_frame(struct stackframe *frame)
if (fp < low || fp > high - 0x18 || fp & 0xf)
return -EINVAL;

+ if ((frame->pc >= (unsigned long)&vectors) &&
+ (frame->pc < (unsigned long)&end_of_vectors)) {
+ /*
+ * Expection handler does not use a normal format of
+ * stack frame, but allocates struct pt_regs.
+ */
+ frame->sp = frame->sp + S_FRAME_SIZE;
+ frame->fp = *(unsigned long *)(fp + S_FP);
+ frame->pc = *(unsigned long *)(fp + S_LR);
+
+ return 0;
+ }
+
frame->sp = fp + 0x10;
frame->fp = *(unsigned long *)(fp);
/*
--
1.7.9.5

2015-08-11 14:52:04

by Jungseok Lee

[permalink] [raw]
Subject: Re: [RFC v2 0/4] arm64: ftrace: fix incorrect output from stack tracer

On Aug 4, 2015, at 4:44 PM, AKASHI Takahiro wrote:

Hi Akashi,

> See the following threads [1],[2] for the background.
>
> With this patch series, I'm trying to fix several problems I noticed
> with stack tracer on arm64. But it is rather experimental, and there still
> remained are several issues.
>
> Patch #1 modifies ftrace framework so that we can provide arm64-specific
> stack tracer later.
> (Well, I know there is some room for further refactoring, but this is
> just a prototype code.)
> Patch #2 implements this arch_check_stack() using unwind_stackframe() to
> traverse stack frames and identify a stack index for each traced function.
> Patch #3 addresses an issue that stack tracer doesn't work well in
> conjuction with function graph tracer.
> Patch #4 addresses an issue that unwind_stackframe() misses a function
> that is the one when an exception happens by setting up a stack frame
> for exception handler.
>
> See below for the result with those patches.
> Issues include:
> a) Size of gic_handle_irq() is 48 (#13), but should be 64.
> b) We cannot identify a location of function which is being returned
> and hooked temporarily by return_to_handler() (#18)
> c) A meaningless entry (#62) as well as a bogus size for the first
> function, el0_svc (#61)
> d) Size doesn't contain a function's "dynamically allocated local
> variables," if any, but instead is sumed up in child's size.
> (See details in [3].)
>
> I'm afraid that we cannot fix issue b) unless we can do *atomically*
> push/pop a return adress in current->ret_stack[], increment/decrement
> current->curr_ret_stack and restore lr register.
>
> We will be able to fix issue d) once we know all the locations in
> the list, including b).
>
> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/354126.html
> [2] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355920.html
> [3] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/358034.html

I hope I'm not too late..

This series looks written on top of the hunk in the end of this reply.

I've strongly agreed with your opinion as digging out this issue. We need to analyse
the first instruction of each function, "stp x29, x30, [sp, #val]!", in order to
solve this problem clearly.

How about decoding the store-pair instruction? If so, we could record a correct position
into stack_dump_index. That is, in your ascii art, top-sp0 and top-sp1 are written into
stack_dump_index[i+1] and stack_dump_index[i] respectively.

sp2 +-------+ <--------- func-2's stackframe
| |
| |
fp2 +-------+
| fp1 |
+-------+ <-- p1 (*p1 == stack_dump_trace[i] == lr1)
| lr1 |
+-------+
| |
| func-2's local variables
| |
sp1 +-------+ <--------- func-1(lr1)'s stackframe
| | (stack_dump_index[i] = top - p1)
| func-1's dynamic local variables
| |
fp1 +-------+
| fp0 |
+-------+ <-- p0 (*p0 == stack_dump_trace[i+1] == lr0)
| lr0 |
+-------+
| |
| func-1's local variables
| |
sp0 +-------+ <--------- func-0(lr0)'s stackframe
| | (stack_dump_index[i+1] = top - p0)
| |
*-------+ top

Best Regards
Jungseok Lee

----8<----
diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index c5534fa..2b43e20 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -13,8 +13,9 @@

#include <asm/insn.h>

-#define MCOUNT_ADDR ((unsigned long)_mcount)
-#define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
+#define MCOUNT_ADDR ((unsigned long)_mcount)
+#define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
+#define FTRACE_STACK_FRAME_OFFSET AARCH64_INSN_SIZE

#ifndef __ASSEMBLY__
#include <linux/compat.h>
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 407991b..9ab67af 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -20,6 +20,7 @@
#include <linux/sched.h>
#include <linux/stacktrace.h>

+#include <asm/insn.h>
#include <asm/stacktrace.h>

/*
@@ -52,7 +53,7 @@ int notrace unwind_frame(struct stackframe *frame)
* -4 here because we care about the PC at time of bl,
* not where the return will go.
*/
- frame->pc = *(unsigned long *)(fp + 8) - 4;
+ frame->pc = *(unsigned long *)(fp + 8) - AARCH64_INSN_SIZE;

return 0;
}
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 1da6029..6566201 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -260,6 +260,9 @@ static inline void ftrace_kill(void) { }
#endif /* CONFIG_FUNCTION_TRACER */

#ifdef CONFIG_STACK_TRACER
+#ifndef FTRACE_STACK_FRAME_OFFSET
+#define FTRACE_STACK_FRAME_OFFSET 0
+#endif
extern int stack_tracer_enabled;
int
stack_trace_sysctl(struct ctl_table *table, int write,
diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index b746399..30521ea 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -105,7 +105,7 @@ check_stack(unsigned long ip, unsigned long *stack)

/* Skip over the overhead of the stack tracer itself */
for (i = 0; i < max_stack_trace.nr_entries; i++) {
- if (stack_dump_trace[i] == ip)
+ if ((stack_dump_trace[i] + FTRACE_STACK_FRAME_OFFSET) == ip)
break;
}

@@ -133,7 +133,8 @@ check_stack(unsigned long ip, unsigned long *stack)
for (; p < top && i < max_stack_trace.nr_entries; p++) {
if (stack_dump_trace[i] == ULONG_MAX)
break;
- if (*p == stack_dump_trace[i]) {
+ if (*p == (stack_dump_trace[i]
+ + FTRACE_STACK_FRAME_OFFSET)) {
stack_dump_trace[x] = stack_dump_trace[i++];
this_size = stack_dump_index[x++] =
(top - p) * sizeof(unsigned long);
----8<----

2015-08-11 14:57:58

by Jungseok Lee

[permalink] [raw]
Subject: Re: [RFC v2 4/4] arm64: ftrace: add a stack frame for exception handler

On Aug 4, 2015, at 4:44 PM, AKASHI Takahiro wrote:

Hi Akashi,

> On arm64, an exception handler use the same stack as in non-exception
> contexts, but doesn't create a stack frame for elx_xx entry, only updating
> sp register. This behavior results in save_stace_trace() missing a function
> that is the one when an exception happens.
>
> This patch creates a stack frame for this case, and puts an additional
> entry for the function in a stack trace list.
>
> Signed-off-by: AKASHI Takahiro <[email protected]>
> ---
> arch/arm64/kernel/entry.S | 4 ++++
> arch/arm64/kernel/stacktrace.c | 17 +++++++++++++++++
> 2 files changed, 21 insertions(+)
>
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index f860bfd..aacb6c6 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -107,6 +107,8 @@
> str x21, [sp, #S_SYSCALLNO]
> .endif
>
> + /* create a stack frame for stack tracer */
> + mov x29, sp
> /*
> * Registers that may be useful after this macro is invoked:
> *
> @@ -737,3 +739,5 @@ ENTRY(sys_rt_sigreturn_wrapper)
> mov x0, sp
> b sys_rt_sigreturn
> ENDPROC(sys_rt_sigreturn_wrapper)
> +
> +ENTRY(end_of_vectors)
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index d1790eb..22ce7c9 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -25,6 +25,10 @@
> #include <asm/stacktrace.h>
>
> #define S_FRAME_SIZE sizeof(struct pt_regs) /* asm-offsets.h */
> +#define S_FP offsetof(struct pt_regs, regs[29])
> +#define S_LR offsetof(struct pt_regs, regs[30])
> +
> +extern unsigned int *vectors, *end_of_vectors;
>
> /*
> * AArch64 PCS assigns the frame pointer to x29.
> @@ -50,6 +54,19 @@ int notrace unwind_frame(struct stackframe *frame)
> if (fp < low || fp > high - 0x18 || fp & 0xf)
> return -EINVAL;
>
> + if ((frame->pc >= (unsigned long)&vectors) &&
> + (frame->pc < (unsigned long)&end_of_vectors)) {
> + /*
> + * Expection handler does not use a normal format of
> + * stack frame, but allocates struct pt_regs.
> + */
> + frame->sp = frame->sp + S_FRAME_SIZE;
> + frame->fp = *(unsigned long *)(fp + S_FP);
> + frame->pc = *(unsigned long *)(fp + S_LR);

Not frame->pc = *(unsigned long *)(fp + S_PC)? Don't we need to look up elr_el1
since this is an exception?

> +
> + return 0;
> + }
> +
> frame->sp = fp + 0x10;

I'm just curious about this constant, 0x10. Do you have an idea on this value?
As reviewing objdump of vmlinux, it looks needed to analyze the first store-pair
instruction of each function.

Please correct me if I'm wrong.

Best Regards
Jungseok Lee-

2015-08-11 17:03:27

by Will Deacon

[permalink] [raw]
Subject: Re: [RFC v2 1/4] ftrace: allow arch-specific check_stack()

On Tue, Aug 04, 2015 at 08:44:06AM +0100, AKASHI Takahiro wrote:
> A stack frame pointer may be used in a different way depending on
> cpu architecture. Thus it is not always appropriate to slurp the stack
> contents, as currently done in check_stack(), in order to calcurate
> a stack index (height) at a given function call. At least not on arm64.
>
> This patch extract potentially arch-specific code from check_stack()
> and puts it into a new arch_check_stack(), which is declared as weak.
> So we will be able to add arch-specific and most efficient way of
> stack traversing Later.
>
> Signed-off-by: AKASHI Takahiro <[email protected]>

If arm64 is the only architecture behaving differently, then I'm happy
to reconsider the fix to unwind_frame that we merged in e306dfd06fcb
("ARM64: unwind: Fix PC calculation"). I'd have thought any architecture
with a branch-and-link instruction would potentially have the same issue,
so we could just be fixing things in the wrong place if ftrace works
everywhere else.

Will

2015-08-17 04:51:07

by AKASHI Takahiro

[permalink] [raw]
Subject: Re: [RFC v2 0/4] arm64: ftrace: fix incorrect output from stack tracer

Hi

On 08/11/2015 11:52 PM, Jungseok Lee wrote:
> On Aug 4, 2015, at 4:44 PM, AKASHI Takahiro wrote:
>
> Hi Akashi,
>
>> See the following threads [1],[2] for the background.
>>
>> With this patch series, I'm trying to fix several problems I noticed
>> with stack tracer on arm64. But it is rather experimental, and there still
>> remained are several issues.
>>
>> Patch #1 modifies ftrace framework so that we can provide arm64-specific
>> stack tracer later.
>> (Well, I know there is some room for further refactoring, but this is
>> just a prototype code.)
>> Patch #2 implements this arch_check_stack() using unwind_stackframe() to
>> traverse stack frames and identify a stack index for each traced function.
>> Patch #3 addresses an issue that stack tracer doesn't work well in
>> conjuction with function graph tracer.
>> Patch #4 addresses an issue that unwind_stackframe() misses a function
>> that is the one when an exception happens by setting up a stack frame
>> for exception handler.
>>
>> See below for the result with those patches.
>> Issues include:
>> a) Size of gic_handle_irq() is 48 (#13), but should be 64.
>> b) We cannot identify a location of function which is being returned
>> and hooked temporarily by return_to_handler() (#18)
>> c) A meaningless entry (#62) as well as a bogus size for the first
>> function, el0_svc (#61)
>> d) Size doesn't contain a function's "dynamically allocated local
>> variables," if any, but instead is sumed up in child's size.
>> (See details in [3].)
>>
>> I'm afraid that we cannot fix issue b) unless we can do *atomically*
>> push/pop a return adress in current->ret_stack[], increment/decrement
>> current->curr_ret_stack and restore lr register.
>>
>> We will be able to fix issue d) once we know all the locations in
>> the list, including b).
>>
>> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/354126.html
>> [2] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355920.html
>> [3] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/358034.html
>
> I hope I'm not too late..

I was on vacation last week.

> This series looks written on top of the hunk in the end of this reply.
>
> I've strongly agreed with your opinion as digging out this issue. We need to analyse
> the first instruction of each function, "stp x29, x30, [sp, #val]!", in order to
> solve this problem clearly.

As far as I notice, the following is not the only prologue:
stp x29,x30, [sp,#-xx]!
mov x29,sp
but some functions may have another one like:
sub sp, sp, #xx
stp x29,x30, [sp, #16]
add x29,sp, #16
Even worse, I see some variant, for example in trace_graph_entry(),
adrp x2, 0x...
stp x29,x30,[sp,#-xx]!
add x4, x2, #..
mov x29,x30

So parsing the function prologue perfectly is a bit more complicated than imagined.
I'm now asking some gcc guy for more information.

Thanks,
-Takahiro AKASHI

> How about decoding the store-pair instruction? If so, we could record a correct position
> into stack_dump_index. That is, in your ascii art, top-sp0 and top-sp1 are written into
> stack_dump_index[i+1] and stack_dump_index[i] respectively.
>
> sp2 +-------+ <--------- func-2's stackframe
> | |
> | |
> fp2 +-------+
> | fp1 |
> +-------+ <-- p1 (*p1 == stack_dump_trace[i] == lr1)
> | lr1 |
> +-------+
> | |
> | func-2's local variables
> | |
> sp1 +-------+ <--------- func-1(lr1)'s stackframe
> | | (stack_dump_index[i] = top - p1)
> | func-1's dynamic local variables
> | |
> fp1 +-------+
> | fp0 |
> +-------+ <-- p0 (*p0 == stack_dump_trace[i+1] == lr0)
> | lr0 |
> +-------+
> | |
> | func-1's local variables
> | |
> sp0 +-------+ <--------- func-0(lr0)'s stackframe
> | | (stack_dump_index[i+1] = top - p0)
> | |
> *-------+ top
>
> Best Regards
> Jungseok Lee
>
> ----8<----
> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
> index c5534fa..2b43e20 100644
> --- a/arch/arm64/include/asm/ftrace.h
> +++ b/arch/arm64/include/asm/ftrace.h
> @@ -13,8 +13,9 @@
>
> #include <asm/insn.h>
>
> -#define MCOUNT_ADDR ((unsigned long)_mcount)
> -#define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
> +#define MCOUNT_ADDR ((unsigned long)_mcount)
> +#define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
> +#define FTRACE_STACK_FRAME_OFFSET AARCH64_INSN_SIZE
>
> #ifndef __ASSEMBLY__
> #include <linux/compat.h>
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index 407991b..9ab67af 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -20,6 +20,7 @@
> #include <linux/sched.h>
> #include <linux/stacktrace.h>
>
> +#include <asm/insn.h>
> #include <asm/stacktrace.h>
>
> /*
> @@ -52,7 +53,7 @@ int notrace unwind_frame(struct stackframe *frame)
> * -4 here because we care about the PC at time of bl,
> * not where the return will go.
> */
> - frame->pc = *(unsigned long *)(fp + 8) - 4;
> + frame->pc = *(unsigned long *)(fp + 8) - AARCH64_INSN_SIZE;
>
> return 0;
> }
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 1da6029..6566201 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -260,6 +260,9 @@ static inline void ftrace_kill(void) { }
> #endif /* CONFIG_FUNCTION_TRACER */
>
> #ifdef CONFIG_STACK_TRACER
> +#ifndef FTRACE_STACK_FRAME_OFFSET
> +#define FTRACE_STACK_FRAME_OFFSET 0
> +#endif
> extern int stack_tracer_enabled;
> int
> stack_trace_sysctl(struct ctl_table *table, int write,
> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
> index b746399..30521ea 100644
> --- a/kernel/trace/trace_stack.c
> +++ b/kernel/trace/trace_stack.c
> @@ -105,7 +105,7 @@ check_stack(unsigned long ip, unsigned long *stack)
>
> /* Skip over the overhead of the stack tracer itself */
> for (i = 0; i < max_stack_trace.nr_entries; i++) {
> - if (stack_dump_trace[i] == ip)
> + if ((stack_dump_trace[i] + FTRACE_STACK_FRAME_OFFSET) == ip)
> break;
> }
>
> @@ -133,7 +133,8 @@ check_stack(unsigned long ip, unsigned long *stack)
> for (; p < top && i < max_stack_trace.nr_entries; p++) {
> if (stack_dump_trace[i] == ULONG_MAX)
> break;
> - if (*p == stack_dump_trace[i]) {
> + if (*p == (stack_dump_trace[i]
> + + FTRACE_STACK_FRAME_OFFSET)) {
> stack_dump_trace[x] = stack_dump_trace[i++];
> this_size = stack_dump_index[x++] =
> (top - p) * sizeof(unsigned long);
> ----8<----
>

2015-08-17 05:21:29

by AKASHI Takahiro

[permalink] [raw]
Subject: Re: [RFC v2 4/4] arm64: ftrace: add a stack frame for exception handler

On 08/11/2015 11:57 PM, Jungseok Lee wrote:
> On Aug 4, 2015, at 4:44 PM, AKASHI Takahiro wrote:
>
> Hi Akashi,
>
>> On arm64, an exception handler use the same stack as in non-exception
>> contexts, but doesn't create a stack frame for elx_xx entry, only updating
>> sp register. This behavior results in save_stace_trace() missing a function
>> that is the one when an exception happens.
>>
>> This patch creates a stack frame for this case, and puts an additional
>> entry for the function in a stack trace list.
>>
>> Signed-off-by: AKASHI Takahiro <[email protected]>
>> ---
>> arch/arm64/kernel/entry.S | 4 ++++
>> arch/arm64/kernel/stacktrace.c | 17 +++++++++++++++++
>> 2 files changed, 21 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
>> index f860bfd..aacb6c6 100644
>> --- a/arch/arm64/kernel/entry.S
>> +++ b/arch/arm64/kernel/entry.S
>> @@ -107,6 +107,8 @@
>> str x21, [sp, #S_SYSCALLNO]
>> .endif
>>
>> + /* create a stack frame for stack tracer */
>> + mov x29, sp
>> /*
>> * Registers that may be useful after this macro is invoked:
>> *
>> @@ -737,3 +739,5 @@ ENTRY(sys_rt_sigreturn_wrapper)
>> mov x0, sp
>> b sys_rt_sigreturn
>> ENDPROC(sys_rt_sigreturn_wrapper)
>> +
>> +ENTRY(end_of_vectors)
>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
>> index d1790eb..22ce7c9 100644
>> --- a/arch/arm64/kernel/stacktrace.c
>> +++ b/arch/arm64/kernel/stacktrace.c
>> @@ -25,6 +25,10 @@
>> #include <asm/stacktrace.h>
>>
>> #define S_FRAME_SIZE sizeof(struct pt_regs) /* asm-offsets.h */
>> +#define S_FP offsetof(struct pt_regs, regs[29])
>> +#define S_LR offsetof(struct pt_regs, regs[30])
>> +
>> +extern unsigned int *vectors, *end_of_vectors;
>>
>> /*
>> * AArch64 PCS assigns the frame pointer to x29.
>> @@ -50,6 +54,19 @@ int notrace unwind_frame(struct stackframe *frame)
>> if (fp < low || fp > high - 0x18 || fp & 0xf)
>> return -EINVAL;
>>
>> + if ((frame->pc >= (unsigned long)&vectors) &&
>> + (frame->pc < (unsigned long)&end_of_vectors)) {
>> + /*
>> + * Expection handler does not use a normal format of
>> + * stack frame, but allocates struct pt_regs.
>> + */
>> + frame->sp = frame->sp + S_FRAME_SIZE;
>> + frame->fp = *(unsigned long *)(fp + S_FP);
>> + frame->pc = *(unsigned long *)(fp + S_LR);
>
> Not frame->pc = *(unsigned long *)(fp + S_PC)? Don't we need to look up elr_el1
> since this is an exception?

You are right. Will fix it if I submit the next version.

>> +
>> + return 0;
>> + }
>> +
>> frame->sp = fp + 0x10;
>
> I'm just curious about this constant, 0x10. Do you have an idea on this value?
> As reviewing objdump of vmlinux, it looks needed to analyze the first store-pair
> instruction of each function.
>
> Please correct me if I'm wrong.

I don't know Catalin's intention here, but fp always points to saved pair of
<fp, lr> and so, in general, "fp + 0x10" is the address of succeeding local variables
in callee function. (Remember my acsii art :)
This can be the easily-approximated (but not accurate) stack pointer of caller unless
we decode function prologues.

Thanks,
-Takahiro AKASHI

> Best Regards
> Jungseok Lee
>

2015-08-17 06:07:09

by AKASHI Takahiro

[permalink] [raw]
Subject: Re: [RFC v2 1/4] ftrace: allow arch-specific check_stack()

Will,

On 08/12/2015 02:03 AM, Will Deacon wrote:
> On Tue, Aug 04, 2015 at 08:44:06AM +0100, AKASHI Takahiro wrote:
>> A stack frame pointer may be used in a different way depending on
>> cpu architecture. Thus it is not always appropriate to slurp the stack
>> contents, as currently done in check_stack(), in order to calcurate
>> a stack index (height) at a given function call. At least not on arm64.
>>
>> This patch extract potentially arch-specific code from check_stack()
>> and puts it into a new arch_check_stack(), which is declared as weak.
>> So we will be able to add arch-specific and most efficient way of
>> stack traversing Later.
>>
>> Signed-off-by: AKASHI Takahiro <[email protected]>
>
> If arm64 is the only architecture behaving differently, then I'm happy
> to reconsider the fix to unwind_frame that we merged in e306dfd06fcb
> ("ARM64: unwind: Fix PC calculation"). I'd have thought any architecture
> with a branch-and-link instruction would potentially have the same issue,
> so we could just be fixing things in the wrong place if ftrace works
> everywhere else.

I'm not the right person to answer for other architectures (and ftrace
behavior on them.) The only thing I know is that current ftrace stack tracer
works correctly only if the addresses stored and found on stack match to
the ones returned by save_stack_trace().

Anyway, the fix above is not the only reason that I want to introduce arch-specific
arch_check_stack(). Other issues to fix include
- combined case of stack tracer and function graph tracer (common across arch's)
- exception entries (as I'm trying to address in RFC 4/4)
- in-accurate stack size (for each function, my current fix is not perfect though.)

Thanks,
-Takahiro AKASHI

> Will
>

2015-08-17 15:29:57

by Jungseok Lee

[permalink] [raw]
Subject: Re: [RFC v2 0/4] arm64: ftrace: fix incorrect output from stack tracer

On Aug 17, 2015, at 1:50 PM, AKASHI Takahiro wrote:
> Hi

Hi Akashi,

> On 08/11/2015 11:52 PM, Jungseok Lee wrote:
>> On Aug 4, 2015, at 4:44 PM, AKASHI Takahiro wrote:
>>
>> Hi Akashi,
>>
>>> See the following threads [1],[2] for the background.
>>>
>>> With this patch series, I'm trying to fix several problems I noticed
>>> with stack tracer on arm64. But it is rather experimental, and there still
>>> remained are several issues.
>>>
>>> Patch #1 modifies ftrace framework so that we can provide arm64-specific
>>> stack tracer later.
>>> (Well, I know there is some room for further refactoring, but this is
>>> just a prototype code.)
>>> Patch #2 implements this arch_check_stack() using unwind_stackframe() to
>>> traverse stack frames and identify a stack index for each traced function.
>>> Patch #3 addresses an issue that stack tracer doesn't work well in
>>> conjuction with function graph tracer.
>>> Patch #4 addresses an issue that unwind_stackframe() misses a function
>>> that is the one when an exception happens by setting up a stack frame
>>> for exception handler.
>>>
>>> See below for the result with those patches.
>>> Issues include:
>>> a) Size of gic_handle_irq() is 48 (#13), but should be 64.
>>> b) We cannot identify a location of function which is being returned
>>> and hooked temporarily by return_to_handler() (#18)
>>> c) A meaningless entry (#62) as well as a bogus size for the first
>>> function, el0_svc (#61)
>>> d) Size doesn't contain a function's "dynamically allocated local
>>> variables," if any, but instead is sumed up in child's size.
>>> (See details in [3].)
>>>
>>> I'm afraid that we cannot fix issue b) unless we can do *atomically*
>>> push/pop a return adress in current->ret_stack[], increment/decrement
>>> current->curr_ret_stack and restore lr register.
>>>
>>> We will be able to fix issue d) once we know all the locations in
>>> the list, including b).
>>>
>>> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/354126.html
>>> [2] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355920.html
>>> [3] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/358034.html
>>
>> I hope I'm not too late..
>
> I was on vacation last week.
>
>> This series looks written on top of the hunk in the end of this reply.
>>
>> I've strongly agreed with your opinion as digging out this issue. We need to analyse
>> the first instruction of each function, "stp x29, x30, [sp, #val]!", in order to
>> solve this problem clearly.
>
> As far as I notice, the following is not the only prologue:
> stp x29,x30, [sp,#-xx]!
> mov x29,sp
> but some functions may have another one like:
> sub sp, sp, #xx
> stp x29,x30, [sp, #16]
> add x29,sp, #16
> Even worse, I see some variant, for example in trace_graph_entry(),
> adrp x2, 0x...
> stp x29,x30,[sp,#-xx]!
> add x4, x2, #..
> mov x29,x30
>
> So parsing the function prologue perfectly is a bit more complicated than imagined.
> I'm now asking some gcc guy for more information.

I've also observed the last two variants as playing with kallsyms to parse function
prologues. The work is accompanied by a pretty expensive computational overhead which
might be unnecessary originally. IMHO, one of PCS's objectives is to handle this kind
of work gracefully. Isn't it?

It is a great approach to talk with gcc guys. It looks inevitable that a complex decoding
logic is needed without it.

I have two drafts on this issue.

1) AARCH64 PCS
I borrow the ascii art from your description :)
If the stack frame is constructed as follows, it would be possible to record
an accurate stack size without decoding store-pair instruction or its variants.
Additionally, a code, frame->sp = fp + 0x10, is correct except exceptions.

+-------+
| |
| |
+-------+
| |
| func-1's dynamic variable
| |
+-------+
| |
| func-1's local variable
| |
fp1 +-------+
| fp0 |
+-------+
| lr0 |
sp1 +-------+ <------ func-1's stackframe
| |
| func-0's dynamic variable
| |
+-------+
| |
| func-0's local variable
| |
+-------+
| |
+-------+ top

However, it's a pretty big challenge to update PCS..

2) v1 approach + function graph fix.
I use a stack tracer to check max stack size and its context. It would
be perfect if it shows an accurate size of every entry, but it would be
enough to find dominant entries in real practice, at least in my case.
(I agree caller's dynamic + callee's local is technically unacceptable.)
The same approach could be applied to exception. It would be no issue
anymore if someone try to introduce a separate IRQ stack like x86.

Best Regards
Jungseok Lee-

2015-08-18 08:21:46

by Will Deacon

[permalink] [raw]
Subject: Re: [RFC v2 1/4] ftrace: allow arch-specific check_stack()

On Mon, Aug 17, 2015 at 07:07:00AM +0100, AKASHI Takahiro wrote:
> On 08/12/2015 02:03 AM, Will Deacon wrote:
> > On Tue, Aug 04, 2015 at 08:44:06AM +0100, AKASHI Takahiro wrote:
> >> A stack frame pointer may be used in a different way depending on
> >> cpu architecture. Thus it is not always appropriate to slurp the stack
> >> contents, as currently done in check_stack(), in order to calcurate
> >> a stack index (height) at a given function call. At least not on arm64.
> >>
> >> This patch extract potentially arch-specific code from check_stack()
> >> and puts it into a new arch_check_stack(), which is declared as weak.
> >> So we will be able to add arch-specific and most efficient way of
> >> stack traversing Later.
> >>
> >> Signed-off-by: AKASHI Takahiro <[email protected]>
> >
> > If arm64 is the only architecture behaving differently, then I'm happy
> > to reconsider the fix to unwind_frame that we merged in e306dfd06fcb
> > ("ARM64: unwind: Fix PC calculation"). I'd have thought any architecture
> > with a branch-and-link instruction would potentially have the same issue,
> > so we could just be fixing things in the wrong place if ftrace works
> > everywhere else.
>
> I'm not the right person to answer for other architectures (and ftrace
> behavior on them.) The only thing I know is that current ftrace stack tracer
> works correctly only if the addresses stored and found on stack match to
> the ones returned by save_stack_trace().
>
> Anyway, the fix above is not the only reason that I want to introduce arch-specific
> arch_check_stack(). Other issues to fix include
> - combined case of stack tracer and function graph tracer (common across arch's)
> - exception entries (as I'm trying to address in RFC 4/4)
> - in-accurate stack size (for each function, my current fix is not perfect though.)

Ok, if you have other reasons for the callback, then fine. I just didn't
want you to think that you had to work around e306dfd06fcb at all costs.

Will