2015-07-13 05:30:23

by AKASHI Takahiro

[permalink] [raw]
Subject: [RFC 0/3] arm64: ftrace: fix incorrect output from stack tracer

As reported in the thread below[1], the output from stack tracer using
ftrace on arm64 seems to be incorrect due to different reasons. Each
problem is described and fixed repsectively in the following patches.
Please see the commit messages for the details.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/354126.html

If the patch[1/3], which adds "#ifdef CONFIG_ARM64" to generic ftrace code,
is not acceptable, we will have to introduce an arch-dependent function,
ie. arch_check_stack().

Even with those patches, we see another issue that the values in 'Size'
field are *inaccurate*. This is simply because we have no way to estimate
the value of stack pointer at each function from the content of stack.
Thus the function's reported stack size does not include its own local
variables, but includes *its child's* ones.
See more details below.

In my opinion, we cannot fix this issue unless analyzing the function's
first instruction, ie. "stp x29, x30, [sp, #xx]!".

* How does stack tracer identify each function's stack size?

Take an example, func-0 calls func-1 and func-1 calls func-2.
The stack layout looks like the below:
("p" is a temp variable in check_stack().)

sp2 +-------+ <--------- func-2's stackframe
| |
| |
fp2 +-------+
| fp1 |
+-------+ <-- p1 (*p1 == stack_dump_trace[i] == lr1)
| lr1 |
+-------+
| |
| func-2's local variables
| |
sp1 +-------+ <--------- func-1(lr1)'s stackframe
| | (stack_dump_index[i] = top - p1)
| func-1's dynamic local variables
| |
fp1 +-------+
| fp0 |
+-------+ <-- p0 (*p0 == stack_dump_trace[i+1] == lr0)
| lr0 |
+-------+
| |
| func-1's local variables
| |
sp0 +-------+ <--------- func-0(lr0)'s stackframe
| | (stack_dump_index[i+1] = top - p0)
| |
*-------+ top

Stack tracer records the stack height of func-1 (== stack_dump_trace[i]):
stack_dump_index[i] = <stack top> - <estimated stack pointer>
in check_stack() by searching for func-1's return address (lr1)
and eventually calculates func-1's stack size by:
stack_dump_index[i] - stack_dump_index[i+1]
=> (top - p1) - (top - p0)
=> p1 - p0

On x86, this calculation is correct because x86's call instruction pushes
the return address to the stack and jump into the child(func-2) function,
thus the func-1's stack pointer is "p1" where *p1 is equal to
stack_dump_trace[i]. But this is not true on arm64.

AKASHI Takahiro (3):
ftrace: adjust a function's pc to search for in check_stack() for
arm64
arm64: refactor save_stack_trace()
arm64: ftrace: mcount() should not create a stack frame

arch/arm64/kernel/entry-ftrace.S | 15 ++++++++-------
arch/arm64/kernel/stacktrace.c | 31 +++++++++++++++++++++++--------
kernel/trace/trace_stack.c | 4 ++++
3 files changed, 35 insertions(+), 15 deletions(-)

--
1.7.9.5


2015-07-13 05:30:33

by AKASHI Takahiro

[permalink] [raw]
Subject: [RFC 1/3] ftrace: adjust a function's pc to search for in check_stack() for arm64

Ftace's stack tracer on arm64 returns wrong information about call stacks:

Depth Size Location (50 entries)
----- ---- --------
0) 5256 0 notifier_call_chain+0x30/0x94
1) 5256 0 ftrace_call+0x0/0x4
2) 5256 0 notifier_call_chain+0x2c/0x94
3) 5256 0 raw_notifier_call_chain+0x34/0x44
4) 5256 0 timekeeping_update.constprop.9+0xb8/0x114
5) 5256 0 update_wall_time+0x408/0x6dc

Most of 'Size' fields are unexpectedly zero.

This is because stack tracer fails to recognize each function's stack frame
in check_stack(). Stack tracer searches for a function's pc in the stack
based on the list returned by save_stack_trace(), but save_stack_trace() on
arm64 does not return the exact return address saved in a stack frame, but
a value decrmented by 4 (which means a branch instruction's address).
This behavior was introduced by
commit e306dfd06fcb ("ARM64: unwind: Fix PC calculation")

So the matching doesn't succeed in most cases.

This problem can be fixed either by
a) reverting the commit above
b) adding an arm64-specific hack to check_patch()

This patch does b).

Signed-off-by: AKASHI Takahiro <[email protected]>
---
kernel/trace/trace_stack.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index 3f34496..7086fc3 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -143,7 +143,11 @@ check_stack(unsigned long ip, unsigned long *stack)
p = start;

for (; p < top && i < max_stack_trace.nr_entries; p++) {
+#ifdef CONFIG_ARM64
+ if (*p == (stack_dump_trace[i] + 4)) {
+#else
if (*p == stack_dump_trace[i]) {
+#endif
this_size = stack_dump_index[i++] =
(top - p) * sizeof(unsigned long);
found = 1;
--
1.7.9.5

2015-07-13 05:30:43

by AKASHI Takahiro

[permalink] [raw]
Subject: [RFC 2/3] arm64: refactor save_stack_trace()

Ftrace's stack tracer on arm64 returns wrong information about call stacks:

Depth Size Location (50 entries)
----- ---- --------
0) 5256 0 notifier_call_chain+0x30/0x94
1) 5256 0 ftrace_call+0x0/0x4
2) 5256 0 notifier_call_chain+0x2c/0x94
3) 5256 0 raw_notifier_call_chain+0x34/0x44
4) 5256 0 timekeeping_update.constprop.9+0xb8/0x114
5) 5256 0 update_wall_time+0x408/0x6dc

One of tracer functions, ftrace_call (or mcount), is unexpectedly listed.
The *bare* stack dump returned by save_stack_trace() is:
save_stack_trace_tsk()
save_stack_trace()
stack_trace_call()
ftrace_ops_no_ops()
ftrace_call()
notifier_call_chain()
raw_notifier_call_chain()
...

On arm64, save_stack_trace() calls save_stack_trace_tsk() and this will
result in putting additional stack frame in the returned list.
This behavior, however, conflicts with stack stracer's assumption
that the number of functions to be skiped as part of tracer is 4, from
save_stack_trace() to mcount(), if ftrace_ops_list_func() is used.
The value is hard coded in check_patch().

This patch refactors save_stack_trace() and save_stack_trace_tsk() in order
to reduce the stack depth by making the common code inlined.

Signed-off-by: AKASHI Takahiro <[email protected]>
---
arch/arm64/kernel/stacktrace.c | 31 +++++++++++++++++++++++--------
1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 407991b..978c923 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -97,34 +97,49 @@ static int save_trace(struct stackframe *frame, void *d)
return trace->nr_entries >= trace->max_entries;
}

-void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
+static inline void __save_stack_trace(struct stackframe *frame,
+ struct stack_trace *trace, int no_sched)
{
struct stack_trace_data data;
- struct stackframe frame;

data.trace = trace;
data.skip = trace->skip;
+ data.no_sched_functions = no_sched;
+
+ walk_stackframe(frame, save_trace, &data);
+ if (trace->nr_entries < trace->max_entries)
+ trace->entries[trace->nr_entries++] = ULONG_MAX;
+}
+
+void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
+{
+ struct stackframe frame;
+ int no_sched;

if (tsk != current) {
- data.no_sched_functions = 1;
+ no_sched = 1;
frame.fp = thread_saved_fp(tsk);
frame.sp = thread_saved_sp(tsk);
frame.pc = thread_saved_pc(tsk);
} else {
- data.no_sched_functions = 0;
+ no_sched = 0;
frame.fp = (unsigned long)__builtin_frame_address(0);
frame.sp = current_stack_pointer;
frame.pc = (unsigned long)save_stack_trace_tsk;
}

- walk_stackframe(&frame, save_trace, &data);
- if (trace->nr_entries < trace->max_entries)
- trace->entries[trace->nr_entries++] = ULONG_MAX;
+ __save_stack_trace(&frame, trace, no_sched);
}

void save_stack_trace(struct stack_trace *trace)
{
- save_stack_trace_tsk(current, trace);
+ struct stackframe frame;
+
+ frame.fp = (unsigned long)__builtin_frame_address(0);
+ frame.sp = current_stack_pointer;
+ frame.pc = (unsigned long)save_stack_trace_tsk;
+
+ __save_stack_trace(&frame, trace, 0);
}
EXPORT_SYMBOL_GPL(save_stack_trace);
#endif
--
1.7.9.5

2015-07-13 05:31:05

by AKASHI Takahiro

[permalink] [raw]
Subject: [RFC 3/3] arm64: ftrace: mcount() should not create a stack frame

Ftrace's stack tracer on arm64 returns wrong information about call stacks:

Depth Size Location (50 entries)
----- ---- --------
0) 5256 0 notifier_call_chain+0x30/0x94
1) 5256 0 ftrace_call+0x0/0x4
2) 5256 0 notifier_call_chain+0x2c/0x94
3) 5256 0 raw_notifier_call_chain+0x34/0x44
4) 5256 0 timekeeping_update.constprop.9+0xb8/0x114
5) 5256 0 update_wall_time+0x408/0x6dc

The instrumented function, notifier_call_chain(), appears twice.

On x86 (and other arch's), mcount (or ftrace_call) does not create a new
stack frame. This will eventually result in not listing the instrumented
function in save_stack_call() because the function's returned address does
not appear as saved lr in the stack. Stack tracer, instead, explicitly adds
this skipped function at the top of the list later in check_stack().
That is why it is listed twice on arm64 as shown above.

This patch modifies arm64 function tracer not to create a stack frame at
mcount() in order to fix this issue.

Signed-off-by: AKASHI Takahiro <[email protected]>
---
arch/arm64/kernel/entry-ftrace.S | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
index 08cafc5..c74fa12 100644
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -26,9 +26,11 @@
* as long as the kernel is compiled without -fomit-frame-pointer.
* (or CONFIG_FRAME_POINTER, this is forced on arm64)
*
+ * We don't update a frame pointer here as ftrace, in special stack tracer,
+ * assumes not. If we did, the instrumented function would be listed twice.
* stack layout after mcount_enter in _mcount():
*
- * current sp/fp => 0:+-----+
+ * current sp => 0:+-----+
* in _mcount() | x29 | -> instrumented function's fp
* +-----+
* | x30 | -> _mcount()'s lr (= instrumented function's pc)
@@ -47,7 +49,6 @@

.macro mcount_enter
stp x29, x30, [sp, #-16]!
- mov x29, sp
.endm

.macro mcount_exit
@@ -61,7 +62,7 @@

/* for instrumented function's parent */
.macro mcount_get_parent_fp reg
- ldr \reg, [x29]
+ ldr \reg, [sp]
ldr \reg, [\reg]
.endm

@@ -71,18 +72,18 @@
.endm

.macro mcount_get_pc reg
- ldr \reg, [x29, #8]
+ ldr \reg, [sp, #8]
mcount_adjust_addr \reg, \reg
.endm

.macro mcount_get_lr reg
- ldr \reg, [x29]
+ ldr \reg, [sp]
ldr \reg, [\reg, #8]
mcount_adjust_addr \reg, \reg
.endm

.macro mcount_get_lr_addr reg
- ldr \reg, [x29]
+ ldr \reg, [sp]
add \reg, \reg, #8
.endm

@@ -205,7 +206,7 @@ ENDPROC(ftrace_graph_caller)
*/
ENTRY(return_to_handler)
str x0, [sp, #-16]!
- mov x0, x29 // parent's fp
+ mov x0, sp // parent's fp
bl ftrace_return_to_handler// addr = ftrace_return_to_hander(fp);
mov x30, x0 // restore the original return address
ldr x0, [sp], #16
--
1.7.9.5

2015-07-13 15:02:11

by Jungseok Lee

[permalink] [raw]
Subject: Re: [RFC 0/3] arm64: ftrace: fix incorrect output from stack tracer

On Jul 13, 2015, at 2:29 PM, AKASHI Takahiro wrote:

Hi, AKASHI

> As reported in the thread below[1], the output from stack tracer using
> ftrace on arm64 seems to be incorrect due to different reasons. Each
> problem is described and fixed repsectively in the following patches.
> Please see the commit messages for the details.
>
> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/354126.html
>
> If the patch[1/3], which adds "#ifdef CONFIG_ARM64" to generic ftrace code,
> is not acceptable, we will have to introduce an arch-dependent function,
> ie. arch_check_stack().
>
> Even with those patches, we see another issue that the values in 'Size'
> field are *inaccurate*. This is simply because we have no way to estimate
> the value of stack pointer at each function from the content of stack.
> Thus the function's reported stack size does not include its own local
> variables, but includes *its child's* ones.
> See more details below.
>
> In my opinion, we cannot fix this issue unless analyzing the function's
> first instruction, ie. "stp x29, x30, [sp, #xx]!".
>
> * How does stack tracer identify each function's stack size?
>
> Take an example, func-0 calls func-1 and func-1 calls func-2.
> The stack layout looks like the below:
> ("p" is a temp variable in check_stack().)
>
> sp2 +-------+ <--------- func-2's stackframe
> | |
> | |
> fp2 +-------+
> | fp1 |
> +-------+ <-- p1 (*p1 == stack_dump_trace[i] == lr1)
> | lr1 |
> +-------+
> | |
> | func-2's local variables
> | |
> sp1 +-------+ <--------- func-1(lr1)'s stackframe
> | | (stack_dump_index[i] = top - p1)
> | func-1's dynamic local variables
> | |
> fp1 +-------+
> | fp0 |
> +-------+ <-- p0 (*p0 == stack_dump_trace[i+1] == lr0)
> | lr0 |
> +-------+
> | |
> | func-1's local variables
> | |
> sp0 +-------+ <--------- func-0(lr0)'s stackframe
> | | (stack_dump_index[i+1] = top - p0)
> | |
> *-------+ top
>
> Stack tracer records the stack height of func-1 (== stack_dump_trace[i]):
> stack_dump_index[i] = <stack top> - <estimated stack pointer>
> in check_stack() by searching for func-1's return address (lr1)
> and eventually calculates func-1's stack size by:
> stack_dump_index[i] - stack_dump_index[i+1]
> => (top - p1) - (top - p0)
> => p1 - p0
>
> On x86, this calculation is correct because x86's call instruction pushes
> the return address to the stack and jump into the child(func-2) function,
> thus the func-1's stack pointer is "p1" where *p1 is equal to
> stack_dump_trace[i]. But this is not true on arm64.
>
> AKASHI Takahiro (3):
> ftrace: adjust a function's pc to search for in check_stack() for
> arm64
> arm64: refactor save_stack_trace()
> arm64: ftrace: mcount() should not create a stack frame

One of stack_trace examples with this patch set is as follows.
*0* size does not appear any more now.

Depth Size Location (36 entries)
----- ---- --------
0) 4504 96 put_cpu_partial+0x2c/0x1d0
1) 4408 80 get_partial_node.isra.64+0x13c/0x344
2) 4328 256 __slab_alloc.isra.65.constprop.67+0xd8/0x37c
3) 4072 32 kmem_cache_alloc+0x258/0x294
4) 4040 304 __alloc_skb+0x48/0x180
5) 3736 96 alloc_skb_with_frags+0x74/0x234
6) 3640 112 sock_alloc_send_pskb+0x1d0/0x294
7) 3528 160 sock_alloc_send_skb+0x44/0x54
8) 3368 64 __ip_append_data.isra.40+0x78c/0xb48
9) 3304 224 ip_append_data.part.42+0x98/0xe8
10) 3080 112 ip_append_data+0x68/0x7c
11) 2968 96 icmp_push_reply+0x7c/0x144
12) 2872 96 icmp_send+0x3c0/0x3c8
13) 2776 192 __udp4_lib_rcv+0x5b8/0x684
14) 2584 96 udp_rcv+0x2c/0x3c
15) 2488 32 ip_local_deliver+0xa0/0x224
16) 2456 48 ip_rcv+0x360/0x57c
17) 2408 64 __netif_receive_skb_core+0x4d0/0x80c
18) 2344 128 __netif_receive_skb+0x24/0x84
19) 2216 32 process_backlog+0x9c/0x15c
20) 2184 80 net_rx_action+0x1ec/0x32c
21) 2104 160 __do_softirq+0x114/0x2f0
22) 1944 128 do_softirq+0x60/0x68
23) 1816 32 __local_bh_enable_ip+0xb0/0xd4
24) 1784 32 ip_finish_output+0x1f4/0xabc
25) 1752 96 ip_output+0xf0/0x120
26) 1656 64 ip_local_out_sk+0x44/0x54
27) 1592 32 ip_send_skb+0x24/0xbc
28) 1560 48 udp_send_skb+0x1b4/0x2f4
29) 1512 80 udp_sendmsg+0x2a8/0x7a0
30) 1432 272 inet_sendmsg+0xa0/0xd0
31) 1160 48 sock_sendmsg+0x30/0x78
32) 1112 32 ___sys_sendmsg+0x15c/0x26c
33) 1080 400 __sys_sendmmsg+0x94/0x180
34) 680 320 SyS_sendmmsg+0x38/0x54
35) 360 360 el0_svc_naked+0x20/0x28

I will leave more comments after reviewing and testing them carefully.

Best Regards
Jungseok Lee-

2015-07-13 15:24:44

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC 1/3] ftrace: adjust a function's pc to search for in check_stack() for arm64

On Mon, 13 Jul 2015 14:29:33 +0900
AKASHI Takahiro <[email protected]> wrote:

> Ftace's stack tracer on arm64 returns wrong information about call stacks:
>
> Depth Size Location (50 entries)
> ----- ---- --------
> 0) 5256 0 notifier_call_chain+0x30/0x94
> 1) 5256 0 ftrace_call+0x0/0x4
> 2) 5256 0 notifier_call_chain+0x2c/0x94
> 3) 5256 0 raw_notifier_call_chain+0x34/0x44
> 4) 5256 0 timekeeping_update.constprop.9+0xb8/0x114
> 5) 5256 0 update_wall_time+0x408/0x6dc
>
> Most of 'Size' fields are unexpectedly zero.
>
> This is because stack tracer fails to recognize each function's stack frame
> in check_stack(). Stack tracer searches for a function's pc in the stack
> based on the list returned by save_stack_trace(), but save_stack_trace() on
> arm64 does not return the exact return address saved in a stack frame, but
> a value decrmented by 4 (which means a branch instruction's address).
> This behavior was introduced by
> commit e306dfd06fcb ("ARM64: unwind: Fix PC calculation")
>
> So the matching doesn't succeed in most cases.
>
> This problem can be fixed either by
> a) reverting the commit above
> b) adding an arm64-specific hack to check_patch()
>
> This patch does b).
>
> Signed-off-by: AKASHI Takahiro <[email protected]>
> ---
> kernel/trace/trace_stack.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
> index 3f34496..7086fc3 100644
> --- a/kernel/trace/trace_stack.c
> +++ b/kernel/trace/trace_stack.c
> @@ -143,7 +143,11 @@ check_stack(unsigned long ip, unsigned long *stack)
> p = start;
>
> for (; p < top && i < max_stack_trace.nr_entries; p++) {
> +#ifdef CONFIG_ARM64
> + if (*p == (stack_dump_trace[i] + 4)) {
> +#else
> if (*p == stack_dump_trace[i]) {
> +#endif

Instead of the ugly #ifdef in this code, please add a macro
FTRACE_STACK_FRAME_OFFSET

Then in include/linux/ftrace.h have:

#ifndef FTRACE_STACK_FRAME_OFFSET
# define FTRACE_STACK_FRAME_OFFSET 0
#endif

And in arch/arm64/include/asm/ftrace.h

#define FTRACE_STACK_FRAME_OFFSET 4

And then just do:

if (*p == (stack_dump_trace[i] + FTRACE_STACK_FRAME_OFFSET)) {

-- Steve

> this_size = stack_dump_index[i++] =
> (top - p) * sizeof(unsigned long);
> found = 1;

2015-07-14 12:47:15

by Jungseok Lee

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Jul 13, 2015, at 2:29 PM, AKASHI Takahiro wrote:

Hi, AKASHI

> Ftrace's stack tracer on arm64 returns wrong information about call stacks:
>
> Depth Size Location (50 entries)
> ----- ---- --------
> 0) 5256 0 notifier_call_chain+0x30/0x94
> 1) 5256 0 ftrace_call+0x0/0x4
> 2) 5256 0 notifier_call_chain+0x2c/0x94
> 3) 5256 0 raw_notifier_call_chain+0x34/0x44
> 4) 5256 0 timekeeping_update.constprop.9+0xb8/0x114
> 5) 5256 0 update_wall_time+0x408/0x6dc
>
> One of tracer functions, ftrace_call (or mcount), is unexpectedly listed.
> The *bare* stack dump returned by save_stack_trace() is:
> save_stack_trace_tsk()
> save_stack_trace()
> stack_trace_call()
> ftrace_ops_no_ops()
> ftrace_call()
> notifier_call_chain()
> raw_notifier_call_chain()
> ...

Is the below example an unexpected result?
Entry 17 and 18 are ftrace_call and ftrace_ops_no_ops, respectively.

Depth Size Location (51 entries)
----- ---- --------
0) 5352 96 _raw_spin_unlock_irqrestore+0x1c/0x60
1) 5256 48 gic_raise_softirq+0xa0/0xbc
2) 5208 80 smp_cross_call+0x40/0xbc
3) 5128 48 smp_send_reschedule+0x38/0x48
4) 5080 32 trigger_load_balance+0x184/0x29c
5) 5048 112 scheduler_tick+0xac/0x104
6) 4936 64 update_process_times+0x5c/0x74
7) 4872 32 tick_sched_handle.isra.15+0x38/0x7c
8) 4840 48 tick_sched_timer+0x48/0x90
9) 4792 48 __run_hrtimer+0x60/0x258
10) 4744 64 hrtimer_interrupt+0xe8/0x260
11) 4680 128 arch_timer_handler_virt+0x38/0x48
12) 4552 32 handle_percpu_devid_irq+0x84/0x188
13) 4520 64 generic_handle_irq+0x38/0x54
14) 4456 32 __handle_domain_irq+0x68/0xbc
15) 4424 64 gic_handle_irq+0x38/0x88
16) 4360 280 el1_irq+0x64/0xd8
17) 4080 168 ftrace_ops_no_ops+0xb4/0x16c
18) 3912 32 ftrace_call+0x0/0x4
19) 3880 144 __alloc_skb+0x48/0x180
20) 3736 96 alloc_skb_with_frags+0x74/0x234
21) 3640 112 sock_alloc_send_pskb+0x1d0/0x294
22) 3528 160 sock_alloc_send_skb+0x44/0x54
23) 3368 64 __ip_append_data.isra.40+0x78c/0xb48
24) 3304 224 ip_append_data.part.42+0x98/0xe8
25) 3080 112 ip_append_data+0x68/0x7c
26) 2968 96 icmp_push_reply+0x7c/0x144
27) 2872 96 icmp_send+0x3c0/0x3c8
28) 2776 192 __udp4_lib_rcv+0x5b8/0x684
29) 2584 96 udp_rcv+0x2c/0x3c
30) 2488 32 ip_local_deliver+0xa0/0x224
31) 2456 48 ip_rcv+0x360/0x57c
32) 2408 64 __netif_receive_skb_core+0x4d0/0x80c
33) 2344 128 __netif_receive_skb+0x24/0x84
34) 2216 32 process_backlog+0x9c/0x15c
35) 2184 80 net_rx_action+0x1ec/0x32c
36) 2104 160 __do_softirq+0x114/0x2f0
37) 1944 128 do_softirq+0x60/0x68
38) 1816 32 __local_bh_enable_ip+0xb0/0xd4
39) 1784 32 ip_finish_output+0x1f4/0xabc
40) 1752 96 ip_output+0xf0/0x120
41) 1656 64 ip_local_out_sk+0x44/0x54
42) 1592 32 ip_send_skb+0x24/0xbc
43) 1560 48 udp_send_skb+0x1b4/0x2f4
44) 1512 80 udp_sendmsg+0x2a8/0x7a0
45) 1432 272 inet_sendmsg+0xa0/0xd0
46) 1160 48 sock_sendmsg+0x30/0x78
47) 1112 32 ___sys_sendmsg+0x15c/0x26c
48) 1080 400 __sys_sendmmsg+0x94/0x180
49) 680 320 SyS_sendmmsg+0x38/0x54
50) 360 360 el0_svc_naked+0x20/0x28

Best Regards
Jungseok Lee

2015-07-14 13:31:58

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Tue, 14 Jul 2015 21:47:10 +0900
Jungseok Lee <[email protected]> wrote:

> Is the below example an unexpected result?
> Entry 17 and 18 are ftrace_call and ftrace_ops_no_ops, respectively.
>
> Depth Size Location (51 entries)
> ----- ---- --------
> 0) 5352 96 _raw_spin_unlock_irqrestore+0x1c/0x60
> 1) 5256 48 gic_raise_softirq+0xa0/0xbc
> 2) 5208 80 smp_cross_call+0x40/0xbc
> 3) 5128 48 smp_send_reschedule+0x38/0x48
> 4) 5080 32 trigger_load_balance+0x184/0x29c
> 5) 5048 112 scheduler_tick+0xac/0x104
> 6) 4936 64 update_process_times+0x5c/0x74
> 7) 4872 32 tick_sched_handle.isra.15+0x38/0x7c
> 8) 4840 48 tick_sched_timer+0x48/0x90
> 9) 4792 48 __run_hrtimer+0x60/0x258
> 10) 4744 64 hrtimer_interrupt+0xe8/0x260
> 11) 4680 128 arch_timer_handler_virt+0x38/0x48
> 12) 4552 32 handle_percpu_devid_irq+0x84/0x188
> 13) 4520 64 generic_handle_irq+0x38/0x54
> 14) 4456 32 __handle_domain_irq+0x68/0xbc
> 15) 4424 64 gic_handle_irq+0x38/0x88
> 16) 4360 280 el1_irq+0x64/0xd8

Note, function tracing does not disable interrupts. This looks to be
that an interrupt came in while __aloc_skb() was being traced.

-- Steve

> 17) 4080 168 ftrace_ops_no_ops+0xb4/0x16c
> 18) 3912 32 ftrace_call+0x0/0x4
> 19) 3880 144 __alloc_skb+0x48/0x180
> 20) 3736 96 alloc_skb_with_frags+0x74/0x234
> 21) 3640 112 sock_alloc_send_pskb+0x1d0/0x294
> 22) 3528 160 sock_alloc_send_skb+0x44/0x54
> 23) 3368 64 __ip_append_data.isra.40+0x78c/0xb48
> 24) 3304 224 ip_append_data.part.42+0x98/0xe8
> 25) 3080 112 ip_append_data+0x68/0x7c
> 26) 2968 96 icmp_push_reply+0x7c/0x144
> 27) 2872 96 icmp_send+0x3c0/0x3c8
> 28) 2776 192 __udp4_lib_rcv+0x5b8/0x684
> 29) 2584 96 udp_rcv+0x2c/0x3c
> 30) 2488 32 ip_local_deliver+0xa0/0x224
> 31) 2456 48 ip_rcv+0x360/0x57c
> 32) 2408 64 __netif_receive_skb_core+0x4d0/0x80c
> 33) 2344 128 __netif_receive_skb+0x24/0x84
> 34) 2216 32 process_backlog+0x9c/0x15c
> 35) 2184 80 net_rx_action+0x1ec/0x32c
> 36) 2104 160 __do_softirq+0x114/0x2f0
> 37) 1944 128 do_softirq+0x60/0x68
> 38) 1816 32 __local_bh_enable_ip+0xb0/0xd4
> 39) 1784 32 ip_finish_output+0x1f4/0xabc
> 40) 1752 96 ip_output+0xf0/0x120
> 41) 1656 64 ip_local_out_sk+0x44/0x54
> 42) 1592 32 ip_send_skb+0x24/0xbc
> 43) 1560 48 udp_send_skb+0x1b4/0x2f4
> 44) 1512 80 udp_sendmsg+0x2a8/0x7a0
> 45) 1432 272 inet_sendmsg+0xa0/0xd0
> 46) 1160 48 sock_sendmsg+0x30/0x78
> 47) 1112 32 ___sys_sendmsg+0x15c/0x26c
> 48) 1080 400 __sys_sendmmsg+0x94/0x180
> 49) 680 320 SyS_sendmmsg+0x38/0x54
> 50) 360 360 el0_svc_naked+0x20/0x28
>
> Best Regards
> Jungseok Lee

2015-07-15 00:20:56

by AKASHI Takahiro

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On 07/14/2015 10:31 PM, Steven Rostedt wrote:
> On Tue, 14 Jul 2015 21:47:10 +0900
> Jungseok Lee <[email protected]> wrote:
>
>> Is the below example an unexpected result?
>> Entry 17 and 18 are ftrace_call and ftrace_ops_no_ops, respectively.

[snip]

> Note, function tracing does not disable interrupts. This looks to be
> that an interrupt came in while __aloc_skb() was being traced.

Yeah, I think so, too. But if my insight is correct, it's not __alloc_skb()
but one of functions that it calls. As I said in the commit log message
of patch[1/3], the exact traced function will not be listed by
save_stack_trace() because we don't create a stack frame at mcount().
I think this is a flaw in the current implementation (on x86).

what do you think, Steve?

-Takahiro AKASHI

>
> -- Steve
>
>> 17) 4080 168 ftrace_ops_no_ops+0xb4/0x16c
>> 18) 3912 32 ftrace_call+0x0/0x4
>> 19) 3880 144 __alloc_skb+0x48/0x180
>> 20) 3736 96 alloc_skb_with_frags+0x74/0x234
>> 21) 3640 112 sock_alloc_send_pskb+0x1d0/0x294
>> 22) 3528 160 sock_alloc_send_skb+0x44/0x54
>> 23) 3368 64 __ip_append_data.isra.40+0x78c/0xb48
>> 24) 3304 224 ip_append_data.part.42+0x98/0xe8
>> 25) 3080 112 ip_append_data+0x68/0x7c
>> 26) 2968 96 icmp_push_reply+0x7c/0x144
>> 27) 2872 96 icmp_send+0x3c0/0x3c8
>> 28) 2776 192 __udp4_lib_rcv+0x5b8/0x684
>> 29) 2584 96 udp_rcv+0x2c/0x3c
>> 30) 2488 32 ip_local_deliver+0xa0/0x224
>> 31) 2456 48 ip_rcv+0x360/0x57c
>> 32) 2408 64 __netif_receive_skb_core+0x4d0/0x80c
>> 33) 2344 128 __netif_receive_skb+0x24/0x84
>> 34) 2216 32 process_backlog+0x9c/0x15c
>> 35) 2184 80 net_rx_action+0x1ec/0x32c
>> 36) 2104 160 __do_softirq+0x114/0x2f0
>> 37) 1944 128 do_softirq+0x60/0x68
>> 38) 1816 32 __local_bh_enable_ip+0xb0/0xd4
>> 39) 1784 32 ip_finish_output+0x1f4/0xabc
>> 40) 1752 96 ip_output+0xf0/0x120
>> 41) 1656 64 ip_local_out_sk+0x44/0x54
>> 42) 1592 32 ip_send_skb+0x24/0xbc
>> 43) 1560 48 udp_send_skb+0x1b4/0x2f4
>> 44) 1512 80 udp_sendmsg+0x2a8/0x7a0
>> 45) 1432 272 inet_sendmsg+0xa0/0xd0
>> 46) 1160 48 sock_sendmsg+0x30/0x78
>> 47) 1112 32 ___sys_sendmsg+0x15c/0x26c
>> 48) 1080 400 __sys_sendmmsg+0x94/0x180
>> 49) 680 320 SyS_sendmmsg+0x38/0x54
>> 50) 360 360 el0_svc_naked+0x20/0x28
>>
>> Best Regards
>> Jungseok Lee
>

2015-07-15 00:22:45

by AKASHI Takahiro

[permalink] [raw]
Subject: Re: [RFC 1/3] ftrace: adjust a function's pc to search for in check_stack() for arm64

On 07/14/2015 12:24 AM, Steven Rostedt wrote:
> On Mon, 13 Jul 2015 14:29:33 +0900
> AKASHI Takahiro <[email protected]> wrote:
>
>> Ftace's stack tracer on arm64 returns wrong information about call stacks:
>>
>> Depth Size Location (50 entries)
>> ----- ---- --------
>> 0) 5256 0 notifier_call_chain+0x30/0x94
>> 1) 5256 0 ftrace_call+0x0/0x4
>> 2) 5256 0 notifier_call_chain+0x2c/0x94
>> 3) 5256 0 raw_notifier_call_chain+0x34/0x44
>> 4) 5256 0 timekeeping_update.constprop.9+0xb8/0x114
>> 5) 5256 0 update_wall_time+0x408/0x6dc
>>
>> Most of 'Size' fields are unexpectedly zero.
>>
>> This is because stack tracer fails to recognize each function's stack frame
>> in check_stack(). Stack tracer searches for a function's pc in the stack
>> based on the list returned by save_stack_trace(), but save_stack_trace() on
>> arm64 does not return the exact return address saved in a stack frame, but
>> a value decrmented by 4 (which means a branch instruction's address).
>> This behavior was introduced by
>> commit e306dfd06fcb ("ARM64: unwind: Fix PC calculation")
>>
>> So the matching doesn't succeed in most cases.
>>
>> This problem can be fixed either by
>> a) reverting the commit above
>> b) adding an arm64-specific hack to check_patch()
>>
>> This patch does b).
>>
>> Signed-off-by: AKASHI Takahiro <[email protected]>
>> ---
>> kernel/trace/trace_stack.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
>> index 3f34496..7086fc3 100644
>> --- a/kernel/trace/trace_stack.c
>> +++ b/kernel/trace/trace_stack.c
>> @@ -143,7 +143,11 @@ check_stack(unsigned long ip, unsigned long *stack)
>> p = start;
>>
>> for (; p < top && i < max_stack_trace.nr_entries; p++) {
>> +#ifdef CONFIG_ARM64
>> + if (*p == (stack_dump_trace[i] + 4)) {
>> +#else
>> if (*p == stack_dump_trace[i]) {
>> +#endif
>
> Instead of the ugly #ifdef in this code, please add a macro
> FTRACE_STACK_FRAME_OFFSET
>
> Then in include/linux/ftrace.h have:
>
> #ifndef FTRACE_STACK_FRAME_OFFSET
> # define FTRACE_STACK_FRAME_OFFSET 0
> #endif
>
> And in arch/arm64/include/asm/ftrace.h
>
> #define FTRACE_STACK_FRAME_OFFSET 4
>
> And then just do:
>
> if (*p == (stack_dump_trace[i] + FTRACE_STACK_FRAME_OFFSET)) {

Yes.

Thanks,
-Takahiro AKASHI

> -- Steve
>
>> this_size = stack_dump_index[i++] =
>> (top - p) * sizeof(unsigned long);
>> found = 1;
>

2015-07-15 02:51:11

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Wed, 15 Jul 2015 09:20:42 +0900
AKASHI Takahiro <[email protected]> wrote:

> On 07/14/2015 10:31 PM, Steven Rostedt wrote:
> > On Tue, 14 Jul 2015 21:47:10 +0900
> > Jungseok Lee <[email protected]> wrote:
> >
> >> Is the below example an unexpected result?
> >> Entry 17 and 18 are ftrace_call and ftrace_ops_no_ops, respectively.
>
> [snip]
>
> > Note, function tracing does not disable interrupts. This looks to be
> > that an interrupt came in while __aloc_skb() was being traced.
>
> Yeah, I think so, too. But if my insight is correct, it's not __alloc_skb()
> but one of functions that it calls. As I said in the commit log message
> of patch[1/3], the exact traced function will not be listed by
> save_stack_trace() because we don't create a stack frame at mcount().
> I think this is a flaw in the current implementation (on x86).
>
> what do you think, Steve?
>

mcount (well ftrace_call actually) does indeed create a stack frame for
itself *and* for what called it. At least on x86_64. See mcount_64.S.

With -pg -mfentry, it creates a stack frame. Without -mfentry, mcount
is called after the current function's frame is made so we don't need
to do much.

Here's what the -mfentry version does:

pushq %rbp
pushq 8*2(%rsp) /* this is the parent pointer */
pushq %rbp
movq %rsp, %rbp
pushq 8*3(%rsp) /* Return address to ftrace_call */
pushq %rbp
movq %rsp, %rbp


Thus the stack looks like this:

<---+
| | |
+------------------------------+ |
| return address for func | |
| return address for func_call | |
| original %rbp | |
+------------------------------+ |
| return address for func | |
| ptr to parent frame (%rbp) | ----+
+------------------------------| <-----+
| return address for func_call | |
| ptr to next frame (%rbp) | ------+
+------------------------------+ <---+
|
|
Current %rbp points to func_call frame -----+

The first box isn't used as a frame, but is used by ftrace_call to save
information to restore everything properly.

Thus, __alloc_skb() is what is currently being traced.


-- Steve

2015-07-15 11:41:44

by AKASHI Takahiro

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

Steve,

On 07/15/2015 11:51 AM, Steven Rostedt wrote:
> On Wed, 15 Jul 2015 09:20:42 +0900
> AKASHI Takahiro <[email protected]> wrote:
>
>> On 07/14/2015 10:31 PM, Steven Rostedt wrote:
>>> On Tue, 14 Jul 2015 21:47:10 +0900
>>> Jungseok Lee <[email protected]> wrote:
>>>
>>>> Is the below example an unexpected result?
>>>> Entry 17 and 18 are ftrace_call and ftrace_ops_no_ops, respectively.
>>
>> [snip]
>>
>>> Note, function tracing does not disable interrupts. This looks to be
>>> that an interrupt came in while __aloc_skb() was being traced.
>>
>> Yeah, I think so, too. But if my insight is correct, it's not __alloc_skb()
>> but one of functions that it calls. As I said in the commit log message
>> of patch[1/3], the exact traced function will not be listed by

not patch[1/3], but patch[3/3]

>> save_stack_trace() because we don't create a stack frame at mcount().
>> I think this is a flaw in the current implementation (on x86).
>>
>> what do you think, Steve?
>>
>
> mcount (well ftrace_call actually) does indeed create a stack frame for
> itself *and* for what called it. At least on x86_64. See mcount_64.S.
>
> With -pg -mfentry, it creates a stack frame. Without -mfentry, mcount
> is called after the current function's frame is made so we don't need
> to do much.

Thank you for the explanation. But what I don't really understand here
is why we need to add the "current function" to the stack dump list
returned by save_stack_trace():

In check_stack(),
> /*
> * Add the passed in ip from the function tracer.
> * Searching for this on the stack will skip over
> * most of the overhead from the stack tracer itself.
> */
> stack_dump_trace[0] = ip;
> max_stack_trace.nr_entries++;

I think that "ip" here is the "return address for func" in your
ascii art, and it should be already in the list if a frame is made
by mcount (or func_call).

In fact, stack tracer on arm64 works OK even without the patch[3/3]
if the code quoted above is commented out.
Even on x86, the code is conditional and not activated if the kernel
is compiled without -mfentry before the following commit:
commit 4df297129f62 ("tracing: Remove most or all of stack tracer stack size from stack_max_size")

So what do I misunderstand here?

Thanks,
-Takahiro AKASHI

> Here's what the -mfentry version does:
>
> pushq %rbp
> pushq 8*2(%rsp) /* this is the parent pointer */
> pushq %rbp
> movq %rsp, %rbp
> pushq 8*3(%rsp) /* Return address to ftrace_call */
> pushq %rbp
> movq %rsp, %rbp
>
>
> Thus the stack looks like this:
>
> <---+
> | | |
> +------------------------------+ |
> | return address for func | |
> | return address for func_call | |
> | original %rbp | |
> +------------------------------+ |
> | return address for func | |
> | ptr to parent frame (%rbp) | ----+
> +------------------------------| <-----+
> | return address for func_call | |
> | ptr to next frame (%rbp) | ------+
> +------------------------------+ <---+
> |
> |
> Current %rbp points to func_call frame -----+
>
> The first box isn't used as a frame, but is used by ftrace_call to save
> information to restore everything properly.
>
> Thus, __alloc_skb() is what is currently being traced.
>
>
> -- Steve
>

2015-07-15 14:55:41

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Wed, 15 Jul 2015 20:41:34 +0900
AKASHI Takahiro <[email protected]> wrote:


> Thank you for the explanation. But what I don't really understand here
> is why we need to add the "current function" to the stack dump list
> returned by save_stack_trace():
>
> In check_stack(),
> > /*
> > * Add the passed in ip from the function tracer.
> > * Searching for this on the stack will skip over
> > * most of the overhead from the stack tracer itself.
> > */
> > stack_dump_trace[0] = ip;
> > max_stack_trace.nr_entries++;
>
> I think that "ip" here is the "return address for func" in your

Ah, you are correct (for fentry).

> ascii art, and it should be already in the list if a frame is made
> by mcount (or func_call).
>
> In fact, stack tracer on arm64 works OK even without the patch[3/3]
> if the code quoted above is commented out.
> Even on x86, the code is conditional and not activated if the kernel
> is compiled without -mfentry before the following commit:
> commit 4df297129f62 ("tracing: Remove most or all of stack tracer stack size from stack_max_size")
>
> So what do I misunderstand here?
>

Hmm, I haven't touched the stack trace code in a while. With the added
stack frame for fentry, the hacks there are probably not needed.

I'll take a look at it and try to clean up the code.

Thanks,

-- Steve

2015-07-15 16:13:42

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Wed, 15 Jul 2015 10:55:36 -0400
Steven Rostedt <[email protected]> wrote:


> I'll take a look at it and try to clean up the code.

Does the following patch make sense for you?

-- Steve

diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index 3f34496244e9..9384647d07c3 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -18,12 +18,6 @@

#define STACK_TRACE_ENTRIES 500

-#ifdef CC_USING_FENTRY
-# define fentry 1
-#else
-# define fentry 0
-#endif
-
static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES+1] =
{ [0 ... (STACK_TRACE_ENTRIES)] = ULONG_MAX };
static unsigned stack_dump_index[STACK_TRACE_ENTRIES];
@@ -35,7 +29,7 @@ static unsigned stack_dump_index[STACK_TRACE_ENTRIES];
*/
static struct stack_trace max_stack_trace = {
.max_entries = STACK_TRACE_ENTRIES - 1,
- .entries = &stack_dump_trace[1],
+ .entries = &stack_dump_trace[0],
};

static unsigned long max_stack_size;
@@ -77,7 +71,7 @@ check_stack(unsigned long ip, unsigned long *stack)
unsigned long this_size, flags; unsigned long *p, *top, *start;
static int tracer_frame;
int frame_size = ACCESS_ONCE(tracer_frame);
- int i;
+ int i, x;

this_size = ((unsigned long)stack) & (THREAD_SIZE-1);
this_size = THREAD_SIZE - this_size;
@@ -105,26 +99,20 @@ check_stack(unsigned long ip, unsigned long *stack)
max_stack_size = this_size;

max_stack_trace.nr_entries = 0;
-
- if (using_ftrace_ops_list_func())
- max_stack_trace.skip = 4;
- else
- max_stack_trace.skip = 3;
+ max_stack_trace.skip = 0;

save_stack_trace(&max_stack_trace);

- /*
- * Add the passed in ip from the function tracer.
- * Searching for this on the stack will skip over
- * most of the overhead from the stack tracer itself.
- */
- stack_dump_trace[0] = ip;
- max_stack_trace.nr_entries++;
+ /* Skip over the overhead of the stack tracer itself */
+ for (i = 0; i < max_stack_trace.nr_entries; i++) {
+ if (stack_dump_trace[i] == ip)
+ break;
+ }

/*
* Now find where in the stack these are.
*/
- i = 0;
+ x = 0;
start = stack;
top = (unsigned long *)
(((unsigned long)start & ~(THREAD_SIZE-1)) + THREAD_SIZE);
@@ -139,12 +127,13 @@ check_stack(unsigned long ip, unsigned long *stack)
while (i < max_stack_trace.nr_entries) {
int found = 0;

- stack_dump_index[i] = this_size;
+ stack_dump_index[x] = this_size;
p = start;

for (; p < top && i < max_stack_trace.nr_entries; p++) {
if (*p == stack_dump_trace[i]) {
- this_size = stack_dump_index[i++] =
+ stack_dump_trace[x] = stack_dump_trace[i++];
+ this_size = stack_dump_index[x++] =
(top - p) * sizeof(unsigned long);
found = 1;
/* Start the search from here */
@@ -168,6 +157,9 @@ check_stack(unsigned long ip, unsigned long *stack)
i++;
}

+ for (; x < max_stack_trace.nr_entries; x++)
+ stack_dump_trace[x] = ULONG_MAX;
+
if (task_stack_end_corrupted(current)) {
print_max_stack();
BUG();
@@ -192,24 +184,7 @@ stack_trace_call(unsigned long ip, unsigned long parent_ip,
if (per_cpu(trace_active, cpu)++ != 0)
goto out;

- /*
- * When fentry is used, the traced function does not get
- * its stack frame set up, and we lose the parent.
- * The ip is pretty useless because the function tracer
- * was called before that function set up its stack frame.
- * In this case, we use the parent ip.
- *
- * By adding the return address of either the parent ip
- * or the current ip we can disregard most of the stack usage
- * caused by the stack tracer itself.
- *
- * The function tracer always reports the address of where the
- * mcount call was, but the stack will hold the return address.
- */
- if (fentry)
- ip = parent_ip;
- else
- ip += MCOUNT_INSN_SIZE;
+ ip += MCOUNT_INSN_SIZE;

check_stack(ip, &stack);

2015-07-16 00:27:56

by AKASHI Takahiro

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On 07/16/2015 01:13 AM, Steven Rostedt wrote:
> On Wed, 15 Jul 2015 10:55:36 -0400
> Steven Rostedt <[email protected]> wrote:
>
>
>> I'll take a look at it and try to clean up the code.
>
> Does the following patch make sense for you?

Looks nice. The patch greatly simplifies changes on arm64 side.

- Takahiro AKASHI

> -- Steve
>
> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
> index 3f34496244e9..9384647d07c3 100644
> --- a/kernel/trace/trace_stack.c
> +++ b/kernel/trace/trace_stack.c
> @@ -18,12 +18,6 @@
>
> #define STACK_TRACE_ENTRIES 500
>
> -#ifdef CC_USING_FENTRY
> -# define fentry 1
> -#else
> -# define fentry 0
> -#endif
> -
> static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES+1] =
> { [0 ... (STACK_TRACE_ENTRIES)] = ULONG_MAX };
> static unsigned stack_dump_index[STACK_TRACE_ENTRIES];
> @@ -35,7 +29,7 @@ static unsigned stack_dump_index[STACK_TRACE_ENTRIES];
> */
> static struct stack_trace max_stack_trace = {
> .max_entries = STACK_TRACE_ENTRIES - 1,
> - .entries = &stack_dump_trace[1],
> + .entries = &stack_dump_trace[0],
> };
>
> static unsigned long max_stack_size;
> @@ -77,7 +71,7 @@ check_stack(unsigned long ip, unsigned long *stack)
> unsigned long this_size, flags; unsigned long *p, *top, *start;
> static int tracer_frame;
> int frame_size = ACCESS_ONCE(tracer_frame);
> - int i;
> + int i, x;
>
> this_size = ((unsigned long)stack) & (THREAD_SIZE-1);
> this_size = THREAD_SIZE - this_size;
> @@ -105,26 +99,20 @@ check_stack(unsigned long ip, unsigned long *stack)
> max_stack_size = this_size;
>
> max_stack_trace.nr_entries = 0;
> -
> - if (using_ftrace_ops_list_func())
> - max_stack_trace.skip = 4;
> - else
> - max_stack_trace.skip = 3;
> + max_stack_trace.skip = 0;
>
> save_stack_trace(&max_stack_trace);
>
> - /*
> - * Add the passed in ip from the function tracer.
> - * Searching for this on the stack will skip over
> - * most of the overhead from the stack tracer itself.
> - */
> - stack_dump_trace[0] = ip;
> - max_stack_trace.nr_entries++;
> + /* Skip over the overhead of the stack tracer itself */
> + for (i = 0; i < max_stack_trace.nr_entries; i++) {
> + if (stack_dump_trace[i] == ip)
> + break;
> + }
>
> /*
> * Now find where in the stack these are.
> */
> - i = 0;
> + x = 0;
> start = stack;
> top = (unsigned long *)
> (((unsigned long)start & ~(THREAD_SIZE-1)) + THREAD_SIZE);
> @@ -139,12 +127,13 @@ check_stack(unsigned long ip, unsigned long *stack)
> while (i < max_stack_trace.nr_entries) {
> int found = 0;
>
> - stack_dump_index[i] = this_size;
> + stack_dump_index[x] = this_size;
> p = start;
>
> for (; p < top && i < max_stack_trace.nr_entries; p++) {
> if (*p == stack_dump_trace[i]) {
> - this_size = stack_dump_index[i++] =
> + stack_dump_trace[x] = stack_dump_trace[i++];
> + this_size = stack_dump_index[x++] =
> (top - p) * sizeof(unsigned long);
> found = 1;
> /* Start the search from here */
> @@ -168,6 +157,9 @@ check_stack(unsigned long ip, unsigned long *stack)
> i++;
> }
>
> + for (; x < max_stack_trace.nr_entries; x++)
> + stack_dump_trace[x] = ULONG_MAX;
> +
> if (task_stack_end_corrupted(current)) {
> print_max_stack();
> BUG();
> @@ -192,24 +184,7 @@ stack_trace_call(unsigned long ip, unsigned long parent_ip,
> if (per_cpu(trace_active, cpu)++ != 0)
> goto out;
>
> - /*
> - * When fentry is used, the traced function does not get
> - * its stack frame set up, and we lose the parent.
> - * The ip is pretty useless because the function tracer
> - * was called before that function set up its stack frame.
> - * In this case, we use the parent ip.
> - *
> - * By adding the return address of either the parent ip
> - * or the current ip we can disregard most of the stack usage
> - * caused by the stack tracer itself.
> - *
> - * The function tracer always reports the address of where the
> - * mcount call was, but the stack will hold the return address.
> - */
> - if (fentry)
> - ip = parent_ip;
> - else
> - ip += MCOUNT_INSN_SIZE;
> + ip += MCOUNT_INSN_SIZE;
>
> check_stack(ip, &stack);
>
>

2015-07-16 01:08:13

by AKASHI Takahiro

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On 07/16/2015 09:27 AM, AKASHI Takahiro wrote:
> On 07/16/2015 01:13 AM, Steven Rostedt wrote:
>> On Wed, 15 Jul 2015 10:55:36 -0400
>> Steven Rostedt <[email protected]> wrote:
>>
>>
>>> I'll take a look at it and try to clean up the code.
>>
>> Does the following patch make sense for you?
>
> Looks nice. The patch greatly simplifies changes on arm64 side.

As follows:

- Takahiro AKASHI

diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index c5534fa..868d6f1 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -15,6 +15,7 @@

#define MCOUNT_ADDR ((unsigned long)_mcount)
#define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
+#define FTRACE_STACK_FRAME_OFFSET 4 /* sync it up with stacktrace.c */

#ifndef __ASSEMBLY__
#include <linux/compat.h>
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 1da6029..2c1bf7d 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -260,6 +260,13 @@ static inline void ftrace_kill(void) { }
#endif /* CONFIG_FUNCTION_TRACER */

#ifdef CONFIG_STACK_TRACER
+/*
+ * the offset value to add to return address from save_stack_trace()
+ */
+#ifndef FTRACE_STACK_FRAME_OFFSET
+#define FTRACE_STACK_FRAME_OFFSET 0
+#endif
+
extern int stack_tracer_enabled;
int
stack_trace_sysctl(struct ctl_table *table, int write,
diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index 9384647..c5b9748 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -105,7 +105,7 @@ check_stack(unsigned long ip, unsigned long *stack)

/* Skip over the overhead of the stack tracer itself */
for (i = 0; i < max_stack_trace.nr_entries; i++) {
- if (stack_dump_trace[i] == ip)
+ if ((stack_dump_trace[i] + FTRACE_STACK_FRAME_OFFSET) == ip)
break;
}

@@ -131,7 +131,8 @@ check_stack(unsigned long ip, unsigned long *stack)
p = start;

for (; p < top && i < max_stack_trace.nr_entries; p++) {
- if (*p == stack_dump_trace[i]) {
+ if (*p == (stack_dump_trace[i]
+ + FTRACE_STACK_FRAME_OFFSET)) {
stack_dump_trace[x] = stack_dump_trace[i++];
this_size = stack_dump_index[x++] =
(top - p) * sizeof(unsigned long);
--
1.7.9.5

2015-07-16 01:38:26

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Thu, 16 Jul 2015 10:08:03 +0900
AKASHI Takahiro <[email protected]> wrote:

> On 07/16/2015 09:27 AM, AKASHI Takahiro wrote:
> > On 07/16/2015 01:13 AM, Steven Rostedt wrote:
> >> On Wed, 15 Jul 2015 10:55:36 -0400
> >> Steven Rostedt <[email protected]> wrote:
> >>
> >>
> >>> I'll take a look at it and try to clean up the code.
> >>
> >> Does the following patch make sense for you?
> >
> > Looks nice. The patch greatly simplifies changes on arm64 side.
>
> As follows:
>
> - Takahiro AKASHI
>

How should we go forward with this. Want me to make my change to my
tree, and pull this patch in with it? I can set up for a 4.3 release.

Then if I can get an Acked-by from one of the arm64 maintainers, I can
have this go through my tree?

-- Steve


> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
> index c5534fa..868d6f1 100644
> --- a/arch/arm64/include/asm/ftrace.h
> +++ b/arch/arm64/include/asm/ftrace.h
> @@ -15,6 +15,7 @@
>
> #define MCOUNT_ADDR ((unsigned long)_mcount)
> #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
> +#define FTRACE_STACK_FRAME_OFFSET 4 /* sync it up with stacktrace.c */
>
> #ifndef __ASSEMBLY__
> #include <linux/compat.h>
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 1da6029..2c1bf7d 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -260,6 +260,13 @@ static inline void ftrace_kill(void) { }
> #endif /* CONFIG_FUNCTION_TRACER */
>
> #ifdef CONFIG_STACK_TRACER
> +/*
> + * the offset value to add to return address from save_stack_trace()
> + */
> +#ifndef FTRACE_STACK_FRAME_OFFSET
> +#define FTRACE_STACK_FRAME_OFFSET 0
> +#endif
> +
> extern int stack_tracer_enabled;
> int
> stack_trace_sysctl(struct ctl_table *table, int write,
> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
> index 9384647..c5b9748 100644
> --- a/kernel/trace/trace_stack.c
> +++ b/kernel/trace/trace_stack.c
> @@ -105,7 +105,7 @@ check_stack(unsigned long ip, unsigned long *stack)
>
> /* Skip over the overhead of the stack tracer itself */
> for (i = 0; i < max_stack_trace.nr_entries; i++) {
> - if (stack_dump_trace[i] == ip)
> + if ((stack_dump_trace[i] + FTRACE_STACK_FRAME_OFFSET) == ip)
> break;
> }
>
> @@ -131,7 +131,8 @@ check_stack(unsigned long ip, unsigned long *stack)
> p = start;
>
> for (; p < top && i < max_stack_trace.nr_entries; p++) {
> - if (*p == stack_dump_trace[i]) {
> + if (*p == (stack_dump_trace[i]
> + + FTRACE_STACK_FRAME_OFFSET)) {
> stack_dump_trace[x] = stack_dump_trace[i++];
> this_size = stack_dump_index[x++] =
> (top - p) * sizeof(unsigned long);

2015-07-16 13:29:09

by Jungseok Lee

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Jul 16, 2015, at 10:08 AM, AKASHI Takahiro wrote:

Hi, AKASHI

> On 07/16/2015 09:27 AM, AKASHI Takahiro wrote:
>> On 07/16/2015 01:13 AM, Steven Rostedt wrote:
>>> On Wed, 15 Jul 2015 10:55:36 -0400
>>> Steven Rostedt <[email protected]> wrote:
>>>
>>>
>>>> I'll take a look at it and try to clean up the code.
>>>
>>> Does the following patch make sense for you?
>>
>> Looks nice. The patch greatly simplifies changes on arm64 side.
>
> As follows:
>
> - Takahiro AKASHI
>
> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
> index c5534fa..868d6f1 100644
> --- a/arch/arm64/include/asm/ftrace.h
> +++ b/arch/arm64/include/asm/ftrace.h
> @@ -15,6 +15,7 @@
>
> #define MCOUNT_ADDR ((unsigned long)_mcount)
> #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
> +#define FTRACE_STACK_FRAME_OFFSET 4 /* sync it up with stacktrace.c */

How about binding it to -4 in unwind_frame function?
IMHO, it would help other developers trying to change stack trace code
be aware of this stack tracer feature.

> #ifndef __ASSEMBLY__
> #include <linux/compat.h>
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 1da6029..2c1bf7d 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -260,6 +260,13 @@ static inline void ftrace_kill(void) { }
> #endif /* CONFIG_FUNCTION_TRACER */
>
> #ifdef CONFIG_STACK_TRACER
> +/*
> + * the offset value to add to return address from save_stack_trace()
> + */
> +#ifndef FTRACE_STACK_FRAME_OFFSET
> +#define FTRACE_STACK_FRAME_OFFSET 0
> +#endif
> +
> extern int stack_tracer_enabled;
> int
> stack_trace_sysctl(struct ctl_table *table, int write,
> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
> index 9384647..c5b9748 100644
> --- a/kernel/trace/trace_stack.c
> +++ b/kernel/trace/trace_stack.c
> @@ -105,7 +105,7 @@ check_stack(unsigned long ip, unsigned long *stack)
>
> /* Skip over the overhead of the stack tracer itself */
> for (i = 0; i < max_stack_trace.nr_entries; i++) {
> - if (stack_dump_trace[i] == ip)
> + if ((stack_dump_trace[i] + FTRACE_STACK_FRAME_OFFSET) == ip)
> break;
> }
>
> @@ -131,7 +131,8 @@ check_stack(unsigned long ip, unsigned long *stack)
> p = start;
>
> for (; p < top && i < max_stack_trace.nr_entries; p++) {
> - if (*p == stack_dump_trace[i]) {
> + if (*p == (stack_dump_trace[i]
> + + FTRACE_STACK_FRAME_OFFSET)) {
> stack_dump_trace[x] = stack_dump_trace[i++];
> this_size = stack_dump_index[x++] =
> (top - p) * sizeof(unsigned long);
> --

I've prepared a kernel with the following patches and reviewed them.

1) Steve's clean up patch
2) This patch
3) [RFC 2/3]

AFAIU, [RFC 3/3] is not needed any more thanks to Steve's patch.

First of all, let's look at the following data.

1) stack_trace data
Depth Size Location (55 entries)
----- ---- --------
0) 4808 16 notifier_call_chain+0x2c/0x94
1) 4792 64 raw_notifier_call_chain+0x34/0x44
2) 4728 48 timekeeping_update.constprop.9+0xb8/0x114
3) 4680 48 update_wall_time+0x408/0x6dc
4) 4632 128 tick_do_update_jiffies64+0xd8/0x154
5) 4504 80 tick_sched_do_timer+0x50/0x60
6) 4424 32 tick_sched_timer+0x34/0x90
7) 4392 48 __run_hrtimer+0x60/0x258
8) 4344 64 hrtimer_interrupt+0xe8/0x260
9) 4280 128 arch_timer_handler_virt+0x38/0x48
10) 4152 32 handle_percpu_devid_irq+0x84/0x188
11) 4120 64 generic_handle_irq+0x38/0x54
12) 4056 32 __handle_domain_irq+0x68/0xbc
13) 4024 64 gic_handle_irq+0x38/0x88
14) 3960 336 el1_irq+0x64/0xd8
15) 3624 16 netif_rx_internal+0x14/0x198
16) 3608 64 netif_rx+0x20/0xa4
17) 3544 32 loopback_xmit+0x64/0xf4
18) 3512 48 dev_hard_start_xmit+0x25c/0x3f8
19) 3464 160 __dev_queue_xmit+0x440/0x4dc
20) 3304 96 dev_queue_xmit_sk+0x20/0x30
21) 3208 32 ip_finish_output+0x1e0/0xabc
22) 3176 96 ip_output+0xf0/0x120
23) 3080 64 ip_local_out_sk+0x44/0x54
24) 3016 32 ip_send_skb+0x24/0xbc
25) 2984 48 ip_push_pending_frames+0x40/0x60
26) 2936 64 icmp_push_reply+0x104/0x144
27) 2872 96 icmp_send+0x3c0/0x3c8
28) 2776 192 __udp4_lib_rcv+0x5b8/0x684
29) 2584 96 udp_rcv+0x2c/0x3c
30) 2488 32 ip_local_deliver+0xa0/0x224
31) 2456 48 ip_rcv+0x360/0x57c
32) 2408 64 __netif_receive_skb_core+0x4d0/0x80c
33) 2344 128 __netif_receive_skb+0x24/0x84
34) 2216 32 process_backlog+0x9c/0x15c
35) 2184 80 net_rx_action+0x1ec/0x32c
36) 2104 160 __do_softirq+0x114/0x2f0
37) 1944 128 do_softirq+0x60/0x68
38) 1816 32 __local_bh_enable_ip+0xb0/0xd4
39) 1784 32 ip_finish_output+0x1f4/0xabc
40) 1752 96 ip_output+0xf0/0x120
41) 1656 64 ip_local_out_sk+0x44/0x54
42) 1592 32 ip_send_skb+0x24/0xbc
43) 1560 48 udp_send_skb+0x1b4/0x2f4
44) 1512 80 udp_sendmsg+0x2a8/0x7a0
45) 1432 272 inet_sendmsg+0xa0/0xd0
46) 1160 48 sock_sendmsg+0x30/0x78
47) 1112 32 ___sys_sendmsg+0x15c/0x26c
48) 1080 400 __sys_sendmmsg+0x94/0x180
49) 680 320 SyS_sendmmsg+0x38/0x54
50) 360 360 el0_svc_naked+0x20/0x28

2) stack_max_size data
4888

The data looks odd in two points.
1) the number of entry
There is a mismatch between start token and real data

2) 80-byte gap
stack_max_size is not aligned with "Depth" field of the first entry of stack_trace.

IMHO, two items are not considered in this series as digging them out.

1) skipped entries
As x variable is introduced in Steve's patch, it is needed to track down
how many entries are recorded in both stack_dump_trace and stack_dump_index.

2) max_stack_trace.skip value
max_stack_trace.skip is 0 as applying Steve's patch. The above gap could be
observed unless the value is not considered in arch code. In the above example,
80-byte gap is save_stack_trace function in arch/arm64/kernel/stacktrace.c.

As applying the following fix, stack_trace and stack_max_size are okay.
However, I'm not sure which code, arch or generic ftrace, should handle trace->skip.
The latter one is responsible for it under current implementation, not Steve's change.

Please correct me if I am wrong.

Best Regards
Jungseok Lee

----8<----

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 978c923..5f6f4df 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -137,7 +137,9 @@ void save_stack_trace(struct stack_trace *trace)

frame.fp = (unsigned long)__builtin_frame_address(0);
frame.sp = current_stack_pointer;
- frame.pc = (unsigned long)save_stack_trace_tsk;
+ frame.pc = (unsigned long)save_stack_trace;
+
+ trace->skip += 4;

__save_stack_trace(&frame, trace, 0);
}
diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index c5b9748..5635683 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -158,6 +158,9 @@ check_stack(unsigned long ip, unsigned long *stack)
i++;
}

+ /* We do not want to include skipped entries */
+ max_stack_trace.nr_entries -= (i - x);
+
for (; x < max_stack_trace.nr_entries; x++)
stack_dump_trace[x] = ULONG_MAX;

@@ -330,7 +333,7 @@ static int t_show(struct seq_file *m, void *v)
seq_printf(m, " Depth Size Location"
" (%d entries)\n"
" ----- ---- --------\n",
- max_stack_trace.nr_entries - 1);
+ max_stack_trace.nr_entries);

if (!stack_tracer_enabled && !max_stack_size)
print_disabled(m);

----8<----

1) stack_trace data
Depth Size Location (47 entries)
----- ---- --------
0) 4648 16 notifier_call_chain+0x2c/0x94
1) 4632 64 raw_notifier_call_chain+0x34/0x44
2) 4568 48 timekeeping_update.constprop.9+0xb8/0x114
3) 4520 48 update_wall_time+0x408/0x6dc
4) 4472 128 tick_do_update_jiffies64+0xd8/0x154
5) 4344 80 tick_sched_do_timer+0x50/0x60
6) 4264 32 tick_sched_timer+0x34/0x90
7) 4232 48 __run_hrtimer+0x60/0x258
8) 4184 64 hrtimer_interrupt+0xe8/0x260
9) 4120 128 arch_timer_handler_virt+0x38/0x48
10) 3992 32 handle_percpu_devid_irq+0x84/0x188
11) 3960 64 generic_handle_irq+0x38/0x54
12) 3896 32 __handle_domain_irq+0x68/0xbc
13) 3864 64 gic_handle_irq+0x38/0x88
14) 3800 336 el1_irq+0x64/0xd8
15) 3464 48 __kmalloc_track_caller+0x40/0x2f8
16) 3416 16 __kmalloc_reserve.isra.46+0x44/0x9c
17) 3400 144 __alloc_skb+0x70/0x180
18) 3256 96 alloc_skb_with_frags+0x74/0x234
19) 3160 112 sock_alloc_send_pskb+0x1d0/0x294
20) 3048 160 sock_alloc_send_skb+0x44/0x54
21) 2888 64 __ip_append_data.isra.40+0x78c/0xb48
22) 2824 224 ip_append_data.part.42+0x98/0xe8
23) 2600 112 ip_append_data+0x68/0x7c
24) 2488 96 icmp_push_reply+0x7c/0x144
25) 2392 96 icmp_send+0x3c0/0x3c8
26) 2296 192 __udp4_lib_rcv+0x5b8/0x684
27) 2104 96 udp_rcv+0x2c/0x3c
28) 2008 32 ip_local_deliver+0xa0/0x224
29) 1976 48 ip_rcv+0x360/0x57c
30) 1928 64 __netif_receive_skb_core+0x4d0/0x80c
31) 1864 128 __netif_receive_skb+0x24/0x84
32) 1736 32 process_backlog+0x9c/0x15c
33) 1704 80 net_rx_action+0x1ec/0x32c
34) 1624 160 __do_softirq+0x114/0x2f0
35) 1464 128 do_softirq+0x60/0x68
36) 1336 32 __local_bh_enable_ip+0xb0/0xd4
37) 1304 32 ip_finish_output+0x1f4/0xabc
38) 1272 96 ip_output+0xf0/0x120
39) 1176 64 ip_local_out_sk+0x44/0x54
40) 1112 32 ip_send_skb+0x24/0xbc
41) 1080 48 udp_send_skb+0x1b4/0x2f4
42) 1032 80 udp_sendmsg+0x2a8/0x7a0
43) 952 272 inet_sendmsg+0xa0/0xd0
44) 680 48 sock_sendmsg+0x30/0x78
45) 632 32 SyS_sendto+0xc4/0x108
46) 600 600 el0_svc_naked+0x20/0x28

2) stack_max_size data
4648

2015-07-16 13:54:52

by Jungseok Lee

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Jul 16, 2015, at 10:29 PM, Jungseok Lee wrote:
> On Jul 16, 2015, at 10:08 AM, AKASHI Takahiro wrote:
>
> Hi, AKASHI
>
>> On 07/16/2015 09:27 AM, AKASHI Takahiro wrote:
>>> On 07/16/2015 01:13 AM, Steven Rostedt wrote:
>>>> On Wed, 15 Jul 2015 10:55:36 -0400
>>>> Steven Rostedt <[email protected]> wrote:
>>>>
>>>>
>>>>> I'll take a look at it and try to clean up the code.
>>>>
>>>> Does the following patch make sense for you?
>>>
>>> Looks nice. The patch greatly simplifies changes on arm64 side.
>>
>> As follows:
>>
>> - Takahiro AKASHI
>>
>> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
>> index c5534fa..868d6f1 100644
>> --- a/arch/arm64/include/asm/ftrace.h
>> +++ b/arch/arm64/include/asm/ftrace.h
>> @@ -15,6 +15,7 @@
>>
>> #define MCOUNT_ADDR ((unsigned long)_mcount)
>> #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
>> +#define FTRACE_STACK_FRAME_OFFSET 4 /* sync it up with stacktrace.c */
>
> How about binding it to -4 in unwind_frame function?
> IMHO, it would help other developers trying to change stack trace code
> be aware of this stack tracer feature.
>
>> #ifndef __ASSEMBLY__
>> #include <linux/compat.h>
>> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
>> index 1da6029..2c1bf7d 100644
>> --- a/include/linux/ftrace.h
>> +++ b/include/linux/ftrace.h
>> @@ -260,6 +260,13 @@ static inline void ftrace_kill(void) { }
>> #endif /* CONFIG_FUNCTION_TRACER */
>>
>> #ifdef CONFIG_STACK_TRACER
>> +/*
>> + * the offset value to add to return address from save_stack_trace()
>> + */
>> +#ifndef FTRACE_STACK_FRAME_OFFSET
>> +#define FTRACE_STACK_FRAME_OFFSET 0
>> +#endif
>> +
>> extern int stack_tracer_enabled;
>> int
>> stack_trace_sysctl(struct ctl_table *table, int write,
>> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
>> index 9384647..c5b9748 100644
>> --- a/kernel/trace/trace_stack.c
>> +++ b/kernel/trace/trace_stack.c
>> @@ -105,7 +105,7 @@ check_stack(unsigned long ip, unsigned long *stack)
>>
>> /* Skip over the overhead of the stack tracer itself */
>> for (i = 0; i < max_stack_trace.nr_entries; i++) {
>> - if (stack_dump_trace[i] == ip)
>> + if ((stack_dump_trace[i] + FTRACE_STACK_FRAME_OFFSET) == ip)
>> break;
>> }
>>
>> @@ -131,7 +131,8 @@ check_stack(unsigned long ip, unsigned long *stack)
>> p = start;
>>
>> for (; p < top && i < max_stack_trace.nr_entries; p++) {
>> - if (*p == stack_dump_trace[i]) {
>> + if (*p == (stack_dump_trace[i]
>> + + FTRACE_STACK_FRAME_OFFSET)) {
>> stack_dump_trace[x] = stack_dump_trace[i++];
>> this_size = stack_dump_index[x++] =
>> (top - p) * sizeof(unsigned long);
>> --
>
> I've prepared a kernel with the following patches and reviewed them.
>
> 1) Steve's clean up patch
> 2) This patch
> 3) [RFC 2/3]
>
> AFAIU, [RFC 3/3] is not needed any more thanks to Steve's patch.
>
> First of all, let's look at the following data.
>
> 1) stack_trace data
> Depth Size Location (55 entries)
> ----- ---- --------
> 0) 4808 16 notifier_call_chain+0x2c/0x94
> 1) 4792 64 raw_notifier_call_chain+0x34/0x44
> 2) 4728 48 timekeeping_update.constprop.9+0xb8/0x114
> 3) 4680 48 update_wall_time+0x408/0x6dc
> 4) 4632 128 tick_do_update_jiffies64+0xd8/0x154
> 5) 4504 80 tick_sched_do_timer+0x50/0x60
> 6) 4424 32 tick_sched_timer+0x34/0x90
> 7) 4392 48 __run_hrtimer+0x60/0x258
> 8) 4344 64 hrtimer_interrupt+0xe8/0x260
> 9) 4280 128 arch_timer_handler_virt+0x38/0x48
> 10) 4152 32 handle_percpu_devid_irq+0x84/0x188
> 11) 4120 64 generic_handle_irq+0x38/0x54
> 12) 4056 32 __handle_domain_irq+0x68/0xbc
> 13) 4024 64 gic_handle_irq+0x38/0x88
> 14) 3960 336 el1_irq+0x64/0xd8
> 15) 3624 16 netif_rx_internal+0x14/0x198
> 16) 3608 64 netif_rx+0x20/0xa4
> 17) 3544 32 loopback_xmit+0x64/0xf4
> 18) 3512 48 dev_hard_start_xmit+0x25c/0x3f8
> 19) 3464 160 __dev_queue_xmit+0x440/0x4dc
> 20) 3304 96 dev_queue_xmit_sk+0x20/0x30
> 21) 3208 32 ip_finish_output+0x1e0/0xabc
> 22) 3176 96 ip_output+0xf0/0x120
> 23) 3080 64 ip_local_out_sk+0x44/0x54
> 24) 3016 32 ip_send_skb+0x24/0xbc
> 25) 2984 48 ip_push_pending_frames+0x40/0x60
> 26) 2936 64 icmp_push_reply+0x104/0x144
> 27) 2872 96 icmp_send+0x3c0/0x3c8
> 28) 2776 192 __udp4_lib_rcv+0x5b8/0x684
> 29) 2584 96 udp_rcv+0x2c/0x3c
> 30) 2488 32 ip_local_deliver+0xa0/0x224
> 31) 2456 48 ip_rcv+0x360/0x57c
> 32) 2408 64 __netif_receive_skb_core+0x4d0/0x80c
> 33) 2344 128 __netif_receive_skb+0x24/0x84
> 34) 2216 32 process_backlog+0x9c/0x15c
> 35) 2184 80 net_rx_action+0x1ec/0x32c
> 36) 2104 160 __do_softirq+0x114/0x2f0
> 37) 1944 128 do_softirq+0x60/0x68
> 38) 1816 32 __local_bh_enable_ip+0xb0/0xd4
> 39) 1784 32 ip_finish_output+0x1f4/0xabc
> 40) 1752 96 ip_output+0xf0/0x120
> 41) 1656 64 ip_local_out_sk+0x44/0x54
> 42) 1592 32 ip_send_skb+0x24/0xbc
> 43) 1560 48 udp_send_skb+0x1b4/0x2f4
> 44) 1512 80 udp_sendmsg+0x2a8/0x7a0
> 45) 1432 272 inet_sendmsg+0xa0/0xd0
> 46) 1160 48 sock_sendmsg+0x30/0x78
> 47) 1112 32 ___sys_sendmsg+0x15c/0x26c
> 48) 1080 400 __sys_sendmmsg+0x94/0x180
> 49) 680 320 SyS_sendmmsg+0x38/0x54
> 50) 360 360 el0_svc_naked+0x20/0x28
>
> 2) stack_max_size data
> 4888
>
> The data looks odd in two points.
> 1) the number of entry
> There is a mismatch between start token and real data
>
> 2) 80-byte gap
> stack_max_size is not aligned with "Depth" field of the first entry of stack_trace.
>
> IMHO, two items are not considered in this series as digging them out.
>
> 1) skipped entries
> As x variable is introduced in Steve's patch, it is needed to track down
> how many entries are recorded in both stack_dump_trace and stack_dump_index.
>
> 2) max_stack_trace.skip value
> max_stack_trace.skip is 0 as applying Steve's patch. The above gap could be
> observed unless the value is not considered in arch code. In the above example,
> 80-byte gap is save_stack_trace function in arch/arm64/kernel/stacktrace.c.
>
> As applying the following fix, stack_trace and stack_max_size are okay.
> However, I'm not sure which code, arch or generic ftrace, should handle trace->skip.
> The latter one is responsible for it under current implementation, not Steve's change.
>
> Please correct me if I am wrong.
>
> Best Regards
> Jungseok Lee
>
> ----8<----
>
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index 978c923..5f6f4df 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -137,7 +137,9 @@ void save_stack_trace(struct stack_trace *trace)
>
> frame.fp = (unsigned long)__builtin_frame_address(0);
> frame.sp = current_stack_pointer;
> - frame.pc = (unsigned long)save_stack_trace_tsk;
> + frame.pc = (unsigned long)save_stack_trace;
> +
> + trace->skip += 4;
>
> __save_stack_trace(&frame, trace, 0);
> }
> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
> index c5b9748..5635683 100644
> --- a/kernel/trace/trace_stack.c
> +++ b/kernel/trace/trace_stack.c
> @@ -158,6 +158,9 @@ check_stack(unsigned long ip, unsigned long *stack)
> i++;
> }
>
> + /* We do not want to include skipped entries */
> + max_stack_trace.nr_entries -= (i - x);
> +
> for (; x < max_stack_trace.nr_entries; x++)

This line should be changed as follows.

for (; x < i; x++)

Sorry for confusion.

Best Regards
Jungseok Lee-

2015-07-16 14:24:11

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Thu, 16 Jul 2015 22:29:05 +0900
Jungseok Lee <[email protected]> wrote:


> First of all, let's look at the following data.
>
> 1) stack_trace data
> Depth Size Location (55 entries)
> ----- ---- --------
> 0) 4808 16 notifier_call_chain+0x2c/0x94
> 1) 4792 64 raw_notifier_call_chain+0x34/0x44
> 2) 4728 48 timekeeping_update.constprop.9+0xb8/0x114
> 3) 4680 48 update_wall_time+0x408/0x6dc
> 4) 4632 128 tick_do_update_jiffies64+0xd8/0x154
> 5) 4504 80 tick_sched_do_timer+0x50/0x60
> 6) 4424 32 tick_sched_timer+0x34/0x90
> 7) 4392 48 __run_hrtimer+0x60/0x258
> 8) 4344 64 hrtimer_interrupt+0xe8/0x260
> 9) 4280 128 arch_timer_handler_virt+0x38/0x48
> 10) 4152 32 handle_percpu_devid_irq+0x84/0x188
> 11) 4120 64 generic_handle_irq+0x38/0x54
> 12) 4056 32 __handle_domain_irq+0x68/0xbc
> 13) 4024 64 gic_handle_irq+0x38/0x88
> 14) 3960 336 el1_irq+0x64/0xd8
> 15) 3624 16 netif_rx_internal+0x14/0x198
> 16) 3608 64 netif_rx+0x20/0xa4
> 17) 3544 32 loopback_xmit+0x64/0xf4
> 18) 3512 48 dev_hard_start_xmit+0x25c/0x3f8
> 19) 3464 160 __dev_queue_xmit+0x440/0x4dc
> 20) 3304 96 dev_queue_xmit_sk+0x20/0x30
> 21) 3208 32 ip_finish_output+0x1e0/0xabc
> 22) 3176 96 ip_output+0xf0/0x120
> 23) 3080 64 ip_local_out_sk+0x44/0x54
> 24) 3016 32 ip_send_skb+0x24/0xbc
> 25) 2984 48 ip_push_pending_frames+0x40/0x60
> 26) 2936 64 icmp_push_reply+0x104/0x144
> 27) 2872 96 icmp_send+0x3c0/0x3c8
> 28) 2776 192 __udp4_lib_rcv+0x5b8/0x684
> 29) 2584 96 udp_rcv+0x2c/0x3c
> 30) 2488 32 ip_local_deliver+0xa0/0x224
> 31) 2456 48 ip_rcv+0x360/0x57c
> 32) 2408 64 __netif_receive_skb_core+0x4d0/0x80c
> 33) 2344 128 __netif_receive_skb+0x24/0x84
> 34) 2216 32 process_backlog+0x9c/0x15c
> 35) 2184 80 net_rx_action+0x1ec/0x32c
> 36) 2104 160 __do_softirq+0x114/0x2f0
> 37) 1944 128 do_softirq+0x60/0x68
> 38) 1816 32 __local_bh_enable_ip+0xb0/0xd4
> 39) 1784 32 ip_finish_output+0x1f4/0xabc
> 40) 1752 96 ip_output+0xf0/0x120
> 41) 1656 64 ip_local_out_sk+0x44/0x54
> 42) 1592 32 ip_send_skb+0x24/0xbc
> 43) 1560 48 udp_send_skb+0x1b4/0x2f4
> 44) 1512 80 udp_sendmsg+0x2a8/0x7a0
> 45) 1432 272 inet_sendmsg+0xa0/0xd0
> 46) 1160 48 sock_sendmsg+0x30/0x78
> 47) 1112 32 ___sys_sendmsg+0x15c/0x26c
> 48) 1080 400 __sys_sendmmsg+0x94/0x180
> 49) 680 320 SyS_sendmmsg+0x38/0x54
> 50) 360 360 el0_svc_naked+0x20/0x28
>
> 2) stack_max_size data
> 4888
>
> The data looks odd in two points.
> 1) the number of entry
> There is a mismatch between start token and real data

Yep, good catch. As soon as I read that, I realized exactly what the
issue was ;-)

>
> 2) 80-byte gap
> stack_max_size is not aligned with "Depth" field of the first entry of stack_trace.
>
> IMHO, two items are not considered in this series as digging them out.
>
> 1) skipped entries
> As x variable is introduced in Steve's patch, it is needed to track down
> how many entries are recorded in both stack_dump_trace and stack_dump_index.

Yep.

>
> 2) max_stack_trace.skip value
> max_stack_trace.skip is 0 as applying Steve's patch. The above gap could be
> observed unless the value is not considered in arch code. In the above example,
> 80-byte gap is save_stack_trace function in arch/arm64/kernel/stacktrace.c.
>
> As applying the following fix, stack_trace and stack_max_size are okay.
> However, I'm not sure which code, arch or generic ftrace, should handle trace->skip.
> The latter one is responsible for it under current implementation, not Steve's change.
>
> Please correct me if I am wrong.

No, it's a bug in my patch. I'll make an update.

Does this new patch fix it for you?

-- Steve

---
kernel/trace/trace_stack.c | 58 +++++++++++++--------------------------------
1 file changed, 17 insertions(+), 41 deletions(-)

Index: linux-trace.git/kernel/trace/trace_stack.c
===================================================================
--- linux-trace.git.orig/kernel/trace/trace_stack.c 2015-07-15 12:49:59.290063597 -0400
+++ linux-trace.git/kernel/trace/trace_stack.c 2015-07-16 09:55:37.936188197 -0400
@@ -18,12 +18,6 @@

#define STACK_TRACE_ENTRIES 500

-#ifdef CC_USING_FENTRY
-# define fentry 1
-#else
-# define fentry 0
-#endif
-
static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES+1] =
{ [0 ... (STACK_TRACE_ENTRIES)] = ULONG_MAX };
static unsigned stack_dump_index[STACK_TRACE_ENTRIES];
@@ -35,7 +29,7 @@ static unsigned stack_dump_index[STACK_T
*/
static struct stack_trace max_stack_trace = {
.max_entries = STACK_TRACE_ENTRIES - 1,
- .entries = &stack_dump_trace[1],
+ .entries = &stack_dump_trace[0],
};

static unsigned long max_stack_size;
@@ -77,7 +71,7 @@ check_stack(unsigned long ip, unsigned l
unsigned long this_size, flags; unsigned long *p, *top, *start;
static int tracer_frame;
int frame_size = ACCESS_ONCE(tracer_frame);
- int i;
+ int i, x;

this_size = ((unsigned long)stack) & (THREAD_SIZE-1);
this_size = THREAD_SIZE - this_size;
@@ -105,26 +99,20 @@ check_stack(unsigned long ip, unsigned l
max_stack_size = this_size;

max_stack_trace.nr_entries = 0;
-
- if (using_ftrace_ops_list_func())
- max_stack_trace.skip = 4;
- else
- max_stack_trace.skip = 3;
+ max_stack_trace.skip = 3;

save_stack_trace(&max_stack_trace);

- /*
- * Add the passed in ip from the function tracer.
- * Searching for this on the stack will skip over
- * most of the overhead from the stack tracer itself.
- */
- stack_dump_trace[0] = ip;
- max_stack_trace.nr_entries++;
+ /* Skip over the overhead of the stack tracer itself */
+ for (i = 0; i < max_stack_trace.nr_entries; i++) {
+ if (stack_dump_trace[i] == ip)
+ break;
+ }

/*
* Now find where in the stack these are.
*/
- i = 0;
+ x = 0;
start = stack;
top = (unsigned long *)
(((unsigned long)start & ~(THREAD_SIZE-1)) + THREAD_SIZE);
@@ -139,12 +127,13 @@ check_stack(unsigned long ip, unsigned l
while (i < max_stack_trace.nr_entries) {
int found = 0;

- stack_dump_index[i] = this_size;
+ stack_dump_index[x] = this_size;
p = start;

for (; p < top && i < max_stack_trace.nr_entries; p++) {
if (*p == stack_dump_trace[i]) {
- this_size = stack_dump_index[i++] =
+ stack_dump_trace[x] = stack_dump_trace[i++];
+ this_size = stack_dump_index[x++] =
(top - p) * sizeof(unsigned long);
found = 1;
/* Start the search from here */
@@ -168,6 +157,10 @@ check_stack(unsigned long ip, unsigned l
i++;
}

+ max_stack_trace.nr_entries = x;
+ for (; x < i; x++)
+ stack_dump_trace[x] = ULONG_MAX;
+
if (task_stack_end_corrupted(current)) {
print_max_stack();
BUG();
@@ -192,24 +185,7 @@ stack_trace_call(unsigned long ip, unsig
if (per_cpu(trace_active, cpu)++ != 0)
goto out;

- /*
- * When fentry is used, the traced function does not get
- * its stack frame set up, and we lose the parent.
- * The ip is pretty useless because the function tracer
- * was called before that function set up its stack frame.
- * In this case, we use the parent ip.
- *
- * By adding the return address of either the parent ip
- * or the current ip we can disregard most of the stack usage
- * caused by the stack tracer itself.
- *
- * The function tracer always reports the address of where the
- * mcount call was, but the stack will hold the return address.
- */
- if (fentry)
- ip = parent_ip;
- else
- ip += MCOUNT_INSN_SIZE;
+ ip += MCOUNT_INSN_SIZE;

check_stack(ip, &stack);

2015-07-16 14:29:01

by Mark Rutland

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Thu, Jul 16, 2015 at 02:08:03AM +0100, AKASHI Takahiro wrote:
> On 07/16/2015 09:27 AM, AKASHI Takahiro wrote:
> > On 07/16/2015 01:13 AM, Steven Rostedt wrote:
> >> On Wed, 15 Jul 2015 10:55:36 -0400
> >> Steven Rostedt <[email protected]> wrote:
> >>
> >>
> >>> I'll take a look at it and try to clean up the code.
> >>
> >> Does the following patch make sense for you?
> >
> > Looks nice. The patch greatly simplifies changes on arm64 side.
>
> As follows:
>
> - Takahiro AKASHI
>
> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
> index c5534fa..868d6f1 100644
> --- a/arch/arm64/include/asm/ftrace.h
> +++ b/arch/arm64/include/asm/ftrace.h
> @@ -15,6 +15,7 @@
>
> #define MCOUNT_ADDR ((unsigned long)_mcount)
> #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
> +#define FTRACE_STACK_FRAME_OFFSET 4 /* sync it up with stacktrace.c */

Is there any reason we couldn't have the arch code dump the stack depth
for each function when it walks the stack to generate the stack trace?

That means we can provide a more precise result (because we know the
layour of our own stackframes), and we only need walk the stack once to
do so.

The downside is that we need a new function per-arch to do so.

Mark.

>
> #ifndef __ASSEMBLY__
> #include <linux/compat.h>
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 1da6029..2c1bf7d 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -260,6 +260,13 @@ static inline void ftrace_kill(void) { }
> #endif /* CONFIG_FUNCTION_TRACER */
>
> #ifdef CONFIG_STACK_TRACER
> +/*
> + * the offset value to add to return address from save_stack_trace()
> + */
> +#ifndef FTRACE_STACK_FRAME_OFFSET
> +#define FTRACE_STACK_FRAME_OFFSET 0
> +#endif
> +
> extern int stack_tracer_enabled;
> int
> stack_trace_sysctl(struct ctl_table *table, int write,
> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
> index 9384647..c5b9748 100644
> --- a/kernel/trace/trace_stack.c
> +++ b/kernel/trace/trace_stack.c
> @@ -105,7 +105,7 @@ check_stack(unsigned long ip, unsigned long *stack)
>
> /* Skip over the overhead of the stack tracer itself */
> for (i = 0; i < max_stack_trace.nr_entries; i++) {
> - if (stack_dump_trace[i] == ip)
> + if ((stack_dump_trace[i] + FTRACE_STACK_FRAME_OFFSET) == ip)
> break;
> }
>
> @@ -131,7 +131,8 @@ check_stack(unsigned long ip, unsigned long *stack)
> p = start;
>
> for (; p < top && i < max_stack_trace.nr_entries; p++) {
> - if (*p == stack_dump_trace[i]) {
> + if (*p == (stack_dump_trace[i]
> + + FTRACE_STACK_FRAME_OFFSET)) {
> stack_dump_trace[x] = stack_dump_trace[i++];
> this_size = stack_dump_index[x++] =
> (top - p) * sizeof(unsigned long);
> --
> 1.7.9.5
>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

2015-07-16 14:34:50

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Thu, 16 Jul 2015 15:28:34 +0100
Mark Rutland <[email protected]> wrote:


> > diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
> > index c5534fa..868d6f1 100644
> > --- a/arch/arm64/include/asm/ftrace.h
> > +++ b/arch/arm64/include/asm/ftrace.h
> > @@ -15,6 +15,7 @@
> >
> > #define MCOUNT_ADDR ((unsigned long)_mcount)
> > #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
> > +#define FTRACE_STACK_FRAME_OFFSET 4 /* sync it up with stacktrace.c */
>
> Is there any reason we couldn't have the arch code dump the stack depth
> for each function when it walks the stack to generate the stack trace?
>
> That means we can provide a more precise result (because we know the
> layour of our own stackframes), and we only need walk the stack once to
> do so.
>
> The downside is that we need a new function per-arch to do so.

Or we make check_patch() a weak function, and let archs override it. I
can possibly break up the code a bit to have helper functions where
things are the same.

I want x86 to be able to track irq stacks as well, but there's no way
to do that generically yet, so having arch specific functions has been
on my todo list.

-- Steve

2015-07-16 15:01:33

by Jungseok Lee

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Jul 16, 2015, at 11:24 PM, Steven Rostedt wrote:

Hi, Steve

> On Thu, 16 Jul 2015 22:29:05 +0900
> Jungseok Lee <[email protected]> wrote:

[ snip ]

>> The data looks odd in two points.
>> 1) the number of entry
>> There is a mismatch between start token and real data
>
> Yep, good catch. As soon as I read that, I realized exactly what the
> issue was ;-)
>
>>
>> 2) 80-byte gap
>> stack_max_size is not aligned with "Depth" field of the first entry of stack_trace.
>>
>> IMHO, two items are not considered in this series as digging them out.
>>
>> 1) skipped entries
>> As x variable is introduced in Steve's patch, it is needed to track down
>> how many entries are recorded in both stack_dump_trace and stack_dump_index.
>
> Yep.
>
>>
>> 2) max_stack_trace.skip value
>> max_stack_trace.skip is 0 as applying Steve's patch. The above gap could be
>> observed unless the value is not considered in arch code. In the above example,
>> 80-byte gap is save_stack_trace function in arch/arm64/kernel/stacktrace.c.
>>
>> As applying the following fix, stack_trace and stack_max_size are okay.
>> However, I'm not sure which code, arch or generic ftrace, should handle trace->skip.
>> The latter one is responsible for it under current implementation, not Steve's change.
>>
>> Please correct me if I am wrong.
>
> No, it's a bug in my patch. I'll make an update.
>
> Does this new patch fix it for you?

I've gathered stack tracer data with your update.

1) stack_trace
Depth Size Location (35 entries)
----- ---- --------
0) 4424 16 put_cpu_partial+0x28/0x1d0
1) 4408 80 get_partial_node.isra.64+0x13c/0x344
2) 4328 256 __slab_alloc.isra.65.constprop.67+0xd8/0x37c
3) 4072 32 kmem_cache_alloc+0x258/0x294
4) 4040 304 __alloc_skb+0x48/0x180
5) 3736 96 alloc_skb_with_frags+0x74/0x234
6) 3640 112 sock_alloc_send_pskb+0x1d0/0x294
7) 3528 160 sock_alloc_send_skb+0x44/0x54
8) 3368 64 __ip_append_data.isra.40+0x78c/0xb48
9) 3304 224 ip_append_data.part.42+0x98/0xe8
10) 3080 112 ip_append_data+0x68/0x7c
11) 2968 96 icmp_push_reply+0x7c/0x144
12) 2872 96 icmp_send+0x3c0/0x3c8
13) 2776 192 __udp4_lib_rcv+0x5b8/0x684
14) 2584 96 udp_rcv+0x2c/0x3c
15) 2488 32 ip_local_deliver+0xa0/0x224
16) 2456 48 ip_rcv+0x360/0x57c
17) 2408 64 __netif_receive_skb_core+0x4d0/0x80c
18) 2344 128 __netif_receive_skb+0x24/0x84
19) 2216 32 process_backlog+0x9c/0x15c
20) 2184 80 net_rx_action+0x1ec/0x32c
21) 2104 160 __do_softirq+0x114/0x2f0
22) 1944 128 do_softirq+0x60/0x68
23) 1816 32 __local_bh_enable_ip+0xb0/0xd4
24) 1784 32 ip_finish_output+0x1f4/0xabc
25) 1752 96 ip_output+0xf0/0x120
26) 1656 64 ip_local_out_sk+0x44/0x54
27) 1592 32 ip_send_skb+0x24/0xbc
28) 1560 48 udp_send_skb+0x1b4/0x2f4
29) 1512 80 udp_sendmsg+0x2a8/0x7a0
30) 1432 272 inet_sendmsg+0xa0/0xd0
31) 1160 48 sock_sendmsg+0x30/0x78
32) 1112 32 ___sys_sendmsg+0x15c/0x26c
33) 1080 400 __sys_sendmmsg+0x94/0x180
34) 680 320 SyS_sendmmsg+0x38/0x54
35) 360 360 el0_svc_naked+0x20/0x28

2) stack_max_size
4504

In case of the number of entries, the following diff might be needed
as I suggested in the previous reply. ;)

----8<----

@@ -330,7 +333,7 @@ static int t_show(struct seq_file *m, void *v)
seq_printf(m, " Depth Size Location"
" (%d entries)\n"
" ----- ---- --------\n",
- max_stack_trace.nr_entries - 1);
+ max_stack_trace.nr_entries);

if (!stack_tracer_enabled && !max_stack_size)
print_disabled(m);

----8<----

However, 80-byte gap still appears.

Since max_stack_trace.skip is 3 in your update, save_stack_trace in arm64
should be refactored to align with this value.

max_stack_trace.skip should be set to 4 if AKASHI's [RFC 2/3] patch is merged.
However, arch code is supposed to follow generic framework's rule in this case.
Isn't it?

Best Regards
Jungseok Lee-

2015-07-16 15:31:21

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Fri, 17 Jul 2015 00:01:25 +0900
Jungseok Lee <[email protected]> wrote:

> I've gathered stack tracer data with your update.
>
> 1) stack_trace
> Depth Size Location (35 entries)
> ----- ---- --------
> 0) 4424 16 put_cpu_partial+0x28/0x1d0
> 1) 4408 80 get_partial_node.isra.64+0x13c/0x344
> 2) 4328 256 __slab_alloc.isra.65.constprop.67+0xd8/0x37c
> 3) 4072 32 kmem_cache_alloc+0x258/0x294
> 4) 4040 304 __alloc_skb+0x48/0x180
> 5) 3736 96 alloc_skb_with_frags+0x74/0x234
> 6) 3640 112 sock_alloc_send_pskb+0x1d0/0x294
> 7) 3528 160 sock_alloc_send_skb+0x44/0x54
> 8) 3368 64 __ip_append_data.isra.40+0x78c/0xb48
> 9) 3304 224 ip_append_data.part.42+0x98/0xe8
> 10) 3080 112 ip_append_data+0x68/0x7c
> 11) 2968 96 icmp_push_reply+0x7c/0x144
> 12) 2872 96 icmp_send+0x3c0/0x3c8
> 13) 2776 192 __udp4_lib_rcv+0x5b8/0x684
> 14) 2584 96 udp_rcv+0x2c/0x3c
> 15) 2488 32 ip_local_deliver+0xa0/0x224
> 16) 2456 48 ip_rcv+0x360/0x57c
> 17) 2408 64 __netif_receive_skb_core+0x4d0/0x80c
> 18) 2344 128 __netif_receive_skb+0x24/0x84
> 19) 2216 32 process_backlog+0x9c/0x15c
> 20) 2184 80 net_rx_action+0x1ec/0x32c
> 21) 2104 160 __do_softirq+0x114/0x2f0
> 22) 1944 128 do_softirq+0x60/0x68
> 23) 1816 32 __local_bh_enable_ip+0xb0/0xd4
> 24) 1784 32 ip_finish_output+0x1f4/0xabc
> 25) 1752 96 ip_output+0xf0/0x120
> 26) 1656 64 ip_local_out_sk+0x44/0x54
> 27) 1592 32 ip_send_skb+0x24/0xbc
> 28) 1560 48 udp_send_skb+0x1b4/0x2f4
> 29) 1512 80 udp_sendmsg+0x2a8/0x7a0
> 30) 1432 272 inet_sendmsg+0xa0/0xd0
> 31) 1160 48 sock_sendmsg+0x30/0x78
> 32) 1112 32 ___sys_sendmsg+0x15c/0x26c
> 33) 1080 400 __sys_sendmmsg+0x94/0x180
> 34) 680 320 SyS_sendmmsg+0x38/0x54
> 35) 360 360 el0_svc_naked+0x20/0x28
>
> 2) stack_max_size
> 4504

Strange, on x86 I have this (with my patch applied):

Depth Size Location (39 entries)
----- ---- --------
0) 3704 64 _raw_spin_lock+0x5/0x30
1) 3640 200 get_partial_node.isra.80+0x54/0x1da
2) 3440 208 __slab_alloc.isra.82+0x199/0x3f7
3) 3232 80 kmem_cache_alloc+0x151/0x160
4) 3152 16 mempool_alloc_slab+0x15/0x20
5) 3136 128 mempool_alloc+0x58/0x150
6) 3008 16 scsi_sg_alloc+0x42/0x50
7) 2992 112 __sg_alloc_table+0x10b/0x150
8) 2880 48 scsi_alloc_sgtable+0x43/0x80
9) 2832 32 scsi_init_sgtable+0x2b/0x70
10) 2800 80 scsi_init_io+0x59/0x1e0
11) 2720 128 sd_init_command+0x66/0xd80
12) 2592 24 scsi_setup_cmnd+0xa9/0x160
13) 2568 88 scsi_prep_fn+0x7d/0x160
14) 2480 48 blk_peek_request+0x168/0x2a0
15) 2432 112 scsi_request_fn+0x3f/0x610
16) 2320 8 __blk_run_queue+0x37/0x50
17) 2312 104 queue_unplugged+0x41/0xe0
18) 2208 112 blk_flush_plug_list+0x1b7/0x1e0
19) 2096 80 blk_queue_bio+0x257/0x340
20) 2016 48 generic_make_request+0xb1/0xf0
21) 1968 96 submit_bio+0x76/0x130
22) 1872 48 submit_bh_wbc.isra.35+0x10b/0x140
23) 1824 112 __block_write_full_page.constprop.40+0x188/0x310
24) 1712 64 block_write_full_page+0xdd/0x130
25) 1648 16 blkdev_writepage+0x18/0x20
26) 1632 8 __writepage+0x17/0x40
27) 1624 312 write_cache_pages+0x21e/0x480
28) 1312 96 generic_writepages+0x4a/0x70
29) 1216 16 do_writepages+0x20/0x30
30) 1200 96 __writeback_single_inode+0x45/0x350
31) 1104 176 writeback_sb_inodes+0x218/0x3d0
32) 928 80 __writeback_inodes_wb+0x8c/0xc0
33) 848 128 wb_writeback+0x239/0x2c0
34) 720 192 wb_workfn+0x24b/0x460
35) 528 80 process_one_work+0x14b/0x430
36) 448 128 worker_thread+0x117/0x460
37) 320 144 kthread+0xc9/0xe0
38) 176 176 ret_from_fork+0x3f/0x70

# cat /debug/tracing/stack_max_size
3704


>
> In case of the number of entries, the following diff might be needed
> as I suggested in the previous reply. ;)
>
> ----8<----
>
> @@ -330,7 +333,7 @@ static int t_show(struct seq_file *m, void *v)
> seq_printf(m, " Depth Size Location"
> " (%d entries)\n"
> " ----- ---- --------\n",
> - max_stack_trace.nr_entries - 1);
> + max_stack_trace.nr_entries);

This would break x86.

>
> if (!stack_tracer_enabled && !max_stack_size)
> print_disabled(m);
>
> ----8<----
>
> However, 80-byte gap still appears.

This seems to be specific to your arch.

>
> Since max_stack_trace.skip is 3 in your update, save_stack_trace in arm64
> should be refactored to align with this value.
>
> max_stack_trace.skip should be set to 4 if AKASHI's [RFC 2/3] patch is merged.
> However, arch code is supposed to follow generic framework's rule in this case.
> Isn't it?
>

yeah, you don't want to update the skip level. It should just work.

I'll run this on my powerpc box and see if it shows something
different. If I have to, I'll even boot up my arm (not 64) board and
try it there.


-- Steve

2015-07-16 15:52:16

by Jungseok Lee

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Jul 17, 2015, at 12:31 AM, Steven Rostedt wrote:
> On Fri, 17 Jul 2015 00:01:25 +0900
> Jungseok Lee <[email protected]> wrote:
>
>> I've gathered stack tracer data with your update.
>>
>> 1) stack_trace
>> Depth Size Location (35 entries)
>> ----- ---- --------
>> 0) 4424 16 put_cpu_partial+0x28/0x1d0
>> 1) 4408 80 get_partial_node.isra.64+0x13c/0x344
>> 2) 4328 256 __slab_alloc.isra.65.constprop.67+0xd8/0x37c
>> 3) 4072 32 kmem_cache_alloc+0x258/0x294
>> 4) 4040 304 __alloc_skb+0x48/0x180
>> 5) 3736 96 alloc_skb_with_frags+0x74/0x234
>> 6) 3640 112 sock_alloc_send_pskb+0x1d0/0x294
>> 7) 3528 160 sock_alloc_send_skb+0x44/0x54
>> 8) 3368 64 __ip_append_data.isra.40+0x78c/0xb48
>> 9) 3304 224 ip_append_data.part.42+0x98/0xe8
>> 10) 3080 112 ip_append_data+0x68/0x7c
>> 11) 2968 96 icmp_push_reply+0x7c/0x144
>> 12) 2872 96 icmp_send+0x3c0/0x3c8
>> 13) 2776 192 __udp4_lib_rcv+0x5b8/0x684
>> 14) 2584 96 udp_rcv+0x2c/0x3c
>> 15) 2488 32 ip_local_deliver+0xa0/0x224
>> 16) 2456 48 ip_rcv+0x360/0x57c
>> 17) 2408 64 __netif_receive_skb_core+0x4d0/0x80c
>> 18) 2344 128 __netif_receive_skb+0x24/0x84
>> 19) 2216 32 process_backlog+0x9c/0x15c
>> 20) 2184 80 net_rx_action+0x1ec/0x32c
>> 21) 2104 160 __do_softirq+0x114/0x2f0
>> 22) 1944 128 do_softirq+0x60/0x68
>> 23) 1816 32 __local_bh_enable_ip+0xb0/0xd4
>> 24) 1784 32 ip_finish_output+0x1f4/0xabc
>> 25) 1752 96 ip_output+0xf0/0x120
>> 26) 1656 64 ip_local_out_sk+0x44/0x54
>> 27) 1592 32 ip_send_skb+0x24/0xbc
>> 28) 1560 48 udp_send_skb+0x1b4/0x2f4
>> 29) 1512 80 udp_sendmsg+0x2a8/0x7a0
>> 30) 1432 272 inet_sendmsg+0xa0/0xd0
>> 31) 1160 48 sock_sendmsg+0x30/0x78
>> 32) 1112 32 ___sys_sendmsg+0x15c/0x26c
>> 33) 1080 400 __sys_sendmmsg+0x94/0x180
>> 34) 680 320 SyS_sendmmsg+0x38/0x54
>> 35) 360 360 el0_svc_naked+0x20/0x28
>>
>> 2) stack_max_size
>> 4504
>
> Strange, on x86 I have this (with my patch applied):
>
> Depth Size Location (39 entries)
> ----- ---- --------
> 0) 3704 64 _raw_spin_lock+0x5/0x30
> 1) 3640 200 get_partial_node.isra.80+0x54/0x1da
> 2) 3440 208 __slab_alloc.isra.82+0x199/0x3f7
> 3) 3232 80 kmem_cache_alloc+0x151/0x160
> 4) 3152 16 mempool_alloc_slab+0x15/0x20
> 5) 3136 128 mempool_alloc+0x58/0x150
> 6) 3008 16 scsi_sg_alloc+0x42/0x50
> 7) 2992 112 __sg_alloc_table+0x10b/0x150
> 8) 2880 48 scsi_alloc_sgtable+0x43/0x80
> 9) 2832 32 scsi_init_sgtable+0x2b/0x70
> 10) 2800 80 scsi_init_io+0x59/0x1e0
> 11) 2720 128 sd_init_command+0x66/0xd80
> 12) 2592 24 scsi_setup_cmnd+0xa9/0x160
> 13) 2568 88 scsi_prep_fn+0x7d/0x160
> 14) 2480 48 blk_peek_request+0x168/0x2a0
> 15) 2432 112 scsi_request_fn+0x3f/0x610
> 16) 2320 8 __blk_run_queue+0x37/0x50
> 17) 2312 104 queue_unplugged+0x41/0xe0
> 18) 2208 112 blk_flush_plug_list+0x1b7/0x1e0
> 19) 2096 80 blk_queue_bio+0x257/0x340
> 20) 2016 48 generic_make_request+0xb1/0xf0
> 21) 1968 96 submit_bio+0x76/0x130
> 22) 1872 48 submit_bh_wbc.isra.35+0x10b/0x140
> 23) 1824 112 __block_write_full_page.constprop.40+0x188/0x310
> 24) 1712 64 block_write_full_page+0xdd/0x130
> 25) 1648 16 blkdev_writepage+0x18/0x20
> 26) 1632 8 __writepage+0x17/0x40
> 27) 1624 312 write_cache_pages+0x21e/0x480
> 28) 1312 96 generic_writepages+0x4a/0x70
> 29) 1216 16 do_writepages+0x20/0x30
> 30) 1200 96 __writeback_single_inode+0x45/0x350
> 31) 1104 176 writeback_sb_inodes+0x218/0x3d0
> 32) 928 80 __writeback_inodes_wb+0x8c/0xc0
> 33) 848 128 wb_writeback+0x239/0x2c0
> 34) 720 192 wb_workfn+0x24b/0x460
> 35) 528 80 process_one_work+0x14b/0x430
> 36) 448 128 worker_thread+0x117/0x460
> 37) 320 144 kthread+0xc9/0xe0
> 38) 176 176 ret_from_fork+0x3f/0x70
>
> # cat /debug/tracing/stack_max_size
> 3704
>
>
>>
>> In case of the number of entries, the following diff might be needed
>> as I suggested in the previous reply. ;)
>>
>> ----8<----
>>
>> @@ -330,7 +333,7 @@ static int t_show(struct seq_file *m, void *v)
>> seq_printf(m, " Depth Size Location"
>> " (%d entries)\n"
>> " ----- ---- --------\n",
>> - max_stack_trace.nr_entries - 1);
>> + max_stack_trace.nr_entries);
>
> This would break x86.

Thanks for x86 data. It's really helpful!

>
>>
>> if (!stack_tracer_enabled && !max_stack_size)
>> print_disabled(m);
>>
>> ----8<----
>>
>> However, 80-byte gap still appears.
>
> This seems to be specific to your arch.

Totally agree.

Best Regards
Jungseok Lee

2015-07-16 16:17:05

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Thu, 16 Jul 2015 11:31:15 -0400
Steven Rostedt <[email protected]> wrote:

> >
> > In case of the number of entries, the following diff might be needed
> > as I suggested in the previous reply. ;)
> >
> > ----8<----
> >
> > @@ -330,7 +333,7 @@ static int t_show(struct seq_file *m, void *v)
> > seq_printf(m, " Depth Size Location"
> > " (%d entries)\n"
> > " ----- ---- --------\n",
> > - max_stack_trace.nr_entries - 1);
> > + max_stack_trace.nr_entries);
>
> This would break x86.

Nope, you were correct. But it was also missing another change:

-- Steve

Index: linux-trace.git/kernel/trace/trace_stack.c
===================================================================
--- linux-trace.git.orig/kernel/trace/trace_stack.c 2015-07-16 11:45:17.816249113 -0400
+++ linux-trace.git/kernel/trace/trace_stack.c 2015-07-16 12:01:51.051403333 -0400
@@ -284,7 +260,7 @@ __next(struct seq_file *m, loff_t *pos)
{
long n = *pos - 1;

- if (n >= max_stack_trace.nr_entries || stack_dump_trace[n] == ULONG_MAX)
+ if (n > max_stack_trace.nr_entries || stack_dump_trace[n] == ULONG_MAX)
return NULL;

m->private = (void *)n;
@@ -354,7 +330,7 @@ static int t_show(struct seq_file *m, vo
seq_printf(m, " Depth Size Location"
" (%d entries)\n"
" ----- ---- --------\n",
- max_stack_trace.nr_entries - 1);
+ max_stack_trace.nr_entries);

if (!stack_tracer_enabled && !max_stack_size)
print_disabled(m);

2015-07-16 20:23:00

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()


Here's the patch I now have in my local repo, and plan on pushing to my
repo on korg.

-- Steve

>From d21f02a45fa367beaf97b153aa29849c06ac5609 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Red Hat)" <[email protected]>
Date: Thu, 16 Jul 2015 13:24:54 -0400
Subject: [PATCH] tracing: Clean up stack tracing and fix fentry updates

Akashi Takahiro was porting the stack tracer to arm64 and found some
issues with it. One was that it repeats the top function, due to the
stack frame added by the mcount caller and added by itself. This
was added when fentry came in, and before fentry created its own stack
frame. But x86's fentry now creates its own stack frame, and there's
no need to insert the function again.

This also cleans up the code a bit, where it doesn't need to do something
special for fentry, and doesn't include insertion of a duplicate
entry for the called function being traced.

Link: http://lkml.kernel.org/r/[email protected]

Some-Suggestions-by: Jungseok Lee <[email protected]>
Reported-by: AKASHI Takahiro <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
kernel/trace/trace_stack.c | 62 ++++++++++++++--------------------------------
1 file changed, 19 insertions(+), 43 deletions(-)

diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index 3f34496244e9..e615cdc9e38a 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -18,12 +18,6 @@

#define STACK_TRACE_ENTRIES 500

-#ifdef CC_USING_FENTRY
-# define fentry 1
-#else
-# define fentry 0
-#endif
-
static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES+1] =
{ [0 ... (STACK_TRACE_ENTRIES)] = ULONG_MAX };
static unsigned stack_dump_index[STACK_TRACE_ENTRIES];
@@ -35,7 +29,7 @@ static unsigned stack_dump_index[STACK_TRACE_ENTRIES];
*/
static struct stack_trace max_stack_trace = {
.max_entries = STACK_TRACE_ENTRIES - 1,
- .entries = &stack_dump_trace[1],
+ .entries = &stack_dump_trace[0],
};

static unsigned long max_stack_size;
@@ -77,7 +71,7 @@ check_stack(unsigned long ip, unsigned long *stack)
unsigned long this_size, flags; unsigned long *p, *top, *start;
static int tracer_frame;
int frame_size = ACCESS_ONCE(tracer_frame);
- int i;
+ int i, x;

this_size = ((unsigned long)stack) & (THREAD_SIZE-1);
this_size = THREAD_SIZE - this_size;
@@ -105,26 +99,20 @@ check_stack(unsigned long ip, unsigned long *stack)
max_stack_size = this_size;

max_stack_trace.nr_entries = 0;
-
- if (using_ftrace_ops_list_func())
- max_stack_trace.skip = 4;
- else
- max_stack_trace.skip = 3;
+ max_stack_trace.skip = 3;

save_stack_trace(&max_stack_trace);

- /*
- * Add the passed in ip from the function tracer.
- * Searching for this on the stack will skip over
- * most of the overhead from the stack tracer itself.
- */
- stack_dump_trace[0] = ip;
- max_stack_trace.nr_entries++;
+ /* Skip over the overhead of the stack tracer itself */
+ for (i = 0; i < max_stack_trace.nr_entries; i++) {
+ if (stack_dump_trace[i] == ip)
+ break;
+ }

/*
* Now find where in the stack these are.
*/
- i = 0;
+ x = 0;
start = stack;
top = (unsigned long *)
(((unsigned long)start & ~(THREAD_SIZE-1)) + THREAD_SIZE);
@@ -139,12 +127,13 @@ check_stack(unsigned long ip, unsigned long *stack)
while (i < max_stack_trace.nr_entries) {
int found = 0;

- stack_dump_index[i] = this_size;
+ stack_dump_index[x] = this_size;
p = start;

for (; p < top && i < max_stack_trace.nr_entries; p++) {
if (*p == stack_dump_trace[i]) {
- this_size = stack_dump_index[i++] =
+ stack_dump_trace[x] = stack_dump_trace[i++];
+ this_size = stack_dump_index[x++] =
(top - p) * sizeof(unsigned long);
found = 1;
/* Start the search from here */
@@ -168,6 +157,10 @@ check_stack(unsigned long ip, unsigned long *stack)
i++;
}

+ max_stack_trace.nr_entries = x;
+ for (; x < i; x++)
+ stack_dump_trace[x] = ULONG_MAX;
+
if (task_stack_end_corrupted(current)) {
print_max_stack();
BUG();
@@ -192,24 +185,7 @@ stack_trace_call(unsigned long ip, unsigned long parent_ip,
if (per_cpu(trace_active, cpu)++ != 0)
goto out;

- /*
- * When fentry is used, the traced function does not get
- * its stack frame set up, and we lose the parent.
- * The ip is pretty useless because the function tracer
- * was called before that function set up its stack frame.
- * In this case, we use the parent ip.
- *
- * By adding the return address of either the parent ip
- * or the current ip we can disregard most of the stack usage
- * caused by the stack tracer itself.
- *
- * The function tracer always reports the address of where the
- * mcount call was, but the stack will hold the return address.
- */
- if (fentry)
- ip = parent_ip;
- else
- ip += MCOUNT_INSN_SIZE;
+ ip += MCOUNT_INSN_SIZE;

check_stack(ip, &stack);

@@ -284,7 +260,7 @@ __next(struct seq_file *m, loff_t *pos)
{
long n = *pos - 1;

- if (n >= max_stack_trace.nr_entries || stack_dump_trace[n] == ULONG_MAX)
+ if (n > max_stack_trace.nr_entries || stack_dump_trace[n] == ULONG_MAX)
return NULL;

m->private = (void *)n;
@@ -354,7 +330,7 @@ static int t_show(struct seq_file *m, void *v)
seq_printf(m, " Depth Size Location"
" (%d entries)\n"
" ----- ---- --------\n",
- max_stack_trace.nr_entries - 1);
+ max_stack_trace.nr_entries);

if (!stack_tracer_enabled && !max_stack_size)
print_disabled(m);
--
1.8.3.1

2015-07-17 02:05:03

by AKASHI Takahiro

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

Jungseok,

Thank you for your testing and reviews.

On 07/16/2015 10:29 PM, Jungseok Lee wrote:
> On Jul 16, 2015, at 10:08 AM, AKASHI Takahiro wrote:
>
> Hi, AKASHI
>
>> On 07/16/2015 09:27 AM, AKASHI Takahiro wrote:
>>> On 07/16/2015 01:13 AM, Steven Rostedt wrote:
>>>> On Wed, 15 Jul 2015 10:55:36 -0400
>>>> Steven Rostedt <[email protected]> wrote:
>>>>
>>>>
>>>>> I'll take a look at it and try to clean up the code.
>>>>
>>>> Does the following patch make sense for you?
>>>
>>> Looks nice. The patch greatly simplifies changes on arm64 side.
>>
>> As follows:
>>
>> - Takahiro AKASHI
>>
>> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
>> index c5534fa..868d6f1 100644
>> --- a/arch/arm64/include/asm/ftrace.h
>> +++ b/arch/arm64/include/asm/ftrace.h
>> @@ -15,6 +15,7 @@
>>
>> #define MCOUNT_ADDR ((unsigned long)_mcount)
>> #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
>> +#define FTRACE_STACK_FRAME_OFFSET 4 /* sync it up with stacktrace.c */

Well,
#define FTRACE_STACK_FRAME_OFFSET AARCH64_INSN_SIZE
might be better.

> How about binding it to -4 in unwind_frame function?

Do you mean like this?
In unwind_frame(),
frame->pc = *(unsigned long*)(fp + 8) - AARCH64_INSN_SIZE;

> IMHO, it would help other developers trying to change stack trace code
> be aware of this stack tracer feature.
>
>> #ifndef __ASSEMBLY__
>> #include <linux/compat.h>
>> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
>> index 1da6029..2c1bf7d 100644
>> --- a/include/linux/ftrace.h
>> +++ b/include/linux/ftrace.h
>> @@ -260,6 +260,13 @@ static inline void ftrace_kill(void) { }
>> #endif /* CONFIG_FUNCTION_TRACER */
>>
>> #ifdef CONFIG_STACK_TRACER
>> +/*
>> + * the offset value to add to return address from save_stack_trace()
>> + */
>> +#ifndef FTRACE_STACK_FRAME_OFFSET
>> +#define FTRACE_STACK_FRAME_OFFSET 0
>> +#endif
>> +
>> extern int stack_tracer_enabled;
>> int
>> stack_trace_sysctl(struct ctl_table *table, int write,
>> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
>> index 9384647..c5b9748 100644
>> --- a/kernel/trace/trace_stack.c
>> +++ b/kernel/trace/trace_stack.c
>> @@ -105,7 +105,7 @@ check_stack(unsigned long ip, unsigned long *stack)
>>
>> /* Skip over the overhead of the stack tracer itself */
>> for (i = 0; i < max_stack_trace.nr_entries; i++) {
>> - if (stack_dump_trace[i] == ip)
>> + if ((stack_dump_trace[i] + FTRACE_STACK_FRAME_OFFSET) == ip)
>> break;
>> }
>>
>> @@ -131,7 +131,8 @@ check_stack(unsigned long ip, unsigned long *stack)
>> p = start;
>>
>> for (; p < top && i < max_stack_trace.nr_entries; p++) {
>> - if (*p == stack_dump_trace[i]) {
>> + if (*p == (stack_dump_trace[i]
>> + + FTRACE_STACK_FRAME_OFFSET)) {
>> stack_dump_trace[x] = stack_dump_trace[i++];
>> this_size = stack_dump_index[x++] =
>> (top - p) * sizeof(unsigned long);
>> --
>
> I've prepared a kernel with the following patches and reviewed them.
>
> 1) Steve's clean up patch
> 2) This patch
> 3) [RFC 2/3]
>
> AFAIU, [RFC 3/3] is not needed any more thanks to Steve's patch.

We don't need [2/3].

> First of all, let's look at the following data.
>
> 1) stack_trace data
> Depth Size Location (55 entries)
> ----- ---- --------
> 0) 4808 16 notifier_call_chain+0x2c/0x94
> 1) 4792 64 raw_notifier_call_chain+0x34/0x44
> 2) 4728 48 timekeeping_update.constprop.9+0xb8/0x114
> 3) 4680 48 update_wall_time+0x408/0x6dc
> 4) 4632 128 tick_do_update_jiffies64+0xd8/0x154
> 5) 4504 80 tick_sched_do_timer+0x50/0x60
> 6) 4424 32 tick_sched_timer+0x34/0x90
> 7) 4392 48 __run_hrtimer+0x60/0x258
> 8) 4344 64 hrtimer_interrupt+0xe8/0x260
> 9) 4280 128 arch_timer_handler_virt+0x38/0x48
> 10) 4152 32 handle_percpu_devid_irq+0x84/0x188
> 11) 4120 64 generic_handle_irq+0x38/0x54
> 12) 4056 32 __handle_domain_irq+0x68/0xbc
> 13) 4024 64 gic_handle_irq+0x38/0x88
> 14) 3960 336 el1_irq+0x64/0xd8
> 15) 3624 16 netif_rx_internal+0x14/0x198
> 16) 3608 64 netif_rx+0x20/0xa4
> 17) 3544 32 loopback_xmit+0x64/0xf4
> 18) 3512 48 dev_hard_start_xmit+0x25c/0x3f8
> 19) 3464 160 __dev_queue_xmit+0x440/0x4dc
> 20) 3304 96 dev_queue_xmit_sk+0x20/0x30
> 21) 3208 32 ip_finish_output+0x1e0/0xabc
> 22) 3176 96 ip_output+0xf0/0x120
> 23) 3080 64 ip_local_out_sk+0x44/0x54
> 24) 3016 32 ip_send_skb+0x24/0xbc
> 25) 2984 48 ip_push_pending_frames+0x40/0x60
> 26) 2936 64 icmp_push_reply+0x104/0x144
> 27) 2872 96 icmp_send+0x3c0/0x3c8
> 28) 2776 192 __udp4_lib_rcv+0x5b8/0x684
> 29) 2584 96 udp_rcv+0x2c/0x3c
> 30) 2488 32 ip_local_deliver+0xa0/0x224
> 31) 2456 48 ip_rcv+0x360/0x57c
> 32) 2408 64 __netif_receive_skb_core+0x4d0/0x80c
> 33) 2344 128 __netif_receive_skb+0x24/0x84
> 34) 2216 32 process_backlog+0x9c/0x15c
> 35) 2184 80 net_rx_action+0x1ec/0x32c
> 36) 2104 160 __do_softirq+0x114/0x2f0
> 37) 1944 128 do_softirq+0x60/0x68
> 38) 1816 32 __local_bh_enable_ip+0xb0/0xd4
> 39) 1784 32 ip_finish_output+0x1f4/0xabc
> 40) 1752 96 ip_output+0xf0/0x120
> 41) 1656 64 ip_local_out_sk+0x44/0x54
> 42) 1592 32 ip_send_skb+0x24/0xbc
> 43) 1560 48 udp_send_skb+0x1b4/0x2f4
> 44) 1512 80 udp_sendmsg+0x2a8/0x7a0
> 45) 1432 272 inet_sendmsg+0xa0/0xd0
> 46) 1160 48 sock_sendmsg+0x30/0x78
> 47) 1112 32 ___sys_sendmsg+0x15c/0x26c
> 48) 1080 400 __sys_sendmmsg+0x94/0x180
> 49) 680 320 SyS_sendmmsg+0x38/0x54
> 50) 360 360 el0_svc_naked+0x20/0x28
>
> 2) stack_max_size data
> 4888
>
> The data looks odd in two points.
> 1) the number of entry
> There is a mismatch between start token and real data
>
> 2) 80-byte gap
> stack_max_size is not aligned with "Depth" field of the first entry of stack_trace.
>
> IMHO, two items are not considered in this series as digging them out.
>
> 1) skipped entries
> As x variable is introduced in Steve's patch, it is needed to track down
> how many entries are recorded in both stack_dump_trace and stack_dump_index.
>
> 2) max_stack_trace.skip value
> max_stack_trace.skip is 0 as applying Steve's patch. The above gap could be
> observed unless the value is not considered in arch code. In the above example,
> 80-byte gap is save_stack_trace function in arch/arm64/kernel/stacktrace.c.
>
> As applying the following fix, stack_trace and stack_max_size are okay.
> However, I'm not sure which code, arch or generic ftrace, should handle trace->skip.
> The latter one is responsible for it under current implementation, not Steve's change.
>
> Please correct me if I am wrong.

I will address these issues in my later replies.

Thanks,
-Takahiro AKASHI

> Best Regards
> Jungseok Lee
>
> ----8<----
>
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index 978c923..5f6f4df 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -137,7 +137,9 @@ void save_stack_trace(struct stack_trace *trace)
>
> frame.fp = (unsigned long)__builtin_frame_address(0);
> frame.sp = current_stack_pointer;
> - frame.pc = (unsigned long)save_stack_trace_tsk;
> + frame.pc = (unsigned long)save_stack_trace;
> +
> + trace->skip += 4;
>
> __save_stack_trace(&frame, trace, 0);
> }
> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
> index c5b9748..5635683 100644
> --- a/kernel/trace/trace_stack.c
> +++ b/kernel/trace/trace_stack.c
> @@ -158,6 +158,9 @@ check_stack(unsigned long ip, unsigned long *stack)
> i++;
> }
>
> + /* We do not want to include skipped entries */
> + max_stack_trace.nr_entries -= (i - x);
> +
> for (; x < max_stack_trace.nr_entries; x++)
> stack_dump_trace[x] = ULONG_MAX;
>
> @@ -330,7 +333,7 @@ static int t_show(struct seq_file *m, void *v)
> seq_printf(m, " Depth Size Location"
> " (%d entries)\n"
> " ----- ---- --------\n",
> - max_stack_trace.nr_entries - 1);
> + max_stack_trace.nr_entries);
>
> if (!stack_tracer_enabled && !max_stack_size)
> print_disabled(m);
>
> ----8<----
>
> 1) stack_trace data
> Depth Size Location (47 entries)
> ----- ---- --------
> 0) 4648 16 notifier_call_chain+0x2c/0x94
> 1) 4632 64 raw_notifier_call_chain+0x34/0x44
> 2) 4568 48 timekeeping_update.constprop.9+0xb8/0x114
> 3) 4520 48 update_wall_time+0x408/0x6dc
> 4) 4472 128 tick_do_update_jiffies64+0xd8/0x154
> 5) 4344 80 tick_sched_do_timer+0x50/0x60
> 6) 4264 32 tick_sched_timer+0x34/0x90
> 7) 4232 48 __run_hrtimer+0x60/0x258
> 8) 4184 64 hrtimer_interrupt+0xe8/0x260
> 9) 4120 128 arch_timer_handler_virt+0x38/0x48
> 10) 3992 32 handle_percpu_devid_irq+0x84/0x188
> 11) 3960 64 generic_handle_irq+0x38/0x54
> 12) 3896 32 __handle_domain_irq+0x68/0xbc
> 13) 3864 64 gic_handle_irq+0x38/0x88
> 14) 3800 336 el1_irq+0x64/0xd8
> 15) 3464 48 __kmalloc_track_caller+0x40/0x2f8
> 16) 3416 16 __kmalloc_reserve.isra.46+0x44/0x9c
> 17) 3400 144 __alloc_skb+0x70/0x180
> 18) 3256 96 alloc_skb_with_frags+0x74/0x234
> 19) 3160 112 sock_alloc_send_pskb+0x1d0/0x294
> 20) 3048 160 sock_alloc_send_skb+0x44/0x54
> 21) 2888 64 __ip_append_data.isra.40+0x78c/0xb48
> 22) 2824 224 ip_append_data.part.42+0x98/0xe8
> 23) 2600 112 ip_append_data+0x68/0x7c
> 24) 2488 96 icmp_push_reply+0x7c/0x144
> 25) 2392 96 icmp_send+0x3c0/0x3c8
> 26) 2296 192 __udp4_lib_rcv+0x5b8/0x684
> 27) 2104 96 udp_rcv+0x2c/0x3c
> 28) 2008 32 ip_local_deliver+0xa0/0x224
> 29) 1976 48 ip_rcv+0x360/0x57c
> 30) 1928 64 __netif_receive_skb_core+0x4d0/0x80c
> 31) 1864 128 __netif_receive_skb+0x24/0x84
> 32) 1736 32 process_backlog+0x9c/0x15c
> 33) 1704 80 net_rx_action+0x1ec/0x32c
> 34) 1624 160 __do_softirq+0x114/0x2f0
> 35) 1464 128 do_softirq+0x60/0x68
> 36) 1336 32 __local_bh_enable_ip+0xb0/0xd4
> 37) 1304 32 ip_finish_output+0x1f4/0xabc
> 38) 1272 96 ip_output+0xf0/0x120
> 39) 1176 64 ip_local_out_sk+0x44/0x54
> 40) 1112 32 ip_send_skb+0x24/0xbc
> 41) 1080 48 udp_send_skb+0x1b4/0x2f4
> 42) 1032 80 udp_sendmsg+0x2a8/0x7a0
> 43) 952 272 inet_sendmsg+0xa0/0xd0
> 44) 680 48 sock_sendmsg+0x30/0x78
> 45) 632 32 SyS_sendto+0xc4/0x108
> 46) 600 600 el0_svc_naked+0x20/0x28
>
> 2) stack_max_size data
> 4648
>

2015-07-17 02:09:40

by AKASHI Takahiro

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On 07/16/2015 11:34 PM, Steven Rostedt wrote:
> On Thu, 16 Jul 2015 15:28:34 +0100
> Mark Rutland <[email protected]> wrote:
>
>
>>> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
>>> index c5534fa..868d6f1 100644
>>> --- a/arch/arm64/include/asm/ftrace.h
>>> +++ b/arch/arm64/include/asm/ftrace.h
>>> @@ -15,6 +15,7 @@
>>>
>>> #define MCOUNT_ADDR ((unsigned long)_mcount)
>>> #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
>>> +#define FTRACE_STACK_FRAME_OFFSET 4 /* sync it up with stacktrace.c */
>>
>> Is there any reason we couldn't have the arch code dump the stack depth
>> for each function when it walks the stack to generate the stack trace?
>>
>> That means we can provide a more precise result (because we know the
>> layour of our own stackframes), and we only need walk the stack once to
>> do so.
>>
>> The downside is that we need a new function per-arch to do so.
>
> Or we make check_patch() a weak function, and let archs override it. I
> can possibly break up the code a bit to have helper functions where
> things are the same.

Yeah, that is exactly what I meant in my cover letter[0/3] if the series
of patches are not acceptable.

-Takahiro AKASHI

> I want x86 to be able to track irq stacks as well, but there's no way
> to do that generically yet, so having arch specific functions has been
> on my todo list.
>
> -- Steve
>

2015-07-17 02:50:02

by AKASHI Takahiro

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

Steve,

On 07/17/2015 05:22 AM, Steven Rostedt wrote:
>
> Here's the patch I now have in my local repo, and plan on pushing to my
> repo on korg.
>
> -- Steve
>
> From d21f02a45fa367beaf97b153aa29849c06ac5609 Mon Sep 17 00:00:00 2001
> From: "Steven Rostedt (Red Hat)" <[email protected]>
> Date: Thu, 16 Jul 2015 13:24:54 -0400
> Subject: [PATCH] tracing: Clean up stack tracing and fix fentry updates
>
> Akashi Takahiro was porting the stack tracer to arm64 and found some
> issues with it. One was that it repeats the top function, due to the
> stack frame added by the mcount caller and added by itself. This
> was added when fentry came in, and before fentry created its own stack
> frame. But x86's fentry now creates its own stack frame, and there's
> no need to insert the function again.
>
> This also cleans up the code a bit, where it doesn't need to do something
> special for fentry, and doesn't include insertion of a duplicate
> entry for the called function being traced.
>
> Link: http://lkml.kernel.org/r/[email protected]
>
> Some-Suggestions-by: Jungseok Lee <[email protected]>
> Reported-by: AKASHI Takahiro <[email protected]>
> Signed-off-by: Steven Rostedt <[email protected]>
> ---
> kernel/trace/trace_stack.c | 62 ++++++++++++++--------------------------------
> 1 file changed, 19 insertions(+), 43 deletions(-)
>
> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
> index 3f34496244e9..e615cdc9e38a 100644
> --- a/kernel/trace/trace_stack.c
> +++ b/kernel/trace/trace_stack.c
> @@ -18,12 +18,6 @@
>
> #define STACK_TRACE_ENTRIES 500
>
> -#ifdef CC_USING_FENTRY
> -# define fentry 1
> -#else
> -# define fentry 0
> -#endif
> -
> static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES+1] =
> { [0 ... (STACK_TRACE_ENTRIES)] = ULONG_MAX };
> static unsigned stack_dump_index[STACK_TRACE_ENTRIES];
> @@ -35,7 +29,7 @@ static unsigned stack_dump_index[STACK_TRACE_ENTRIES];
> */
> static struct stack_trace max_stack_trace = {
> .max_entries = STACK_TRACE_ENTRIES - 1,
> - .entries = &stack_dump_trace[1],
> + .entries = &stack_dump_trace[0],
> };
>
> static unsigned long max_stack_size;
> @@ -77,7 +71,7 @@ check_stack(unsigned long ip, unsigned long *stack)
> unsigned long this_size, flags; unsigned long *p, *top, *start;
> static int tracer_frame;
> int frame_size = ACCESS_ONCE(tracer_frame);
> - int i;
> + int i, x;
>
> this_size = ((unsigned long)stack) & (THREAD_SIZE-1);
> this_size = THREAD_SIZE - this_size;
> @@ -105,26 +99,20 @@ check_stack(unsigned long ip, unsigned long *stack)
> max_stack_size = this_size;
>
> max_stack_trace.nr_entries = 0;
> -
> - if (using_ftrace_ops_list_func())
> - max_stack_trace.skip = 4;
> - else
> - max_stack_trace.skip = 3;
> + max_stack_trace.skip = 3;

I don't think this last line is necessary because we will skip all
the functions anyway below:

> save_stack_trace(&max_stack_trace);
>
> - /*
> - * Add the passed in ip from the function tracer.
> - * Searching for this on the stack will skip over
> - * most of the overhead from the stack tracer itself.
> - */
> - stack_dump_trace[0] = ip;
> - max_stack_trace.nr_entries++;
> + /* Skip over the overhead of the stack tracer itself */
> + for (i = 0; i < max_stack_trace.nr_entries; i++) {
> + if (stack_dump_trace[i] == ip)
> + break;
> + }

here. Now "i" indicates the start point, excepting tracer functions,
and "x" will eventually represent the exact number of functions
that we are interested in after searching the stack.

To calc "stack_max_size" correctly, we should change the line:
if (unlikely(tracer_frame) && i == 1) {
to
if (unlikely(tracer_frame)) {

With these two changes applied, the issues Jungseok mentioned will be
fixed.

Thanks,
-Takahiro AKASHI

> /*
> * Now find where in the stack these are.
> */
> - i = 0;
> + x = 0;
> start = stack;
> top = (unsigned long *)
> (((unsigned long)start & ~(THREAD_SIZE-1)) + THREAD_SIZE);
> @@ -139,12 +127,13 @@ check_stack(unsigned long ip, unsigned long *stack)
> while (i < max_stack_trace.nr_entries) {
> int found = 0;
>
> - stack_dump_index[i] = this_size;
> + stack_dump_index[x] = this_size;
> p = start;
>
> for (; p < top && i < max_stack_trace.nr_entries; p++) {
> if (*p == stack_dump_trace[i]) {
> - this_size = stack_dump_index[i++] =
> + stack_dump_trace[x] = stack_dump_trace[i++];
> + this_size = stack_dump_index[x++] =
> (top - p) * sizeof(unsigned long);
> found = 1;
> /* Start the search from here */
> @@ -168,6 +157,10 @@ check_stack(unsigned long ip, unsigned long *stack)
> i++;
> }
>
> + max_stack_trace.nr_entries = x;
> + for (; x < i; x++)
> + stack_dump_trace[x] = ULONG_MAX;
> +
> if (task_stack_end_corrupted(current)) {
> print_max_stack();
> BUG();
> @@ -192,24 +185,7 @@ stack_trace_call(unsigned long ip, unsigned long parent_ip,
> if (per_cpu(trace_active, cpu)++ != 0)
> goto out;
>
> - /*
> - * When fentry is used, the traced function does not get
> - * its stack frame set up, and we lose the parent.
> - * The ip is pretty useless because the function tracer
> - * was called before that function set up its stack frame.
> - * In this case, we use the parent ip.
> - *
> - * By adding the return address of either the parent ip
> - * or the current ip we can disregard most of the stack usage
> - * caused by the stack tracer itself.
> - *
> - * The function tracer always reports the address of where the
> - * mcount call was, but the stack will hold the return address.
> - */
> - if (fentry)
> - ip = parent_ip;
> - else
> - ip += MCOUNT_INSN_SIZE;
> + ip += MCOUNT_INSN_SIZE;
>
> check_stack(ip, &stack);
>
> @@ -284,7 +260,7 @@ __next(struct seq_file *m, loff_t *pos)
> {
> long n = *pos - 1;
>
> - if (n >= max_stack_trace.nr_entries || stack_dump_trace[n] == ULONG_MAX)
> + if (n > max_stack_trace.nr_entries || stack_dump_trace[n] == ULONG_MAX)
> return NULL;
>
> m->private = (void *)n;
> @@ -354,7 +330,7 @@ static int t_show(struct seq_file *m, void *v)
> seq_printf(m, " Depth Size Location"
> " (%d entries)\n"
> " ----- ---- --------\n",
> - max_stack_trace.nr_entries - 1);
> + max_stack_trace.nr_entries);
>
> if (!stack_tracer_enabled && !max_stack_size)
> print_disabled(m);
>

2015-07-17 03:20:50

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Fri, 17 Jul 2015 11:49:52 +0900
AKASHI Takahiro <[email protected]> wrote:

> > -
> > - if (using_ftrace_ops_list_func())
> > - max_stack_trace.skip = 4;
> > - else
> > - max_stack_trace.skip = 3;
> > + max_stack_trace.skip = 3;
>
> I don't think this last line is necessary because we will skip all
> the functions anyway below:

I put this back more as an optimization as it is already known that it
takes at least three calls to get to this point. Unless of course gcc
decides to inline them. But currently I don't see that.

>
> > save_stack_trace(&max_stack_trace);
> >
> > - /*
> > - * Add the passed in ip from the function tracer.
> > - * Searching for this on the stack will skip over
> > - * most of the overhead from the stack tracer itself.
> > - */
> > - stack_dump_trace[0] = ip;
> > - max_stack_trace.nr_entries++;
> > + /* Skip over the overhead of the stack tracer itself */
> > + for (i = 0; i < max_stack_trace.nr_entries; i++) {
> > + if (stack_dump_trace[i] == ip)
> > + break;
> > + }
>
> here. Now "i" indicates the start point, excepting tracer functions,
> and "x" will eventually represent the exact number of functions
> that we are interested in after searching the stack.
>
> To calc "stack_max_size" correctly, we should change the line:
> if (unlikely(tracer_frame) && i == 1) {
> to
> if (unlikely(tracer_frame)) {

Good catch, I'll fix that!

-- Steve

>
> With these two changes applied, the issues Jungseok mentioned will be
> fixed.
>

2015-07-17 10:46:11

by Will Deacon

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

Hi Steve,

On Thu, Jul 16, 2015 at 02:38:18AM +0100, Steven Rostedt wrote:
> On Thu, 16 Jul 2015 10:08:03 +0900
> AKASHI Takahiro <[email protected]> wrote:
>
> > On 07/16/2015 09:27 AM, AKASHI Takahiro wrote:
> > > On 07/16/2015 01:13 AM, Steven Rostedt wrote:
> > >> On Wed, 15 Jul 2015 10:55:36 -0400
> > >> Steven Rostedt <[email protected]> wrote:
> > >>
> > >>
> > >>> I'll take a look at it and try to clean up the code.
> > >>
> > >> Does the following patch make sense for you?
> > >
> > > Looks nice. The patch greatly simplifies changes on arm64 side.
> >
> > As follows:
> >
> > - Takahiro AKASHI
> >
>
> How should we go forward with this. Want me to make my change to my
> tree, and pull this patch in with it? I can set up for a 4.3 release.
>
> Then if I can get an Acked-by from one of the arm64 maintainers, I can
> have this go through my tree?

To be honest, I've completely lost track of this thread!

Once you and Akashi have settled on a way forward, I'm happy to take a look
at anything touching arch/arm64/, but it seems like you're still discussing
some aspects of the series atm.

Will

2015-07-17 12:41:21

by Mark Rutland

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

Hi,

> @@ -354,7 +330,7 @@ static int t_show(struct seq_file *m, vo
> seq_printf(m, " Depth Size Location"
> " (%d entries)\n"
> " ----- ---- --------\n",
> - max_stack_trace.nr_entries - 1);
> + max_stack_trace.nr_entries);

I believe you'll want the same fix in print_max_stack, given it's a
carbon copy.

Mark

2015-07-17 12:51:53

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Fri, 17 Jul 2015 13:40:54 +0100
Mark Rutland <[email protected]> wrote:

> Hi,
>
> > @@ -354,7 +330,7 @@ static int t_show(struct seq_file *m, vo
> > seq_printf(m, " Depth Size Location"
> > " (%d entries)\n"
> > " ----- ---- --------\n",
> > - max_stack_trace.nr_entries - 1);
> > + max_stack_trace.nr_entries);
>
> I believe you'll want the same fix in print_max_stack, given it's a
> carbon copy.

Yep, thanks for pointing that out.

-- Steve

2015-07-17 13:00:15

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

Here's my latest version of the patch. I also added a fix that made
entries off from the real number of entries. That was to stop the loop
on ULONG_MAX in stack_dump_trace[i], otherwise if for some reason
nr_entries is one off and points to ULONG_MAX, and there is a -1 in the
stack, the trace will include it in the count. The output of the stack
tests against both nr_entries and ULONG_MAX and will stop with either
case, making the dump and the count different.

-- Steve

>From 1c3697c3c4ce1f237466f76e40c91f66e2030bac Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Red Hat)" <[email protected]>
Date: Thu, 16 Jul 2015 13:24:54 -0400
Subject: [PATCH] tracing: Clean up stack tracing and fix fentry updates

Akashi Takahiro was porting the stack tracer to arm64 and found some
issues with it. One was that it repeats the top function, due to the
stack frame added by the mcount caller and added by itself. This
was added when fentry came in, and before fentry created its own stack
frame. But x86's fentry now creates its own stack frame, and there's
no need to insert the function again.

This also cleans up the code a bit, where it doesn't need to do something
special for fentry, and doesn't include insertion of a duplicate
entry for the called function being traced.

Link: http://lkml.kernel.org/r/[email protected]

Some-suggestions-by: Jungseok Lee <[email protected]>
Some-suggestions-by: Mark Rutland <[email protected]>
Reported-by: AKASHI Takahiro <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
kernel/trace/trace_stack.c | 68 ++++++++++++++++------------------------------
1 file changed, 23 insertions(+), 45 deletions(-)

diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index 3f34496244e9..b746399ab59c 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -18,12 +18,6 @@

#define STACK_TRACE_ENTRIES 500

-#ifdef CC_USING_FENTRY
-# define fentry 1
-#else
-# define fentry 0
-#endif
-
static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES+1] =
{ [0 ... (STACK_TRACE_ENTRIES)] = ULONG_MAX };
static unsigned stack_dump_index[STACK_TRACE_ENTRIES];
@@ -35,7 +29,7 @@ static unsigned stack_dump_index[STACK_TRACE_ENTRIES];
*/
static struct stack_trace max_stack_trace = {
.max_entries = STACK_TRACE_ENTRIES - 1,
- .entries = &stack_dump_trace[1],
+ .entries = &stack_dump_trace[0],
};

static unsigned long max_stack_size;
@@ -55,7 +49,7 @@ static inline void print_max_stack(void)

pr_emerg(" Depth Size Location (%d entries)\n"
" ----- ---- --------\n",
- max_stack_trace.nr_entries - 1);
+ max_stack_trace.nr_entries);

for (i = 0; i < max_stack_trace.nr_entries; i++) {
if (stack_dump_trace[i] == ULONG_MAX)
@@ -77,7 +71,7 @@ check_stack(unsigned long ip, unsigned long *stack)
unsigned long this_size, flags; unsigned long *p, *top, *start;
static int tracer_frame;
int frame_size = ACCESS_ONCE(tracer_frame);
- int i;
+ int i, x;

this_size = ((unsigned long)stack) & (THREAD_SIZE-1);
this_size = THREAD_SIZE - this_size;
@@ -105,26 +99,20 @@ check_stack(unsigned long ip, unsigned long *stack)
max_stack_size = this_size;

max_stack_trace.nr_entries = 0;
-
- if (using_ftrace_ops_list_func())
- max_stack_trace.skip = 4;
- else
- max_stack_trace.skip = 3;
+ max_stack_trace.skip = 3;

save_stack_trace(&max_stack_trace);

- /*
- * Add the passed in ip from the function tracer.
- * Searching for this on the stack will skip over
- * most of the overhead from the stack tracer itself.
- */
- stack_dump_trace[0] = ip;
- max_stack_trace.nr_entries++;
+ /* Skip over the overhead of the stack tracer itself */
+ for (i = 0; i < max_stack_trace.nr_entries; i++) {
+ if (stack_dump_trace[i] == ip)
+ break;
+ }

/*
* Now find where in the stack these are.
*/
- i = 0;
+ x = 0;
start = stack;
top = (unsigned long *)
(((unsigned long)start & ~(THREAD_SIZE-1)) + THREAD_SIZE);
@@ -139,12 +127,15 @@ check_stack(unsigned long ip, unsigned long *stack)
while (i < max_stack_trace.nr_entries) {
int found = 0;

- stack_dump_index[i] = this_size;
+ stack_dump_index[x] = this_size;
p = start;

for (; p < top && i < max_stack_trace.nr_entries; p++) {
+ if (stack_dump_trace[i] == ULONG_MAX)
+ break;
if (*p == stack_dump_trace[i]) {
- this_size = stack_dump_index[i++] =
+ stack_dump_trace[x] = stack_dump_trace[i++];
+ this_size = stack_dump_index[x++] =
(top - p) * sizeof(unsigned long);
found = 1;
/* Start the search from here */
@@ -156,7 +147,7 @@ check_stack(unsigned long ip, unsigned long *stack)
* out what that is, then figure it out
* now.
*/
- if (unlikely(!tracer_frame) && i == 1) {
+ if (unlikely(!tracer_frame)) {
tracer_frame = (p - stack) *
sizeof(unsigned long);
max_stack_size -= tracer_frame;
@@ -168,6 +159,10 @@ check_stack(unsigned long ip, unsigned long *stack)
i++;
}

+ max_stack_trace.nr_entries = x;
+ for (; x < i; x++)
+ stack_dump_trace[x] = ULONG_MAX;
+
if (task_stack_end_corrupted(current)) {
print_max_stack();
BUG();
@@ -192,24 +187,7 @@ stack_trace_call(unsigned long ip, unsigned long parent_ip,
if (per_cpu(trace_active, cpu)++ != 0)
goto out;

- /*
- * When fentry is used, the traced function does not get
- * its stack frame set up, and we lose the parent.
- * The ip is pretty useless because the function tracer
- * was called before that function set up its stack frame.
- * In this case, we use the parent ip.
- *
- * By adding the return address of either the parent ip
- * or the current ip we can disregard most of the stack usage
- * caused by the stack tracer itself.
- *
- * The function tracer always reports the address of where the
- * mcount call was, but the stack will hold the return address.
- */
- if (fentry)
- ip = parent_ip;
- else
- ip += MCOUNT_INSN_SIZE;
+ ip += MCOUNT_INSN_SIZE;

check_stack(ip, &stack);

@@ -284,7 +262,7 @@ __next(struct seq_file *m, loff_t *pos)
{
long n = *pos - 1;

- if (n >= max_stack_trace.nr_entries || stack_dump_trace[n] == ULONG_MAX)
+ if (n > max_stack_trace.nr_entries || stack_dump_trace[n] == ULONG_MAX)
return NULL;

m->private = (void *)n;
@@ -354,7 +332,7 @@ static int t_show(struct seq_file *m, void *v)
seq_printf(m, " Depth Size Location"
" (%d entries)\n"
" ----- ---- --------\n",
- max_stack_trace.nr_entries - 1);
+ max_stack_trace.nr_entries);

if (!stack_tracer_enabled && !max_stack_size)
print_disabled(m);
--
1.8.3.1

2015-07-17 14:28:16

by Jungseok Lee

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Jul 17, 2015, at 10:00 PM, Steven Rostedt wrote:

Hi Steve,

> Here's my latest version of the patch. I also added a fix that made
> entries off from the real number of entries. That was to stop the loop
> on ULONG_MAX in stack_dump_trace[i], otherwise if for some reason
> nr_entries is one off and points to ULONG_MAX, and there is a -1 in the
> stack, the trace will include it in the count. The output of the stack
> tests against both nr_entries and ULONG_MAX and will stop with either
> case, making the dump and the count different.
>
> -- Steve
>
> From 1c3697c3c4ce1f237466f76e40c91f66e2030bac Mon Sep 17 00:00:00 2001
> From: "Steven Rostedt (Red Hat)" <[email protected]>
> Date: Thu, 16 Jul 2015 13:24:54 -0400
> Subject: [PATCH] tracing: Clean up stack tracing and fix fentry updates
>
> Akashi Takahiro was porting the stack tracer to arm64 and found some
> issues with it. One was that it repeats the top function, due to the
> stack frame added by the mcount caller and added by itself. This
> was added when fentry came in, and before fentry created its own stack
> frame. But x86's fentry now creates its own stack frame, and there's
> no need to insert the function again.
>
> This also cleans up the code a bit, where it doesn't need to do something
> special for fentry, and doesn't include insertion of a duplicate
> entry for the called function being traced.
>
> Link: http://lkml.kernel.org/r/[email protected]
>
> Some-suggestions-by: Jungseok Lee <[email protected]>
> Some-suggestions-by: Mark Rutland <[email protected]>
> Reported-by: AKASHI Takahiro <[email protected]>
> Signed-off-by: Steven Rostedt <[email protected]>
> ---
> kernel/trace/trace_stack.c | 68 ++++++++++++++++------------------------------
> 1 file changed, 23 insertions(+), 45 deletions(-)
>
> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
> index 3f34496244e9..b746399ab59c 100644
> --- a/kernel/trace/trace_stack.c
> +++ b/kernel/trace/trace_stack.c
> @@ -18,12 +18,6 @@
>
> #define STACK_TRACE_ENTRIES 500
>
> -#ifdef CC_USING_FENTRY
> -# define fentry 1
> -#else
> -# define fentry 0
> -#endif
> -
> static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES+1] =
> { [0 ... (STACK_TRACE_ENTRIES)] = ULONG_MAX };
> static unsigned stack_dump_index[STACK_TRACE_ENTRIES];
> @@ -35,7 +29,7 @@ static unsigned stack_dump_index[STACK_TRACE_ENTRIES];
> */
> static struct stack_trace max_stack_trace = {
> .max_entries = STACK_TRACE_ENTRIES - 1,
> - .entries = &stack_dump_trace[1],
> + .entries = &stack_dump_trace[0],
> };
>
> static unsigned long max_stack_size;
> @@ -55,7 +49,7 @@ static inline void print_max_stack(void)
>
> pr_emerg(" Depth Size Location (%d entries)\n"
> " ----- ---- --------\n",
> - max_stack_trace.nr_entries - 1);
> + max_stack_trace.nr_entries);
>
> for (i = 0; i < max_stack_trace.nr_entries; i++) {
> if (stack_dump_trace[i] == ULONG_MAX)
> @@ -77,7 +71,7 @@ check_stack(unsigned long ip, unsigned long *stack)
> unsigned long this_size, flags; unsigned long *p, *top, *start;
> static int tracer_frame;
> int frame_size = ACCESS_ONCE(tracer_frame);
> - int i;
> + int i, x;
>
> this_size = ((unsigned long)stack) & (THREAD_SIZE-1);
> this_size = THREAD_SIZE - this_size;
> @@ -105,26 +99,20 @@ check_stack(unsigned long ip, unsigned long *stack)
> max_stack_size = this_size;
>
> max_stack_trace.nr_entries = 0;
> -
> - if (using_ftrace_ops_list_func())
> - max_stack_trace.skip = 4;
> - else
> - max_stack_trace.skip = 3;
> + max_stack_trace.skip = 3;
>
> save_stack_trace(&max_stack_trace);
>
> - /*
> - * Add the passed in ip from the function tracer.
> - * Searching for this on the stack will skip over
> - * most of the overhead from the stack tracer itself.
> - */
> - stack_dump_trace[0] = ip;
> - max_stack_trace.nr_entries++;
> + /* Skip over the overhead of the stack tracer itself */
> + for (i = 0; i < max_stack_trace.nr_entries; i++) {
> + if (stack_dump_trace[i] == ip)
> + break;
> + }
>
> /*
> * Now find where in the stack these are.
> */
> - i = 0;
> + x = 0;
> start = stack;
> top = (unsigned long *)
> (((unsigned long)start & ~(THREAD_SIZE-1)) + THREAD_SIZE);
> @@ -139,12 +127,15 @@ check_stack(unsigned long ip, unsigned long *stack)
> while (i < max_stack_trace.nr_entries) {
> int found = 0;
>
> - stack_dump_index[i] = this_size;
> + stack_dump_index[x] = this_size;
> p = start;
>
> for (; p < top && i < max_stack_trace.nr_entries; p++) {
> + if (stack_dump_trace[i] == ULONG_MAX)
> + break;
> if (*p == stack_dump_trace[i]) {
> - this_size = stack_dump_index[i++] =
> + stack_dump_trace[x] = stack_dump_trace[i++];
> + this_size = stack_dump_index[x++] =
> (top - p) * sizeof(unsigned long);
> found = 1;
> /* Start the search from here */
> @@ -156,7 +147,7 @@ check_stack(unsigned long ip, unsigned long *stack)
> * out what that is, then figure it out
> * now.
> */
> - if (unlikely(!tracer_frame) && i == 1) {
> + if (unlikely(!tracer_frame)) {
> tracer_frame = (p - stack) *
> sizeof(unsigned long);
> max_stack_size -= tracer_frame;
> @@ -168,6 +159,10 @@ check_stack(unsigned long ip, unsigned long *stack)
> i++;
> }
>
> + max_stack_trace.nr_entries = x;
> + for (; x < i; x++)
> + stack_dump_trace[x] = ULONG_MAX;
> +
> if (task_stack_end_corrupted(current)) {
> print_max_stack();
> BUG();
> @@ -192,24 +187,7 @@ stack_trace_call(unsigned long ip, unsigned long parent_ip,
> if (per_cpu(trace_active, cpu)++ != 0)
> goto out;
>
> - /*
> - * When fentry is used, the traced function does not get
> - * its stack frame set up, and we lose the parent.
> - * The ip is pretty useless because the function tracer
> - * was called before that function set up its stack frame.
> - * In this case, we use the parent ip.
> - *
> - * By adding the return address of either the parent ip
> - * or the current ip we can disregard most of the stack usage
> - * caused by the stack tracer itself.
> - *
> - * The function tracer always reports the address of where the
> - * mcount call was, but the stack will hold the return address.
> - */
> - if (fentry)
> - ip = parent_ip;
> - else
> - ip += MCOUNT_INSN_SIZE;
> + ip += MCOUNT_INSN_SIZE;
>
> check_stack(ip, &stack);
>
> @@ -284,7 +262,7 @@ __next(struct seq_file *m, loff_t *pos)
> {
> long n = *pos - 1;
>
> - if (n >= max_stack_trace.nr_entries || stack_dump_trace[n] == ULONG_MAX)
> + if (n > max_stack_trace.nr_entries || stack_dump_trace[n] == ULONG_MAX)
> return NULL;
>
> m->private = (void *)n;
> @@ -354,7 +332,7 @@ static int t_show(struct seq_file *m, void *v)
> seq_printf(m, " Depth Size Location"
> " (%d entries)\n"
> " ----- ---- --------\n",
> - max_stack_trace.nr_entries - 1);
> + max_stack_trace.nr_entries);
>
> if (!stack_tracer_enabled && !max_stack_size)
> print_disabled(m);
> --


I have reviewed and tested the kernel including this patch and only [RFC 1/3].

Now, the number of entries and max_stack_size are always okay, but unexpected functions,
such as ftrace_ops_no_ops and ftrace_call, are *sometimes* listed as follows.

$ cat /sys/kernel/debug/tracing/stack_trace

Depth Size Location (49 entries)
----- ---- --------
0) 4456 16 arch_counter_read+0xc/0x24
1) 4440 16 ktime_get+0x44/0xb4
2) 4424 48 get_drm_timestamp+0x30/0x40
3) 4376 16 drm_get_last_vbltimestamp+0x94/0xb4
4) 4360 80 drm_handle_vblank+0x84/0x3c0
5) 4280 144 mdp5_irq+0x118/0x130
6) 4136 80 msm_irq+0x2c/0x68
7) 4056 32 handle_irq_event_percpu+0x60/0x210
8) 4024 96 handle_irq_event+0x50/0x80
9) 3928 64 handle_fasteoi_irq+0xb0/0x178
10) 3864 48 generic_handle_irq+0x38/0x54
11) 3816 32 __handle_domain_irq+0x68/0xbc
12) 3784 64 gic_handle_irq+0x38/0x88
13) 3720 280 el1_irq+0x64/0xd8
14) 3440 168 ftrace_ops_no_ops+0xb4/0x16c
15) 3272 64 ftrace_call+0x0/0x4
16) 3208 16 _raw_spin_lock_irqsave+0x14/0x70
17) 3192 32 msm_gpio_set+0x44/0xb4
18) 3160 48 _gpiod_set_raw_value+0x68/0x148
19) 3112 64 gpiod_set_value+0x40/0x70
20) 3048 32 gpio_led_set+0x3c/0x94
21) 3016 48 led_set_brightness+0x50/0xa4
22) 2968 32 led_trigger_event+0x4c/0x78
23) 2936 48 mmc_request_done+0x38/0x84
24) 2888 32 sdhci_tasklet_finish+0xcc/0x12c
25) 2856 48 tasklet_action+0x64/0x120
26) 2808 48 __do_softirq+0x114/0x2f0
27) 2760 128 irq_exit+0x98/0xd8
28) 2632 32 __handle_domain_irq+0x6c/0xbc
29) 2600 64 gic_handle_irq+0x38/0x88
30) 2536 280 el1_irq+0x64/0xd8
31) 2256 168 ftrace_ops_no_ops+0xb4/0x16c
32) 2088 64 ftrace_call+0x0/0x4
33) 2024 16 __schedule+0x1c/0x748
34) 2008 80 schedule+0x38/0x94
35) 1928 32 schedule_timeout+0x1a8/0x200
36) 1896 128 wait_for_common+0xa8/0x150
37) 1768 128 wait_for_completion+0x24/0x34
38) 1640 32 mmc_wait_for_req_done+0x3c/0x104
39) 1608 64 mmc_wait_for_cmd+0x68/0x94
40) 1544 128 get_card_status.isra.25+0x6c/0x88
41) 1416 112 card_busy_detect.isra.31+0x7c/0x154
42) 1304 128 mmc_blk_err_check+0xd0/0x4f8
43) 1176 192 mmc_start_req+0xe4/0x3a8
44) 984 160 mmc_blk_issue_rw_rq+0xc4/0x9c0
45) 824 176 mmc_blk_issue_rq+0x19c/0x450
46) 648 112 mmc_queue_thread+0x134/0x17c
47) 536 80 kthread+0xe0/0xf8
48) 456 456 ret_from_fork+0xc/0x50

$ cat /sys/kernel/debug/tracing/stack_max_size
4456

This issue might be related to arch code, not this change.
IMHO, this patch look settled now.

Best Regards
Jungseok Lee-

2015-07-17 14:38:31

by Jungseok Lee

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Jul 17, 2015, at 11:04 AM, AKASHI Takahiro wrote:
> Jungseok,
>
> Thank you for your testing and reviews.

You're welcome.

> On 07/16/2015 10:29 PM, Jungseok Lee wrote:
>> On Jul 16, 2015, at 10:08 AM, AKASHI Takahiro wrote:
>>
>> Hi, AKASHI
>>
>>> On 07/16/2015 09:27 AM, AKASHI Takahiro wrote:
>>>> On 07/16/2015 01:13 AM, Steven Rostedt wrote:
>>>>> On Wed, 15 Jul 2015 10:55:36 -0400
>>>>> Steven Rostedt <[email protected]> wrote:
>>>>>
>>>>>
>>>>>> I'll take a look at it and try to clean up the code.
>>>>>
>>>>> Does the following patch make sense for you?
>>>>
>>>> Looks nice. The patch greatly simplifies changes on arm64 side.
>>>
>>> As follows:
>>>
>>> - Takahiro AKASHI
>>>
>>> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
>>> index c5534fa..868d6f1 100644
>>> --- a/arch/arm64/include/asm/ftrace.h
>>> +++ b/arch/arm64/include/asm/ftrace.h
>>> @@ -15,6 +15,7 @@
>>>
>>> #define MCOUNT_ADDR ((unsigned long)_mcount)
>>> #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
>>> +#define FTRACE_STACK_FRAME_OFFSET 4 /* sync it up with stacktrace.c */
>
> Well,
> #define FTRACE_STACK_FRAME_OFFSET AARCH64_INSN_SIZE
> might be better.

Agree.

>
>> How about binding it to -4 in unwind_frame function?
>
> Do you mean like this?
> In unwind_frame(),
> frame->pc = *(unsigned long*)(fp + 8) - AARCH64_INSN_SIZE;

Exactly.

>
>> IMHO, it would help other developers trying to change stack trace code
>> be aware of this stack tracer feature.
>>
>>> #ifndef __ASSEMBLY__
>>> #include <linux/compat.h>
>>> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
>>> index 1da6029..2c1bf7d 100644
>>> --- a/include/linux/ftrace.h
>>> +++ b/include/linux/ftrace.h
>>> @@ -260,6 +260,13 @@ static inline void ftrace_kill(void) { }
>>> #endif /* CONFIG_FUNCTION_TRACER */
>>>
>>> #ifdef CONFIG_STACK_TRACER
>>> +/*
>>> + * the offset value to add to return address from save_stack_trace()
>>> + */
>>> +#ifndef FTRACE_STACK_FRAME_OFFSET
>>> +#define FTRACE_STACK_FRAME_OFFSET 0
>>> +#endif
>>> +
>>> extern int stack_tracer_enabled;
>>> int
>>> stack_trace_sysctl(struct ctl_table *table, int write,
>>> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
>>> index 9384647..c5b9748 100644
>>> --- a/kernel/trace/trace_stack.c
>>> +++ b/kernel/trace/trace_stack.c
>>> @@ -105,7 +105,7 @@ check_stack(unsigned long ip, unsigned long *stack)
>>>
>>> /* Skip over the overhead of the stack tracer itself */
>>> for (i = 0; i < max_stack_trace.nr_entries; i++) {
>>> - if (stack_dump_trace[i] == ip)
>>> + if ((stack_dump_trace[i] + FTRACE_STACK_FRAME_OFFSET) == ip)
>>> break;
>>> }
>>>
>>> @@ -131,7 +131,8 @@ check_stack(unsigned long ip, unsigned long *stack)
>>> p = start;
>>>
>>> for (; p < top && i < max_stack_trace.nr_entries; p++) {
>>> - if (*p == stack_dump_trace[i]) {
>>> + if (*p == (stack_dump_trace[i]
>>> + + FTRACE_STACK_FRAME_OFFSET)) {
>>> stack_dump_trace[x] = stack_dump_trace[i++];
>>> this_size = stack_dump_index[x++] =
>>> (top - p) * sizeof(unsigned long);
>>> --
>>
>> I've prepared a kernel with the following patches and reviewed them.
>>
>> 1) Steve's clean up patch
>> 2) This patch
>> 3) [RFC 2/3]
>>
>> AFAIU, [RFC 3/3] is not needed any more thanks to Steve's patch.
>
> We don't need [2/3].

Okay. So, I've played a kernel including Steve's latest patch and only [1/3].

Best Regards
Jungseok Lee-

2015-07-17 14:41:49

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Fri, 17 Jul 2015 23:28:13 +0900
Jungseok Lee <[email protected]> wrote:


>
> I have reviewed and tested the kernel including this patch and only [RFC 1/3].

Thanks! Can you repost patch 1 with the changes I recommended, so that
I can get an Acked-by from the arm64 maintainers and pull all the
changes in together. This is fine for a 4.3 release, right? That is, it
doesn't need to go into 4.2-rcs.

>
> Now, the number of entries and max_stack_size are always okay, but unexpected functions,
> such as ftrace_ops_no_ops and ftrace_call, are *sometimes* listed as follows.
>
> $ cat /sys/kernel/debug/tracing/stack_trace
>
> Depth Size Location (49 entries)
> ----- ---- --------
> 0) 4456 16 arch_counter_read+0xc/0x24
> 1) 4440 16 ktime_get+0x44/0xb4
> 2) 4424 48 get_drm_timestamp+0x30/0x40
> 3) 4376 16 drm_get_last_vbltimestamp+0x94/0xb4
> 4) 4360 80 drm_handle_vblank+0x84/0x3c0
> 5) 4280 144 mdp5_irq+0x118/0x130
> 6) 4136 80 msm_irq+0x2c/0x68
> 7) 4056 32 handle_irq_event_percpu+0x60/0x210
> 8) 4024 96 handle_irq_event+0x50/0x80
> 9) 3928 64 handle_fasteoi_irq+0xb0/0x178
> 10) 3864 48 generic_handle_irq+0x38/0x54
> 11) 3816 32 __handle_domain_irq+0x68/0xbc
> 12) 3784 64 gic_handle_irq+0x38/0x88
> 13) 3720 280 el1_irq+0x64/0xd8
> 14) 3440 168 ftrace_ops_no_ops+0xb4/0x16c
> 15) 3272 64 ftrace_call+0x0/0x4
> 16) 3208 16 _raw_spin_lock_irqsave+0x14/0x70
> 17) 3192 32 msm_gpio_set+0x44/0xb4
> 18) 3160 48 _gpiod_set_raw_value+0x68/0x148
> 19) 3112 64 gpiod_set_value+0x40/0x70
> 20) 3048 32 gpio_led_set+0x3c/0x94
> 21) 3016 48 led_set_brightness+0x50/0xa4
> 22) 2968 32 led_trigger_event+0x4c/0x78
> 23) 2936 48 mmc_request_done+0x38/0x84
> 24) 2888 32 sdhci_tasklet_finish+0xcc/0x12c
> 25) 2856 48 tasklet_action+0x64/0x120
> 26) 2808 48 __do_softirq+0x114/0x2f0
> 27) 2760 128 irq_exit+0x98/0xd8
> 28) 2632 32 __handle_domain_irq+0x6c/0xbc
> 29) 2600 64 gic_handle_irq+0x38/0x88
> 30) 2536 280 el1_irq+0x64/0xd8
> 31) 2256 168 ftrace_ops_no_ops+0xb4/0x16c
> 32) 2088 64 ftrace_call+0x0/0x4

Like I stated before, the above looks to be an interrupt coming in
while the tracing was happening. This looks legitimate to me. I'm
guessing that arm64 uses one stack for normal context and interrupt
context, where as x86 uses a separate stack for interrupt context.

-- Steve


> 33) 2024 16 __schedule+0x1c/0x748
> 34) 2008 80 schedule+0x38/0x94
> 35) 1928 32 schedule_timeout+0x1a8/0x200
> 36) 1896 128 wait_for_common+0xa8/0x150
> 37) 1768 128 wait_for_completion+0x24/0x34
> 38) 1640 32 mmc_wait_for_req_done+0x3c/0x104
> 39) 1608 64 mmc_wait_for_cmd+0x68/0x94
> 40) 1544 128 get_card_status.isra.25+0x6c/0x88
> 41) 1416 112 card_busy_detect.isra.31+0x7c/0x154
> 42) 1304 128 mmc_blk_err_check+0xd0/0x4f8
> 43) 1176 192 mmc_start_req+0xe4/0x3a8
> 44) 984 160 mmc_blk_issue_rw_rq+0xc4/0x9c0
> 45) 824 176 mmc_blk_issue_rq+0x19c/0x450
> 46) 648 112 mmc_queue_thread+0x134/0x17c
> 47) 536 80 kthread+0xe0/0xf8
> 48) 456 456 ret_from_fork+0xc/0x50
>
> $ cat /sys/kernel/debug/tracing/stack_max_size
> 4456
>
> This issue might be related to arch code, not this change.
> IMHO, this patch look settled now.
>
> Best Regards
> Jungseok Lee

2015-07-17 14:59:15

by Jungseok Lee

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Jul 17, 2015, at 11:41 PM, Steven Rostedt wrote:
> On Fri, 17 Jul 2015 23:28:13 +0900
> Jungseok Lee <[email protected]> wrote:
>
>
>>
>> I have reviewed and tested the kernel including this patch and only [RFC 1/3].
>
> Thanks! Can you repost patch 1 with the changes I recommended, so that
> I can get an Acked-by from the arm64 maintainers and pull all the
> changes in together. This is fine for a 4.3 release, right? That is, it
> doesn't need to go into 4.2-rcs.
>

Welcome!

It's not hard to repost a patch, but I feel like we have to wait for Akashi's response.
Also, it might be needed to consider Mark's comment on arch part.

If they are okay, I will proceed.

>>
>> Now, the number of entries and max_stack_size are always okay, but unexpected functions,
>> such as ftrace_ops_no_ops and ftrace_call, are *sometimes* listed as follows.
>>
>> $ cat /sys/kernel/debug/tracing/stack_trace
>>
>> Depth Size Location (49 entries)
>> ----- ---- --------
>> 0) 4456 16 arch_counter_read+0xc/0x24
>> 1) 4440 16 ktime_get+0x44/0xb4
>> 2) 4424 48 get_drm_timestamp+0x30/0x40
>> 3) 4376 16 drm_get_last_vbltimestamp+0x94/0xb4
>> 4) 4360 80 drm_handle_vblank+0x84/0x3c0
>> 5) 4280 144 mdp5_irq+0x118/0x130
>> 6) 4136 80 msm_irq+0x2c/0x68
>> 7) 4056 32 handle_irq_event_percpu+0x60/0x210
>> 8) 4024 96 handle_irq_event+0x50/0x80
>> 9) 3928 64 handle_fasteoi_irq+0xb0/0x178
>> 10) 3864 48 generic_handle_irq+0x38/0x54
>> 11) 3816 32 __handle_domain_irq+0x68/0xbc
>> 12) 3784 64 gic_handle_irq+0x38/0x88
>> 13) 3720 280 el1_irq+0x64/0xd8
>> 14) 3440 168 ftrace_ops_no_ops+0xb4/0x16c
>> 15) 3272 64 ftrace_call+0x0/0x4
>> 16) 3208 16 _raw_spin_lock_irqsave+0x14/0x70
>> 17) 3192 32 msm_gpio_set+0x44/0xb4
>> 18) 3160 48 _gpiod_set_raw_value+0x68/0x148
>> 19) 3112 64 gpiod_set_value+0x40/0x70
>> 20) 3048 32 gpio_led_set+0x3c/0x94
>> 21) 3016 48 led_set_brightness+0x50/0xa4
>> 22) 2968 32 led_trigger_event+0x4c/0x78
>> 23) 2936 48 mmc_request_done+0x38/0x84
>> 24) 2888 32 sdhci_tasklet_finish+0xcc/0x12c
>> 25) 2856 48 tasklet_action+0x64/0x120
>> 26) 2808 48 __do_softirq+0x114/0x2f0
>> 27) 2760 128 irq_exit+0x98/0xd8
>> 28) 2632 32 __handle_domain_irq+0x6c/0xbc
>> 29) 2600 64 gic_handle_irq+0x38/0x88
>> 30) 2536 280 el1_irq+0x64/0xd8
>> 31) 2256 168 ftrace_ops_no_ops+0xb4/0x16c
>> 32) 2088 64 ftrace_call+0x0/0x4
>
> Like I stated before, the above looks to be an interrupt coming in
> while the tracing was happening. This looks legitimate to me. I'm
> guessing that arm64 uses one stack for normal context and interrupt
> context, where as x86 uses a separate stack for interrupt context.

Okay. Thanks for a kind explanation.

Best Regards
Jungseok Lee-

2015-07-17 15:34:43

by Jungseok Lee

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Jul 17, 2015, at 11:59 PM, Jungseok Lee wrote:
> On Jul 17, 2015, at 11:41 PM, Steven Rostedt wrote:
>> On Fri, 17 Jul 2015 23:28:13 +0900
>> Jungseok Lee <[email protected]> wrote:
>>
>>
>>>
>>> I have reviewed and tested the kernel including this patch and only [RFC 1/3].
>>
>> Thanks! Can you repost patch 1 with the changes I recommended, so that
>> I can get an Acked-by from the arm64 maintainers and pull all the
>> changes in together. This is fine for a 4.3 release, right? That is, it
>> doesn't need to go into 4.2-rcs.
>>
>
> Welcome!
>
> It's not hard to repost a patch, but I feel like we have to wait for Akashi's response.
> Also, it might be needed to consider Mark's comment on arch part.
>
> If they are okay, I will proceed.

Akashi and Mark,

The [RFC 1/3] patch used in my environment is shaped as follows.
I leave the hunk for *only* clear synchronization. This is why I choose this format
instead of reposting a patch. I hope it would help to track down this thread.

Steve,

I think this is my best at this point.

Best Regards
Jungseok Lee

----8<----
diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index c5534fa..2b43e20 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -13,8 +13,9 @@

#include <asm/insn.h>

-#define MCOUNT_ADDR ((unsigned long)_mcount)
-#define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
+#define MCOUNT_ADDR ((unsigned long)_mcount)
+#define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
+#define FTRACE_STACK_FRAME_OFFSET AARCH64_INSN_SIZE

#ifndef __ASSEMBLY__
#include <linux/compat.h>
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 407991b..9ab67af 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -20,6 +20,7 @@
#include <linux/sched.h>
#include <linux/stacktrace.h>

+#include <asm/insn.h>
#include <asm/stacktrace.h>

/*
@@ -52,7 +53,7 @@ int notrace unwind_frame(struct stackframe *frame)
* -4 here because we care about the PC at time of bl,
* not where the return will go.
*/
- frame->pc = *(unsigned long *)(fp + 8) - 4;
+ frame->pc = *(unsigned long *)(fp + 8) - AARCH64_INSN_SIZE;

return 0;
}
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 1da6029..6566201 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -260,6 +260,9 @@ static inline void ftrace_kill(void) { }
#endif /* CONFIG_FUNCTION_TRACER */

#ifdef CONFIG_STACK_TRACER
+#ifndef FTRACE_STACK_FRAME_OFFSET
+#define FTRACE_STACK_FRAME_OFFSET 0
+#endif
extern int stack_tracer_enabled;
int
stack_trace_sysctl(struct ctl_table *table, int write,
diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index b746399..30521ea 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -105,7 +105,7 @@ check_stack(unsigned long ip, unsigned long *stack)

/* Skip over the overhead of the stack tracer itself */
for (i = 0; i < max_stack_trace.nr_entries; i++) {
- if (stack_dump_trace[i] == ip)
+ if ((stack_dump_trace[i] + FTRACE_STACK_FRAME_OFFSET) == ip)
break;
}

@@ -133,7 +133,8 @@ check_stack(unsigned long ip, unsigned long *stack)
for (; p < top && i < max_stack_trace.nr_entries; p++) {
if (stack_dump_trace[i] == ULONG_MAX)
break;
- if (*p == stack_dump_trace[i]) {
+ if (*p == (stack_dump_trace[i]
+ + FTRACE_STACK_FRAME_OFFSET)) {
stack_dump_trace[x] = stack_dump_trace[i++];
this_size = stack_dump_index[x++] =
(top - p) * sizeof(unsigned long);
----8<-----

2015-07-17 16:01:16

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Sat, 18 Jul 2015 00:34:21 +0900
Jungseok Lee <[email protected]> wrote:


>
> The [RFC 1/3] patch used in my environment is shaped as follows.
> I leave the hunk for *only* clear synchronization. This is why I choose this format
> instead of reposting a patch. I hope it would help to track down this thread.

You could just add:

Link: http://lkml.kernel.org/r/[email protected]

to your change log, and that will point to this thread. That's what the
Link tag was made for.

>
> Steve,
>
> I think this is my best at this point.
>

OK, I'll just wait for the others to give their ack.

-- Steve

2015-07-20 16:20:08

by Will Deacon

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

Hi all,

On Fri, Jul 17, 2015 at 04:34:21PM +0100, Jungseok Lee wrote:
> On Jul 17, 2015, at 11:59 PM, Jungseok Lee wrote:
> > On Jul 17, 2015, at 11:41 PM, Steven Rostedt wrote:
> >> Thanks! Can you repost patch 1 with the changes I recommended, so that
> >> I can get an Acked-by from the arm64 maintainers and pull all the
> >> changes in together. This is fine for a 4.3 release, right? That is, it
> >> doesn't need to go into 4.2-rcs.
> >>
> >
> > It's not hard to repost a patch, but I feel like we have to wait for Akashi's response.
> > Also, it might be needed to consider Mark's comment on arch part.
> >
> > If they are okay, I will proceed.
>
> The [RFC 1/3] patch used in my environment is shaped as follows.
> I leave the hunk for *only* clear synchronization. This is why I choose this format
> instead of reposting a patch. I hope it would help to track down this thread.
>
> I think this is my best at this point.
>
> ----8<----
> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
> index c5534fa..2b43e20 100644
> --- a/arch/arm64/include/asm/ftrace.h
> +++ b/arch/arm64/include/asm/ftrace.h
> @@ -13,8 +13,9 @@
>
> #include <asm/insn.h>
>
> -#define MCOUNT_ADDR ((unsigned long)_mcount)
> -#define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
> +#define MCOUNT_ADDR ((unsigned long)_mcount)
> +#define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
> +#define FTRACE_STACK_FRAME_OFFSET AARCH64_INSN_SIZE
>
> #ifndef __ASSEMBLY__
> #include <linux/compat.h>
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index 407991b..9ab67af 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -20,6 +20,7 @@
> #include <linux/sched.h>
> #include <linux/stacktrace.h>
>
> +#include <asm/insn.h>
> #include <asm/stacktrace.h>
>
> /*
> @@ -52,7 +53,7 @@ int notrace unwind_frame(struct stackframe *frame)
> * -4 here because we care about the PC at time of bl,
> * not where the return will go.
> */
> - frame->pc = *(unsigned long *)(fp + 8) - 4;
> + frame->pc = *(unsigned long *)(fp + 8) - AARCH64_INSN_SIZE;
>
> return 0;
> }

The arm64 bits look fine to me:

Acked-by: Will Deacon <[email protected]>

Steve: feel free to take this along with the other ftrace changes. I don't
anticipate any conflicts, but if anything crops up in -next we can sort
it out then.

Thanks!

Will

2015-07-20 23:53:44

by AKASHI Takahiro

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

Hi

So i don't have to repost my patch here. Please use the original
commit log message[1/3] as is.
But please keep in mind that there is still an issue that I mentioned
in the cover letter; 'Size' field is inaccurate.
<reported size> = <its own dynamic local variables>
+ <child's local variables>
and
<real size> = <reported size> + <its local variables>
- <child's local variables>
where "dynamic" means, for example, a variable allocated like the below:
int foo(int num) {
int array[num];
...
}
(See more details in my ascii art.)

Such usage is seldom seen in the kernel, and <reported size> is
likely equal to <child's local variables>. In other words, we will
see one-line *displacement* in most cases.

(We'd better mention it explicitly in the commmit?)

Thanks,
-Takahiro AKASHI


On 07/21/2015 01:20 AM, Will Deacon wrote:
> Hi all,
>
> On Fri, Jul 17, 2015 at 04:34:21PM +0100, Jungseok Lee wrote:
>> On Jul 17, 2015, at 11:59 PM, Jungseok Lee wrote:
>>> On Jul 17, 2015, at 11:41 PM, Steven Rostedt wrote:
>>>> Thanks! Can you repost patch 1 with the changes I recommended, so that
>>>> I can get an Acked-by from the arm64 maintainers and pull all the
>>>> changes in together. This is fine for a 4.3 release, right? That is, it
>>>> doesn't need to go into 4.2-rcs.
>>>>
>>>
>>> It's not hard to repost a patch, but I feel like we have to wait for Akashi's response.
>>> Also, it might be needed to consider Mark's comment on arch part.
>>>
>>> If they are okay, I will proceed.
>>
>> The [RFC 1/3] patch used in my environment is shaped as follows.
>> I leave the hunk for *only* clear synchronization. This is why I choose this format
>> instead of reposting a patch. I hope it would help to track down this thread.
>>
>> I think this is my best at this point.
>>
>> ----8<----
>> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
>> index c5534fa..2b43e20 100644
>> --- a/arch/arm64/include/asm/ftrace.h
>> +++ b/arch/arm64/include/asm/ftrace.h
>> @@ -13,8 +13,9 @@
>>
>> #include <asm/insn.h>
>>
>> -#define MCOUNT_ADDR ((unsigned long)_mcount)
>> -#define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
>> +#define MCOUNT_ADDR ((unsigned long)_mcount)
>> +#define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
>> +#define FTRACE_STACK_FRAME_OFFSET AARCH64_INSN_SIZE
>>
>> #ifndef __ASSEMBLY__
>> #include <linux/compat.h>
>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
>> index 407991b..9ab67af 100644
>> --- a/arch/arm64/kernel/stacktrace.c
>> +++ b/arch/arm64/kernel/stacktrace.c
>> @@ -20,6 +20,7 @@
>> #include <linux/sched.h>
>> #include <linux/stacktrace.h>
>>
>> +#include <asm/insn.h>
>> #include <asm/stacktrace.h>
>>
>> /*
>> @@ -52,7 +53,7 @@ int notrace unwind_frame(struct stackframe *frame)
>> * -4 here because we care about the PC at time of bl,
>> * not where the return will go.
>> */
>> - frame->pc = *(unsigned long *)(fp + 8) - 4;
>> + frame->pc = *(unsigned long *)(fp + 8) - AARCH64_INSN_SIZE;
>>
>> return 0;
>> }
>
> The arm64 bits look fine to me:
>
> Acked-by: Will Deacon <[email protected]>
>
> Steve: feel free to take this along with the other ftrace changes. I don't
> anticipate any conflicts, but if anything crops up in -next we can sort
> it out then.
>
> Thanks!
>
> Will
>

2015-07-21 10:26:49

by AKASHI Takahiro

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On 07/21/2015 08:53 AM, AKASHI Takahiro wrote:
> Hi
>
> So i don't have to repost my patch here. Please use the original
> commit log message[1/3] as is.
> But please keep in mind that there is still an issue that I mentioned
> in the cover letter; 'Size' field is inaccurate.
> <reported size> = <its own dynamic local variables>
> + <child's local variables>
> and
> <real size> = <reported size> + <its local variables>
> - <child's local variables>
> where "dynamic" means, for example, a variable allocated like the below:
> int foo(int num) {
> int array[num];
> ...
> }
> (See more details in my ascii art.)
>
> Such usage is seldom seen in the kernel, and <reported size> is
> likely equal to <child's local variables>. In other words, we will
> see one-line *displacement* in most cases.

Well, I have a quick fix now, but it looks ugly.

In addition, I found another issue; With function_graph tracer,
the output is like:
# cat /sys/kernel/tracing/stack_trace
Depth Size Location (78 entries)
----- ---- --------
0) 6184 32 update_min_vruntime+0x14/0x74
1) 6152 48 update_curr+0x6c/0x150
2) 6104 128 enqueue_task_fair+0x2f4/0xb9c
3) 5976 48 enqueue_task+0x48/0x90
...
18) 5160 112 hrtimer_interrupt+0xa0/0x214
19) 5048 32 arch_timer_handler_phys+0x38/0x48
20) 5016 0 ftrace_graph_caller+0x2c/0x30
21) 5016 64 ftrace_graph_caller+0x2c/0x30
22) 4952 32 ftrace_graph_caller+0x2c/0x30
23) 4920 64 ftrace_graph_caller+0x2c/0x30
...

Since, with function_graph tracer, we modify LR register in a stack frame
when we enter into a function, we have to manage such special cases
in save_stack_trace() or check_stack() as x86 does in
print_ftrace_graph_addr().

Thanks,
-Takahiro AKASHI


> (We'd better mention it explicitly in the commmit?)
>
> Thanks,
> -Takahiro AKASHI
>
>
> On 07/21/2015 01:20 AM, Will Deacon wrote:
>> Hi all,
>>
>> On Fri, Jul 17, 2015 at 04:34:21PM +0100, Jungseok Lee wrote:
>>> On Jul 17, 2015, at 11:59 PM, Jungseok Lee wrote:
>>>> On Jul 17, 2015, at 11:41 PM, Steven Rostedt wrote:
>>>>> Thanks! Can you repost patch 1 with the changes I recommended, so that
>>>>> I can get an Acked-by from the arm64 maintainers and pull all the
>>>>> changes in together. This is fine for a 4.3 release, right? That is, it
>>>>> doesn't need to go into 4.2-rcs.
>>>>>
>>>>
>>>> It's not hard to repost a patch, but I feel like we have to wait for Akashi's response.
>>>> Also, it might be needed to consider Mark's comment on arch part.
>>>>
>>>> If they are okay, I will proceed.
>>>
>>> The [RFC 1/3] patch used in my environment is shaped as follows.
>>> I leave the hunk for *only* clear synchronization. This is why I choose this format
>>> instead of reposting a patch. I hope it would help to track down this thread.
>>>
>>> I think this is my best at this point.
>>>
>>> ----8<----
>>> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
>>> index c5534fa..2b43e20 100644
>>> --- a/arch/arm64/include/asm/ftrace.h
>>> +++ b/arch/arm64/include/asm/ftrace.h
>>> @@ -13,8 +13,9 @@
>>>
>>> #include <asm/insn.h>
>>>
>>> -#define MCOUNT_ADDR ((unsigned long)_mcount)
>>> -#define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
>>> +#define MCOUNT_ADDR ((unsigned long)_mcount)
>>> +#define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
>>> +#define FTRACE_STACK_FRAME_OFFSET AARCH64_INSN_SIZE
>>>
>>> #ifndef __ASSEMBLY__
>>> #include <linux/compat.h>
>>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
>>> index 407991b..9ab67af 100644
>>> --- a/arch/arm64/kernel/stacktrace.c
>>> +++ b/arch/arm64/kernel/stacktrace.c
>>> @@ -20,6 +20,7 @@
>>> #include <linux/sched.h>
>>> #include <linux/stacktrace.h>
>>>
>>> +#include <asm/insn.h>
>>> #include <asm/stacktrace.h>
>>>
>>> /*
>>> @@ -52,7 +53,7 @@ int notrace unwind_frame(struct stackframe *frame)
>>> * -4 here because we care about the PC at time of bl,
>>> * not where the return will go.
>>> */
>>> - frame->pc = *(unsigned long *)(fp + 8) - 4;
>>> + frame->pc = *(unsigned long *)(fp + 8) - AARCH64_INSN_SIZE;
>>>
>>> return 0;
>>> }
>>
>> The arm64 bits look fine to me:
>>
>> Acked-by: Will Deacon <[email protected]>
>>
>> Steve: feel free to take this along with the other ftrace changes. I don't
>> anticipate any conflicts, but if anything crops up in -next we can sort
>> it out then.
>>
>> Thanks!
>>
>> Will
>>

2015-07-21 14:34:30

by Jungseok Lee

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Jul 21, 2015, at 7:26 PM, AKASHI Takahiro wrote:
> On 07/21/2015 08:53 AM, AKASHI Takahiro wrote:
>> Hi
>>
>> So i don't have to repost my patch here. Please use the original
>> commit log message[1/3] as is.
>> But please keep in mind that there is still an issue that I mentioned
>> in the cover letter; 'Size' field is inaccurate.
>> <reported size> = <its own dynamic local variables>
>> + <child's local variables>
>> and
>> <real size> = <reported size> + <its local variables>
>> - <child's local variables>
>> where "dynamic" means, for example, a variable allocated like the below:
>> int foo(int num) {
>> int array[num];
>> ...
>> }
>> (See more details in my ascii art.)
>>
>> Such usage is seldom seen in the kernel, and <reported size> is
>> likely equal to <child's local variables>. In other words, we will
>> see one-line *displacement* in most cases.
>
> Well, I have a quick fix now, but it looks ugly.

AFAIU, stack_max_size would be more accurate if a separate stack
is introduced for interrupt context. However, it might be unnecessary
at this point due to complexity.

> In addition, I found another issue; With function_graph tracer,
> the output is like:
> # cat /sys/kernel/tracing/stack_trace
> Depth Size Location (78 entries)
> ----- ---- --------
> 0) 6184 32 update_min_vruntime+0x14/0x74
> 1) 6152 48 update_curr+0x6c/0x150
> 2) 6104 128 enqueue_task_fair+0x2f4/0xb9c
> 3) 5976 48 enqueue_task+0x48/0x90
> ...
> 18) 5160 112 hrtimer_interrupt+0xa0/0x214
> 19) 5048 32 arch_timer_handler_phys+0x38/0x48
> 20) 5016 0 ftrace_graph_caller+0x2c/0x30
> 21) 5016 64 ftrace_graph_caller+0x2c/0x30
> 22) 4952 32 ftrace_graph_caller+0x2c/0x30
> 23) 4920 64 ftrace_graph_caller+0x2c/0x30
> ...
>
> Since, with function_graph tracer, we modify LR register in a stack frame
> when we enter into a function, we have to manage such special cases
> in save_stack_trace() or check_stack() as x86 does in
> print_ftrace_graph_addr().

I should have run it with function_graph. The issue is reproduced easily
on my environment. I don't see other issues yet when enabling other tracers.

Best Regards
Jungseok Lee

2015-08-03 09:09:57

by Will Deacon

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

Hi guys,

On Mon, Jul 20, 2015 at 05:20:04PM +0100, Will Deacon wrote:
> On Fri, Jul 17, 2015 at 04:34:21PM +0100, Jungseok Lee wrote:
> > On Jul 17, 2015, at 11:59 PM, Jungseok Lee wrote:
> > > On Jul 17, 2015, at 11:41 PM, Steven Rostedt wrote:
> > >> Thanks! Can you repost patch 1 with the changes I recommended, so that
> > >> I can get an Acked-by from the arm64 maintainers and pull all the
> > >> changes in together. This is fine for a 4.3 release, right? That is, it
> > >> doesn't need to go into 4.2-rcs.
> > >>
> > >
> > > It's not hard to repost a patch, but I feel like we have to wait for Akashi's response.
> > > Also, it might be needed to consider Mark's comment on arch part.
> > >
> > > If they are okay, I will proceed.
> >
> > The [RFC 1/3] patch used in my environment is shaped as follows.
> > I leave the hunk for *only* clear synchronization. This is why I choose this format
> > instead of reposting a patch. I hope it would help to track down this thread.
> >
> > I think this is my best at this point.

[...]

> The arm64 bits look fine to me:
>
> Acked-by: Will Deacon <[email protected]>

What happened to this? Is it queued someplace, or are we waiting for a
new version?

Will

2015-08-03 14:01:55

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Mon, 3 Aug 2015 10:09:51 +0100
Will Deacon <[email protected]> wrote:


> > The arm64 bits look fine to me:
> >
> > Acked-by: Will Deacon <[email protected]>
>
> What happened to this? Is it queued someplace, or are we waiting for a
> new version?

I went on vacation :-) I'm back and trying to catch up. I'll be
pulling in a bunch of things today. Hopefully I can get to this.

-- Steve

2015-08-03 14:04:41

by Will Deacon

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Mon, Aug 03, 2015 at 03:01:40PM +0100, Steven Rostedt wrote:
> On Mon, 3 Aug 2015 10:09:51 +0100
> Will Deacon <[email protected]> wrote:
>
>
> > > The arm64 bits look fine to me:
> > >
> > > Acked-by: Will Deacon <[email protected]>
> >
> > What happened to this? Is it queued someplace, or are we waiting for a
> > new version?
>
> I went on vacation :-) I'm back and trying to catch up. I'll be
> pulling in a bunch of things today. Hopefully I can get to this.

Cheers, Steve. Just wanted to make sure it didn't got lost (I'd briefly
forgotten about it myself!).

I hope you had a good holiday.

Will

2015-08-03 16:30:57

by Jungseok Lee

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Aug 3, 2015, at 6:09 PM, Will Deacon wrote:
> Hi guys,

Hi Will,

> On Mon, Jul 20, 2015 at 05:20:04PM +0100, Will Deacon wrote:
>> On Fri, Jul 17, 2015 at 04:34:21PM +0100, Jungseok Lee wrote:
>>> On Jul 17, 2015, at 11:59 PM, Jungseok Lee wrote:
>>>> On Jul 17, 2015, at 11:41 PM, Steven Rostedt wrote:
>>>>> Thanks! Can you repost patch 1 with the changes I recommended, so that
>>>>> I can get an Acked-by from the arm64 maintainers and pull all the
>>>>> changes in together. This is fine for a 4.3 release, right? That is, it
>>>>> doesn't need to go into 4.2-rcs.
>>>>>
>>>>
>>>> It's not hard to repost a patch, but I feel like we have to wait for Akashi's response.
>>>> Also, it might be needed to consider Mark's comment on arch part.
>>>>
>>>> If they are okay, I will proceed.
>>>
>>> The [RFC 1/3] patch used in my environment is shaped as follows.
>>> I leave the hunk for *only* clear synchronization. This is why I choose this format
>>> instead of reposting a patch. I hope it would help to track down this thread.
>>>
>>> I think this is my best at this point.
>
> [...]
>
>> The arm64 bits look fine to me:
>>
>> Acked-by: Will Deacon <[email protected]>
>
> What happened to this? Is it queued someplace, or are we waiting for a
> new version?

There are two issues in the current version.
1) The change does not work correctly when function_graph feature is enabled.
2) Akashi have raised an issue that size field of stack tracer is inaccurate.

So, I think this patch set is not ready yet.

Best Regards
Jungseok Lee-

2015-08-03 16:57:19

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Tue, 4 Aug 2015 01:30:50 +0900
Jungseok Lee <[email protected]> wrote:


> There are two issues in the current version.
> 1) The change does not work correctly when function_graph feature is enabled.
> 2) Akashi have raised an issue that size field of stack tracer is inaccurate.
>
> So, I think this patch set is not ready yet.

Do you still want me to add code that does:

if (*p == (stack_dump_trace[i] + FTRACE_STACK_FRAME_OFFSET)) {

?

If you expect to need that, I can get it into the next merge window and
you can base the code of that in the merge window after that.

-- Steve

2015-08-03 17:22:20

by Jungseok Lee

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Aug 4, 2015, at 1:57 AM, Steven Rostedt wrote:
> On Tue, 4 Aug 2015 01:30:50 +0900
> Jungseok Lee <[email protected]> wrote:
>
>
>> There are two issues in the current version.
>> 1) The change does not work correctly when function_graph feature is enabled.
>> 2) Akashi have raised an issue that size field of stack tracer is inaccurate.
>>
>> So, I think this patch set is not ready yet.
>
> Do you still want me to add code that does:
>
> if (*p == (stack_dump_trace[i] + FTRACE_STACK_FRAME_OFFSET)) {
>
> ?
>
> If you expect to need that, I can get it into the next merge window and
> you can base the code of that in the merge window after that.

It would be better to add the snippet when a new version is ready.
That way might help to figure out easily why the macro is introduced and how
it can be used in architecture code.

Best Regards
Jungseok Lee-

2015-08-03 17:32:26

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On Tue, 4 Aug 2015 02:22:16 +0900
Jungseok Lee <[email protected]> wrote:


> It would be better to add the snippet when a new version is ready.
> That way might help to figure out easily why the macro is introduced and how
> it can be used in architecture code.


OK, which ever.

-- Steve

2015-08-04 07:42:07

by AKASHI Takahiro

[permalink] [raw]
Subject: Re: [RFC 2/3] arm64: refactor save_stack_trace()

On 08/04/2015 02:32 AM, Steven Rostedt wrote:
> On Tue, 4 Aug 2015 02:22:16 +0900
> Jungseok Lee <[email protected]> wrote:
>
>
>> It would be better to add the snippet when a new version is ready.
>> That way might help to figure out easily why the macro is introduced and how
>> it can be used in architecture code.
>
>
> OK, which ever.

I'm making best efforts to fix the problem, but it seems to me that the current
check_stack() doesn't fit to arm64 due to the differences in stack frame usage
on architectures.
Although my current patch doesn't solve all the issues I've noticed so far,
I'd like to get any comments about whether I'm doing it the right way.
Please take a look at my new vesion (v2) of RFC.

Thanks,
-Takahiro AKASHI


> -- Steve
>