2017-09-01 05:49:24

by Pratyush Anand

[permalink] [raw]
Subject: [PATCH v2] arm64: fix unwind_frame() for filtered out fn for function graph tracing

do_task_stat() calls get_wchan(), which further does unbind_frame().
unbind_frame() restores frame->pc to original value in case function
graph tracer has modified a return address (LR) in a stack frame to hook
a function return. However, if function graph tracer has hit a filtered
function, then we can't unwind it as ftrace_push_return_trace() has
biased the index(frame->graph) with a 'huge negative'
offset(-FTRACE_NOTRACE_DEPTH).

Moreover, arm64 stack walker defines index(frame->graph) as unsigned
int, which can not compare a -ve number.

Similar problem we can have with calling of walk_stackframe() from
save_stack_trace_tsk() or dump_backtrace().

This patch fixes unwind_frame() to test the index for -ve value and
restore index accordingly before we can restore frame->pc.

Reproducer:

cd /sys/kernel/debug/tracing/
echo schedule > set_graph_notrace
echo 1 > options/display-graph
echo wakeup > current_tracer
ps -ef | grep -i agent

Above commands result in:
Unable to handle kernel paging request at virtual address ffff801bd3d1e000
pgd = ffff8003cbe97c00
[ffff801bd3d1e000] *pgd=0000000000000000, *pud=0000000000000000
Internal error: Oops: 96000006 [#1] SMP
[...]
CPU: 5 PID: 11696 Comm: ps Not tainted 4.11.0+ #33
[...]
task: ffff8003c21ba000 task.stack: ffff8003cc6c0000
PC is at unwind_frame+0x12c/0x180
LR is at get_wchan+0xd4/0x134
pc : [<ffff00000808892c>] lr : [<ffff0000080860b8>] pstate: 60000145
sp : ffff8003cc6c3ab0
x29: ffff8003cc6c3ab0 x28: 0000000000000001
x27: 0000000000000026 x26: 0000000000000026
x25: 00000000000012d8 x24: 0000000000000000
x23: ffff8003c1c04000 x22: ffff000008c83000
x21: ffff8003c1c00000 x20: 000000000000000f
x19: ffff8003c1bc0000 x18: 0000fffffc593690
x17: 0000000000000000 x16: 0000000000000001
x15: 0000b855670e2b60 x14: 0003e97f22cf1d0f
x13: 0000000000000001 x12: 0000000000000000
x11: 00000000e8f4883e x10: 0000000154f47ec8
x9 : 0000000070f367c0 x8 : 0000000000000000
x7 : 00008003f7290000 x6 : 0000000000000018
x5 : 0000000000000000 x4 : ffff8003c1c03cb0
x3 : ffff8003c1c03ca0 x2 : 00000017ffe80000
x1 : ffff8003cc6c3af8 x0 : ffff8003d3e9e000

Process ps (pid: 11696, stack limit = 0xffff8003cc6c0000)
Stack: (0xffff8003cc6c3ab0 to 0xffff8003cc6c4000)
[...]
[<ffff00000808892c>] unwind_frame+0x12c/0x180
[<ffff000008305008>] do_task_stat+0x864/0x870
[<ffff000008305c44>] proc_tgid_stat+0x3c/0x48
[<ffff0000082fde0c>] proc_single_show+0x5c/0xb8
[<ffff0000082b27e0>] seq_read+0x160/0x414
[<ffff000008289e6c>] __vfs_read+0x58/0x164
[<ffff00000828b164>] vfs_read+0x88/0x144
[<ffff00000828c2e8>] SyS_read+0x60/0xc0
[<ffff0000080834a0>] __sys_trace_return+0x0/0x4

fixes: 20380bb390a4 (arm64: ftrace: fix a stack tracer's output under function graph tracer)
Signed-off-by: Pratyush Anand <[email protected]>
---
v1 -> v2:
- improved commit log
- now index is restored and thereafter frame->pc as well.

arch/arm64/include/asm/stacktrace.h | 2 +-
arch/arm64/kernel/stacktrace.c | 3 +++
2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index 5b6eafccc5d8..816db5b9b874 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -23,7 +23,7 @@ struct stackframe {
unsigned long sp;
unsigned long pc;
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
- unsigned int graph;
+ int graph;
#endif
};

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 09d37d66b630..4c47147d0554 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -75,6 +75,9 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
if (tsk->ret_stack &&
(frame->pc == (unsigned long)return_to_handler)) {
+ if (frame->graph < 0)
+ frame->graph += FTRACE_NOTRACE_DEPTH;
+
/*
* This is a case where function graph tracer has
* modified a return address (LR) in a stack frame
--
2.9.4


2017-09-12 09:55:57

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v2] arm64: fix unwind_frame() for filtered out fn for function graph tracing

Hi Pratyush,

On 01/09/17 06:48, Pratyush Anand wrote:
> do_task_stat() calls get_wchan(), which further does unbind_frame().
> unbind_frame() restores frame->pc to original value in case function
> graph tracer has modified a return address (LR) in a stack frame to hook
> a function return. However, if function graph tracer has hit a filtered
> function, then we can't unwind it as ftrace_push_return_trace() has
> biased the index(frame->graph) with a 'huge negative'
> offset(-FTRACE_NOTRACE_DEPTH).
>
> Moreover, arm64 stack walker defines index(frame->graph) as unsigned
> int, which can not compare a -ve number.
>
> Similar problem we can have with calling of walk_stackframe() from
> save_stack_trace_tsk() or dump_backtrace().
>
> This patch fixes unwind_frame() to test the index for -ve value and
> restore index accordingly before we can restore frame->pc.

I've just spotted arm64's profile_pc, which does this:
>From arch/arm64/kernel/time.c:profile_pc():
> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> frame.graph = -1; /* no task info */
> #endif

Is this another elaborate way of hitting this problem?

I guess the options are skip any return-address restore in the unwinder if
frame.graph is -1. (and profile_pc may have a bug here). Or, put
current->curr_ret_stack in there.

profile_pc() always passes tsk=NULL, so the unwinder assumes its current...
kernel/profile.c pulls the pt_regs from a per-cpu irq_regs variable, that is
updated by handle_IPI ... so it looks like this should always be current...


Thanks,

James


> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index 09d37d66b630..4c47147d0554 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -75,6 +75,9 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> if (tsk->ret_stack &&
> (frame->pc == (unsigned long)return_to_handler)) {
> + if (frame->graph < 0)
> + frame->graph += FTRACE_NOTRACE_DEPTH;
> +
> /*
> * This is a case where function graph tracer has
> * modified a return address (LR) in a stack frame
>




2017-09-13 02:42:12

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v2] arm64: fix unwind_frame() for filtered out fn for function graph tracing

On Tue, Sep 12, 2017 at 10:54:28AM +0100, James Morse wrote:
> Hi Pratyush,
>
> On 01/09/17 06:48, Pratyush Anand wrote:
> > do_task_stat() calls get_wchan(), which further does unbind_frame().
> > unbind_frame() restores frame->pc to original value in case function
> > graph tracer has modified a return address (LR) in a stack frame to hook
> > a function return. However, if function graph tracer has hit a filtered
> > function, then we can't unwind it as ftrace_push_return_trace() has
> > biased the index(frame->graph) with a 'huge negative'
> > offset(-FTRACE_NOTRACE_DEPTH).
> >
> > Moreover, arm64 stack walker defines index(frame->graph) as unsigned
> > int, which can not compare a -ve number.
> >
> > Similar problem we can have with calling of walk_stackframe() from
> > save_stack_trace_tsk() or dump_backtrace().
> >
> > This patch fixes unwind_frame() to test the index for -ve value and
> > restore index accordingly before we can restore frame->pc.
>
> I've just spotted arm64's profile_pc, which does this:
> From arch/arm64/kernel/time.c:profile_pc():
> > #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > frame.graph = -1; /* no task info */
> > #endif
>
> Is this another elaborate way of hitting this problem?
>
> I guess the options are skip any return-address restore in the unwinder if
> frame.graph is -1. (and profile_pc may have a bug here). Or, put
> current->curr_ret_stack in there.
>
> profile_pc() always passes tsk=NULL, so the unwinder assumes its current...
> kernel/profile.c pulls the pt_regs from a per-cpu irq_regs variable, that is
> updated by handle_IPI ... so it looks like this should always be current...

Hmmm... is profile_pc the *only* case where frame->graph isn't equal to
tsk->curr_ret_stack in unwind_frame? If so, maybe unwind_frame should just
use that, and we could kill the graph member of struct stackframe completely?

Will

2017-09-13 04:59:27

by Pratyush Anand

[permalink] [raw]
Subject: Re: [PATCH v2] arm64: fix unwind_frame() for filtered out fn for function graph tracing



On Wednesday 13 September 2017 08:12 AM, Will Deacon wrote:
> On Tue, Sep 12, 2017 at 10:54:28AM +0100, James Morse wrote:
>> Hi Pratyush,
>>
>> On 01/09/17 06:48, Pratyush Anand wrote:
>>> do_task_stat() calls get_wchan(), which further does unbind_frame().
>>> unbind_frame() restores frame->pc to original value in case function
>>> graph tracer has modified a return address (LR) in a stack frame to hook
>>> a function return. However, if function graph tracer has hit a filtered
>>> function, then we can't unwind it as ftrace_push_return_trace() has
>>> biased the index(frame->graph) with a 'huge negative'
>>> offset(-FTRACE_NOTRACE_DEPTH).
>>>
>>> Moreover, arm64 stack walker defines index(frame->graph) as unsigned
>>> int, which can not compare a -ve number.
>>>
>>> Similar problem we can have with calling of walk_stackframe() from
>>> save_stack_trace_tsk() or dump_backtrace().
>>>
>>> This patch fixes unwind_frame() to test the index for -ve value and
>>> restore index accordingly before we can restore frame->pc.
>>
>> I've just spotted arm64's profile_pc, which does this:
>> From arch/arm64/kernel/time.c:profile_pc():
>>> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>> frame.graph = -1; /* no task info */
>>> #endif
>>
>> Is this another elaborate way of hitting this problem?
>>
>> I guess the options are skip any return-address restore in the unwinder if
>> frame.graph is -1. (and profile_pc may have a bug here). Or, put
>> current->curr_ret_stack in there.

I think we should go with latter, ie assign frame.graph =
current->curr_ret_stack in profile_pc().

>>
>> profile_pc() always passes tsk=NULL, so the unwinder assumes its current...
>> kernel/profile.c pulls the pt_regs from a per-cpu irq_regs variable, that is
>> updated by handle_IPI ... so it looks like this should always be current...
>
> Hmmm... is profile_pc the *only* case where frame->graph isn't equal to
> tsk->curr_ret_stack in unwind_frame? If so, maybe unwind_frame should just

Yes, it is the only place.

> use that, and we could kill the graph member of struct stackframe completely?
>

Humm, not sure, we initialize frame->graph out of the while loop in
unwind_frame()'s caller and then keep in decrementing it in looped function.

--
Regards
Pratyush