A couple of reliable unwinder fixes for newly forked tasks, which were
reported by Wang ShaoBo.
Josh Poimboeuf (2):
x86/unwind/orc: Fix ORC for newly forked tasks
x86/stacktrace: Fix reliable check for empty user task stacks
arch/x86/kernel/stacktrace.c | 5 -----
arch/x86/kernel/unwind_orc.c | 8 ++++++--
2 files changed, 6 insertions(+), 7 deletions(-)
--
2.25.4
The ORC unwinder fails to unwind newly forked tasks which haven't yet
run on the CPU. It correctly reads the 'ret_from_fork' instruction
pointer from the stack, but it incorrectly interprets that value as a
call stack address rather than a "signal" one, so the address gets
incorrectly decremented in the call to orc_find(), resulting in bad ORC
data.
Fix it by forcing 'ret_from_fork' frames to be signal frames.
Reported-by: Wang ShaoBo <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
---
arch/x86/kernel/unwind_orc.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 7f969b2d240f..ec88bbe08a32 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -440,8 +440,11 @@ bool unwind_next_frame(struct unwind_state *state)
/*
* Find the orc_entry associated with the text address.
*
- * Decrement call return addresses by one so they work for sibling
- * calls and calls to noreturn functions.
+ * For a call frame (as opposed to a signal frame), state->ip points to
+ * the instruction after the call. That instruction's stack layout
+ * could be different from the call instruction's layout, for example
+ * if the call was to a noreturn function. So get the ORC data for the
+ * call instruction itself.
*/
orc = orc_find(state->signal ? state->ip : state->ip - 1);
if (!orc) {
@@ -662,6 +665,7 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
state->sp = task->thread.sp;
state->bp = READ_ONCE_NOCHECK(frame->bp);
state->ip = READ_ONCE_NOCHECK(frame->ret_addr);
+ state->signal = (void *)state->ip == ret_from_fork;
}
if (get_stack_info((unsigned long *)state->sp, state->task,
--
2.25.4
If a user task's stack is empty, or if it only has user regs, ORC
reports it as a reliable empty stack. But arch_stack_walk_reliable()
incorrectly treats it as unreliable.
That happens because the only success path for user tasks is inside the
loop, which only iterates on non-empty stacks. Generally, a user task
must end in a user regs frame, but an empty stack is an exception to
that rule.
Thanks to commit 71c95825289f ("x86/unwind/orc: Fix error handling in
__unwind_start()"), unwind_start() now sets state->error appropriately.
So now for both ORC and FP unwinders, unwind_done() and !unwind_error()
always means the end of the stack was successfully reached. So the
success path for kthreads is no longer needed -- it can also be used for
empty user tasks.
Reported-by: Wang ShaoBo <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
---
arch/x86/kernel/stacktrace.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 6ad43fc44556..2fd698e28e4d 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -58,7 +58,6 @@ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
* or a page fault), which can make frame pointers
* unreliable.
*/
-
if (IS_ENABLED(CONFIG_FRAME_POINTER))
return -EINVAL;
}
@@ -81,10 +80,6 @@ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
if (unwind_error(&state))
return -EINVAL;
- /* Success path for non-user tasks, i.e. kthreads and idle tasks */
- if (!(task->flags & (PF_KTHREAD | PF_IDLE)))
- return -EINVAL;
-
return 0;
}
--
2.25.4
Tested-by: Wang ShaoBo <[email protected]>
?? 2020/7/17 22:04, Josh Poimboeuf ะด??:
> A couple of reliable unwinder fixes for newly forked tasks, which were
> reported by Wang ShaoBo.
>
> Josh Poimboeuf (2):
> x86/unwind/orc: Fix ORC for newly forked tasks
> x86/stacktrace: Fix reliable check for empty user task stacks
>
> arch/x86/kernel/stacktrace.c | 5 -----
> arch/x86/kernel/unwind_orc.c | 8 ++++++--
> 2 files changed, 6 insertions(+), 7 deletions(-)
>
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 039a7a30ec102ec866d382a66f87f6f7654f8140
Gitweb: https://git.kernel.org/tip/039a7a30ec102ec866d382a66f87f6f7654f8140
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Fri, 17 Jul 2020 09:04:26 -05:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Wed, 22 Jul 2020 23:47:47 +02:00
x86/stacktrace: Fix reliable check for empty user task stacks
If a user task's stack is empty, or if it only has user regs, ORC
reports it as a reliable empty stack. But arch_stack_walk_reliable()
incorrectly treats it as unreliable.
That happens because the only success path for user tasks is inside the
loop, which only iterates on non-empty stacks. Generally, a user task
must end in a user regs frame, but an empty stack is an exception to
that rule.
Thanks to commit 71c95825289f ("x86/unwind/orc: Fix error handling in
__unwind_start()"), unwind_start() now sets state->error appropriately.
So now for both ORC and FP unwinders, unwind_done() and !unwind_error()
always means the end of the stack was successfully reached. So the
success path for kthreads is no longer needed -- it can also be used for
empty user tasks.
Reported-by: Wang ShaoBo <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Wang ShaoBo <[email protected]>
Link: https://lkml.kernel.org/r/f136a4e5f019219cbc4f4da33b30c2f44fa65b84.1594994374.git.jpoimboe@redhat.com
---
arch/x86/kernel/stacktrace.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 6ad43fc..2fd698e 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -58,7 +58,6 @@ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
* or a page fault), which can make frame pointers
* unreliable.
*/
-
if (IS_ENABLED(CONFIG_FRAME_POINTER))
return -EINVAL;
}
@@ -81,10 +80,6 @@ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
if (unwind_error(&state))
return -EINVAL;
- /* Success path for non-user tasks, i.e. kthreads and idle tasks */
- if (!(task->flags & (PF_KTHREAD | PF_IDLE)))
- return -EINVAL;
-
return 0;
}
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 372a8eaa05998cd45b3417d0e0ffd3a70978211a
Gitweb: https://git.kernel.org/tip/372a8eaa05998cd45b3417d0e0ffd3a70978211a
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Fri, 17 Jul 2020 09:04:25 -05:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Wed, 22 Jul 2020 23:47:47 +02:00
x86/unwind/orc: Fix ORC for newly forked tasks
The ORC unwinder fails to unwind newly forked tasks which haven't yet
run on the CPU. It correctly reads the 'ret_from_fork' instruction
pointer from the stack, but it incorrectly interprets that value as a
call stack address rather than a "signal" one, so the address gets
incorrectly decremented in the call to orc_find(), resulting in bad ORC
data.
Fix it by forcing 'ret_from_fork' frames to be signal frames.
Reported-by: Wang ShaoBo <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Wang ShaoBo <[email protected]>
Link: https://lkml.kernel.org/r/f91a8778dde8aae7f71884b5df2b16d552040441.1594994374.git.jpoimboe@redhat.com
---
arch/x86/kernel/unwind_orc.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 7f969b2..ec88bbe 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -440,8 +440,11 @@ bool unwind_next_frame(struct unwind_state *state)
/*
* Find the orc_entry associated with the text address.
*
- * Decrement call return addresses by one so they work for sibling
- * calls and calls to noreturn functions.
+ * For a call frame (as opposed to a signal frame), state->ip points to
+ * the instruction after the call. That instruction's stack layout
+ * could be different from the call instruction's layout, for example
+ * if the call was to a noreturn function. So get the ORC data for the
+ * call instruction itself.
*/
orc = orc_find(state->signal ? state->ip : state->ip - 1);
if (!orc) {
@@ -662,6 +665,7 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
state->sp = task->thread.sp;
state->bp = READ_ONCE_NOCHECK(frame->bp);
state->ip = READ_ONCE_NOCHECK(frame->ret_addr);
+ state->signal = (void *)state->ip == ret_from_fork;
}
if (get_stack_info((unsigned long *)state->sp, state->task,