The livepatch consistency model requires reliable stack tracing
architecture support in order to work properly. In order to achieve
this, two main issues have to be solved. First, reliable and consistent
call chain backtracing has to be ensured. Second, the unwinder needs to
be able to detect stack corruptions and return errors.
The former should be guaranteed (see 3/3 for details), the latter is
implemented by the patch set (mainly 3/3).
It passes livepatch kselftests (with timeout disabled, see
https://lore.kernel.org/live-patching/[email protected]/)
and tests from https://github.com/lpechacek/qa_test_klp.
Changes since v1 and questions follow:
- rebased on top of v5.4-rc5. It also meant rebasing on top of
ARCH_STACKWALK, which made the outcome nicer and addressed some of
Joe's remarks.
- sp alignment is checked in both _start and _next_frame cases
- ftrace_graph_ret_addr() calling cleanup
- I tried to use the existing infrastructure as much as possible with
one exception. I kept unwind_next_frame_reliable() next to the
ordinary unwind_next_frame(). I did not come up with a nice solution
how to integrate it. The reliable unwinding is executed on a task
stack only, which leads to a nice simplification. My integration
attempts only obfuscated the existing unwind_next_frame() which is
already not easy to read. Ideas are definitely welcome.
- although not nice (Josh mentioned it in his review), I kept checking
for kretprobe_trampoline in the main loop. It could go into _start and
_next_frame callbacks, but more changes would be required (changing a
function return type, ordinary unwinding does not need it). So I did
not think it was worth it. I could change it in v3, of course.
Miroslav Benes (3):
s390/unwind: drop unnecessary code around calling
ftrace_graph_ret_addr()
s390/unwind: prepare the unwinding interface for reliable stack traces
s390/livepatch: Implement reliable stack tracing for the consistency
model
arch/s390/Kconfig | 1 +
arch/s390/include/asm/stacktrace.h | 3 +-
arch/s390/include/asm/unwind.h | 15 +++---
arch/s390/kernel/dumpstack.c | 16 +++++-
arch/s390/kernel/perf_event.c | 2 +-
arch/s390/kernel/stacktrace.c | 48 +++++++++++++++++-
arch/s390/kernel/unwind_bc.c | 79 ++++++++++++++++++++++++------
arch/s390/oprofile/init.c | 2 +-
8 files changed, 139 insertions(+), 27 deletions(-)
--
2.23.0
The reliable stack traces support will require to perform more actions
and some tasks differently. Add a new parameter to respective functions.
Signed-off-by: Miroslav Benes <[email protected]>
---
arch/s390/include/asm/stacktrace.h | 3 ++-
arch/s390/include/asm/unwind.h | 14 ++++++++------
arch/s390/kernel/dumpstack.c | 5 +++--
arch/s390/kernel/perf_event.c | 2 +-
arch/s390/kernel/stacktrace.c | 2 +-
arch/s390/kernel/unwind_bc.c | 7 ++++---
arch/s390/oprofile/init.c | 2 +-
7 files changed, 20 insertions(+), 15 deletions(-)
diff --git a/arch/s390/include/asm/stacktrace.h b/arch/s390/include/asm/stacktrace.h
index 0ae4bbf7779c..f033bb70a8db 100644
--- a/arch/s390/include/asm/stacktrace.h
+++ b/arch/s390/include/asm/stacktrace.h
@@ -21,7 +21,8 @@ struct stack_info {
const char *stack_type_name(enum stack_type type);
int get_stack_info(unsigned long sp, struct task_struct *task,
- struct stack_info *info, unsigned long *visit_mask);
+ struct stack_info *info, unsigned long *visit_mask,
+ bool unwind_reliable);
static inline bool on_stack(struct stack_info *info,
unsigned long addr, size_t len)
diff --git a/arch/s390/include/asm/unwind.h b/arch/s390/include/asm/unwind.h
index d827b5b9a32c..1f4de78c3ef9 100644
--- a/arch/s390/include/asm/unwind.h
+++ b/arch/s390/include/asm/unwind.h
@@ -41,7 +41,8 @@ struct unwind_state {
};
void __unwind_start(struct unwind_state *state, struct task_struct *task,
- struct pt_regs *regs, unsigned long first_frame);
+ struct pt_regs *regs, unsigned long first_frame,
+ bool unwind_reliable);
bool unwind_next_frame(struct unwind_state *state);
unsigned long unwind_get_return_address(struct unwind_state *state);
@@ -58,10 +59,11 @@ static inline bool unwind_error(struct unwind_state *state)
static inline void unwind_start(struct unwind_state *state,
struct task_struct *task,
struct pt_regs *regs,
- unsigned long sp)
+ unsigned long sp,
+ bool unwind_reliable)
{
sp = sp ? : get_stack_pointer(task, regs);
- __unwind_start(state, task, regs, sp);
+ __unwind_start(state, task, regs, sp, unwind_reliable);
}
static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
@@ -69,9 +71,9 @@ static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
return unwind_done(state) ? NULL : state->regs;
}
-#define unwind_for_each_frame(state, task, regs, first_frame) \
- for (unwind_start(state, task, regs, first_frame); \
- !unwind_done(state); \
+#define unwind_for_each_frame(state, task, regs, first_frame, unwind_reliable) \
+ for (unwind_start(state, task, regs, first_frame, unwind_reliable); \
+ !unwind_done(state); \
unwind_next_frame(state))
static inline void unwind_init(void) {}
diff --git a/arch/s390/kernel/dumpstack.c b/arch/s390/kernel/dumpstack.c
index 34bdc60c0b11..1ee19e6336cd 100644
--- a/arch/s390/kernel/dumpstack.c
+++ b/arch/s390/kernel/dumpstack.c
@@ -88,7 +88,8 @@ static bool in_restart_stack(unsigned long sp, struct stack_info *info)
}
int get_stack_info(unsigned long sp, struct task_struct *task,
- struct stack_info *info, unsigned long *visit_mask)
+ struct stack_info *info, unsigned long *visit_mask,
+ bool unwind_reliable)
{
if (!sp)
goto unknown;
@@ -130,7 +131,7 @@ void show_stack(struct task_struct *task, unsigned long *stack)
printk("Call Trace:\n");
if (!task)
task = current;
- unwind_for_each_frame(&state, task, NULL, (unsigned long) stack)
+ unwind_for_each_frame(&state, task, NULL, (unsigned long) stack, false)
printk(state.reliable ? " [<%016lx>] %pSR \n" :
"([<%016lx>] %pSR)\n",
state.ip, (void *) state.ip);
diff --git a/arch/s390/kernel/perf_event.c b/arch/s390/kernel/perf_event.c
index fcb6c2e92b07..7e81db0883e1 100644
--- a/arch/s390/kernel/perf_event.c
+++ b/arch/s390/kernel/perf_event.c
@@ -225,7 +225,7 @@ void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
{
struct unwind_state state;
- unwind_for_each_frame(&state, current, regs, 0)
+ unwind_for_each_frame(&state, current, regs, 0, false)
perf_callchain_store(entry, state.ip);
}
diff --git a/arch/s390/kernel/stacktrace.c b/arch/s390/kernel/stacktrace.c
index f8fc4f8aef9b..751c136172f7 100644
--- a/arch/s390/kernel/stacktrace.c
+++ b/arch/s390/kernel/stacktrace.c
@@ -16,7 +16,7 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
struct unwind_state state;
unsigned long addr;
- unwind_for_each_frame(&state, task, regs, 0) {
+ unwind_for_each_frame(&state, task, regs, 0, false) {
addr = unwind_get_return_address(&state);
if (!addr || !consume_entry(cookie, addr, false))
break;
diff --git a/arch/s390/kernel/unwind_bc.c b/arch/s390/kernel/unwind_bc.c
index b407b5531f11..564eef63668b 100644
--- a/arch/s390/kernel/unwind_bc.c
+++ b/arch/s390/kernel/unwind_bc.c
@@ -29,7 +29,7 @@ static bool update_stack_info(struct unwind_state *state, unsigned long sp)
unsigned long *mask = &state->stack_mask;
/* New stack pointer leaves the current stack */
- if (get_stack_info(sp, state->task, info, mask) != 0 ||
+ if (get_stack_info(sp, state->task, info, mask, false) != 0 ||
!on_stack(info, sp, sizeof(struct stack_frame)))
/* 'sp' does not point to a valid stack */
return false;
@@ -99,7 +99,8 @@ bool unwind_next_frame(struct unwind_state *state)
EXPORT_SYMBOL_GPL(unwind_next_frame);
void __unwind_start(struct unwind_state *state, struct task_struct *task,
- struct pt_regs *regs, unsigned long sp)
+ struct pt_regs *regs, unsigned long sp,
+ bool unwind_reliable)
{
struct stack_info *info = &state->stack_info;
unsigned long *mask = &state->stack_mask;
@@ -118,7 +119,7 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
}
/* Get current stack pointer and initialize stack info */
- if (get_stack_info(sp, task, info, mask) != 0 ||
+ if (get_stack_info(sp, task, info, mask, unwind_reliable) != 0 ||
!on_stack(info, sp, sizeof(struct stack_frame))) {
/* Something is wrong with the stack pointer */
info->type = STACK_TYPE_UNKNOWN;
diff --git a/arch/s390/oprofile/init.c b/arch/s390/oprofile/init.c
index 7441857df51b..59d736f46cbc 100644
--- a/arch/s390/oprofile/init.c
+++ b/arch/s390/oprofile/init.c
@@ -19,7 +19,7 @@ static void s390_backtrace(struct pt_regs *regs, unsigned int depth)
{
struct unwind_state state;
- unwind_for_each_frame(&state, current, regs, 0) {
+ unwind_for_each_frame(&state, current, regs, 0, false) {
if (depth-- == 0)
break;
oprofile_add_trace(state.ip);
--
2.23.0
On Tue, Oct 29, 2019 at 03:39:01PM +0100, Miroslav Benes wrote:
> - I tried to use the existing infrastructure as much as possible with
> one exception. I kept unwind_next_frame_reliable() next to the
> ordinary unwind_next_frame(). I did not come up with a nice solution
> how to integrate it. The reliable unwinding is executed on a task
> stack only, which leads to a nice simplification. My integration
> attempts only obfuscated the existing unwind_next_frame() which is
> already not easy to read. Ideas are definitely welcome.
Ah, now I see. So patch 2 seems to be leftover(?). Could you just send
how the result would look like?
I'd really like to have only one function, since some of the sanity
checks you added also make sense for what we already have - so code
would diverge from the beginning.
The current code around calling ftrace_graph_ret_addr() is ifdeffed and
also tests if ftrace redirection is present on stack.
ftrace_graph_ret_addr() however performs the test internally and there
is a version for !CONFIG_FUNCTION_GRAPH_TRACER as well. The unnecessary
code can thus be dropped.
Signed-off-by: Miroslav Benes <[email protected]>
---
arch/s390/kernel/unwind_bc.c | 16 ++++------------
1 file changed, 4 insertions(+), 12 deletions(-)
diff --git a/arch/s390/kernel/unwind_bc.c b/arch/s390/kernel/unwind_bc.c
index 8fc9daae47a2..b407b5531f11 100644
--- a/arch/s390/kernel/unwind_bc.c
+++ b/arch/s390/kernel/unwind_bc.c
@@ -80,12 +80,8 @@ bool unwind_next_frame(struct unwind_state *state)
}
}
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
- /* Decode any ftrace redirection */
- if (ip == (unsigned long) return_to_handler)
- ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
- ip, (void *) sp);
-#endif
+ ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
+ ip, (void *) sp);
/* Update unwind state */
state->sp = sp;
@@ -140,12 +136,8 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
reliable = false;
}
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
- /* Decode any ftrace redirection */
- if (ip == (unsigned long) return_to_handler)
- ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
- ip, NULL);
-#endif
+ ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
+ ip, NULL);
/* Update unwind state */
state->sp = sp;
--
2.23.0
The livepatch consistency model requires reliable stack tracing
architecture support in order to work properly. In order to achieve
this, two main issues have to be solved. First, reliable and consistent
call chain backtracing has to be ensured. Second, the unwinder needs to
be able to detect stack corruptions and return errors.
The "zSeries ELF Application Binary Interface Supplement" says:
"The stack pointer points to the first word of the lowest allocated
stack frame. If the "back chain" is implemented this word will point to
the previously allocated stack frame (towards higher addresses), except
for the first stack frame, which shall have a back chain of zero (NULL).
The stack shall grow downwards, in other words towards lower addresses."
"back chain" is optional. GCC option -mbackchain enables it. Quoting
Martin Schwidefsky [1]:
"The compiler is called with the -mbackchain option, all normal C
function will store the backchain in the function prologue. All
functions written in assembler code should do the same, if you find one
that does not we should fix that. The end result is that a task that
*voluntarily* called schedule() should have a proper backchain at all
times.
Dependent on the use case this may or may not be enough. Asynchronous
interrupts may stop the CPU at the beginning of a function, if kernel
preemption is enabled we can end up with a broken backchain. The
production kernels for IBM Z are all compiled *without* kernel
preemption. So yes, we might get away without the objtool support.
On a side-note, we do have a line item to implement the ORC unwinder for
the kernel, that includes the objtool support. Once we have that we can
drop the -mbackchain option for the kernel build. That gives us a nice
little performance benefit. I hope that the change from backchain to the
ORC unwinder will not be too hard to implement in the livepatch tools."
Since -mbackchain is enabled by default when the kernel is compiled, the
call chain backtracing should be currently ensured and objtool should
not be necessary for livepatch purposes.
Regarding the second issue, stack corruptions and non-reliable states
have to be recognized by the unwinder. Mainly it means to detect
preemption or page faults, the end of the task stack must be reached,
return addresses must be valid text addresses and hacks like function
graph tracing and kretprobes must be properly detected.
Unwinding a running task's stack is not a problem, because there is a
livepatch requirement that every checked task is blocked, except for the
current task. Due to that, the implementation can be much simpler
compared to the existing non-reliable infrastructure. We can consider a
task's kernel/thread stack only and skip the other stacks.
Idle tasks are a bit special. Their final back chains point to no_dat
stacks. See for reference CALL_ON_STACK() in smp_start_secondary()
callback used in __cpu_up(). The unwinding is stopped there and it is
not considered to be a stack corruption.
[1] 20180912121106.31ffa97c@mschwideX1 [not archived on lore.kernel.org]
Signed-off-by: Miroslav Benes <[email protected]>
---
arch/s390/Kconfig | 1 +
arch/s390/include/asm/unwind.h | 1 +
arch/s390/kernel/dumpstack.c | 11 +++++++
arch/s390/kernel/stacktrace.c | 46 ++++++++++++++++++++++++++++
arch/s390/kernel/unwind_bc.c | 56 ++++++++++++++++++++++++++++++++++
5 files changed, 115 insertions(+)
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 43a81d0ad507..9cfec1000abd 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -170,6 +170,7 @@ config S390
select HAVE_PERF_EVENTS
select HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
+ select HAVE_RELIABLE_STACKTRACE
select HAVE_RSEQ
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_VIRT_CPU_ACCOUNTING
diff --git a/arch/s390/include/asm/unwind.h b/arch/s390/include/asm/unwind.h
index 1f4de78c3ef9..87d1850d195a 100644
--- a/arch/s390/include/asm/unwind.h
+++ b/arch/s390/include/asm/unwind.h
@@ -44,6 +44,7 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
struct pt_regs *regs, unsigned long first_frame,
bool unwind_reliable);
bool unwind_next_frame(struct unwind_state *state);
+bool unwind_next_frame_reliable(struct unwind_state *state);
unsigned long unwind_get_return_address(struct unwind_state *state);
static inline bool unwind_done(struct unwind_state *state)
diff --git a/arch/s390/kernel/dumpstack.c b/arch/s390/kernel/dumpstack.c
index 1ee19e6336cd..0081e75b957c 100644
--- a/arch/s390/kernel/dumpstack.c
+++ b/arch/s390/kernel/dumpstack.c
@@ -94,12 +94,23 @@ int get_stack_info(unsigned long sp, struct task_struct *task,
if (!sp)
goto unknown;
+ /* Sanity check: ABI requires SP to be aligned 8 bytes. */
+ if (sp & 0x7)
+ goto unknown;
+
task = task ? : current;
/* Check per-task stack */
if (in_task_stack(sp, task, info))
goto recursion_check;
+ /*
+ * The reliable unwinding should not start on nodat_stack, async_stack
+ * or restart_stack. The task is either current or must be inactive.
+ */
+ if (unwind_reliable)
+ goto unknown;
+
if (task != current)
goto unknown;
diff --git a/arch/s390/kernel/stacktrace.c b/arch/s390/kernel/stacktrace.c
index 751c136172f7..cff9ba0715e6 100644
--- a/arch/s390/kernel/stacktrace.c
+++ b/arch/s390/kernel/stacktrace.c
@@ -9,6 +9,7 @@
#include <linux/stacktrace.h>
#include <asm/stacktrace.h>
#include <asm/unwind.h>
+#include <asm/kprobes.h>
void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
struct task_struct *task, struct pt_regs *regs)
@@ -22,3 +23,48 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
break;
}
}
+
+/*
+ * This function returns an error if it detects any unreliable features of the
+ * stack. Otherwise it guarantees that the stack trace is reliable.
+ *
+ * If the task is not 'current', the caller *must* ensure the task is inactive.
+ */
+int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
+ void *cookie, struct task_struct *task)
+{
+ struct unwind_state state;
+ unsigned long addr;
+
+ for (unwind_start(&state, task, NULL, 0, true);
+ !unwind_done(&state) && !unwind_error(&state);
+ unwind_next_frame_reliable(&state)) {
+
+ addr = unwind_get_return_address(&state);
+ if (!addr)
+ return -EINVAL;
+
+#ifdef CONFIG_KPROBES
+ /*
+ * Mark stacktraces with kretprobed functions on them
+ * as unreliable.
+ */
+ if (state.ip == (unsigned long)kretprobe_trampoline)
+ return -EINVAL;
+#endif
+
+ if (!consume_entry(cookie, addr, false))
+ return -EINVAL;
+ }
+
+ /* Check for stack corruption */
+ if (unwind_error(&state))
+ return -EINVAL;
+
+ /* Store kernel_thread_starter, null for swapper/0 */
+ if ((task->flags & (PF_KTHREAD | PF_IDLE)) &&
+ !consume_entry(cookie, state.regs->psw.addr, false))
+ return -EINVAL;
+
+ return 0;
+}
diff --git a/arch/s390/kernel/unwind_bc.c b/arch/s390/kernel/unwind_bc.c
index 564eef63668b..8d3a1d137ad0 100644
--- a/arch/s390/kernel/unwind_bc.c
+++ b/arch/s390/kernel/unwind_bc.c
@@ -98,6 +98,62 @@ bool unwind_next_frame(struct unwind_state *state)
}
EXPORT_SYMBOL_GPL(unwind_next_frame);
+bool unwind_next_frame_reliable(struct unwind_state *state)
+{
+ struct stack_info *info = &state->stack_info;
+ struct stack_frame *sf;
+ struct pt_regs *regs;
+ unsigned long sp, ip;
+
+ sf = (struct stack_frame *) state->sp;
+ sp = READ_ONCE_NOCHECK(sf->back_chain);
+ /*
+ * Idle tasks are special. The final back-chain points to nodat_stack.
+ * See CALL_ON_STACK() in smp_start_secondary() callback used in
+ * __cpu_up(). We just accept it, go to else branch and look for
+ * pt_regs.
+ */
+ if (likely(sp && !(is_idle_task(state->task) &&
+ outside_of_stack(state, sp)))) {
+ /* Non-zero back-chain points to the previous frame */
+ if (unlikely(outside_of_stack(state, sp)))
+ goto out_err;
+
+ sf = (struct stack_frame *) sp;
+ ip = READ_ONCE_NOCHECK(sf->gprs[8]);
+ } else {
+ /* No back-chain, look for a pt_regs structure */
+ sp = state->sp + STACK_FRAME_OVERHEAD;
+ regs = (struct pt_regs *) sp;
+ if ((unsigned long)regs != info->end - sizeof(struct pt_regs))
+ goto out_err;
+ if (!(state->task->flags & (PF_KTHREAD | PF_IDLE)) &&
+ !user_mode(regs))
+ goto out_err;
+
+ state->regs = regs;
+ goto out_stop;
+ }
+
+ /* Sanity check: ABI requires SP to be aligned 8 bytes. */
+ if (sp & 0x7)
+ goto out_err;
+
+ ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
+ ip, (void *) sp);
+
+ /* Update unwind state */
+ state->sp = sp;
+ state->ip = ip;
+ return true;
+
+out_err:
+ state->error = true;
+out_stop:
+ state->stack_info.type = STACK_TYPE_UNKNOWN;
+ return false;
+}
+
void __unwind_start(struct unwind_state *state, struct task_struct *task,
struct pt_regs *regs, unsigned long sp,
bool unwind_reliable)
--
2.23.0
On Tue, 29 Oct 2019, Heiko Carstens wrote:
> On Tue, Oct 29, 2019 at 03:39:01PM +0100, Miroslav Benes wrote:
> > - I tried to use the existing infrastructure as much as possible with
> > one exception. I kept unwind_next_frame_reliable() next to the
> > ordinary unwind_next_frame(). I did not come up with a nice solution
> > how to integrate it. The reliable unwinding is executed on a task
> > stack only, which leads to a nice simplification. My integration
> > attempts only obfuscated the existing unwind_next_frame() which is
> > already not easy to read. Ideas are definitely welcome.
>
> Ah, now I see. So patch 2 seems to be leftover(?). Could you just send
> how the result would look like?
>
> I'd really like to have only one function, since some of the sanity
> checks you added also make sense for what we already have - so code
> would diverge from the beginning.
Ok, that is understandable. I tried a bit harder and the outcome does not
look as bad as my previous attempts (read, I gave up too early).
I deliberately split unwind_reliable/!unwind_reliable case in "No
back-chain, look for a pt_regs structure" branch, because the purpose is
different there. In !unwind_reliable case we can continue on a different
stack (if I understood the code correctly when I analyzed it in the past.
I haven't found a good documentation unfortunately :(). While in
unwind_realiable case we just check if there are pt_regs in the right
place on a task stack and stop. If there are not, error out.
It applies on top of the patch set. Only compile tested though. If it
looks ok-ish to you, I'll work on it.
Thanks
Miroslav
---
diff --git a/arch/s390/include/asm/unwind.h b/arch/s390/include/asm/unwind.h
index 87d1850d195a..282c158a3c2a 100644
--- a/arch/s390/include/asm/unwind.h
+++ b/arch/s390/include/asm/unwind.h
@@ -43,7 +43,7 @@ struct unwind_state {
void __unwind_start(struct unwind_state *state, struct task_struct *task,
struct pt_regs *regs, unsigned long first_frame,
bool unwind_reliable);
-bool unwind_next_frame(struct unwind_state *state);
+bool unwind_next_frame(struct unwind_state *state, bool unwind_reliable);
bool unwind_next_frame_reliable(struct unwind_state *state);
unsigned long unwind_get_return_address(struct unwind_state *state);
@@ -75,7 +75,7 @@ static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
#define unwind_for_each_frame(state, task, regs, first_frame, unwind_reliable) \
for (unwind_start(state, task, regs, first_frame, unwind_reliable); \
!unwind_done(state); \
- unwind_next_frame(state))
+ unwind_next_frame(state, unwind_reliable))
static inline void unwind_init(void) {}
static inline void unwind_module_init(struct module *mod, void *orc_ip,
diff --git a/arch/s390/kernel/stacktrace.c b/arch/s390/kernel/stacktrace.c
index cff9ba0715e6..c5e3a37763f7 100644
--- a/arch/s390/kernel/stacktrace.c
+++ b/arch/s390/kernel/stacktrace.c
@@ -38,7 +38,7 @@ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
for (unwind_start(&state, task, NULL, 0, true);
!unwind_done(&state) && !unwind_error(&state);
- unwind_next_frame_reliable(&state)) {
+ unwind_next_frame(&state, true)) {
addr = unwind_get_return_address(&state);
if (!addr)
diff --git a/arch/s390/kernel/unwind_bc.c b/arch/s390/kernel/unwind_bc.c
index 8d3a1d137ad0..2a7c88b58089 100644
--- a/arch/s390/kernel/unwind_bc.c
+++ b/arch/s390/kernel/unwind_bc.c
@@ -36,7 +36,7 @@ static bool update_stack_info(struct unwind_state *state, unsigned long sp)
return true;
}
-bool unwind_next_frame(struct unwind_state *state)
+bool unwind_next_frame(struct unwind_state *state, bool unwind_reliable)
{
struct stack_info *info = &state->stack_info;
struct stack_frame *sf;
@@ -58,28 +58,59 @@ bool unwind_next_frame(struct unwind_state *state)
} else {
sf = (struct stack_frame *) state->sp;
sp = READ_ONCE_NOCHECK(sf->back_chain);
- if (likely(sp)) {
- /* Non-zero back-chain points to the previous frame */
- if (unlikely(outside_of_stack(state, sp))) {
- if (!update_stack_info(state, sp))
- goto out_err;
- }
+ /*
+ * unwind_reliable case: Idle tasks are special. The final
+ * back-chain points to nodat_stack. See CALL_ON_STACK() in
+ * smp_start_secondary() callback used in __cpu_up(). We just
+ * accept it, go to else branch and look for pt_regs.
+ */
+ if (likely(sp) &&
+ (!unwind_reliable || !(is_idle_task(state->task) &&
+ outside_of_stack(state, sp)))) {
+
+ /*
+ * Non-zero back-chain points to the previous frame. No
+ * need to update stack info when unwind_reliable is
+ * true. We should be on a task stack and everything
+ * else is an error.
+ */
+ if (unlikely(outside_of_stack(state, sp)) &&
+ ((!unwind_reliable && !update_stack_info(state, sp)) ||
+ unwind_reliable))
+ goto out_err;
+
sf = (struct stack_frame *) sp;
ip = READ_ONCE_NOCHECK(sf->gprs[8]);
reliable = true;
} else {
/* No back-chain, look for a pt_regs structure */
sp = state->sp + STACK_FRAME_OVERHEAD;
- if (!on_stack(info, sp, sizeof(struct pt_regs)))
- goto out_stop;
regs = (struct pt_regs *) sp;
- if (READ_ONCE_NOCHECK(regs->psw.mask) & PSW_MASK_PSTATE)
+
+ if (!unwind_reliable) {
+ if (!on_stack(info, sp, sizeof(struct pt_regs)))
+ goto out_stop;
+ if (READ_ONCE_NOCHECK(regs->psw.mask) & PSW_MASK_PSTATE)
+ goto out_stop;
+ ip = READ_ONCE_NOCHECK(regs->psw.addr);
+ reliable = true;
+ } else {
+ if ((unsigned long)regs != info->end - sizeof(struct pt_regs))
+ goto out_err;
+ if (!(state->task->flags & (PF_KTHREAD | PF_IDLE)) &&
+ !user_mode(regs))
+ goto out_err;
+
+ state->regs = regs;
goto out_stop;
- ip = READ_ONCE_NOCHECK(regs->psw.addr);
- reliable = true;
+ }
}
}
+ /* Sanity check: ABI requires SP to be aligned 8 bytes. */
+ if (sp & 0x7)
+ goto out_err;
+
ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
ip, (void *) sp);
@@ -98,62 +129,6 @@ bool unwind_next_frame(struct unwind_state *state)
}
EXPORT_SYMBOL_GPL(unwind_next_frame);
-bool unwind_next_frame_reliable(struct unwind_state *state)
-{
- struct stack_info *info = &state->stack_info;
- struct stack_frame *sf;
- struct pt_regs *regs;
- unsigned long sp, ip;
-
- sf = (struct stack_frame *) state->sp;
- sp = READ_ONCE_NOCHECK(sf->back_chain);
- /*
- * Idle tasks are special. The final back-chain points to nodat_stack.
- * See CALL_ON_STACK() in smp_start_secondary() callback used in
- * __cpu_up(). We just accept it, go to else branch and look for
- * pt_regs.
- */
- if (likely(sp && !(is_idle_task(state->task) &&
- outside_of_stack(state, sp)))) {
- /* Non-zero back-chain points to the previous frame */
- if (unlikely(outside_of_stack(state, sp)))
- goto out_err;
-
- sf = (struct stack_frame *) sp;
- ip = READ_ONCE_NOCHECK(sf->gprs[8]);
- } else {
- /* No back-chain, look for a pt_regs structure */
- sp = state->sp + STACK_FRAME_OVERHEAD;
- regs = (struct pt_regs *) sp;
- if ((unsigned long)regs != info->end - sizeof(struct pt_regs))
- goto out_err;
- if (!(state->task->flags & (PF_KTHREAD | PF_IDLE)) &&
- !user_mode(regs))
- goto out_err;
-
- state->regs = regs;
- goto out_stop;
- }
-
- /* Sanity check: ABI requires SP to be aligned 8 bytes. */
- if (sp & 0x7)
- goto out_err;
-
- ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
- ip, (void *) sp);
-
- /* Update unwind state */
- state->sp = sp;
- state->ip = ip;
- return true;
-
-out_err:
- state->error = true;
-out_stop:
- state->stack_info.type = STACK_TYPE_UNKNOWN;
- return false;
-}
-
void __unwind_start(struct unwind_state *state, struct task_struct *task,
struct pt_regs *regs, unsigned long sp,
bool unwind_reliable)
On Wed, Oct 30, 2019 at 11:12:00AM +0100, Miroslav Benes wrote:
> On Tue, 29 Oct 2019, Heiko Carstens wrote:
>
> > On Tue, Oct 29, 2019 at 03:39:01PM +0100, Miroslav Benes wrote:
> > > - I tried to use the existing infrastructure as much as possible with
> > > one exception. I kept unwind_next_frame_reliable() next to the
> > > ordinary unwind_next_frame(). I did not come up with a nice solution
> > > how to integrate it. The reliable unwinding is executed on a task
> > > stack only, which leads to a nice simplification. My integration
> > > attempts only obfuscated the existing unwind_next_frame() which is
> > > already not easy to read. Ideas are definitely welcome.
> >
> > Ah, now I see. So patch 2 seems to be leftover(?). Could you just send
> > how the result would look like?
> >
> > I'd really like to have only one function, since some of the sanity
> > checks you added also make sense for what we already have - so code
> > would diverge from the beginning.
>
> Ok, that is understandable. I tried a bit harder and the outcome does not
> look as bad as my previous attempts (read, I gave up too early).
>
> I deliberately split unwind_reliable/!unwind_reliable case in "No
> back-chain, look for a pt_regs structure" branch, because the purpose is
> different there. In !unwind_reliable case we can continue on a different
> stack (if I understood the code correctly when I analyzed it in the past.
> I haven't found a good documentation unfortunately :(). While in
> unwind_realiable case we just check if there are pt_regs in the right
> place on a task stack and stop. If there are not, error out.
>
> It applies on top of the patch set. Only compile tested though. If it
> looks ok-ish to you, I'll work on it.
Yes, that looks much better. Note, from a coding style perspective the
80 characters per line limit is _not_ enforced on s390 kernel code; so
that might be a possibility to make the code a bit more readable.
Also it _might_ make sense to split the function into two or more
functions (without duplicating code). Not sure if that would really
increase readability though.
FWIW, I just applied your first patch, since it makes sense anyway.