The livepatch consistency model requires reliable stack tracing
architecture support in order to work properly. In order to achieve
this, two main issues have to be solved. First, reliable and consistent
call chain backtracing has to be ensured. Second, the unwinder needs to
be able to detect stack corruptions and return errors.
The "zSeries ELF Application Binary Interface Supplement" says:
"The stack pointer points to the first word of the lowest allocated
stack frame. If the "back chain" is implemented this word will point to
the previously allocated stack frame (towards higher addresses), except
for the first stack frame, which shall have a back chain of zero (NULL).
The stack shall grow downwards, in other words towards lower addresses."
"back chain" is optional. GCC option -mbackchain enables it. Quoting
Martin Schwidefsky [1]:
"The compiler is called with the -mbackchain option, all normal C
function will store the backchain in the function prologue. All
functions written in assembler code should do the same, if you find one
that does not we should fix that. The end result is that a task that
*voluntarily* called schedule() should have a proper backchain at all
times.
Dependent on the use case this may or may not be enough. Asynchronous
interrupts may stop the CPU at the beginning of a function, if kernel
preemption is enabled we can end up with a broken backchain. The
production kernels for IBM Z are all compiled *without* kernel
preemption. So yes, we might get away without the objtool support.
On a side-note, we do have a line item to implement the ORC unwinder for
the kernel, that includes the objtool support. Once we have that we can
drop the -mbackchain option for the kernel build. That gives us a nice
little performance benefit. I hope that the change from backchain to the
ORC unwinder will not be too hard to implement in the livepatch tools."
Thus, the call chain backtracing should be currently ensured and objtool
should not be necessary for livepatch purposes.
Regarding the second issue, stack corruptions and non-reliable states
have to be recognized by the unwinder. Mainly it means to detect
preemption or page faults, the end of the task stack must be reached,
return addresses must be valid text addresses and hacks like function
graph tracing and kretprobes must be properly detected.
Unwinding a running task's stack is not a problem, because there is a
livepatch requirement that every checked task is blocked, except for the
current task. Due to that, the implementation can be much simpler
compared to the existing non-reliable infrastructure. We can consider a
task's kernel/thread stack only and skip the other stacks.
Idle tasks are a bit special. Their final back chains point to no_dat
stacks. See for reference CALL_ON_STACK() in smp_start_secondary()
callback used in __cpu_up(). The unwinding is stopped there and it is
not considered to be a stack corruption.
Signed-off-by: Miroslav Benes <[email protected]>
---
- based on Linus' master
- passes livepatch kselftests
- passes tests from https://github.com/lpechacek/qa_test_klp, which
stress the consistency model and the unwinder a bit more
arch/s390/Kconfig | 1 +
arch/s390/include/asm/stacktrace.h | 5 ++
arch/s390/include/asm/unwind.h | 19 ++++++
arch/s390/kernel/dumpstack.c | 28 +++++++++
arch/s390/kernel/stacktrace.c | 78 +++++++++++++++++++++++++
arch/s390/kernel/unwind_bc.c | 93 ++++++++++++++++++++++++++++++
6 files changed, 224 insertions(+)
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index fdb4246265a5..ea73e555063d 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -170,6 +170,7 @@ config S390
select HAVE_PERF_EVENTS
select HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
+ select HAVE_RELIABLE_STACKTRACE
select HAVE_RSEQ
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_VIRT_CPU_ACCOUNTING
diff --git a/arch/s390/include/asm/stacktrace.h b/arch/s390/include/asm/stacktrace.h
index 0ae4bbf7779c..2b5c913c408f 100644
--- a/arch/s390/include/asm/stacktrace.h
+++ b/arch/s390/include/asm/stacktrace.h
@@ -23,6 +23,11 @@ const char *stack_type_name(enum stack_type type);
int get_stack_info(unsigned long sp, struct task_struct *task,
struct stack_info *info, unsigned long *visit_mask);
+#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
+int get_stack_info_reliable(unsigned long sp, struct task_struct *task,
+ struct stack_info *info);
+#endif
+
static inline bool on_stack(struct stack_info *info,
unsigned long addr, size_t len)
{
diff --git a/arch/s390/include/asm/unwind.h b/arch/s390/include/asm/unwind.h
index d827b5b9a32c..1cc96c54169c 100644
--- a/arch/s390/include/asm/unwind.h
+++ b/arch/s390/include/asm/unwind.h
@@ -45,6 +45,25 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
bool unwind_next_frame(struct unwind_state *state);
unsigned long unwind_get_return_address(struct unwind_state *state);
+#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
+void __unwind_start_reliable(struct unwind_state *state,
+ struct task_struct *task, unsigned long sp);
+bool unwind_next_frame_reliable(struct unwind_state *state);
+
+static inline void unwind_start_reliable(struct unwind_state *state,
+ struct task_struct *task)
+{
+ unsigned long sp;
+
+ if (task == current)
+ sp = current_stack_pointer();
+ else
+ sp = task->thread.ksp;
+
+ __unwind_start_reliable(state, task, sp);
+}
+#endif
+
static inline bool unwind_done(struct unwind_state *state)
{
return state->stack_info.type == STACK_TYPE_UNKNOWN;
diff --git a/arch/s390/kernel/dumpstack.c b/arch/s390/kernel/dumpstack.c
index ac06c3949ab3..b21ef2a766ff 100644
--- a/arch/s390/kernel/dumpstack.c
+++ b/arch/s390/kernel/dumpstack.c
@@ -127,6 +127,34 @@ int get_stack_info(unsigned long sp, struct task_struct *task,
return -EINVAL;
}
+#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
+int get_stack_info_reliable(unsigned long sp, struct task_struct *task,
+ struct stack_info *info)
+{
+ if (!sp)
+ goto error;
+
+ /* Sanity check: ABI requires SP to be aligned 8 bytes. */
+ if (sp & 0x7)
+ goto error;
+
+ if (!task)
+ goto error;
+
+ /*
+ * The unwinding should not start on nodat_stack, async_stack or
+ * restart_stack. The task is either current or must be inactive.
+ */
+ if (!in_task_stack(sp, task, info))
+ goto error;
+
+ return 0;
+error:
+ info->type = STACK_TYPE_UNKNOWN;
+ return -EINVAL;
+}
+#endif
+
void show_stack(struct task_struct *task, unsigned long *stack)
{
struct unwind_state state;
diff --git a/arch/s390/kernel/stacktrace.c b/arch/s390/kernel/stacktrace.c
index f6a620f854e1..7d774a325163 100644
--- a/arch/s390/kernel/stacktrace.c
+++ b/arch/s390/kernel/stacktrace.c
@@ -13,6 +13,7 @@
#include <linux/export.h>
#include <asm/stacktrace.h>
#include <asm/unwind.h>
+#include <asm/kprobes.h>
void save_stack_trace(struct stack_trace *trace)
{
@@ -60,3 +61,80 @@ void save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace)
}
}
EXPORT_SYMBOL_GPL(save_stack_trace_regs);
+
+#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
+/*
+ * This function returns an error if it detects any unreliable features of the
+ * stack. Otherwise it guarantees that the stack trace is reliable.
+ *
+ * If the task is not 'current', the caller *must* ensure the task is inactive.
+ */
+static __always_inline int
+__save_stack_trace_tsk_reliable(struct task_struct *tsk,
+ struct stack_trace *trace)
+{
+ struct unwind_state state;
+
+ for (unwind_start_reliable(&state, tsk);
+ !unwind_done(&state) && !unwind_error(&state);
+ unwind_next_frame_reliable(&state)) {
+
+ if (!__kernel_text_address(state.ip))
+ return -EINVAL;
+
+#ifdef CONFIG_KPROBES
+ /*
+ * Mark stacktraces with kretprobed functions on them
+ * as unreliable.
+ */
+ if (state.ip == (unsigned long)kretprobe_trampoline)
+ return -EINVAL;
+#endif
+
+ if (trace->nr_entries >= trace->max_entries)
+ return -E2BIG;
+
+ if (!trace->skip)
+ trace->entries[trace->nr_entries++] = state.ip;
+ else
+ trace->skip--;
+ }
+
+ /* Check for stack corruption */
+ if (unwind_error(&state))
+ return -EINVAL;
+
+ /* Store kernel_thread_starter, null for swapper/0 */
+ if (tsk->flags & (PF_KTHREAD | PF_IDLE)) {
+ if (trace->nr_entries >= trace->max_entries)
+ return -E2BIG;
+
+ if (!trace->skip)
+ trace->entries[trace->nr_entries++] =
+ state.regs->psw.addr;
+ else
+ trace->skip--;
+ }
+
+ return 0;
+}
+
+int save_stack_trace_tsk_reliable(struct task_struct *tsk,
+ struct stack_trace *trace)
+{
+ int ret;
+
+ /*
+ * If the task doesn't have a stack (e.g., a zombie), the stack is
+ * "reliably" empty.
+ */
+ if (!try_get_task_stack(tsk))
+ return 0;
+
+ ret = __save_stack_trace_tsk_reliable(tsk, trace);
+
+ put_task_stack(tsk);
+
+ return ret;
+}
+#endif
diff --git a/arch/s390/kernel/unwind_bc.c b/arch/s390/kernel/unwind_bc.c
index 3ce8a0808059..ada3a8538961 100644
--- a/arch/s390/kernel/unwind_bc.c
+++ b/arch/s390/kernel/unwind_bc.c
@@ -153,3 +153,96 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
state->reliable = reliable;
}
EXPORT_SYMBOL_GPL(__unwind_start);
+
+#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
+void __unwind_start_reliable(struct unwind_state *state,
+ struct task_struct *task, unsigned long sp)
+{
+ struct stack_info *info = &state->stack_info;
+ struct stack_frame *sf;
+ unsigned long ip;
+
+ memset(state, 0, sizeof(*state));
+ state->task = task;
+
+ /* Get current stack pointer and initialize stack info */
+ if (get_stack_info_reliable(sp, task, info) ||
+ !on_stack(info, sp, sizeof(struct stack_frame))) {
+ /* Something is wrong with the stack pointer */
+ info->type = STACK_TYPE_UNKNOWN;
+ state->error = true;
+ return;
+ }
+
+ /* Get the instruction pointer from the stack frame */
+ sf = (struct stack_frame *) sp;
+ ip = READ_ONCE_NOCHECK(sf->gprs[8]);
+
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+ /* Decode any ftrace redirection */
+ if (ip == (unsigned long) return_to_handler)
+ ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
+ ip, NULL);
+#endif
+
+ /* Update unwind state */
+ state->sp = sp;
+ state->ip = ip;
+}
+
+bool unwind_next_frame_reliable(struct unwind_state *state)
+{
+ struct stack_info *info = &state->stack_info;
+ struct stack_frame *sf;
+ struct pt_regs *regs;
+ unsigned long sp, ip;
+
+ sf = (struct stack_frame *) state->sp;
+ sp = READ_ONCE_NOCHECK(sf->back_chain);
+ /*
+ * Idle tasks are special. The final back-chain points to nodat_stack.
+ * See CALL_ON_STACK() in smp_start_secondary() callback used in
+ * __cpu_up(). We just accept it, go to else branch and look for
+ * pt_regs.
+ */
+ if (likely(sp && !(is_idle_task(state->task) &&
+ outside_of_stack(state, sp)))) {
+ /* Non-zero back-chain points to the previous frame */
+ if (unlikely(outside_of_stack(state, sp)))
+ goto out_err;
+
+ sf = (struct stack_frame *) sp;
+ ip = READ_ONCE_NOCHECK(sf->gprs[8]);
+ } else {
+ /* No back-chain, look for a pt_regs structure */
+ sp = state->sp + STACK_FRAME_OVERHEAD;
+ regs = (struct pt_regs *) sp;
+ if ((unsigned long)regs != info->end - sizeof(struct pt_regs))
+ goto out_err;
+ if (!(state->task->flags & (PF_KTHREAD | PF_IDLE)) &&
+ !user_mode(regs))
+ goto out_err;
+
+ state->regs = regs;
+ goto out_stop;
+ }
+
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+ /* Decode any ftrace redirection */
+ if (ip == (unsigned long) return_to_handler)
+ ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
+ ip, (void *) sp);
+#endif
+
+ /* Update unwind state */
+ state->sp = sp;
+ state->ip = ip;
+ return true;
+
+out_err:
+ state->error = true;
+out_stop:
+ state->stack_info.type = STACK_TYPE_UNKNOWN;
+ return false;
+}
+#endif
--
2.22.0
On Wed, Jul 10, 2019 at 12:59:18PM +0200, Miroslav Benes wrote:
> The livepatch consistency model requires reliable stack tracing
> architecture support in order to work properly. In order to achieve
> this, two main issues have to be solved. First, reliable and consistent
> call chain backtracing has to be ensured. Second, the unwinder needs to
> be able to detect stack corruptions and return errors.
>
> The "zSeries ELF Application Binary Interface Supplement" says:
>
> "The stack pointer points to the first word of the lowest allocated
> stack frame. If the "back chain" is implemented this word will point to
> the previously allocated stack frame (towards higher addresses), except
> for the first stack frame, which shall have a back chain of zero (NULL).
> The stack shall grow downwards, in other words towards lower addresses."
>
> "back chain" is optional. GCC option -mbackchain enables it. Quoting
> Martin Schwidefsky [1]:
>
> "The compiler is called with the -mbackchain option, all normal C
> function will store the backchain in the function prologue. All
> functions written in assembler code should do the same, if you find one
> that does not we should fix that. The end result is that a task that
> *voluntarily* called schedule() should have a proper backchain at all
> times.
>
> Dependent on the use case this may or may not be enough. Asynchronous
> interrupts may stop the CPU at the beginning of a function, if kernel
> preemption is enabled we can end up with a broken backchain. The
> production kernels for IBM Z are all compiled *without* kernel
> preemption. So yes, we might get away without the objtool support.
>
> On a side-note, we do have a line item to implement the ORC unwinder for
> the kernel, that includes the objtool support. Once we have that we can
> drop the -mbackchain option for the kernel build. That gives us a nice
> little performance benefit. I hope that the change from backchain to the
> ORC unwinder will not be too hard to implement in the livepatch tools."
>
> Thus, the call chain backtracing should be currently ensured and objtool
> should not be necessary for livepatch purposes.
Hi Miroslav,
Should there be a CONFIG? dependency on -mbackchain and/or kernel
preemption, or does the following ensure that we don't need a explicit
build time checks?
> Regarding the second issue, stack corruptions and non-reliable states
> have to be recognized by the unwinder. Mainly it means to detect
> preemption or page faults, the end of the task stack must be reached,
> return addresses must be valid text addresses and hacks like function
> graph tracing and kretprobes must be properly detected.
>
> Unwinding a running task's stack is not a problem, because there is a
> livepatch requirement that every checked task is blocked, except for the
> current task. Due to that, the implementation can be much simpler
> compared to the existing non-reliable infrastructure. We can consider a
> task's kernel/thread stack only and skip the other stacks.
>
> Idle tasks are a bit special. Their final back chains point to no_dat
> stacks. See for reference CALL_ON_STACK() in smp_start_secondary()
> callback used in __cpu_up(). The unwinding is stopped there and it is
> not considered to be a stack corruption.
>
> Signed-off-by: Miroslav Benes <[email protected]>
> ---
> - based on Linus' master
> - passes livepatch kselftests
> - passes tests from https://github.com/lpechacek/qa_test_klp, which
> stress the consistency model and the unwinder a bit more
>
> arch/s390/Kconfig | 1 +
> arch/s390/include/asm/stacktrace.h | 5 ++
> arch/s390/include/asm/unwind.h | 19 ++++++
> arch/s390/kernel/dumpstack.c | 28 +++++++++
> arch/s390/kernel/stacktrace.c | 78 +++++++++++++++++++++++++
> arch/s390/kernel/unwind_bc.c | 93 ++++++++++++++++++++++++++++++
> 6 files changed, 224 insertions(+)
>
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index fdb4246265a5..ea73e555063d 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -170,6 +170,7 @@ config S390
> select HAVE_PERF_EVENTS
> select HAVE_RCU_TABLE_FREE
> select HAVE_REGS_AND_STACK_ACCESS_API
> + select HAVE_RELIABLE_STACKTRACE
> select HAVE_RSEQ
> select HAVE_SYSCALL_TRACEPOINTS
> select HAVE_VIRT_CPU_ACCOUNTING
> diff --git a/arch/s390/include/asm/stacktrace.h b/arch/s390/include/asm/stacktrace.h
> index 0ae4bbf7779c..2b5c913c408f 100644
> --- a/arch/s390/include/asm/stacktrace.h
> +++ b/arch/s390/include/asm/stacktrace.h
> @@ -23,6 +23,11 @@ const char *stack_type_name(enum stack_type type);
> int get_stack_info(unsigned long sp, struct task_struct *task,
> struct stack_info *info, unsigned long *visit_mask);
>
> +#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
> +int get_stack_info_reliable(unsigned long sp, struct task_struct *task,
> + struct stack_info *info);
> +#endif
> +
> static inline bool on_stack(struct stack_info *info,
> unsigned long addr, size_t len)
> {
> diff --git a/arch/s390/include/asm/unwind.h b/arch/s390/include/asm/unwind.h
> index d827b5b9a32c..1cc96c54169c 100644
> --- a/arch/s390/include/asm/unwind.h
> +++ b/arch/s390/include/asm/unwind.h
> @@ -45,6 +45,25 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
> bool unwind_next_frame(struct unwind_state *state);
> unsigned long unwind_get_return_address(struct unwind_state *state);
>
> +#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
> +void __unwind_start_reliable(struct unwind_state *state,
> + struct task_struct *task, unsigned long sp);
> +bool unwind_next_frame_reliable(struct unwind_state *state);
> +
> +static inline void unwind_start_reliable(struct unwind_state *state,
> + struct task_struct *task)
> +{
> + unsigned long sp;
> +
> + if (task == current)
> + sp = current_stack_pointer();
> + else
> + sp = task->thread.ksp;
> +
> + __unwind_start_reliable(state, task, sp);
> +}
> +#endif
> +
> static inline bool unwind_done(struct unwind_state *state)
> {
> return state->stack_info.type == STACK_TYPE_UNKNOWN;
> diff --git a/arch/s390/kernel/dumpstack.c b/arch/s390/kernel/dumpstack.c
> index ac06c3949ab3..b21ef2a766ff 100644
> --- a/arch/s390/kernel/dumpstack.c
> +++ b/arch/s390/kernel/dumpstack.c
> @@ -127,6 +127,34 @@ int get_stack_info(unsigned long sp, struct task_struct *task,
> return -EINVAL;
> }
>
> +#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
> +int get_stack_info_reliable(unsigned long sp, struct task_struct *task,
> + struct stack_info *info)
> +{
> + if (!sp)
> + goto error;
> +
> + /* Sanity check: ABI requires SP to be aligned 8 bytes. */
> + if (sp & 0x7)
> + goto error;
> +
Does SP alignment only need to be checked for the initial frame, or
should it be verified everytime it's moved in
unwind_next_frame_reliable()?
> + if (!task)
> + goto error;
> +
> + /*
> + * The unwinding should not start on nodat_stack, async_stack or
> + * restart_stack. The task is either current or must be inactive.
> + */
> + if (!in_task_stack(sp, task, info))
> + goto error;
> +
> + return 0;
> +error:
> + info->type = STACK_TYPE_UNKNOWN;
> + return -EINVAL;
> +}
> +#endif
> +
> void show_stack(struct task_struct *task, unsigned long *stack)
> {
> struct unwind_state state;
> diff --git a/arch/s390/kernel/stacktrace.c b/arch/s390/kernel/stacktrace.c
> index f6a620f854e1..7d774a325163 100644
> --- a/arch/s390/kernel/stacktrace.c
> +++ b/arch/s390/kernel/stacktrace.c
> @@ -13,6 +13,7 @@
> #include <linux/export.h>
> #include <asm/stacktrace.h>
> #include <asm/unwind.h>
> +#include <asm/kprobes.h>
>
> void save_stack_trace(struct stack_trace *trace)
> {
> @@ -60,3 +61,80 @@ void save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace)
> }
> }
> EXPORT_SYMBOL_GPL(save_stack_trace_regs);
> +
> +#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
> +/*
> + * This function returns an error if it detects any unreliable features of the
> + * stack. Otherwise it guarantees that the stack trace is reliable.
> + *
> + * If the task is not 'current', the caller *must* ensure the task is inactive.
> + */
> +static __always_inline int
> +__save_stack_trace_tsk_reliable(struct task_struct *tsk,
> + struct stack_trace *trace)
> +{
> + struct unwind_state state;
> +
> + for (unwind_start_reliable(&state, tsk);
> + !unwind_done(&state) && !unwind_error(&state);
> + unwind_next_frame_reliable(&state)) {
> +
> + if (!__kernel_text_address(state.ip))
> + return -EINVAL;
> +
> +#ifdef CONFIG_KPROBES
> + /*
> + * Mark stacktraces with kretprobed functions on them
> + * as unreliable.
> + */
> + if (state.ip == (unsigned long)kretprobe_trampoline)
> + return -EINVAL;
> +#endif
> +
> + if (trace->nr_entries >= trace->max_entries)
> + return -E2BIG;
> +
> + if (!trace->skip)
> + trace->entries[trace->nr_entries++] = state.ip;
> + else
> + trace->skip--;
> + }
> +
> + /* Check for stack corruption */
> + if (unwind_error(&state))
> + return -EINVAL;
> +
> + /* Store kernel_thread_starter, null for swapper/0 */
> + if (tsk->flags & (PF_KTHREAD | PF_IDLE)) {
> + if (trace->nr_entries >= trace->max_entries)
> + return -E2BIG;
> +
> + if (!trace->skip)
> + trace->entries[trace->nr_entries++] =
> + state.regs->psw.addr;
> + else
> + trace->skip--;
An idea for a follow up patch: stuff this into a function like
int save_trace_entry(struct stack_trace *trace, unsigned long entry);
which could one day make the trace->entries[] code generic across arches.
> + }
> +
> + return 0;
> +}
> +
> +int save_stack_trace_tsk_reliable(struct task_struct *tsk,
> + struct stack_trace *trace)
> +{
> + int ret;
> +
> + /*
> + * If the task doesn't have a stack (e.g., a zombie), the stack is
> + * "reliably" empty.
> + */
> + if (!try_get_task_stack(tsk))
> + return 0;
> +
> + ret = __save_stack_trace_tsk_reliable(tsk, trace);
> +
> + put_task_stack(tsk);
> +
> + return ret;
> +}
> +#endif
> diff --git a/arch/s390/kernel/unwind_bc.c b/arch/s390/kernel/unwind_bc.c
> index 3ce8a0808059..ada3a8538961 100644
> --- a/arch/s390/kernel/unwind_bc.c
> +++ b/arch/s390/kernel/unwind_bc.c
> @@ -153,3 +153,96 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
> state->reliable = reliable;
> }
> EXPORT_SYMBOL_GPL(__unwind_start);
> +
> +#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
> +void __unwind_start_reliable(struct unwind_state *state,
> + struct task_struct *task, unsigned long sp)
> +{
> + struct stack_info *info = &state->stack_info;
> + struct stack_frame *sf;
> + unsigned long ip;
> +
> + memset(state, 0, sizeof(*state));
> + state->task = task;
> +
> + /* Get current stack pointer and initialize stack info */
> + if (get_stack_info_reliable(sp, task, info) ||
> + !on_stack(info, sp, sizeof(struct stack_frame))) {
> + /* Something is wrong with the stack pointer */
> + info->type = STACK_TYPE_UNKNOWN;
> + state->error = true;
> + return;
> + }
> +
> + /* Get the instruction pointer from the stack frame */
> + sf = (struct stack_frame *) sp;
> + ip = READ_ONCE_NOCHECK(sf->gprs[8]);
> +
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> + /* Decode any ftrace redirection */
> + if (ip == (unsigned long) return_to_handler)
> + ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
> + ip, NULL);
^^^^
double checking: we ignore the retp here and not in the next-frame case?
> +#endif
> +
> + /* Update unwind state */
> + state->sp = sp;
> + state->ip = ip;
> +}
> +
> +bool unwind_next_frame_reliable(struct unwind_state *state)
> +{
> + struct stack_info *info = &state->stack_info;
> + struct stack_frame *sf;
> + struct pt_regs *regs;
> + unsigned long sp, ip;
> +
> + sf = (struct stack_frame *) state->sp;
> + sp = READ_ONCE_NOCHECK(sf->back_chain);
> + /*
> + * Idle tasks are special. The final back-chain points to nodat_stack.
> + * See CALL_ON_STACK() in smp_start_secondary() callback used in
> + * __cpu_up(). We just accept it, go to else branch and look for
> + * pt_regs.
> + */
> + if (likely(sp && !(is_idle_task(state->task) &&
> + outside_of_stack(state, sp)))) {
> + /* Non-zero back-chain points to the previous frame */
> + if (unlikely(outside_of_stack(state, sp)))
> + goto out_err;
> +
> + sf = (struct stack_frame *) sp;
> + ip = READ_ONCE_NOCHECK(sf->gprs[8]);
> + } else {
> + /* No back-chain, look for a pt_regs structure */
> + sp = state->sp + STACK_FRAME_OVERHEAD;
> + regs = (struct pt_regs *) sp;
> + if ((unsigned long)regs != info->end - sizeof(struct pt_regs))
> + goto out_err;
> + if (!(state->task->flags & (PF_KTHREAD | PF_IDLE)) &&
> + !user_mode(regs))
> + goto out_err;
> +
> + state->regs = regs;
> + goto out_stop;
> + }
> +
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> + /* Decode any ftrace redirection */
> + if (ip == (unsigned long) return_to_handler)
> + ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
> + ip, (void *) sp);
> +#endif
> +
> + /* Update unwind state */
> + state->sp = sp;
> + state->ip = ip;
minor nit: maybe the CONFIG_FUNCTION_GRAPH_TRACER and "Update unwind
state" logic could be combined into a function? (Not a big deal either
way.)
> + return true;
> +
> +out_err:
> + state->error = true;
> +out_stop:
> + state->stack_info.type = STACK_TYPE_UNKNOWN;
> + return false;
> +}
> +#endif
> --
> 2.22.0
>
I've tested the patch with positive results, however I didn't stress it
very hard (basically only selftests). The code logic seems
straightforward and correct by inspection.
On a related note, do you think it would be feasible to extend (in
another patchset) the reliable stack unwinding code a bit so that we
could feed it pre-baked stacks ... then we could verify that the code
was finding interesting scenarios. That was a passing thought I had
back when Nicolai and I were debugging the ppc64le exception frame
marker bug, but didn't think it worth the time/effort at the time.
One more note: Using READ_ONCE_NOCHECK is probably correct here, but
s390 happens to define a READ_ONCE_TASK_STACK macro which calls
READ_ONCE_NOCHECK when task != current. According to the code comments,
this "disables KASAN checking when reading a value from another task's
stack". Is there any scenario here where we would want to use the that
wrapper macro?
-- Joe
On Tue, 16 Jul 2019, Joe Lawrence wrote:
> On Wed, Jul 10, 2019 at 12:59:18PM +0200, Miroslav Benes wrote:
> > The livepatch consistency model requires reliable stack tracing
> > architecture support in order to work properly. In order to achieve
> > this, two main issues have to be solved. First, reliable and consistent
> > call chain backtracing has to be ensured. Second, the unwinder needs to
> > be able to detect stack corruptions and return errors.
> >
> > The "zSeries ELF Application Binary Interface Supplement" says:
> >
> > "The stack pointer points to the first word of the lowest allocated
> > stack frame. If the "back chain" is implemented this word will point to
> > the previously allocated stack frame (towards higher addresses), except
> > for the first stack frame, which shall have a back chain of zero (NULL).
> > The stack shall grow downwards, in other words towards lower addresses."
> >
> > "back chain" is optional. GCC option -mbackchain enables it. Quoting
> > Martin Schwidefsky [1]:
> >
> > "The compiler is called with the -mbackchain option, all normal C
> > function will store the backchain in the function prologue. All
> > functions written in assembler code should do the same, if you find one
> > that does not we should fix that. The end result is that a task that
> > *voluntarily* called schedule() should have a proper backchain at all
> > times.
> >
> > Dependent on the use case this may or may not be enough. Asynchronous
> > interrupts may stop the CPU at the beginning of a function, if kernel
> > preemption is enabled we can end up with a broken backchain. The
> > production kernels for IBM Z are all compiled *without* kernel
> > preemption. So yes, we might get away without the objtool support.
> >
> > On a side-note, we do have a line item to implement the ORC unwinder for
> > the kernel, that includes the objtool support. Once we have that we can
> > drop the -mbackchain option for the kernel build. That gives us a nice
> > little performance benefit. I hope that the change from backchain to the
> > ORC unwinder will not be too hard to implement in the livepatch tools."
> >
> > Thus, the call chain backtracing should be currently ensured and objtool
> > should not be necessary for livepatch purposes.
>
> Hi Miroslav,
>
> Should there be a CONFIG? dependency on -mbackchain and/or kernel
> preemption, or does the following ensure that we don't need a explicit
> build time checks?
I don't think we have to do anything explicit. -mbackchain is enabled by
default (arch/s390/Makefile) and the following should ensure the rest.
I'll make it clearer in v2.
> > Regarding the second issue, stack corruptions and non-reliable states
> > have to be recognized by the unwinder. Mainly it means to detect
> > preemption or page faults, the end of the task stack must be reached,
> > return addresses must be valid text addresses and hacks like function
> > graph tracing and kretprobes must be properly detected.
> >
> > Unwinding a running task's stack is not a problem, because there is a
> > livepatch requirement that every checked task is blocked, except for the
> > current task. Due to that, the implementation can be much simpler
> > compared to the existing non-reliable infrastructure. We can consider a
> > task's kernel/thread stack only and skip the other stacks.
> >
> > Idle tasks are a bit special. Their final back chains point to no_dat
> > stacks. See for reference CALL_ON_STACK() in smp_start_secondary()
> > callback used in __cpu_up(). The unwinding is stopped there and it is
> > not considered to be a stack corruption.
> >
> > Signed-off-by: Miroslav Benes <[email protected]>
> > ---
> > - based on Linus' master
> > - passes livepatch kselftests
> > - passes tests from https://github.com/lpechacek/qa_test_klp, which
> > stress the consistency model and the unwinder a bit more
> >
> > arch/s390/Kconfig | 1 +
> > arch/s390/include/asm/stacktrace.h | 5 ++
> > arch/s390/include/asm/unwind.h | 19 ++++++
> > arch/s390/kernel/dumpstack.c | 28 +++++++++
> > arch/s390/kernel/stacktrace.c | 78 +++++++++++++++++++++++++
> > arch/s390/kernel/unwind_bc.c | 93 ++++++++++++++++++++++++++++++
> > 6 files changed, 224 insertions(+)
> >
> > diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> > index fdb4246265a5..ea73e555063d 100644
> > --- a/arch/s390/Kconfig
> > +++ b/arch/s390/Kconfig
> > @@ -170,6 +170,7 @@ config S390
> > select HAVE_PERF_EVENTS
> > select HAVE_RCU_TABLE_FREE
> > select HAVE_REGS_AND_STACK_ACCESS_API
> > + select HAVE_RELIABLE_STACKTRACE
> > select HAVE_RSEQ
> > select HAVE_SYSCALL_TRACEPOINTS
> > select HAVE_VIRT_CPU_ACCOUNTING
> > diff --git a/arch/s390/include/asm/stacktrace.h b/arch/s390/include/asm/stacktrace.h
> > index 0ae4bbf7779c..2b5c913c408f 100644
> > --- a/arch/s390/include/asm/stacktrace.h
> > +++ b/arch/s390/include/asm/stacktrace.h
> > @@ -23,6 +23,11 @@ const char *stack_type_name(enum stack_type type);
> > int get_stack_info(unsigned long sp, struct task_struct *task,
> > struct stack_info *info, unsigned long *visit_mask);
> >
> > +#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
> > +int get_stack_info_reliable(unsigned long sp, struct task_struct *task,
> > + struct stack_info *info);
> > +#endif
> > +
> > static inline bool on_stack(struct stack_info *info,
> > unsigned long addr, size_t len)
> > {
> > diff --git a/arch/s390/include/asm/unwind.h b/arch/s390/include/asm/unwind.h
> > index d827b5b9a32c..1cc96c54169c 100644
> > --- a/arch/s390/include/asm/unwind.h
> > +++ b/arch/s390/include/asm/unwind.h
> > @@ -45,6 +45,25 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
> > bool unwind_next_frame(struct unwind_state *state);
> > unsigned long unwind_get_return_address(struct unwind_state *state);
> >
> > +#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
> > +void __unwind_start_reliable(struct unwind_state *state,
> > + struct task_struct *task, unsigned long sp);
> > +bool unwind_next_frame_reliable(struct unwind_state *state);
> > +
> > +static inline void unwind_start_reliable(struct unwind_state *state,
> > + struct task_struct *task)
> > +{
> > + unsigned long sp;
> > +
> > + if (task == current)
> > + sp = current_stack_pointer();
> > + else
> > + sp = task->thread.ksp;
> > +
> > + __unwind_start_reliable(state, task, sp);
> > +}
> > +#endif
> > +
> > static inline bool unwind_done(struct unwind_state *state)
> > {
> > return state->stack_info.type == STACK_TYPE_UNKNOWN;
> > diff --git a/arch/s390/kernel/dumpstack.c b/arch/s390/kernel/dumpstack.c
> > index ac06c3949ab3..b21ef2a766ff 100644
> > --- a/arch/s390/kernel/dumpstack.c
> > +++ b/arch/s390/kernel/dumpstack.c
> > @@ -127,6 +127,34 @@ int get_stack_info(unsigned long sp, struct task_struct *task,
> > return -EINVAL;
> > }
> >
> > +#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
> > +int get_stack_info_reliable(unsigned long sp, struct task_struct *task,
> > + struct stack_info *info)
> > +{
> > + if (!sp)
> > + goto error;
> > +
> > + /* Sanity check: ABI requires SP to be aligned 8 bytes. */
> > + if (sp & 0x7)
> > + goto error;
> > +
>
> Does SP alignment only need to be checked for the initial frame, or
> should it be verified everytime it's moved in
> unwind_next_frame_reliable()?
Good spotting. It should have been verified everytime. It got lost during
rebasing onto the new unwinding framework.
> > + if (!task)
> > + goto error;
> > +
> > + /*
> > + * The unwinding should not start on nodat_stack, async_stack or
> > + * restart_stack. The task is either current or must be inactive.
> > + */
> > + if (!in_task_stack(sp, task, info))
> > + goto error;
> > +
> > + return 0;
> > +error:
> > + info->type = STACK_TYPE_UNKNOWN;
> > + return -EINVAL;
> > +}
> > +#endif
> > +
> > void show_stack(struct task_struct *task, unsigned long *stack)
> > {
> > struct unwind_state state;
> > diff --git a/arch/s390/kernel/stacktrace.c b/arch/s390/kernel/stacktrace.c
> > index f6a620f854e1..7d774a325163 100644
> > --- a/arch/s390/kernel/stacktrace.c
> > +++ b/arch/s390/kernel/stacktrace.c
> > @@ -13,6 +13,7 @@
> > #include <linux/export.h>
> > #include <asm/stacktrace.h>
> > #include <asm/unwind.h>
> > +#include <asm/kprobes.h>
> >
> > void save_stack_trace(struct stack_trace *trace)
> > {
> > @@ -60,3 +61,80 @@ void save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace)
> > }
> > }
> > EXPORT_SYMBOL_GPL(save_stack_trace_regs);
> > +
> > +#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
> > +/*
> > + * This function returns an error if it detects any unreliable features of the
> > + * stack. Otherwise it guarantees that the stack trace is reliable.
> > + *
> > + * If the task is not 'current', the caller *must* ensure the task is inactive.
> > + */
> > +static __always_inline int
> > +__save_stack_trace_tsk_reliable(struct task_struct *tsk,
> > + struct stack_trace *trace)
> > +{
> > + struct unwind_state state;
> > +
> > + for (unwind_start_reliable(&state, tsk);
> > + !unwind_done(&state) && !unwind_error(&state);
> > + unwind_next_frame_reliable(&state)) {
> > +
> > + if (!__kernel_text_address(state.ip))
> > + return -EINVAL;
> > +
> > +#ifdef CONFIG_KPROBES
> > + /*
> > + * Mark stacktraces with kretprobed functions on them
> > + * as unreliable.
> > + */
> > + if (state.ip == (unsigned long)kretprobe_trampoline)
> > + return -EINVAL;
> > +#endif
> > +
> > + if (trace->nr_entries >= trace->max_entries)
> > + return -E2BIG;
> > +
> > + if (!trace->skip)
> > + trace->entries[trace->nr_entries++] = state.ip;
> > + else
> > + trace->skip--;
> > + }
> > +
> > + /* Check for stack corruption */
> > + if (unwind_error(&state))
> > + return -EINVAL;
> > +
> > + /* Store kernel_thread_starter, null for swapper/0 */
> > + if (tsk->flags & (PF_KTHREAD | PF_IDLE)) {
> > + if (trace->nr_entries >= trace->max_entries)
> > + return -E2BIG;
> > +
> > + if (!trace->skip)
> > + trace->entries[trace->nr_entries++] =
> > + state.regs->psw.addr;
> > + else
> > + trace->skip--;
>
> An idea for a follow up patch: stuff this into a function like
> int save_trace_entry(struct stack_trace *trace, unsigned long entry);
> which could one day make the trace->entries[] code generic across arches.
Yes. I was thinking about it and then decided to postpone it a bit. Thomas
introduced more generic infrastructure with ARCH_STACKWALK. See x86
implementation. There is consume_entry() which is exactly what you are
proposing. So I thought it should be a part of a bigger rework in the
future.
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +int save_stack_trace_tsk_reliable(struct task_struct *tsk,
> > + struct stack_trace *trace)
> > +{
> > + int ret;
> > +
> > + /*
> > + * If the task doesn't have a stack (e.g., a zombie), the stack is
> > + * "reliably" empty.
> > + */
> > + if (!try_get_task_stack(tsk))
> > + return 0;
> > +
> > + ret = __save_stack_trace_tsk_reliable(tsk, trace);
> > +
> > + put_task_stack(tsk);
> > +
> > + return ret;
> > +}
> > +#endif
> > diff --git a/arch/s390/kernel/unwind_bc.c b/arch/s390/kernel/unwind_bc.c
> > index 3ce8a0808059..ada3a8538961 100644
> > --- a/arch/s390/kernel/unwind_bc.c
> > +++ b/arch/s390/kernel/unwind_bc.c
> > @@ -153,3 +153,96 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
> > state->reliable = reliable;
> > }
> > EXPORT_SYMBOL_GPL(__unwind_start);
> > +
> > +#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
> > +void __unwind_start_reliable(struct unwind_state *state,
> > + struct task_struct *task, unsigned long sp)
> > +{
> > + struct stack_info *info = &state->stack_info;
> > + struct stack_frame *sf;
> > + unsigned long ip;
> > +
> > + memset(state, 0, sizeof(*state));
> > + state->task = task;
> > +
> > + /* Get current stack pointer and initialize stack info */
> > + if (get_stack_info_reliable(sp, task, info) ||
> > + !on_stack(info, sp, sizeof(struct stack_frame))) {
> > + /* Something is wrong with the stack pointer */
> > + info->type = STACK_TYPE_UNKNOWN;
> > + state->error = true;
> > + return;
> > + }
> > +
> > + /* Get the instruction pointer from the stack frame */
> > + sf = (struct stack_frame *) sp;
> > + ip = READ_ONCE_NOCHECK(sf->gprs[8]);
> > +
> > +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > + /* Decode any ftrace redirection */
> > + if (ip == (unsigned long) return_to_handler)
> > + ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
> > + ip, NULL);
> ^^^^
> double checking: we ignore the retp here and not in the next-frame case?
Frankly, I copy-pasted this from non-reliable versions and checked that
powerpc ignored it as well. I'll double check.
It also calls for another cleanup. #ifdef seems to be superfluous and
checking ip for return_to_handler too (because it is done in
ftrace_graph_ret_addr() itself).
> > +#endif
> > +
> > + /* Update unwind state */
> > + state->sp = sp;
> > + state->ip = ip;
> > +}
> > +
> > +bool unwind_next_frame_reliable(struct unwind_state *state)
> > +{
> > + struct stack_info *info = &state->stack_info;
> > + struct stack_frame *sf;
> > + struct pt_regs *regs;
> > + unsigned long sp, ip;
> > +
> > + sf = (struct stack_frame *) state->sp;
> > + sp = READ_ONCE_NOCHECK(sf->back_chain);
> > + /*
> > + * Idle tasks are special. The final back-chain points to nodat_stack.
> > + * See CALL_ON_STACK() in smp_start_secondary() callback used in
> > + * __cpu_up(). We just accept it, go to else branch and look for
> > + * pt_regs.
> > + */
> > + if (likely(sp && !(is_idle_task(state->task) &&
> > + outside_of_stack(state, sp)))) {
> > + /* Non-zero back-chain points to the previous frame */
> > + if (unlikely(outside_of_stack(state, sp)))
> > + goto out_err;
> > +
> > + sf = (struct stack_frame *) sp;
> > + ip = READ_ONCE_NOCHECK(sf->gprs[8]);
> > + } else {
> > + /* No back-chain, look for a pt_regs structure */
> > + sp = state->sp + STACK_FRAME_OVERHEAD;
> > + regs = (struct pt_regs *) sp;
> > + if ((unsigned long)regs != info->end - sizeof(struct pt_regs))
> > + goto out_err;
> > + if (!(state->task->flags & (PF_KTHREAD | PF_IDLE)) &&
> > + !user_mode(regs))
> > + goto out_err;
> > +
> > + state->regs = regs;
> > + goto out_stop;
> > + }
> > +
> > +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > + /* Decode any ftrace redirection */
> > + if (ip == (unsigned long) return_to_handler)
> > + ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
> > + ip, (void *) sp);
> > +#endif
> > +
> > + /* Update unwind state */
> > + state->sp = sp;
> > + state->ip = ip;
>
> minor nit: maybe the CONFIG_FUNCTION_GRAPH_TRACER and "Update unwind
> state" logic could be combined into a function? (Not a big deal either
> way.)
I think it is better to open code it here, but it is a matter of taste for
sure.
> > + return true;
> > +
> > +out_err:
> > + state->error = true;
> > +out_stop:
> > + state->stack_info.type = STACK_TYPE_UNKNOWN;
> > + return false;
> > +}
> > +#endif
> > --
> > 2.22.0
> >
>
> I've tested the patch with positive results, however I didn't stress it
> very hard (basically only selftests). The code logic seems
> straightforward and correct by inspection.
>
> On a related note, do you think it would be feasible to extend (in
> another patchset) the reliable stack unwinding code a bit so that we
> could feed it pre-baked stacks ... then we could verify that the code
> was finding interesting scenarios. That was a passing thought I had
> back when Nicolai and I were debugging the ppc64le exception frame
> marker bug, but didn't think it worth the time/effort at the time.
That is an interesting thought. It would help the testing a lot. I will
make a note in my todo list.
> One more note: Using READ_ONCE_NOCHECK is probably correct here, but
> s390 happens to define a READ_ONCE_TASK_STACK macro which calls
> READ_ONCE_NOCHECK when task != current. According to the code comments,
> this "disables KASAN checking when reading a value from another task's
> stack". Is there any scenario here where we would want to use the that
> wrapper macro?
s/READ_ONCE_TASK_STACK/READ_ONCE_NOCHECK/ was a last minute change. s390
does not define it anymore. See 20955746320e ("s390/kasan: avoid false
positives during stack unwind") and da1776733617 ("s390/unwind: cleanup
unused READ_ONCE_TASK_STACK").
Thanks for the review and testing!
Miroslav
On Wed, Jul 17, 2019 at 01:01:27PM +0200, Miroslav Benes wrote:
> > On a related note, do you think it would be feasible to extend (in
> > another patchset) the reliable stack unwinding code a bit so that we
> > could feed it pre-baked stacks ... then we could verify that the code
> > was finding interesting scenarios. That was a passing thought I had
> > back when Nicolai and I were debugging the ppc64le exception frame
> > marker bug, but didn't think it worth the time/effort at the time.
>
> That is an interesting thought. It would help the testing a lot. I will
> make a note in my todo list.
Another idea I had for reliable unwinder testing: add a
CONFIG_RELIABLE_STACKTRACE_DEBUG option which does a periodic stack
trace and warns if it doesn't reach the end. It could triggered from a
periodic NMI, or from schedule().
--
Josh
On Wed, Jul 10, 2019 at 12:59:18PM +0200, Miroslav Benes wrote:
> The livepatch consistency model requires reliable stack tracing
> architecture support in order to work properly. In order to achieve
> this, two main issues have to be solved. First, reliable and consistent
> call chain backtracing has to be ensured. Second, the unwinder needs to
> be able to detect stack corruptions and return errors.
>
> The "zSeries ELF Application Binary Interface Supplement" says:
>
> "The stack pointer points to the first word of the lowest allocated
> stack frame. If the "back chain" is implemented this word will point to
> the previously allocated stack frame (towards higher addresses), except
> for the first stack frame, which shall have a back chain of zero (NULL).
> The stack shall grow downwards, in other words towards lower addresses."
>
> "back chain" is optional. GCC option -mbackchain enables it. Quoting
> Martin Schwidefsky [1]:
This reference footnote seems to be missing at the bottom of the patch
description.
> diff --git a/arch/s390/include/asm/unwind.h b/arch/s390/include/asm/unwind.h
> index d827b5b9a32c..1cc96c54169c 100644
> --- a/arch/s390/include/asm/unwind.h
> +++ b/arch/s390/include/asm/unwind.h
> @@ -45,6 +45,25 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
> bool unwind_next_frame(struct unwind_state *state);
> unsigned long unwind_get_return_address(struct unwind_state *state);
>
> +#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
> +void __unwind_start_reliable(struct unwind_state *state,
> + struct task_struct *task, unsigned long sp);
> +bool unwind_next_frame_reliable(struct unwind_state *state);
> +
> +static inline void unwind_start_reliable(struct unwind_state *state,
> + struct task_struct *task)
> +{
> + unsigned long sp;
> +
> + if (task == current)
> + sp = current_stack_pointer();
> + else
> + sp = task->thread.ksp;
> +
> + __unwind_start_reliable(state, task, sp);
> +}
> +#endif
> +
(Ah, cool, I didn't realize s390 ported the x86 unwind interfaces. We
should look at unifying them someday.)
Why do you need _reliable() variants of the unwind interfaces? Can the
error checking be integrated into unwind_start() and unwind_next_frame()
like they are on x86?
> +#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
> +void __unwind_start_reliable(struct unwind_state *state,
> + struct task_struct *task, unsigned long sp)
> +{
> + struct stack_info *info = &state->stack_info;
> + struct stack_frame *sf;
> + unsigned long ip;
> +
> + memset(state, 0, sizeof(*state));
> + state->task = task;
> +
> + /* Get current stack pointer and initialize stack info */
> + if (get_stack_info_reliable(sp, task, info) ||
> + !on_stack(info, sp, sizeof(struct stack_frame))) {
> + /* Something is wrong with the stack pointer */
> + info->type = STACK_TYPE_UNKNOWN;
> + state->error = true;
> + return;
> + }
> +
> + /* Get the instruction pointer from the stack frame */
> + sf = (struct stack_frame *) sp;
> + ip = READ_ONCE_NOCHECK(sf->gprs[8]);
> +
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> + /* Decode any ftrace redirection */
> + if (ip == (unsigned long) return_to_handler)
> + ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
> + ip, NULL);
> +#endif
The return_to_handler and ifdef checks aren't needed. Those are done
already by the call.
Also it seems a bit odd that the kretprobes check isn't done in this
function next to the ftrace check.
> +
> + /* Update unwind state */
> + state->sp = sp;
> + state->ip = ip;
> +}
> +
> +bool unwind_next_frame_reliable(struct unwind_state *state)
> +{
> + struct stack_info *info = &state->stack_info;
> + struct stack_frame *sf;
> + struct pt_regs *regs;
> + unsigned long sp, ip;
> +
> + sf = (struct stack_frame *) state->sp;
> + sp = READ_ONCE_NOCHECK(sf->back_chain);
> + /*
> + * Idle tasks are special. The final back-chain points to nodat_stack.
> + * See CALL_ON_STACK() in smp_start_secondary() callback used in
> + * __cpu_up(). We just accept it, go to else branch and look for
> + * pt_regs.
> + */
> + if (likely(sp && !(is_idle_task(state->task) &&
> + outside_of_stack(state, sp)))) {
> + /* Non-zero back-chain points to the previous frame */
> + if (unlikely(outside_of_stack(state, sp)))
> + goto out_err;
> +
> + sf = (struct stack_frame *) sp;
> + ip = READ_ONCE_NOCHECK(sf->gprs[8]);
> + } else {
> + /* No back-chain, look for a pt_regs structure */
> + sp = state->sp + STACK_FRAME_OVERHEAD;
> + regs = (struct pt_regs *) sp;
> + if ((unsigned long)regs != info->end - sizeof(struct pt_regs))
> + goto out_err;
> + if (!(state->task->flags & (PF_KTHREAD | PF_IDLE)) &&
> + !user_mode(regs))
> + goto out_err;
> +
> + state->regs = regs;
> + goto out_stop;
> + }
> +
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> + /* Decode any ftrace redirection */
> + if (ip == (unsigned long) return_to_handler)
> + ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
> + ip, (void *) sp);
> +#endif
> +
> + /* Update unwind state */
> + state->sp = sp;
> + state->ip = ip;
> + return true;
> +
> +out_err:
> + state->error = true;
> +out_stop:
> + state->stack_info.type = STACK_TYPE_UNKNOWN;
> + return false;
> +}
> +#endif
For the _reliable() variants of the unwind interfaces, there's a lot of
code duplication with the non-reliable variants. It looks like it would
be a lot cleaner (and easier to follow) if they were integrated.
Overall it's looking good though.
--
Josh
On Sun, 28 Jul 2019, Josh Poimboeuf wrote:
> On Wed, Jul 17, 2019 at 01:01:27PM +0200, Miroslav Benes wrote:
> > > On a related note, do you think it would be feasible to extend (in
> > > another patchset) the reliable stack unwinding code a bit so that we
> > > could feed it pre-baked stacks ... then we could verify that the code
> > > was finding interesting scenarios. That was a passing thought I had
> > > back when Nicolai and I were debugging the ppc64le exception frame
> > > marker bug, but didn't think it worth the time/effort at the time.
> >
> > That is an interesting thought. It would help the testing a lot. I will
> > make a note in my todo list.
>
> Another idea I had for reliable unwinder testing: add a
> CONFIG_RELIABLE_STACKTRACE_DEBUG option which does a periodic stack
> trace and warns if it doesn't reach the end. It could triggered from a
> periodic NMI, or from schedule().
Noted as well.
Miroslav
> > diff --git a/arch/s390/include/asm/unwind.h b/arch/s390/include/asm/unwind.h
> > index d827b5b9a32c..1cc96c54169c 100644
> > --- a/arch/s390/include/asm/unwind.h
> > +++ b/arch/s390/include/asm/unwind.h
> > @@ -45,6 +45,25 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
> > bool unwind_next_frame(struct unwind_state *state);
> > unsigned long unwind_get_return_address(struct unwind_state *state);
> >
> > +#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
> > +void __unwind_start_reliable(struct unwind_state *state,
> > + struct task_struct *task, unsigned long sp);
> > +bool unwind_next_frame_reliable(struct unwind_state *state);
> > +
> > +static inline void unwind_start_reliable(struct unwind_state *state,
> > + struct task_struct *task)
> > +{
> > + unsigned long sp;
> > +
> > + if (task == current)
> > + sp = current_stack_pointer();
> > + else
> > + sp = task->thread.ksp;
> > +
> > + __unwind_start_reliable(state, task, sp);
> > +}
> > +#endif
> > +
>
> (Ah, cool, I didn't realize s390 ported the x86 unwind interfaces. We
> should look at unifying them someday.)
Yes, it is quite recent change.
> Why do you need _reliable() variants of the unwind interfaces? Can the
> error checking be integrated into unwind_start() and unwind_next_frame()
> like they are on x86?
Good question. I rebased the patch a lot of times and it was much easier
in the end just to separate the original and reliable infrastructure. Not
the best for upstream inclusion though.
unwind_start_reliable() is basically the same as the original.
get_stack_info_reliable() is the main difference. It is much simpler in
our case. I wanted to avoid a new parameter or a callback, but let me
think about it again.
unwind_next_frame_reliable() is again a lot simpler than the original one,
because we know that the unwinding happens only on a task stack. I'll
think about inclusion to the unwind_next_frame() though. The code
duplication is not nice.
> > +#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
> > +void __unwind_start_reliable(struct unwind_state *state,
> > + struct task_struct *task, unsigned long sp)
> > +{
> > + struct stack_info *info = &state->stack_info;
> > + struct stack_frame *sf;
> > + unsigned long ip;
> > +
> > + memset(state, 0, sizeof(*state));
> > + state->task = task;
> > +
> > + /* Get current stack pointer and initialize stack info */
> > + if (get_stack_info_reliable(sp, task, info) ||
> > + !on_stack(info, sp, sizeof(struct stack_frame))) {
> > + /* Something is wrong with the stack pointer */
> > + info->type = STACK_TYPE_UNKNOWN;
> > + state->error = true;
> > + return;
> > + }
> > +
> > + /* Get the instruction pointer from the stack frame */
> > + sf = (struct stack_frame *) sp;
> > + ip = READ_ONCE_NOCHECK(sf->gprs[8]);
> > +
> > +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > + /* Decode any ftrace redirection */
> > + if (ip == (unsigned long) return_to_handler)
> > + ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
> > + ip, NULL);
> > +#endif
>
> The return_to_handler and ifdef checks aren't needed. Those are done
> already by the call.
Correct. I realized it when Joe asked about the hunk.
> Also it seems a bit odd that the kretprobes check isn't done in this
> function next to the ftrace check.
Ah, yes.
> > +
> > + /* Update unwind state */
> > + state->sp = sp;
> > + state->ip = ip;
> > +}
> > +
> > +bool unwind_next_frame_reliable(struct unwind_state *state)
> > +{
> > + struct stack_info *info = &state->stack_info;
> > + struct stack_frame *sf;
> > + struct pt_regs *regs;
> > + unsigned long sp, ip;
> > +
> > + sf = (struct stack_frame *) state->sp;
> > + sp = READ_ONCE_NOCHECK(sf->back_chain);
> > + /*
> > + * Idle tasks are special. The final back-chain points to nodat_stack.
> > + * See CALL_ON_STACK() in smp_start_secondary() callback used in
> > + * __cpu_up(). We just accept it, go to else branch and look for
> > + * pt_regs.
> > + */
> > + if (likely(sp && !(is_idle_task(state->task) &&
> > + outside_of_stack(state, sp)))) {
> > + /* Non-zero back-chain points to the previous frame */
> > + if (unlikely(outside_of_stack(state, sp)))
> > + goto out_err;
> > +
> > + sf = (struct stack_frame *) sp;
> > + ip = READ_ONCE_NOCHECK(sf->gprs[8]);
> > + } else {
> > + /* No back-chain, look for a pt_regs structure */
> > + sp = state->sp + STACK_FRAME_OVERHEAD;
> > + regs = (struct pt_regs *) sp;
> > + if ((unsigned long)regs != info->end - sizeof(struct pt_regs))
> > + goto out_err;
> > + if (!(state->task->flags & (PF_KTHREAD | PF_IDLE)) &&
> > + !user_mode(regs))
> > + goto out_err;
> > +
> > + state->regs = regs;
> > + goto out_stop;
> > + }
> > +
> > +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > + /* Decode any ftrace redirection */
> > + if (ip == (unsigned long) return_to_handler)
> > + ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
> > + ip, (void *) sp);
> > +#endif
> > +
> > + /* Update unwind state */
> > + state->sp = sp;
> > + state->ip = ip;
> > + return true;
> > +
> > +out_err:
> > + state->error = true;
> > +out_stop:
> > + state->stack_info.type = STACK_TYPE_UNKNOWN;
> > + return false;
> > +}
> > +#endif
>
> For the _reliable() variants of the unwind interfaces, there's a lot of
> code duplication with the non-reliable variants. It looks like it would
> be a lot cleaner (and easier to follow) if they were integrated.
True.
> Overall it's looking good though.
Great. Now let me try to make it nicer.
Thanks for the review.
Miroslav