2019-11-06 09:57:09

by Miroslav Benes

[permalink] [raw]
Subject: [PATCH v3 4/4] s390/livepatch: Implement reliable stack tracing for the consistency model

The livepatch consistency model requires reliable stack tracing
architecture support in order to work properly. In order to achieve
this, two main issues have to be solved. First, reliable and consistent
call chain backtracing has to be ensured. Second, the unwinder needs to
be able to detect stack corruptions and return errors.

The "zSeries ELF Application Binary Interface Supplement" says:

"The stack pointer points to the first word of the lowest allocated
stack frame. If the "back chain" is implemented this word will point to
the previously allocated stack frame (towards higher addresses), except
for the first stack frame, which shall have a back chain of zero (NULL).
The stack shall grow downwards, in other words towards lower addresses."

"back chain" is optional. GCC option -mbackchain enables it. Quoting
Martin Schwidefsky [1]:

"The compiler is called with the -mbackchain option, all normal C
function will store the backchain in the function prologue. All
functions written in assembler code should do the same, if you find one
that does not we should fix that. The end result is that a task that
*voluntarily* called schedule() should have a proper backchain at all
times.

Dependent on the use case this may or may not be enough. Asynchronous
interrupts may stop the CPU at the beginning of a function, if kernel
preemption is enabled we can end up with a broken backchain. The
production kernels for IBM Z are all compiled *without* kernel
preemption. So yes, we might get away without the objtool support.

On a side-note, we do have a line item to implement the ORC unwinder for
the kernel, that includes the objtool support. Once we have that we can
drop the -mbackchain option for the kernel build. That gives us a nice
little performance benefit. I hope that the change from backchain to the
ORC unwinder will not be too hard to implement in the livepatch tools."

Since -mbackchain is enabled by default when the kernel is compiled, the
call chain backtracing should be currently ensured and objtool should
not be necessary for livepatch purposes.

Regarding the second issue, stack corruptions and non-reliable states
have to be recognized by the unwinder. Mainly it means to detect
preemption or page faults, the end of the task stack must be reached,
return addresses must be valid text addresses and hacks like function
graph tracing and kretprobes must be properly detected.

Unwinding a running task's stack is not a problem, because there is a
livepatch requirement that every checked task is blocked, except for the
current task. Due to that, the implementation can be much simpler
compared to the existing non-reliable infrastructure. We can consider a
task's kernel/thread stack only and skip the other stacks.

Idle tasks are a bit special. Their final back chains point to no_dat
stacks. See for reference CALL_ON_STACK() in smp_start_secondary()
callback used in __cpu_up(). The unwinding is stopped there and it is
not considered to be a stack corruption.

[1] 20180912121106.31ffa97c@mschwideX1 [not archived on lore.kernel.org]

Signed-off-by: Miroslav Benes <[email protected]>
---
arch/s390/Kconfig | 1 +
arch/s390/kernel/dumpstack.c | 11 +++++++
arch/s390/kernel/stacktrace.c | 46 ++++++++++++++++++++++++++
arch/s390/kernel/unwind_bc.c | 61 +++++++++++++++++++++++++++--------
4 files changed, 106 insertions(+), 13 deletions(-)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 43a81d0ad507..9cfec1000abd 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -170,6 +170,7 @@ config S390
select HAVE_PERF_EVENTS
select HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
+ select HAVE_RELIABLE_STACKTRACE
select HAVE_RSEQ
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_VIRT_CPU_ACCOUNTING
diff --git a/arch/s390/kernel/dumpstack.c b/arch/s390/kernel/dumpstack.c
index 1ee19e6336cd..0081e75b957c 100644
--- a/arch/s390/kernel/dumpstack.c
+++ b/arch/s390/kernel/dumpstack.c
@@ -94,12 +94,23 @@ int get_stack_info(unsigned long sp, struct task_struct *task,
if (!sp)
goto unknown;

+ /* Sanity check: ABI requires SP to be aligned 8 bytes. */
+ if (sp & 0x7)
+ goto unknown;
+
task = task ? : current;

/* Check per-task stack */
if (in_task_stack(sp, task, info))
goto recursion_check;

+ /*
+ * The reliable unwinding should not start on nodat_stack, async_stack
+ * or restart_stack. The task is either current or must be inactive.
+ */
+ if (unwind_reliable)
+ goto unknown;
+
if (task != current)
goto unknown;

diff --git a/arch/s390/kernel/stacktrace.c b/arch/s390/kernel/stacktrace.c
index 751c136172f7..c5e3a37763f7 100644
--- a/arch/s390/kernel/stacktrace.c
+++ b/arch/s390/kernel/stacktrace.c
@@ -9,6 +9,7 @@
#include <linux/stacktrace.h>
#include <asm/stacktrace.h>
#include <asm/unwind.h>
+#include <asm/kprobes.h>

void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
struct task_struct *task, struct pt_regs *regs)
@@ -22,3 +23,48 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
break;
}
}
+
+/*
+ * This function returns an error if it detects any unreliable features of the
+ * stack. Otherwise it guarantees that the stack trace is reliable.
+ *
+ * If the task is not 'current', the caller *must* ensure the task is inactive.
+ */
+int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
+ void *cookie, struct task_struct *task)
+{
+ struct unwind_state state;
+ unsigned long addr;
+
+ for (unwind_start(&state, task, NULL, 0, true);
+ !unwind_done(&state) && !unwind_error(&state);
+ unwind_next_frame(&state, true)) {
+
+ addr = unwind_get_return_address(&state);
+ if (!addr)
+ return -EINVAL;
+
+#ifdef CONFIG_KPROBES
+ /*
+ * Mark stacktraces with kretprobed functions on them
+ * as unreliable.
+ */
+ if (state.ip == (unsigned long)kretprobe_trampoline)
+ return -EINVAL;
+#endif
+
+ if (!consume_entry(cookie, addr, false))
+ return -EINVAL;
+ }
+
+ /* Check for stack corruption */
+ if (unwind_error(&state))
+ return -EINVAL;
+
+ /* Store kernel_thread_starter, null for swapper/0 */
+ if ((task->flags & (PF_KTHREAD | PF_IDLE)) &&
+ !consume_entry(cookie, state.regs->psw.addr, false))
+ return -EINVAL;
+
+ return 0;
+}
diff --git a/arch/s390/kernel/unwind_bc.c b/arch/s390/kernel/unwind_bc.c
index 092626f4e7c6..9b6bc7539a9e 100644
--- a/arch/s390/kernel/unwind_bc.c
+++ b/arch/s390/kernel/unwind_bc.c
@@ -40,6 +40,10 @@ static bool unwind_update_state(struct unwind_state *state,
unsigned long sp, unsigned long ip,
struct pt_regs *regs, bool reliable)
{
+ /* Sanity check: ABI requires SP to be aligned 8 bytes. */
+ if (sp & 0x7)
+ goto out_err;
+
ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
ip, (void *) sp);

@@ -49,6 +53,11 @@ static bool unwind_update_state(struct unwind_state *state,
state->regs = regs;
state->reliable = reliable;
return true;
+
+out_err:
+ state->error = true;
+ state->stack_info.type = STACK_TYPE_UNKNOWN;
+ return false;
}

static bool unwind_use_regs(struct unwind_state *state)
@@ -85,10 +94,13 @@ static bool unwind_use_frame(struct unwind_state *state, unsigned long sp,
struct stack_frame *sf;
unsigned long ip;

- if (unlikely(outside_of_stack(state, sp))) {
- if (!update_stack_info(state, sp))
- goto out_err;
- }
+ /*
+ * No need to update stack info when unwind_reliable is true. We should
+ * be on a task stack and everything else is an error.
+ */
+ if (unlikely(outside_of_stack(state, sp)) &&
+ (unwind_reliable || !update_stack_info(state, sp)))
+ goto out_err;

sf = (struct stack_frame *) sp;
ip = READ_ONCE_NOCHECK(sf->gprs[8]);
@@ -109,17 +121,31 @@ static bool unwind_look_for_regs(struct unwind_state *state,
unsigned long sp, ip;

sp = state->sp + STACK_FRAME_OVERHEAD;
- if (!on_stack(info, sp, sizeof(struct pt_regs)))
- goto out_stop;
-
regs = (struct pt_regs *) sp;
- if (READ_ONCE_NOCHECK(regs->psw.mask) & PSW_MASK_PSTATE)
- goto out_stop;

- ip = READ_ONCE_NOCHECK(regs->psw.addr);
+ if (!unwind_reliable) {
+ if (!on_stack(info, sp, sizeof(struct pt_regs)))
+ goto out_stop;

- return unwind_update_state(state, sp, ip, regs, true);
+ if (READ_ONCE_NOCHECK(regs->psw.mask) & PSW_MASK_PSTATE)
+ goto out_stop;

+ ip = READ_ONCE_NOCHECK(regs->psw.addr);
+ return unwind_update_state(state, sp, ip, regs, true);
+ }
+
+ /* Unwind reliable */
+ if ((unsigned long)regs != info->end - sizeof(struct pt_regs))
+ goto out_err;
+
+ if (!(state->task->flags & (PF_KTHREAD | PF_IDLE)) && !user_mode(regs))
+ goto out_err;
+
+ state->regs = regs;
+ goto out_stop;
+
+out_err:
+ state->error = true;
out_stop:
state->stack_info.type = STACK_TYPE_UNKNOWN;
return false;
@@ -136,8 +162,17 @@ bool unwind_next_frame(struct unwind_state *state, bool unwind_reliable)
sf = (struct stack_frame *) state->sp;
sp = READ_ONCE_NOCHECK(sf->back_chain);

- /* Non-zero back-chain points to the previous frame */
- if (likely(sp))
+ /*
+ * Non-zero back-chain points to the previous frame
+ *
+ * unwind_reliable case: Idle tasks are special. The final
+ * back-chain points to nodat_stack. See CALL_ON_STACK() in
+ * smp_start_secondary() callback used in __cpu_up(). We just
+ * accept it and look for pt_regs.
+ */
+ if (likely(sp) &&
+ (!unwind_reliable || !(is_idle_task(state->task) &&
+ outside_of_stack(state, sp))))
return unwind_use_frame(state, sp, unwind_reliable);

/* No back-chain, look for a pt_regs structure */
--
2.23.0


2019-11-29 07:43:22

by Vasily Gorbik

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] s390/livepatch: Implement reliable stack tracing for the consistency model

On Wed, Nov 06, 2019 at 10:56:01AM +0100, Miroslav Benes wrote:
> The livepatch consistency model requires reliable stack tracing
> architecture support in order to work properly. In order to achieve
> this, two main issues have to be solved. First, reliable and consistent
> call chain backtracing has to be ensured. Second, the unwinder needs to
> be able to detect stack corruptions and return errors.

I tried to address that in a patch series I pushed here:
https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/log/?h=livepatch

It partially includes your changes from this patch which have been split
in 2 patches:
s390/unwind: add stack pointer alignment sanity checks
s390/livepatch: Implement reliable stack tracing for the consistency model

The primary goal was to make our backchain unwinder reliable itself. And
suitable for livepatching as is (we extra checks at the caller, see
below). Besides unwinder changes few things have been improved to avoid
special handling during unwinding.
- all tasks now have pt_regs at the bottom of task stack.
- final backchain always contains 0, no further references to no_dat stack.
- successful unwinding means reaching pt_regs at the bottom of task stack,
and unwinder guarantees that unwind_error() reflects that.
- final pt_regs at the bottom of task stack is never included in unwinding
results. It never was for user tasks. And kernel tasks cannot return to
that state anyway (and in some cases pt_regs are empty).
- unwinder starts unwinding from a reliable state (not "sp" passed as
an argument).
There is also s390 specific unwinder testing module.

> Idle tasks are a bit special. Their final back chains point to no_dat
> stacks. See for reference CALL_ON_STACK() in smp_start_secondary()
> callback used in __cpu_up(). The unwinding is stopped there and it is
> not considered to be a stack corruption.
I changed that with commit:
s390: avoid misusing CALL_ON_STACK for task stack setup
Now idle tasks are not special, final back chain contains 0.

> ---
> arch/s390/Kconfig | 1 +
> arch/s390/kernel/dumpstack.c | 11 +++++++
> arch/s390/kernel/stacktrace.c | 46 ++++++++++++++++++++++++++
> arch/s390/kernel/unwind_bc.c | 61 +++++++++++++++++++++++++++--------
> 4 files changed, 106 insertions(+), 13 deletions(-)
>
> --- a/arch/s390/kernel/dumpstack.c
> +++ b/arch/s390/kernel/dumpstack.c
> @@ -94,12 +94,23 @@ int get_stack_info(unsigned long sp, struct task_struct *task,
> if (!sp)
> goto unknown;
>
> + /* Sanity check: ABI requires SP to be aligned 8 bytes. */
> + if (sp & 0x7)
> + goto unknown;
> +
This has been split in a separate commit:
s390/unwind: add stack pointer alignment sanity checks

> + /*
> + * The reliable unwinding should not start on nodat_stack, async_stack
> + * or restart_stack. The task is either current or must be inactive.
> + */
> + if (unwind_reliable)
> + goto unknown;
I moved this check in arch_stack_walk_reliable itself. See below.

> static bool unwind_use_regs(struct unwind_state *state)
> @@ -85,10 +94,13 @@ static bool unwind_use_frame(struct unwind_state *state, unsigned long sp,
> struct stack_frame *sf;
> unsigned long ip;
>
> - if (unlikely(outside_of_stack(state, sp))) {
> - if (!update_stack_info(state, sp))
> - goto out_err;
> - }
> + /*
> + * No need to update stack info when unwind_reliable is true. We should
> + * be on a task stack and everything else is an error.
> + */
> + if (unlikely(outside_of_stack(state, sp)) &&
> + (unwind_reliable || !update_stack_info(state, sp)))
> + goto out_err;
I moved this check in arch_stack_walk_reliable itself. See below.

> + /* Unwind reliable */
> + if ((unsigned long)regs != info->end - sizeof(struct pt_regs))
> + goto out_err;
I moved this check in arch_stack_walk_reliable itself. See below.


> @@ -136,8 +162,17 @@ bool unwind_next_frame(struct unwind_state *state, bool unwind_reliable)
> sf = (struct stack_frame *) state->sp;
> sp = READ_ONCE_NOCHECK(sf->back_chain);
>
> - /* Non-zero back-chain points to the previous frame */
> - if (likely(sp))
> + /*
> + * Non-zero back-chain points to the previous frame
> + *
> + * unwind_reliable case: Idle tasks are special. The final
> + * back-chain points to nodat_stack. See CALL_ON_STACK() in
> + * smp_start_secondary() callback used in __cpu_up(). We just
> + * accept it and look for pt_regs.
> + */
> + if (likely(sp) &&
> + (!unwind_reliable || !(is_idle_task(state->task) &&
> + outside_of_stack(state, sp))))
> return unwind_use_frame(state, sp, unwind_reliable);
This is not needed anymore.

In the end this all boils down to arch_stack_walk_reliable
implementation. I made the following changes compaired to your version:
---
- use plain unwind_for_each_frame
- "state.stack_info.type != STACK_TYPE_TASK" guarantees that we are
not leaving task stack.
- "if (state.regs)" guarantees that we have not met an program interrupt
pt_regs (page faults) or preempted. Corresponds to yours:
> + if ((unsigned long)regs != info->end - sizeof(struct pt_regs))
> + goto out_err;
- I don't see a point in storing "kernel_thread_starter", am I missing
something?

diff --git a/arch/s390/kernel/stacktrace.c b/arch/s390/kernel/stacktrace.c
index 79f323388e4d..fc5419ac64c8 100644
--- a/arch/s390/kernel/stacktrace.c
+++ b/arch/s390/kernel/stacktrace.c
@@ -36,9 +36,12 @@ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
struct unwind_state state;
unsigned long addr;

- for (unwind_start(&state, task, NULL, 0, true);
- !unwind_done(&state) && !unwind_error(&state);
- unwind_next_frame(&state, true)) {
+ unwind_for_each_frame(&state, task, NULL, 0) {
+ if (state.stack_info.type != STACK_TYPE_TASK)
+ return -EINVAL;
+
+ if (state.regs)
+ return -EINVAL;

addr = unwind_get_return_address(&state);
if (!addr)
@@ -60,11 +63,5 @@ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
/* Check for stack corruption */
if (unwind_error(&state))
return -EINVAL;
-
- /* Store kernel_thread_starter, null for swapper/0 */
- if ((task->flags & (PF_KTHREAD | PF_IDLE)) &&
- !consume_entry(cookie, state.regs->psw.addr, false))
- return -EINVAL;
-
return 0;
}
--
I'd appreciate if you could give those changes a spin. And check if
something is missing or broken. Or share your livepatching testing
procedure. If everything goes well, I might include these changes in
second pull request for this 5.5 merge window.

I did successfully run ./testing/selftests/livepatch/test-livepatch.sh

https://github.com/lpechacek/qa_test_klp seems outdated. I was able to
fix and run some tests of it but haven't had time to figure out all of
them. Is there a version that would run on top of current Linus tree?

Since I changed your last patch and split it in 2, could you please give
me your Signed-off-by for those 2 commits?

2019-11-29 07:43:25

by Vasily Gorbik

[permalink] [raw]
Subject: [PATCH v4 1/2] s390/unwind: add stack pointer alignment sanity checks

From: Miroslav Benes <[email protected]>

ABI requires SP to be aligned 8 bytes, report unwinding error otherwise.

Signed-off-by: Vasily Gorbik <[email protected]>
---
arch/s390/kernel/dumpstack.c | 4 ++++
arch/s390/kernel/unwind_bc.c | 4 ++++
2 files changed, 8 insertions(+)

diff --git a/arch/s390/kernel/dumpstack.c b/arch/s390/kernel/dumpstack.c
index d74e21a23703..d306fe04489a 100644
--- a/arch/s390/kernel/dumpstack.c
+++ b/arch/s390/kernel/dumpstack.c
@@ -94,6 +94,10 @@ int get_stack_info(unsigned long sp, struct task_struct *task,
if (!sp)
goto unknown;

+ /* Sanity check: ABI requires SP to be aligned 8 bytes. */
+ if (sp & 0x7)
+ goto unknown;
+
/* Check per-task stack */
if (in_task_stack(sp, task, info))
goto recursion_check;
diff --git a/arch/s390/kernel/unwind_bc.c b/arch/s390/kernel/unwind_bc.c
index ef42d5f77ce7..da2d4d4c5b0e 100644
--- a/arch/s390/kernel/unwind_bc.c
+++ b/arch/s390/kernel/unwind_bc.c
@@ -92,6 +92,10 @@ bool unwind_next_frame(struct unwind_state *state)
}
}

+ /* Sanity check: ABI requires SP to be aligned 8 bytes. */
+ if (sp & 0x7)
+ goto out_err;
+
ip = ftrace_graph_ret_addr(state->task, &state->graph_idx, ip, (void *) sp);

/* Update unwind state */
--
2.21.0

2019-11-29 07:45:54

by Vasily Gorbik

[permalink] [raw]
Subject: [PATCH v4 2/2] s390/livepatch: Implement reliable stack tracing for the consistency model

From: Miroslav Benes <[email protected]>

The livepatch consistency model requires reliable stack tracing
architecture support in order to work properly. In order to achieve
this, two main issues have to be solved. First, reliable and consistent
call chain backtracing has to be ensured. Second, the unwinder needs to
be able to detect stack corruptions and return errors.

The "zSeries ELF Application Binary Interface Supplement" says:

"The stack pointer points to the first word of the lowest allocated
stack frame. If the "back chain" is implemented this word will point to
the previously allocated stack frame (towards higher addresses), except
for the first stack frame, which shall have a back chain of zero (NULL).
The stack shall grow downwards, in other words towards lower addresses."

"back chain" is optional. GCC option -mbackchain enables it. Quoting
Martin Schwidefsky [1]:

"The compiler is called with the -mbackchain option, all normal C
function will store the backchain in the function prologue. All
functions written in assembler code should do the same, if you find one
that does not we should fix that. The end result is that a task that
*voluntarily* called schedule() should have a proper backchain at all
times.

Dependent on the use case this may or may not be enough. Asynchronous
interrupts may stop the CPU at the beginning of a function, if kernel
preemption is enabled we can end up with a broken backchain. The
production kernels for IBM Z are all compiled *without* kernel
preemption. So yes, we might get away without the objtool support.

On a side-note, we do have a line item to implement the ORC unwinder for
the kernel, that includes the objtool support. Once we have that we can
drop the -mbackchain option for the kernel build. That gives us a nice
little performance benefit. I hope that the change from backchain to the
ORC unwinder will not be too hard to implement in the livepatch tools."

Since -mbackchain is enabled by default when the kernel is compiled, the
call chain backtracing should be currently ensured and objtool should
not be necessary for livepatch purposes.

Regarding the second issue, stack corruptions and non-reliable states
have to be recognized by the unwinder. Mainly it means to detect
preemption or page faults, the end of the task stack must be reached,
return addresses must be valid text addresses and hacks like function
graph tracing and kretprobes must be properly detected.

Unwinding a running task's stack is not a problem, because there is a
livepatch requirement that every checked task is blocked, except for the
current task. Due to that, we can consider a task's kernel/thread stack
only and skip the other stacks.

[1] 20180912121106.31ffa97c@mschwideX1 [not archived on lore.kernel.org]

Signed-off-by: Vasily Gorbik <[email protected]>
---
arch/s390/Kconfig | 1 +
arch/s390/kernel/stacktrace.c | 43 +++++++++++++++++++++++++++++++++++
2 files changed, 44 insertions(+)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 2528eb9d01fb..367a87c5d7b8 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -170,6 +170,7 @@ config S390
select HAVE_PERF_EVENTS
select HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
+ select HAVE_RELIABLE_STACKTRACE
select HAVE_RSEQ
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_VIRT_CPU_ACCOUNTING
diff --git a/arch/s390/kernel/stacktrace.c b/arch/s390/kernel/stacktrace.c
index f8fc4f8aef9b..fc5419ac64c8 100644
--- a/arch/s390/kernel/stacktrace.c
+++ b/arch/s390/kernel/stacktrace.c
@@ -9,6 +9,7 @@
#include <linux/stacktrace.h>
#include <asm/stacktrace.h>
#include <asm/unwind.h>
+#include <asm/kprobes.h>

void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
struct task_struct *task, struct pt_regs *regs)
@@ -22,3 +23,45 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
break;
}
}
+
+/*
+ * This function returns an error if it detects any unreliable features of the
+ * stack. Otherwise it guarantees that the stack trace is reliable.
+ *
+ * If the task is not 'current', the caller *must* ensure the task is inactive.
+ */
+int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
+ void *cookie, struct task_struct *task)
+{
+ struct unwind_state state;
+ unsigned long addr;
+
+ unwind_for_each_frame(&state, task, NULL, 0) {
+ if (state.stack_info.type != STACK_TYPE_TASK)
+ return -EINVAL;
+
+ if (state.regs)
+ return -EINVAL;
+
+ addr = unwind_get_return_address(&state);
+ if (!addr)
+ return -EINVAL;
+
+#ifdef CONFIG_KPROBES
+ /*
+ * Mark stacktraces with kretprobed functions on them
+ * as unreliable.
+ */
+ if (state.ip == (unsigned long)kretprobe_trampoline)
+ return -EINVAL;
+#endif
+
+ if (!consume_entry(cookie, addr, false))
+ return -EINVAL;
+ }
+
+ /* Check for stack corruption */
+ if (unwind_error(&state))
+ return -EINVAL;
+ return 0;
+}
--
2.21.0

2019-11-29 18:18:03

by Miroslav Benes

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] s390/livepatch: Implement reliable stack tracing for the consistency model

On Fri, 29 Nov 2019, Vasily Gorbik wrote:

> On Wed, Nov 06, 2019 at 10:56:01AM +0100, Miroslav Benes wrote:
> > The livepatch consistency model requires reliable stack tracing
> > architecture support in order to work properly. In order to achieve
> > this, two main issues have to be solved. First, reliable and consistent
> > call chain backtracing has to be ensured. Second, the unwinder needs to
> > be able to detect stack corruptions and return errors.
>
> I tried to address that in a patch series I pushed here:
> https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/log/?h=livepatch
>
> It partially includes your changes from this patch which have been split
> in 2 patches:
> s390/unwind: add stack pointer alignment sanity checks
> s390/livepatch: Implement reliable stack tracing for the consistency model
>
> The primary goal was to make our backchain unwinder reliable itself. And
> suitable for livepatching as is (we extra checks at the caller, see
> below). Besides unwinder changes few things have been improved to avoid
> special handling during unwinding.
> - all tasks now have pt_regs at the bottom of task stack.
> - final backchain always contains 0, no further references to no_dat stack.
> - successful unwinding means reaching pt_regs at the bottom of task stack,
> and unwinder guarantees that unwind_error() reflects that.
> - final pt_regs at the bottom of task stack is never included in unwinding
> results. It never was for user tasks. And kernel tasks cannot return to
> that state anyway (and in some cases pt_regs are empty).
> - unwinder starts unwinding from a reliable state (not "sp" passed as
> an argument).

Great. Thanks for doing that. It all looks good to me and the outcome is
definitely better. I obviously still had some misconceptions about the
code and it is much clearer now.

> There is also s390 specific unwinder testing module.
>
> > Idle tasks are a bit special. Their final back chains point to no_dat
> > stacks. See for reference CALL_ON_STACK() in smp_start_secondary()
> > callback used in __cpu_up(). The unwinding is stopped there and it is
> > not considered to be a stack corruption.
> I changed that with commit:
> s390: avoid misusing CALL_ON_STACK for task stack setup
> Now idle tasks are not special, final back chain contains 0.
>
> > ---
> > arch/s390/Kconfig | 1 +
> > arch/s390/kernel/dumpstack.c | 11 +++++++
> > arch/s390/kernel/stacktrace.c | 46 ++++++++++++++++++++++++++
> > arch/s390/kernel/unwind_bc.c | 61 +++++++++++++++++++++++++++--------
> > 4 files changed, 106 insertions(+), 13 deletions(-)
> >
> > --- a/arch/s390/kernel/dumpstack.c
> > +++ b/arch/s390/kernel/dumpstack.c
> > @@ -94,12 +94,23 @@ int get_stack_info(unsigned long sp, struct task_struct *task,
> > if (!sp)
> > goto unknown;
> >
> > + /* Sanity check: ABI requires SP to be aligned 8 bytes. */
> > + if (sp & 0x7)
> > + goto unknown;
> > +
> This has been split in a separate commit:
> s390/unwind: add stack pointer alignment sanity checks
>
> > + /*
> > + * The reliable unwinding should not start on nodat_stack, async_stack
> > + * or restart_stack. The task is either current or must be inactive.
> > + */
> > + if (unwind_reliable)
> > + goto unknown;
> I moved this check in arch_stack_walk_reliable itself. See below.
>
> > static bool unwind_use_regs(struct unwind_state *state)
> > @@ -85,10 +94,13 @@ static bool unwind_use_frame(struct unwind_state *state, unsigned long sp,
> > struct stack_frame *sf;
> > unsigned long ip;
> >
> > - if (unlikely(outside_of_stack(state, sp))) {
> > - if (!update_stack_info(state, sp))
> > - goto out_err;
> > - }
> > + /*
> > + * No need to update stack info when unwind_reliable is true. We should
> > + * be on a task stack and everything else is an error.
> > + */
> > + if (unlikely(outside_of_stack(state, sp)) &&
> > + (unwind_reliable || !update_stack_info(state, sp)))
> > + goto out_err;
> I moved this check in arch_stack_walk_reliable itself. See below.
>
> > + /* Unwind reliable */
> > + if ((unsigned long)regs != info->end - sizeof(struct pt_regs))
> > + goto out_err;
> I moved this check in arch_stack_walk_reliable itself. See below.
>
>
> > @@ -136,8 +162,17 @@ bool unwind_next_frame(struct unwind_state *state, bool unwind_reliable)
> > sf = (struct stack_frame *) state->sp;
> > sp = READ_ONCE_NOCHECK(sf->back_chain);
> >
> > - /* Non-zero back-chain points to the previous frame */
> > - if (likely(sp))
> > + /*
> > + * Non-zero back-chain points to the previous frame
> > + *
> > + * unwind_reliable case: Idle tasks are special. The final
> > + * back-chain points to nodat_stack. See CALL_ON_STACK() in
> > + * smp_start_secondary() callback used in __cpu_up(). We just
> > + * accept it and look for pt_regs.
> > + */
> > + if (likely(sp) &&
> > + (!unwind_reliable || !(is_idle_task(state->task) &&
> > + outside_of_stack(state, sp))))
> > return unwind_use_frame(state, sp, unwind_reliable);
> This is not needed anymore.
>
> In the end this all boils down to arch_stack_walk_reliable
> implementation. I made the following changes compaired to your version:
> ---
> - use plain unwind_for_each_frame
> - "state.stack_info.type != STACK_TYPE_TASK" guarantees that we are
> not leaving task stack.
> - "if (state.regs)" guarantees that we have not met an program interrupt
> pt_regs (page faults) or preempted. Corresponds to yours:
> > + if ((unsigned long)regs != info->end - sizeof(struct pt_regs))
> > + goto out_err;

Agreed. All this should be equivalent to the changes I made.

> - I don't see a point in storing "kernel_thread_starter", am I missing
> something?

No, you don't. It was just for the sake of completeness and it is not
worth it.

[...]

> I'd appreciate if you could give those changes a spin. And check if
> something is missing or broken. Or share your livepatching testing
> procedure. If everything goes well, I might include these changes in
> second pull request for this 5.5 merge window.
>
> I did successfully run ./testing/selftests/livepatch/test-livepatch.sh

'make run_tests' in tools/testing/selftests/livepatch/ would call all the
tests in the directory. Especially test-callbacks.sh which stresses the
livepatching even more.

> https://github.com/lpechacek/qa_test_klp seems outdated. I was able to
> fix and run some tests of it but haven't had time to figure out all of
> them. Is there a version that would run on top of current Linus tree?

Ah, sorry. I should have mentioned that. The code became outdated with
recent upstream changes. Libor is working on the updates (CCed).

Anyway, I ran it here and it all works fine.

Tested-by: Miroslav Benes <[email protected]>

Thanks a lot
Miroslav

2019-11-29 18:18:10

by Miroslav Benes

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] s390/unwind: add stack pointer alignment sanity checks

On Fri, 29 Nov 2019, Vasily Gorbik wrote:

> From: Miroslav Benes <[email protected]>
>
> ABI requires SP to be aligned 8 bytes, report unwinding error otherwise.
>
> Signed-off-by: Vasily Gorbik <[email protected]>

Signed-off-by: Miroslav Benes <[email protected]>

M

2019-11-29 18:20:04

by Miroslav Benes

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] s390/livepatch: Implement reliable stack tracing for the consistency model

On Fri, 29 Nov 2019, Vasily Gorbik wrote:

> From: Miroslav Benes <[email protected]>
>
> The livepatch consistency model requires reliable stack tracing
> architecture support in order to work properly. In order to achieve
> this, two main issues have to be solved. First, reliable and consistent
> call chain backtracing has to be ensured. Second, the unwinder needs to
> be able to detect stack corruptions and return errors.
>
> The "zSeries ELF Application Binary Interface Supplement" says:
>
> "The stack pointer points to the first word of the lowest allocated
> stack frame. If the "back chain" is implemented this word will point to
> the previously allocated stack frame (towards higher addresses), except
> for the first stack frame, which shall have a back chain of zero (NULL).
> The stack shall grow downwards, in other words towards lower addresses."
>
> "back chain" is optional. GCC option -mbackchain enables it. Quoting
> Martin Schwidefsky [1]:
>
> "The compiler is called with the -mbackchain option, all normal C
> function will store the backchain in the function prologue. All
> functions written in assembler code should do the same, if you find one
> that does not we should fix that. The end result is that a task that
> *voluntarily* called schedule() should have a proper backchain at all
> times.
>
> Dependent on the use case this may or may not be enough. Asynchronous
> interrupts may stop the CPU at the beginning of a function, if kernel
> preemption is enabled we can end up with a broken backchain. The
> production kernels for IBM Z are all compiled *without* kernel
> preemption. So yes, we might get away without the objtool support.
>
> On a side-note, we do have a line item to implement the ORC unwinder for
> the kernel, that includes the objtool support. Once we have that we can
> drop the -mbackchain option for the kernel build. That gives us a nice
> little performance benefit. I hope that the change from backchain to the
> ORC unwinder will not be too hard to implement in the livepatch tools."
>
> Since -mbackchain is enabled by default when the kernel is compiled, the
> call chain backtracing should be currently ensured and objtool should
> not be necessary for livepatch purposes.
>
> Regarding the second issue, stack corruptions and non-reliable states
> have to be recognized by the unwinder. Mainly it means to detect
> preemption or page faults, the end of the task stack must be reached,
> return addresses must be valid text addresses and hacks like function
> graph tracing and kretprobes must be properly detected.
>
> Unwinding a running task's stack is not a problem, because there is a
> livepatch requirement that every checked task is blocked, except for the
> current task. Due to that, we can consider a task's kernel/thread stack
> only and skip the other stacks.
>
> [1] 20180912121106.31ffa97c@mschwideX1 [not archived on lore.kernel.org]
>
> Signed-off-by: Vasily Gorbik <[email protected]>

Signed-off-by: Miroslav Benes <[email protected]>

M

2019-12-11 13:46:58

by Libor Pechacek

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] s390/livepatch: Implement reliable stack tracing for the consistency model

On Fri 29-11-19 19:16:38, Miroslav Benes wrote:
> On Fri, 29 Nov 2019, Vasily Gorbik wrote:
[...]
> > https://github.com/lpechacek/qa_test_klp seems outdated. I was able to
> > fix and run some tests of it but haven't had time to figure out all of
> > them. Is there a version that would run on top of current Linus tree?
>
> Ah, sorry. I should have mentioned that. The code became outdated with
> recent upstream changes. Libor is working on the updates (CCed).

qa_test_klp has been refreshed. Feel free to test and report issues.

Thanks!

Libor
--
Libor Pechacek
SUSE Labs Remember to have fun...