2016-03-25 19:36:12

by Josh Poimboeuf

[permalink] [raw]
Subject: [RFC PATCH v1.9 00/14] livepatch: hybrid consistency model

These patches are still a work in progress, but Jiri asked that I share
them before I go on vacation next week. Based on origin/master because
it has CONFIG_STACK_VALIDATION.

This has two consistency models: the immediate model (like in today's
code) and the new kpatch/kgraft hybrid model.

The default is the hybrid model: kgraft's per-task consistency and
syscall barrier switching combined with kpatch's stack trace switching.
There are also a number of fallback options which make it pretty
flexible, yet the code is still quite simple.

Patches are applied on a per-task basis, when the task is deemed safe to
switch over. It uses a tiered approach to determine when a task is safe
and can be switched.

The first wave of attack is stack checking of sleeping tasks. If no
affected functions are on the stack of a given task, the task is
switched. In most cases this will patch most or all of the tasks on the
first try. Otherwise it'll keep trying periodically. This option is
only available if the architecture has reliable stacks
(CONFIG_RELIABLE_STACKTRACE and CONFIG_STACK_VALIDATION).

The next line of attack, if needed, is syscall/IRQ switching. A task is
switched when it returns from a system call or a user space IRQ. This
approach is less than ideal because it usually requires signaling tasks
to get them to switch. It also doesn't work for kthreads. But it's
still useful in some cases when user tasks get stuck sleeping on an
affected function.

For architectures which don't have reliable stacks, users have two
options:

a) use the hybrid fallback option of using only the syscall/IRQ
switching (which is the default);

b) or they can use the immediate model (which is the model we already
have today) by setting the patch->immediate flag.

There's also a func->immediate flag which allows users to specify that
certain functions in the patch can be applied without per-task
consistency. This might be useful if you want to patch a common
function like schedule(), and the function change doesn't need
consistency but the rest of the patch does.

Still have a lot of TODOs, some of them are listed here. If you see
something objectionable, it might be a good idea to make sure it's not
already on the TODO list :-)

TODO:
- actually test it
- don't use TIP_KLP_NEED_UPDATE in arch-independent code
- figure out new API to keep the use of task_rq_lock() in the sched code
- cleaner way to detect preemption on the stack
- allow patch modules to be removed. still needs more discussion and
thought. maybe something like Miroslav's patch would be good:
https://lkml.kernel.org/r/[email protected]
- come up with a better name than universe? KLP_STATE_PREV/NEXT?
KLP_UNPATCHED/PATCHED? there were some objections to the name in v1.
- update documentation for sysfs, proc, livepatch
- need atomic accesses or READ_ONCE/WRITE_ONCE anywhere?
- ability to force a patch to the goal universe
- try ftrace handler switching idea from v1 cover letter
- split up the patches better
- cc all the right people

v1.9 changes:
- revive from the dead and rebased
- reliable stacks!
- add support for immediate consistency model
- add a ton of comments
- fix up memory barriers
- remove "allow patch modules to be removed" patch for now, it still
needs more discussion and thought - it can be done with something
- "proc/pid/universe" -> "proc/pid/patch_status"
- remove WARN_ON_ONCE from !func condition in ftrace handler -- can
happen because of RCU
- keep klp_mutex private by putting the work_fn in core.c
- convert states from int to boolean
- remove obsolete '@state' comments
- several header file and include improvements suggested by Jiri S
- change kallsyms_lookup_size_offset() errors from EINVAL -> ENOENT
- change proc file permissions S_IRUGO -> USR
- use klp_for_each_object/func helpers

v1 was here:
https://lkml.kernel.org/r/[email protected]

Josh Poimboeuf (14):
x86/asm/head: cleanup initial stack variable
x86/asm/head: use a common function for starting CPUs
x86/asm/head: standardize the bottom of the stack for idle tasks
x86: move _stext marker before head code
sched: horrible way to detect whether a task has been preempted
x86: add error handling to dump_trace()
x86/stacktrace: add function for detecting reliable stack traces
livepatch: separate enabled and patched states
livepatch: remove unnecessary object loaded check
livepatch: move patching functions into patch.c
livepatch: store function sizes
livepatch: create per-task consistency model
livepatch: add /proc/<pid>/patch_status
livepatch: update task universe when exiting kernel

arch/Kconfig | 6 +
arch/x86/Kconfig | 1 +
arch/x86/entry/common.c | 6 +-
arch/x86/include/asm/realmode.h | 2 +-
arch/x86/include/asm/smp.h | 3 -
arch/x86/include/asm/stacktrace.h | 36 +--
arch/x86/include/asm/thread_info.h | 2 +
arch/x86/kernel/acpi/sleep.c | 2 +-
arch/x86/kernel/dumpstack.c | 67 ++++--
arch/x86/kernel/dumpstack_32.c | 22 +-
arch/x86/kernel/dumpstack_64.c | 53 +++--
arch/x86/kernel/head_32.S | 8 +-
arch/x86/kernel/head_64.S | 35 ++-
arch/x86/kernel/smpboot.c | 2 +-
arch/x86/kernel/stacktrace.c | 32 +++
arch/x86/kernel/vmlinux.lds.S | 2 +-
fs/proc/base.c | 12 +
include/linux/livepatch.h | 49 ++++-
include/linux/sched.h | 14 +-
include/linux/stacktrace.h | 20 +-
kernel/fork.c | 2 +
kernel/livepatch/Makefile | 2 +-
kernel/livepatch/core.c | 316 +++++++++------------------
kernel/livepatch/patch.c | 226 +++++++++++++++++++
kernel/livepatch/patch.h | 33 +++
kernel/livepatch/transition.c | 435 +++++++++++++++++++++++++++++++++++++
kernel/livepatch/transition.h | 20 ++
kernel/sched/core.c | 28 ++-
kernel/sched/idle.c | 4 +
kernel/stacktrace.c | 4 +-
lib/Kconfig.debug | 6 +
31 files changed, 1135 insertions(+), 315 deletions(-)
create mode 100644 kernel/livepatch/patch.c
create mode 100644 kernel/livepatch/patch.h
create mode 100644 kernel/livepatch/transition.c
create mode 100644 kernel/livepatch/transition.h

--
2.4.3


2016-03-25 19:36:17

by Josh Poimboeuf

[permalink] [raw]
Subject: [RFC PATCH v1.9 06/14] x86: add error handling to dump_trace()

In preparation for being able to determine whether a given stack trace
is reliable, allow the stacktrace_ops functions to propagate errors to
dump_trace().

Signed-off-by: Josh Poimboeuf <[email protected]>
---
arch/x86/include/asm/stacktrace.h | 36 +++++++++++++++-----------
arch/x86/kernel/dumpstack.c | 31 +++++++++++------------
arch/x86/kernel/dumpstack_32.c | 22 ++++++++++------
arch/x86/kernel/dumpstack_64.c | 53 ++++++++++++++++++++++++++-------------
4 files changed, 87 insertions(+), 55 deletions(-)

diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index 7c247e7..a64523f3 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -14,26 +14,32 @@ extern int kstack_depth_to_print;
struct thread_info;
struct stacktrace_ops;

-typedef unsigned long (*walk_stack_t)(struct thread_info *tinfo,
- unsigned long *stack,
- unsigned long bp,
- const struct stacktrace_ops *ops,
- void *data,
- unsigned long *end,
- int *graph);
-
-extern unsigned long
+typedef int (*walk_stack_t)(struct thread_info *tinfo,
+ unsigned long *stack,
+ unsigned long *bp,
+ const struct stacktrace_ops *ops,
+ void *data,
+ unsigned long *end,
+ int *graph);
+
+extern int
print_context_stack(struct thread_info *tinfo,
- unsigned long *stack, unsigned long bp,
+ unsigned long *stack, unsigned long *bp,
const struct stacktrace_ops *ops, void *data,
unsigned long *end, int *graph);

-extern unsigned long
+extern int
print_context_stack_bp(struct thread_info *tinfo,
- unsigned long *stack, unsigned long bp,
+ unsigned long *stack, unsigned long *bp,
const struct stacktrace_ops *ops, void *data,
unsigned long *end, int *graph);

+extern int
+print_context_stack_reliable(struct thread_info *tinfo,
+ unsigned long *stack, unsigned long *bp,
+ const struct stacktrace_ops *ops, void *data,
+ unsigned long *end, int *graph);
+
/* Generic stack tracer with callbacks */

struct stacktrace_ops {
@@ -43,9 +49,9 @@ struct stacktrace_ops {
walk_stack_t walk_stack;
};

-void dump_trace(struct task_struct *tsk, struct pt_regs *regs,
- unsigned long *stack, unsigned long bp,
- const struct stacktrace_ops *ops, void *data);
+int dump_trace(struct task_struct *tsk, struct pt_regs *regs,
+ unsigned long *stack, unsigned long bp,
+ const struct stacktrace_ops *ops, void *data);

#ifdef CONFIG_X86_32
#define STACKSLOTS_PER_LINE 8
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 8efa57a..3b10518 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -92,23 +92,22 @@ static inline int valid_stack_ptr(struct thread_info *tinfo,
return p > t && p < t + THREAD_SIZE - size;
}

-unsigned long
-print_context_stack(struct thread_info *tinfo,
- unsigned long *stack, unsigned long bp,
- const struct stacktrace_ops *ops, void *data,
- unsigned long *end, int *graph)
+int print_context_stack(struct thread_info *tinfo,
+ unsigned long *stack, unsigned long *bp,
+ const struct stacktrace_ops *ops, void *data,
+ unsigned long *end, int *graph)
{
- struct stack_frame *frame = (struct stack_frame *)bp;
+ struct stack_frame *frame = (struct stack_frame *)*bp;

while (valid_stack_ptr(tinfo, stack, sizeof(*stack), end)) {
unsigned long addr;

addr = *stack;
if (__kernel_text_address(addr)) {
- if ((unsigned long) stack == bp + sizeof(long)) {
+ if ((unsigned long) stack == *bp + sizeof(long)) {
ops->address(data, addr, 1);
frame = frame->next_frame;
- bp = (unsigned long) frame;
+ *bp = (unsigned long) frame;
} else {
ops->address(data, addr, 0);
}
@@ -116,17 +115,16 @@ print_context_stack(struct thread_info *tinfo,
}
stack++;
}
- return bp;
+ return 0;
}
EXPORT_SYMBOL_GPL(print_context_stack);

-unsigned long
-print_context_stack_bp(struct thread_info *tinfo,
- unsigned long *stack, unsigned long bp,
- const struct stacktrace_ops *ops, void *data,
- unsigned long *end, int *graph)
+int print_context_stack_bp(struct thread_info *tinfo,
+ unsigned long *stack, unsigned long *bp,
+ const struct stacktrace_ops *ops, void *data,
+ unsigned long *end, int *graph)
{
- struct stack_frame *frame = (struct stack_frame *)bp;
+ struct stack_frame *frame = (struct stack_frame *)*bp;
unsigned long *ret_addr = &frame->return_address;

while (valid_stack_ptr(tinfo, ret_addr, sizeof(*ret_addr), end)) {
@@ -142,7 +140,8 @@ print_context_stack_bp(struct thread_info *tinfo,
print_ftrace_graph_addr(addr, data, ops, tinfo, graph);
}

- return (unsigned long)frame;
+ *bp = (unsigned long)frame;
+ return 0;
}
EXPORT_SYMBOL_GPL(print_context_stack_bp);

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 464ffd6..e710bab 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -38,13 +38,14 @@ static void *is_softirq_stack(unsigned long *stack, int cpu)
return is_irq_stack(stack, irq);
}

-void dump_trace(struct task_struct *task, struct pt_regs *regs,
- unsigned long *stack, unsigned long bp,
- const struct stacktrace_ops *ops, void *data)
+int dump_trace(struct task_struct *task, struct pt_regs *regs,
+ unsigned long *stack, unsigned long bp,
+ const struct stacktrace_ops *ops, void *data)
{
const unsigned cpu = get_cpu();
int graph = 0;
u32 *prev_esp;
+ int ret;

if (!task)
task = current;
@@ -69,8 +70,10 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
end_stack = is_softirq_stack(stack, cpu);

context = task_thread_info(task);
- bp = ops->walk_stack(context, stack, bp, ops, data,
- end_stack, &graph);
+ ret = ops->walk_stack(context, stack, &bp, ops, data,
+ end_stack, &graph);
+ if (ret)
+ goto out;

/* Stop if not on irq stack */
if (!end_stack)
@@ -82,11 +85,16 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
if (!stack)
break;

- if (ops->stack(data, "IRQ") < 0)
- break;
+ ret = ops->stack(data, "IRQ");
+ if (ret)
+ goto out;
+
touch_nmi_watchdog();
}
+
+out:
put_cpu();
+ return ret;
}
EXPORT_SYMBOL(dump_trace);

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 5f1c626..0c810ba 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -148,9 +148,9 @@ analyze_stack(int cpu, struct task_struct *task, unsigned long *stack,
* severe exception (double fault, nmi, stack fault, debug, mce) hardware stack
*/

-void dump_trace(struct task_struct *task, struct pt_regs *regs,
- unsigned long *stack, unsigned long bp,
- const struct stacktrace_ops *ops, void *data)
+int dump_trace(struct task_struct *task, struct pt_regs *regs,
+ unsigned long *stack, unsigned long bp,
+ const struct stacktrace_ops *ops, void *data)
{
const unsigned cpu = get_cpu();
struct thread_info *tinfo;
@@ -159,6 +159,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
unsigned used = 0;
int graph = 0;
int done = 0;
+ int ret;

if (!task)
task = current;
@@ -198,13 +199,18 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
break;

case STACK_IS_EXCEPTION:
-
- if (ops->stack(data, id) < 0)
- break;
-
- bp = ops->walk_stack(tinfo, stack, bp, ops,
- data, stack_end, &graph);
- ops->stack(data, "<EOE>");
+ ret = ops->stack(data, id);
+ if (ret)
+ goto out;
+
+ ret = ops->walk_stack(tinfo, stack, &bp, ops, data,
+ stack_end, &graph);
+ if (ret)
+ goto out;
+
+ ret = ops->stack(data, "<EOE>");
+ if (ret)
+ goto out;
/*
* We link to the next stack via the
* second-to-last pointer (index -2 to end) in the
@@ -215,11 +221,15 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
break;

case STACK_IS_IRQ:
+ ret = ops->stack(data, "IRQ");
+ if (ret)
+ goto out;
+
+ ret = ops->walk_stack(tinfo, stack, &bp, ops, data,
+ stack_end, &graph);
+ if (ret)
+ goto out;

- if (ops->stack(data, "IRQ") < 0)
- break;
- bp = ops->walk_stack(tinfo, stack, bp,
- ops, data, stack_end, &graph);
/*
* We link to the next stack (which would be
* the process stack normally) the last
@@ -227,12 +237,18 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
*/
stack = (unsigned long *) (stack_end[-1]);
irq_stack = NULL;
- ops->stack(data, "EOI");
+
+ ret = ops->stack(data, "EOI");
+ if (ret)
+ goto out;
+
done = 0;
break;

case STACK_IS_UNKNOWN:
- ops->stack(data, "UNK");
+ ret = ops->stack(data, "UNK");
+ if (ret)
+ goto out;
break;
}
}
@@ -240,8 +256,11 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
/*
* This handles the process stack:
*/
- bp = ops->walk_stack(tinfo, stack, bp, ops, data, NULL, &graph);
+ ret = ops->walk_stack(tinfo, stack, &bp, ops, data, NULL, &graph);
+
+out:
put_cpu();
+ return ret;
}
EXPORT_SYMBOL(dump_trace);

--
2.4.3

2016-03-25 19:36:22

by Josh Poimboeuf

[permalink] [raw]
Subject: [RFC PATCH v1.9 11/14] livepatch: store function sizes

For the consistency model we'll need to know the sizes of the old and
new functions to determine if they're on stacks of any tasks.

Signed-off-by: Josh Poimboeuf <[email protected]>
---
include/linux/livepatch.h | 3 +++
kernel/livepatch/core.c | 16 ++++++++++++++++
2 files changed, 19 insertions(+)

diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
index 6d45dc7..dd5db74 100644
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -37,6 +37,8 @@
* @old_addr: the address of the function being patched
* @kobj: kobject for sysfs resources
* @stack_node: list node for klp_ops func_stack list
+ * @old_size: size of the old function
+ * @new_size: size of the new function
* @patched: the func has been added to the klp_ops list
*/
struct klp_func {
@@ -56,6 +58,7 @@ struct klp_func {
unsigned long old_addr;
struct kobject kobj;
struct list_head stack_node;
+ unsigned long old_size, new_size;
bool patched;
};

diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index beaf263..b0fd31d 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -540,6 +540,22 @@ static int klp_init_object_loaded(struct klp_patch *patch,
&func->old_addr);
if (ret)
return ret;
+
+ ret = kallsyms_lookup_size_offset(func->old_addr,
+ &func->old_size, NULL);
+ if (!ret) {
+ pr_err("kallsyms size lookup failed for '%s'\n",
+ func->old_name);
+ return -ENOENT;
+ }
+
+ ret = kallsyms_lookup_size_offset((unsigned long)func->new_func,
+ &func->new_size, NULL);
+ if (!ret) {
+ pr_err("kallsyms size lookup failed for '%s' replacement\n",
+ func->old_name);
+ return -ENOENT;
+ }
}

return 0;
--
2.4.3

2016-03-25 19:36:23

by Josh Poimboeuf

[permalink] [raw]
Subject: [RFC PATCH v1.9 13/14] livepatch: add /proc/<pid>/patch_status

Expose the per-task klp_universe value so users can determine which
tasks are holding up completion of a patching operation.

Call it "patch_status" rather than "universe": it's at least more
descriptive for the average user.

Signed-off-by: Josh Poimboeuf <[email protected]>
---
fs/proc/base.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index b1755b2..f90998b 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2801,6 +2801,15 @@ static int proc_pid_personality(struct seq_file *m, struct pid_namespace *ns,
return err;
}

+#ifdef CONFIG_LIVEPATCH
+static int proc_pid_patch_status(struct seq_file *m, struct pid_namespace *ns,
+ struct pid *pid, struct task_struct *task)
+{
+ seq_printf(m, "%d\n", task->klp_universe);
+ return 0;
+}
+#endif /* CONFIG_LIVEPATCH */
+
/*
* Thread groups
*/
@@ -2900,6 +2909,9 @@ static const struct pid_entry tgid_base_stuff[] = {
REG("timers", S_IRUGO, proc_timers_operations),
#endif
REG("timerslack_ns", S_IRUGO|S_IWUGO, proc_pid_set_timerslack_ns_operations),
+#ifdef CONFIG_LIVEPATCH
+ ONE("patch_status", S_IRUSR, proc_pid_patch_status),
+#endif
};

static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx)
--
2.4.3

2016-03-25 19:36:20

by Josh Poimboeuf

[permalink] [raw]
Subject: [RFC PATCH v1.9 10/14] livepatch: move patching functions into patch.c

Move functions related to the actual patching of functions and objects
into a new patch.c file.

Signed-off-by: Josh Poimboeuf <[email protected]>
---
kernel/livepatch/Makefile | 2 +-
kernel/livepatch/core.c | 174 +------------------------------------------
kernel/livepatch/patch.c | 185 ++++++++++++++++++++++++++++++++++++++++++++++
kernel/livepatch/patch.h | 32 ++++++++
4 files changed, 219 insertions(+), 174 deletions(-)
create mode 100644 kernel/livepatch/patch.c
create mode 100644 kernel/livepatch/patch.h

diff --git a/kernel/livepatch/Makefile b/kernel/livepatch/Makefile
index e8780c0..e136dad 100644
--- a/kernel/livepatch/Makefile
+++ b/kernel/livepatch/Makefile
@@ -1,3 +1,3 @@
obj-$(CONFIG_LIVEPATCH) += livepatch.o

-livepatch-objs := core.o
+livepatch-objs := core.o patch.o
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index 1f70500..beaf263 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -24,30 +24,11 @@
#include <linux/kernel.h>
#include <linux/mutex.h>
#include <linux/slab.h>
-#include <linux/ftrace.h>
#include <linux/list.h>
#include <linux/kallsyms.h>
#include <linux/livepatch.h>
#include <asm/cacheflush.h>
-
-/**
- * struct klp_ops - structure for tracking registered ftrace ops structs
- *
- * A single ftrace_ops is shared between all enabled replacement functions
- * (klp_func structs) which have the same old_addr. This allows the switch
- * between function versions to happen instantaneously by updating the klp_ops
- * struct's func_stack list. The winner is the klp_func at the top of the
- * func_stack (front of the list).
- *
- * @node: node for the global klp_ops list
- * @func_stack: list head for the stack of klp_func's (active func is on top)
- * @fops: registered ftrace ops struct
- */
-struct klp_ops {
- struct list_head node;
- struct list_head func_stack;
- struct ftrace_ops fops;
-};
+#include "patch.h"

/*
* The klp_mutex protects the global lists and state transitions of any
@@ -58,25 +39,9 @@ struct klp_ops {
static DEFINE_MUTEX(klp_mutex);

static LIST_HEAD(klp_patches);
-static LIST_HEAD(klp_ops);

static struct kobject *klp_root_kobj;

-static struct klp_ops *klp_find_ops(unsigned long old_addr)
-{
- struct klp_ops *ops;
- struct klp_func *func;
-
- list_for_each_entry(ops, &klp_ops, node) {
- func = list_first_entry(&ops->func_stack, struct klp_func,
- stack_node);
- if (func->old_addr == old_addr)
- return ops;
- }
-
- return NULL;
-}
-
static bool klp_is_module(struct klp_object *obj)
{
return obj->name;
@@ -277,143 +242,6 @@ out:
return ret;
}

-static void notrace klp_ftrace_handler(unsigned long ip,
- unsigned long parent_ip,
- struct ftrace_ops *fops,
- struct pt_regs *regs)
-{
- struct klp_ops *ops;
- struct klp_func *func;
-
- ops = container_of(fops, struct klp_ops, fops);
-
- rcu_read_lock();
- func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
- stack_node);
- if (WARN_ON_ONCE(!func))
- goto unlock;
-
- klp_arch_set_pc(regs, (unsigned long)func->new_func);
-unlock:
- rcu_read_unlock();
-}
-
-static void klp_unpatch_func(struct klp_func *func)
-{
- struct klp_ops *ops;
-
- if (WARN_ON(!func->patched))
- return;
- if (WARN_ON(!func->old_addr))
- return;
-
- ops = klp_find_ops(func->old_addr);
- if (WARN_ON(!ops))
- return;
-
- if (list_is_singular(&ops->func_stack)) {
- WARN_ON(unregister_ftrace_function(&ops->fops));
- WARN_ON(ftrace_set_filter_ip(&ops->fops, func->old_addr, 1, 0));
-
- list_del_rcu(&func->stack_node);
- list_del(&ops->node);
- kfree(ops);
- } else {
- list_del_rcu(&func->stack_node);
- }
-
- func->patched = false;
-}
-
-static int klp_patch_func(struct klp_func *func)
-{
- struct klp_ops *ops;
- int ret;
-
- if (WARN_ON(!func->old_addr))
- return -EINVAL;
-
- if (WARN_ON(func->patched))
- return -EINVAL;
-
- ops = klp_find_ops(func->old_addr);
- if (!ops) {
- ops = kzalloc(sizeof(*ops), GFP_KERNEL);
- if (!ops)
- return -ENOMEM;
-
- ops->fops.func = klp_ftrace_handler;
- ops->fops.flags = FTRACE_OPS_FL_SAVE_REGS |
- FTRACE_OPS_FL_DYNAMIC |
- FTRACE_OPS_FL_IPMODIFY;
-
- list_add(&ops->node, &klp_ops);
-
- INIT_LIST_HEAD(&ops->func_stack);
- list_add_rcu(&func->stack_node, &ops->func_stack);
-
- ret = ftrace_set_filter_ip(&ops->fops, func->old_addr, 0, 0);
- if (ret) {
- pr_err("failed to set ftrace filter for function '%s' (%d)\n",
- func->old_name, ret);
- goto err;
- }
-
- ret = register_ftrace_function(&ops->fops);
- if (ret) {
- pr_err("failed to register ftrace handler for function '%s' (%d)\n",
- func->old_name, ret);
- ftrace_set_filter_ip(&ops->fops, func->old_addr, 1, 0);
- goto err;
- }
-
-
- } else {
- list_add_rcu(&func->stack_node, &ops->func_stack);
- }
-
- func->patched = true;
-
- return 0;
-
-err:
- list_del_rcu(&func->stack_node);
- list_del(&ops->node);
- kfree(ops);
- return ret;
-}
-
-static void klp_unpatch_object(struct klp_object *obj)
-{
- struct klp_func *func;
-
- klp_for_each_func(obj, func)
- if (func->patched)
- klp_unpatch_func(func);
-
- obj->patched = false;
-}
-
-static int klp_patch_object(struct klp_object *obj)
-{
- struct klp_func *func;
- int ret;
-
- if (WARN_ON(obj->patched))
- return -EINVAL;
-
- klp_for_each_func(obj, func) {
- ret = klp_patch_func(func);
- if (ret) {
- klp_unpatch_object(obj);
- return ret;
- }
- }
- obj->patched = true;
-
- return 0;
-}
-
static int __klp_disable_patch(struct klp_patch *patch)
{
struct klp_object *obj;
diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
new file mode 100644
index 0000000..92e9ee0
--- /dev/null
+++ b/kernel/livepatch/patch.c
@@ -0,0 +1,185 @@
+/*
+ * patch.c - livepatch patching functions
+ *
+ * Copyright (C) 2014 Seth Jennings <[email protected]>
+ * Copyright (C) 2014 SUSE
+ * Copyright (C) 2015 Josh Poimboeuf <[email protected]
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/livepatch.h>
+#include <linux/list.h>
+#include <linux/ftrace.h>
+#include <linux/rculist.h>
+#include <linux/slab.h>
+#include <linux/bug.h>
+#include <linux/printk.h>
+#include "patch.h"
+
+static LIST_HEAD(klp_ops);
+
+struct klp_ops *klp_find_ops(unsigned long old_addr)
+{
+ struct klp_ops *ops;
+ struct klp_func *func;
+
+ list_for_each_entry(ops, &klp_ops, node) {
+ func = list_first_entry(&ops->func_stack, struct klp_func,
+ stack_node);
+ if (func->old_addr == old_addr)
+ return ops;
+ }
+
+ return NULL;
+}
+
+static void notrace klp_ftrace_handler(unsigned long ip,
+ unsigned long parent_ip,
+ struct ftrace_ops *fops,
+ struct pt_regs *regs)
+{
+ struct klp_ops *ops;
+ struct klp_func *func;
+
+ ops = container_of(fops, struct klp_ops, fops);
+
+ rcu_read_lock();
+ func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
+ stack_node);
+ if (WARN_ON_ONCE(!func))
+ goto unlock;
+
+ klp_arch_set_pc(regs, (unsigned long)func->new_func);
+unlock:
+ rcu_read_unlock();
+}
+
+static void klp_unpatch_func(struct klp_func *func)
+{
+ struct klp_ops *ops;
+
+ if (WARN_ON(!func->patched))
+ return;
+ if (WARN_ON(!func->old_addr))
+ return;
+
+ ops = klp_find_ops(func->old_addr);
+ if (WARN_ON(!ops))
+ return;
+
+ if (list_is_singular(&ops->func_stack)) {
+ WARN_ON(unregister_ftrace_function(&ops->fops));
+ WARN_ON(ftrace_set_filter_ip(&ops->fops, func->old_addr, 1, 0));
+
+ list_del_rcu(&func->stack_node);
+ list_del(&ops->node);
+ kfree(ops);
+ } else {
+ list_del_rcu(&func->stack_node);
+ }
+
+ func->patched = false;
+}
+
+static int klp_patch_func(struct klp_func *func)
+{
+ struct klp_ops *ops;
+ int ret;
+
+ if (WARN_ON(!func->old_addr))
+ return -EINVAL;
+
+ if (WARN_ON(func->patched))
+ return -EINVAL;
+
+ ops = klp_find_ops(func->old_addr);
+ if (!ops) {
+ ops = kzalloc(sizeof(*ops), GFP_KERNEL);
+ if (!ops)
+ return -ENOMEM;
+
+ ops->fops.func = klp_ftrace_handler;
+ ops->fops.flags = FTRACE_OPS_FL_SAVE_REGS |
+ FTRACE_OPS_FL_DYNAMIC |
+ FTRACE_OPS_FL_IPMODIFY;
+
+ list_add(&ops->node, &klp_ops);
+
+ INIT_LIST_HEAD(&ops->func_stack);
+ list_add_rcu(&func->stack_node, &ops->func_stack);
+
+ ret = ftrace_set_filter_ip(&ops->fops, func->old_addr, 0, 0);
+ if (ret) {
+ pr_err("failed to set ftrace filter for function '%s' (%d)\n",
+ func->old_name, ret);
+ goto err;
+ }
+
+ ret = register_ftrace_function(&ops->fops);
+ if (ret) {
+ pr_err("failed to register ftrace handler for function '%s' (%d)\n",
+ func->old_name, ret);
+ ftrace_set_filter_ip(&ops->fops, func->old_addr, 1, 0);
+ goto err;
+ }
+
+
+ } else {
+ list_add_rcu(&func->stack_node, &ops->func_stack);
+ }
+
+ func->patched = true;
+
+ return 0;
+
+err:
+ list_del_rcu(&func->stack_node);
+ list_del(&ops->node);
+ kfree(ops);
+ return ret;
+}
+
+void klp_unpatch_object(struct klp_object *obj)
+{
+ struct klp_func *func;
+
+ klp_for_each_func(obj, func)
+ if (func->patched)
+ klp_unpatch_func(func);
+
+ obj->patched = false;
+}
+
+int klp_patch_object(struct klp_object *obj)
+{
+ struct klp_func *func;
+ int ret;
+
+ if (WARN_ON(obj->patched))
+ return -EINVAL;
+
+ klp_for_each_func(obj, func) {
+ ret = klp_patch_func(func);
+ if (ret) {
+ klp_unpatch_object(obj);
+ return ret;
+ }
+ }
+ obj->patched = true;
+
+ return 0;
+}
diff --git a/kernel/livepatch/patch.h b/kernel/livepatch/patch.h
new file mode 100644
index 0000000..2d0cce0
--- /dev/null
+++ b/kernel/livepatch/patch.h
@@ -0,0 +1,32 @@
+#ifndef _LIVEPATCH_PATCH_H
+#define _LIVEPATCH_PATCH_H
+
+#include <linux/livepatch.h>
+#include <linux/list.h>
+#include <linux/ftrace.h>
+
+/**
+ * struct klp_ops - structure for tracking registered ftrace ops structs
+ *
+ * A single ftrace_ops is shared between all enabled replacement functions
+ * (klp_func structs) which have the same old_addr. This allows the switch
+ * between function versions to happen instantaneously by updating the klp_ops
+ * struct's func_stack list. The winner is the klp_func at the top of the
+ * func_stack (front of the list).
+ *
+ * @node: node for the global klp_ops list
+ * @func_stack: list head for the stack of klp_func's (active func is on top)
+ * @fops: registered ftrace ops struct
+ */
+struct klp_ops {
+ struct list_head node;
+ struct list_head func_stack;
+ struct ftrace_ops fops;
+};
+
+struct klp_ops *klp_find_ops(unsigned long old_addr);
+
+int klp_patch_object(struct klp_object *obj);
+void klp_unpatch_object(struct klp_object *obj);
+
+#endif /* _LIVEPATCH_PATCH_H */
--
2.4.3

2016-03-25 19:36:45

by Josh Poimboeuf

[permalink] [raw]
Subject: [RFC PATCH v1.9 14/14] livepatch: update task universe when exiting kernel

Update a tasks's universe when returning from a system call or user
space interrupt, or after handling a signal.

This greatly increases the chances of a patch operation succeeding. If
a task is I/O bound, it can switch universes when returning from a
system call. If a task is CPU bound, it can switch universes when
returning from an interrupt. If a task is sleeping on a to-be-patched
function, the user can send SIGSTOP and SIGCONT to force it to switch.

Since the idle "swapper" tasks don't ever exit the kernel, they're
updated from within the idle loop.

Signed-off-by: Josh Poimboeuf <[email protected]>
---
arch/x86/entry/common.c | 6 +++++-
arch/x86/include/asm/thread_info.h | 2 ++
include/linux/livepatch.h | 2 ++
kernel/livepatch/transition.c | 37 +++++++++++++++++++++++++++++++++----
kernel/sched/idle.c | 4 ++++
5 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index e79d93d..94639dd 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -21,6 +21,7 @@
#include <linux/context_tracking.h>
#include <linux/user-return-notifier.h>
#include <linux/uprobes.h>
+#include <linux/livepatch.h>

#include <asm/desc.h>
#include <asm/traps.h>
@@ -202,7 +203,7 @@ long syscall_trace_enter(struct pt_regs *regs)

#define EXIT_TO_USERMODE_LOOP_FLAGS \
(_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE | \
- _TIF_NEED_RESCHED | _TIF_USER_RETURN_NOTIFY)
+ _TIF_NEED_RESCHED | _TIF_USER_RETURN_NOTIFY | _TIF_KLP_NEED_UPDATE)

static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
{
@@ -236,6 +237,9 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
if (cached_flags & _TIF_USER_RETURN_NOTIFY)
fire_user_return_notifiers();

+ if (unlikely(cached_flags & _TIF_KLP_NEED_UPDATE))
+ klp_update_task_universe(current);
+
/* Disable IRQs and retry */
local_irq_disable();

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 8286669..4e3ea6f 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -97,6 +97,7 @@ struct thread_info {
#define TIF_SECCOMP 8 /* secure computing */
#define TIF_USER_RETURN_NOTIFY 11 /* notify kernel of userspace return */
#define TIF_UPROBE 12 /* breakpointed or singlestepping */
+#define TIF_KLP_NEED_UPDATE 13 /* pending live patching update */
#define TIF_NOTSC 16 /* TSC is not accessible in userland */
#define TIF_IA32 17 /* IA32 compatibility process */
#define TIF_FORK 18 /* ret_from_fork */
@@ -120,6 +121,7 @@ struct thread_info {
#define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
#define _TIF_SECCOMP (1 << TIF_SECCOMP)
#define _TIF_USER_RETURN_NOTIFY (1 << TIF_USER_RETURN_NOTIFY)
+#define _TIF_KLP_NEED_UPDATE (1 << TIF_KLP_NEED_UPDATE)
#define _TIF_UPROBE (1 << TIF_UPROBE)
#define _TIF_NOTSC (1 << TIF_NOTSC)
#define _TIF_IA32 (1 << TIF_IA32)
diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
index 4d2e26d..29964ac 100644
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -152,6 +152,8 @@ extern int klp_universe_goal;
*/
static inline void klp_update_task_universe(struct task_struct *task)
{
+ clear_tsk_thread_flag(task, TIF_KLP_NEED_UPDATE);
+
/*
* The corresponding write barriers are in klp_init_transition() and
* klp_start_transition(). See the comments there for an explanation.
diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index 0609d84..8a38247 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -277,6 +277,9 @@ success:
*/
void klp_start_transition(int universe)
{
+ struct task_struct *g, *task;
+ unsigned int cpu;
+
if (WARN_ON(klp_universe_goal == universe))
return;

@@ -293,12 +296,38 @@ void klp_start_transition(int universe)
klp_universe_goal = universe;

/*
- * Enforce the ordering of the universe goal write with later
- * task universe writes which are done via
- * klp_try_complete_transition(). The corresponding read barrier is in
- * klp_update_task_universe().
+ * Ensure that if another CPU goes through the syscall barrier, sees
+ * the TIF_KLP_NEED_UPDATE write below, and calls
+ * klp_update_task_universe(), it also sees the above write to the
+ * universe goal. Otherwise it can put the task in the wrong universe.
*/
smp_wmb();
+
+ /*
+ * If the patch can be applied or reverted immediately, skip the
+ * per-task transitions.
+ */
+ if (klp_transition_patch->immediate)
+ return;
+
+ /*
+ * Mark all normal tasks as needing a universe update. As they pass
+ * through the syscall barrier they'll switch over to the goal universe
+ * (unless we switch them in klp_try_complete_transition() first).
+ */
+ read_lock(&tasklist_lock);
+ for_each_process_thread(g, task)
+ set_tsk_thread_flag(task, TIF_KLP_NEED_UPDATE);
+ read_unlock(&tasklist_lock);
+
+ /*
+ * Ditto for the idle "swapper" tasks, though they never cross the
+ * syscall barrier.
+ */
+ get_online_cpus();
+ for_each_online_cpu(cpu)
+ set_tsk_thread_flag(idle_task(cpu), TIF_KLP_NEED_UPDATE);
+ put_online_cpus();
}

/*
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index bd12c6c..94bdad9 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -9,6 +9,7 @@
#include <linux/mm.h>
#include <linux/stackprotector.h>
#include <linux/suspend.h>
+#include <linux/livepatch.h>

#include <asm/tlb.h>

@@ -266,6 +267,9 @@ static void cpu_idle_loop(void)

sched_ttwu_pending();
schedule_preempt_disabled();
+
+ if (unlikely(test_thread_flag(TIF_KLP_NEED_UPDATE)))
+ klp_update_task_universe(current);
}
}

--
2.4.3

2016-03-25 19:37:09

by Josh Poimboeuf

[permalink] [raw]
Subject: [RFC PATCH v1.9 12/14] livepatch: create per-task consistency model

Add a basic per-task consistency model. This is the foundation which
will eventually enable us to patch those ~10% of security patches which
change function prototypes and/or data semantics.

When a patch is enabled, livepatch enters into a transition state where
tasks are converging from the old universe to the new universe. If a
given task isn't using any of the patched functions, it's switched to
the new universe. Once all the tasks have been converged to the new
universe, patching is complete.

The same sequence occurs when a patch is disabled, except the tasks
converge from the new universe to the old universe.

The /sys/kernel/livepatch/<patch>/transition file shows whether a patch
is in transition. Only a single patch (the topmost patch on the stack)
can be in transition at a given time. A patch can remain in the
transition state indefinitely, if any of the tasks are stuck in the
previous universe.

A transition can be reversed and effectively canceled by writing the
opposite value to the /sys/kernel/livepatch/<patch>/enabled file while
the transition is in progress. Then all the tasks will attempt to
converge back to the original universe.

Signed-off-by: Josh Poimboeuf <[email protected]>
---
include/linux/livepatch.h | 27 +++
include/linux/sched.h | 3 +
kernel/fork.c | 2 +
kernel/livepatch/Makefile | 2 +-
kernel/livepatch/core.c | 99 +++++++---
kernel/livepatch/patch.c | 43 ++++-
kernel/livepatch/patch.h | 1 +
kernel/livepatch/transition.c | 406 ++++++++++++++++++++++++++++++++++++++++++
kernel/livepatch/transition.h | 20 +++
kernel/sched/core.c | 2 +
10 files changed, 582 insertions(+), 23 deletions(-)
create mode 100644 kernel/livepatch/transition.c
create mode 100644 kernel/livepatch/transition.h

diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
index dd5db74..4d2e26d 100644
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -34,12 +34,14 @@
* @new_func: pointer to the patched function code
* @old_sympos: a hint indicating which symbol position the old function
* can be found (optional)
+ * @immediate: patch the func immediately, bypassing backtrace safety checks
* @old_addr: the address of the function being patched
* @kobj: kobject for sysfs resources
* @stack_node: list node for klp_ops func_stack list
* @old_size: size of the old function
* @new_size: size of the new function
* @patched: the func has been added to the klp_ops list
+ * @transition: the func is currently being applied or reverted
*/
struct klp_func {
/* external */
@@ -53,6 +55,7 @@ struct klp_func {
* in kallsyms for the given object is used.
*/
unsigned long old_sympos;
+ bool immediate;

/* internal */
unsigned long old_addr;
@@ -60,6 +63,7 @@ struct klp_func {
struct list_head stack_node;
unsigned long old_size, new_size;
bool patched;
+ bool transition;
};

/**
@@ -106,6 +110,7 @@ struct klp_object {
* struct klp_patch - patch structure for live patching
* @mod: reference to the live patch module
* @objs: object entries for kernel objects to be patched
+ * @immediate: patch all funcs immediately, bypassing safety mechanisms
* @list: list node for global list of registered patches
* @kobj: kobject for sysfs resources
* @enabled: the patch is enabled (but operation may be incomplete)
@@ -114,6 +119,7 @@ struct klp_patch {
/* external */
struct module *mod;
struct klp_object *objs;
+ bool immediate;

/* internal */
struct list_head list;
@@ -136,11 +142,32 @@ int klp_disable_patch(struct klp_patch *);
int klp_module_coming(struct module *mod);
void klp_module_going(struct module *mod);

+extern int klp_universe_goal;
+/*
+ * klp_update_task_universe() - change the patched state of a task
+ * @task: The task to change
+ *
+ * Converts the patched state of the task so that it will switch to the set of
+ * functions in the goal universe.
+ */
+static inline void klp_update_task_universe(struct task_struct *task)
+{
+ /*
+ * The corresponding write barriers are in klp_init_transition() and
+ * klp_start_transition(). See the comments there for an explanation.
+ */
+ smp_rmb();
+
+ task->klp_universe = klp_universe_goal;
+}
+
#else /* !CONFIG_LIVEPATCH */

static inline int klp_module_coming(struct module *mod) { return 0; }
static inline void klp_module_going(struct module *mod) { }

+static inline void klp_update_task_universe(struct task_struct *task) {}
+
#endif /* CONFIG_LIVEPATCH */

#endif /* _LINUX_LIVEPATCH_H_ */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 62d0961..c27286f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1848,6 +1848,9 @@ struct task_struct {
unsigned long task_state_change;
#endif
int pagefault_disabled;
+#ifdef CONFIG_LIVEPATCH
+ int klp_universe;
+#endif
/* CPU-specific state of this task */
struct thread_struct thread;
/*
diff --git a/kernel/fork.c b/kernel/fork.c
index d277e83..27b181e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -76,6 +76,7 @@
#include <linux/compiler.h>
#include <linux/sysctl.h>
#include <linux/kcov.h>
+#include <linux/livepatch.h>

#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -1615,6 +1616,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
total_forks++;
spin_unlock(&current->sighand->siglock);
syscall_tracepoint_update(p);
+ klp_update_task_universe(p);
write_unlock_irq(&tasklist_lock);

proc_fork_connector(p);
diff --git a/kernel/livepatch/Makefile b/kernel/livepatch/Makefile
index e136dad..2b8bdb1 100644
--- a/kernel/livepatch/Makefile
+++ b/kernel/livepatch/Makefile
@@ -1,3 +1,3 @@
obj-$(CONFIG_LIVEPATCH) += livepatch.o

-livepatch-objs := core.o patch.o
+livepatch-objs := core.o patch.o transition.o
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index b0fd31d..19afa9b 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -27,14 +27,18 @@
#include <linux/list.h>
#include <linux/kallsyms.h>
#include <linux/livepatch.h>
+#include <linux/stacktrace.h>
#include <asm/cacheflush.h>
#include "patch.h"
+#include "transition.h"

/*
- * The klp_mutex protects the global lists and state transitions of any
- * structure reachable from them. References to any structure must be obtained
- * under mutex protection (except in klp_ftrace_handler(), which uses RCU to
- * ensure it gets consistent data).
+ * klp_mutex is a coarse lock which serializes access to klp data. All
+ * accesses to klp-related variables and structures must have mutex protection,
+ * except within the following functions which carefully avoid the need for it:
+ *
+ * - klp_ftrace_handler()
+ * - klp_update_task_universe()
*/
static DEFINE_MUTEX(klp_mutex);

@@ -42,6 +46,30 @@ static LIST_HEAD(klp_patches);

static struct kobject *klp_root_kobj;

+static void klp_work_fn(struct work_struct *work);
+static DECLARE_DELAYED_WORK(klp_work, klp_work_fn);
+
+static void klp_schedule_work(void)
+{
+ if (IS_ENABLED(CONFIG_RELIABLE_STACKTRACE))
+ schedule_delayed_work(&klp_work, round_jiffies_relative(HZ));
+}
+
+/*
+ * This work can be performed periodically to finish patching or unpatching any
+ * "straggler" tasks which failed to transition in klp_enable_patch().
+ */
+static void klp_work_fn(struct work_struct *work)
+{
+ mutex_lock(&klp_mutex);
+
+ if (klp_transition_patch)
+ if (!klp_try_complete_transition())
+ klp_schedule_work();
+
+ mutex_unlock(&klp_mutex);
+}
+
static bool klp_is_module(struct klp_object *obj)
{
return obj->name;
@@ -80,7 +108,6 @@ static void klp_find_object_module(struct klp_object *obj)
mutex_unlock(&module_mutex);
}

-/* klp_mutex must be held by caller */
static bool klp_is_patch_registered(struct klp_patch *patch)
{
struct klp_patch *mypatch;
@@ -244,19 +271,18 @@ out:

static int __klp_disable_patch(struct klp_patch *patch)
{
- struct klp_object *obj;
+ if (klp_transition_patch)
+ return -EBUSY;

/* enforce stacking: only the last enabled patch can be disabled */
if (!list_is_last(&patch->list, &klp_patches) &&
list_next_entry(patch, list)->enabled)
return -EBUSY;

- pr_notice("disabling patch '%s'\n", patch->mod->name);
-
- klp_for_each_object(patch, obj) {
- if (obj->patched)
- klp_unpatch_object(obj);
- }
+ klp_init_transition(patch, KLP_UNIVERSE_NEW);
+ klp_start_transition(KLP_UNIVERSE_OLD);
+ if (!klp_try_complete_transition())
+ klp_schedule_work();

patch->enabled = false;

@@ -300,6 +326,9 @@ static int __klp_enable_patch(struct klp_patch *patch)
struct klp_object *obj;
int ret;

+ if (klp_transition_patch)
+ return -EBUSY;
+
if (WARN_ON(patch->enabled))
return -EINVAL;

@@ -311,24 +340,32 @@ static int __klp_enable_patch(struct klp_patch *patch)
pr_notice_once("tainting kernel with TAINT_LIVEPATCH\n");
add_taint(TAINT_LIVEPATCH, LOCKDEP_STILL_OK);

- pr_notice("enabling patch '%s'\n", patch->mod->name);
+ klp_init_transition(patch, KLP_UNIVERSE_OLD);

klp_for_each_object(patch, obj) {
if (!klp_is_object_loaded(obj))
continue;

ret = klp_patch_object(obj);
- if (ret)
- goto unregister;
+ if (ret) {
+ pr_warn("failed to enable patch '%s'\n",
+ patch->mod->name);
+
+ klp_unpatch_objects(patch);
+ klp_complete_transition();
+
+ return ret;
+ }
}

+ klp_start_transition(KLP_UNIVERSE_NEW);
+
+ if (!klp_try_complete_transition())
+ klp_schedule_work();
+
patch->enabled = true;

return 0;
-
-unregister:
- WARN_ON(__klp_disable_patch(patch));
- return ret;
}

/**
@@ -365,6 +402,7 @@ EXPORT_SYMBOL_GPL(klp_enable_patch);
* /sys/kernel/livepatch
* /sys/kernel/livepatch/<patch>
* /sys/kernel/livepatch/<patch>/enabled
+ * /sys/kernel/livepatch/<patch>/transition
* /sys/kernel/livepatch/<patch>/<object>
* /sys/kernel/livepatch/<patch>/<object>/<function,sympos>
*/
@@ -393,7 +431,9 @@ static ssize_t enabled_store(struct kobject *kobj, struct kobj_attribute *attr,
goto err;
}

- if (val) {
+ if (klp_transition_patch == patch) {
+ klp_reverse_transition();
+ } else if (val) {
ret = __klp_enable_patch(patch);
if (ret)
goto err;
@@ -421,9 +461,21 @@ static ssize_t enabled_show(struct kobject *kobj,
return snprintf(buf, PAGE_SIZE-1, "%d\n", patch->enabled);
}

+static ssize_t transition_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct klp_patch *patch;
+
+ patch = container_of(kobj, struct klp_patch, kobj);
+ return snprintf(buf, PAGE_SIZE-1, "%d\n",
+ klp_transition_patch == patch);
+}
+
static struct kobj_attribute enabled_kobj_attr = __ATTR_RW(enabled);
+static struct kobj_attribute transition_kobj_attr = __ATTR_RO(transition);
static struct attribute *klp_patch_attrs[] = {
&enabled_kobj_attr.attr,
+ &transition_kobj_attr.attr,
NULL
};

@@ -510,6 +562,7 @@ static int klp_init_func(struct klp_object *obj, struct klp_func *func)
{
INIT_LIST_HEAD(&func->stack_node);
func->patched = false;
+ func->transition = false;

/* The format for the sysfs directory is <function,sympos> where sympos
* is the nth occurrence of this symbol in kallsyms for the patched
@@ -738,7 +791,11 @@ int klp_module_coming(struct module *mod)
goto err;
}

- if (!patch->enabled)
+ /*
+ * Only patch the module if the patch is enabled or is
+ * in transition.
+ */
+ if (!patch->enabled && klp_transition_patch != patch)
break;

pr_notice("applying patch '%s' to loading module '%s'\n",
diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
index 92e9ee0..f0fa6b5 100644
--- a/kernel/livepatch/patch.c
+++ b/kernel/livepatch/patch.c
@@ -29,6 +29,7 @@
#include <linux/bug.h>
#include <linux/printk.h>
#include "patch.h"
+#include "transition.h"

static LIST_HEAD(klp_ops);

@@ -58,11 +59,42 @@ static void notrace klp_ftrace_handler(unsigned long ip,
ops = container_of(fops, struct klp_ops, fops);

rcu_read_lock();
+
func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
stack_node);
- if (WARN_ON_ONCE(!func))
+
+ if (!func)
goto unlock;

+ /*
+ * See the comment for the 2nd smp_wmb() in klp_init_transition() for
+ * an explanation of why this read barrier is needed.
+ */
+ smp_rmb();
+
+ if (unlikely(func->transition)) {
+
+ /*
+ * See the comment for the 1st smp_wmb() in
+ * klp_init_transition() for an explanation of why this read
+ * barrier is needed.
+ */
+ smp_rmb();
+
+ if (current->klp_universe == KLP_UNIVERSE_OLD) {
+ /*
+ * Use the previously patched version of the function.
+ * If no previous patches exist, use the original
+ * function.
+ */
+ func = list_entry_rcu(func->stack_node.next,
+ struct klp_func, stack_node);
+
+ if (&func->stack_node == &ops->func_stack)
+ goto unlock;
+ }
+ }
+
klp_arch_set_pc(regs, (unsigned long)func->new_func);
unlock:
rcu_read_unlock();
@@ -183,3 +215,12 @@ int klp_patch_object(struct klp_object *obj)

return 0;
}
+
+void klp_unpatch_objects(struct klp_patch *patch)
+{
+ struct klp_object *obj;
+
+ klp_for_each_object(patch, obj)
+ if (obj->patched)
+ klp_unpatch_object(obj);
+}
diff --git a/kernel/livepatch/patch.h b/kernel/livepatch/patch.h
index 2d0cce0..0db2271 100644
--- a/kernel/livepatch/patch.h
+++ b/kernel/livepatch/patch.h
@@ -28,5 +28,6 @@ struct klp_ops *klp_find_ops(unsigned long old_addr);

int klp_patch_object(struct klp_object *obj);
void klp_unpatch_object(struct klp_object *obj);
+void klp_unpatch_objects(struct klp_patch *patch);

#endif /* _LIVEPATCH_PATCH_H */
diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
new file mode 100644
index 0000000..0609d84
--- /dev/null
+++ b/kernel/livepatch/transition.c
@@ -0,0 +1,406 @@
+/*
+ * transition.c - Kernel Live Patching transition functions
+ *
+ * Copyright (C) 2015 Josh Poimboeuf <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/cpu.h>
+#include <linux/stacktrace.h>
+#include "../sched/sched.h"
+
+#include "patch.h"
+#include "transition.h"
+
+#define MAX_STACK_ENTRIES 100
+
+struct klp_patch *klp_transition_patch;
+
+int klp_universe_goal = KLP_UNIVERSE_UNDEFINED;
+
+/*
+ * The transition to the universe goal is complete. Clean up the data
+ * structures.
+ */
+void klp_complete_transition(void)
+{
+ struct klp_object *obj;
+ struct klp_func *func;
+
+ if (klp_transition_patch->immediate)
+ goto done;
+
+ klp_for_each_object(klp_transition_patch, obj)
+ klp_for_each_func(obj, func)
+ func->transition = false;
+
+done:
+ klp_transition_patch = NULL;
+}
+
+/*
+ * Determine whether the given stack trace includes any references to a
+ * to-be-patched or to-be-unpatched function.
+ */
+static int klp_check_stack_func(struct klp_func *func,
+ struct stack_trace *trace)
+{
+ unsigned long func_addr, func_size, address;
+ struct klp_ops *ops;
+ int i;
+
+ if (func->immediate)
+ return 0;
+
+ for (i = 0; i < trace->max_entries; i++) {
+ address = trace->entries[i];
+
+ if (klp_universe_goal == KLP_UNIVERSE_OLD) {
+ /*
+ * Check for the to-be-unpatched function
+ * (the func itself).
+ */
+ func_addr = (unsigned long)func->new_func;
+ func_size = func->new_size;
+ } else {
+ /*
+ * Check for the to-be-patched function
+ * (the previous func).
+ */
+ ops = klp_find_ops(func->old_addr);
+
+ if (list_is_singular(&ops->func_stack)) {
+ /* original function */
+ func_addr = func->old_addr;
+ func_size = func->old_size;
+ } else {
+ /* previously patched function */
+ struct klp_func *prev;
+
+ prev = list_next_entry(func, stack_node);
+ func_addr = (unsigned long)prev->new_func;
+ func_size = prev->new_size;
+ }
+ }
+
+ if (address >= func_addr && address < func_addr + func_size)
+ return -EAGAIN;
+ }
+
+ return 0;
+}
+
+/*
+ * Determine whether it's safe to transition the task to the new universe by
+ * looking for any to-be-patched or to-be-unpatched functions on its stack.
+ */
+static int klp_check_stack(struct task_struct *task)
+{
+ static unsigned long entries[MAX_STACK_ENTRIES];
+ struct stack_trace trace;
+ struct klp_object *obj;
+ struct klp_func *func;
+ int ret;
+
+ trace.skip = 0;
+ trace.nr_entries = 0;
+ trace.max_entries = MAX_STACK_ENTRIES;
+ trace.entries = entries;
+ ret = save_stack_trace_tsk_reliable(task, &trace);
+ WARN_ON_ONCE(ret == -ENOSYS);
+ if (ret) {
+ pr_debug("%s: pid %d (%s) has an unreliable stack\n",
+ __func__, task->pid, task->comm);
+ return ret;
+ }
+
+ klp_for_each_object(klp_transition_patch, obj) {
+ if (!obj->patched)
+ continue;
+ klp_for_each_func(obj, func) {
+ ret = klp_check_stack_func(func, &trace);
+ if (ret) {
+ pr_debug("%s: pid %d (%s) is sleeping on function %s\n",
+ __func__, task->pid, task->comm,
+ func->old_name);
+ return ret;
+ }
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * Try to safely switch a task to the universe goal. If it's currently
+ * running, or it's sleeping on a to-be-patched or to-be-unpatched function, or
+ * if the stack is unreliable, return false.
+ */
+static bool klp_try_switch_task(struct task_struct *task)
+{
+ struct rq *rq;
+ unsigned long flags;
+ int ret;
+ bool success = false;
+
+ /* check if this task has already switched over */
+ if (task->klp_universe == klp_universe_goal)
+ return true;
+
+ /*
+ * For arches which don't have reliable stack traces, we have to rely
+ * on other methods (e.g., switching tasks at the syscall barrier).
+ */
+ if (!IS_ENABLED(CONFIG_RELIABLE_STACKTRACE))
+ return false;
+
+ /*
+ * Now try to check the stack for any to-be-patched or to-be-unpatched
+ * functions. If all goes well, switch the task to the goal universe.
+ */
+ rq = task_rq_lock(task, &flags);
+
+ if (task_running(rq, task) && task != current) {
+ pr_debug("%s: pid %d (%s) is running\n", __func__, task->pid,
+ task->comm);
+ goto done;
+ }
+
+ ret = klp_check_stack(task);
+ if (ret)
+ goto done;
+
+ klp_update_task_universe(task);
+
+ success = true;
+done:
+ task_rq_unlock(rq, task, &flags);
+ return success;
+}
+
+/*
+ * Try to switch all remaining tasks to the goal universe by walking the stacks
+ * of sleeping tasks and looking for any to-be-patched or to-be-unpatched
+ * functions. If such functions are found, the task can't be switched yet.
+ *
+ * If any tasks are still stuck in the starting universe, schedule a retry.
+ */
+bool klp_try_complete_transition(void)
+{
+ unsigned int cpu;
+ struct task_struct *g, *task;
+ bool complete = true;
+
+ /*
+ * If the patch can be applied or reverted immediately, skip the
+ * per-task transitions.
+ */
+ if (klp_transition_patch->immediate)
+ goto success;
+
+ /*
+ * Try to switch the tasks to the goal universe by walking their stacks
+ * and looking for any to-be-patched or to-be-unpatched functions. If
+ * such functions are found on a stack, or if the stack is deemed
+ * unreliable, the task can't be switched yet.
+ *
+ * Usually this will transition most (or all) of the tasks on a system
+ * unless the patch includes changes to a very common function.
+ */
+ read_lock(&tasklist_lock);
+ for_each_process_thread(g, task)
+ if (!klp_try_switch_task(task))
+ complete = false;
+ read_unlock(&tasklist_lock);
+
+ /*
+ * Ditto for the idle "swapper" tasks.
+ */
+ get_online_cpus();
+ for_each_online_cpu(cpu)
+ if (!klp_try_switch_task(idle_task(cpu)))
+ complete = false;
+ put_online_cpus();
+
+ /*
+ * Some tasks weren't able to be switched over. Try again later and/or
+ * wait for other methods like syscall barrier switching.
+ */
+ if (!complete)
+ return false;
+
+success:
+ /*
+ * When unpatching, all tasks have transitioned to the old universe so
+ * we can now remove the new functions from the func_stack.
+ */
+ if (klp_universe_goal == KLP_UNIVERSE_OLD) {
+ klp_unpatch_objects(klp_transition_patch);
+
+ /*
+ * Don't allow any existing instances of ftrace handlers to
+ * access any obsolete funcs before we reset the func
+ * transition states to false. Otherwise the handler may see
+ * the deleted "new" func, see that it's not in transition, and
+ * wrongly pick the new version of the function.
+ */
+ synchronize_rcu();
+ }
+
+ pr_notice("'%s': %s complete\n", klp_transition_patch->mod->name,
+ klp_universe_goal == KLP_UNIVERSE_NEW ? "patching" :
+ "unpatching");
+
+ /* we're done, now cleanup the data structures */
+ klp_complete_transition();
+
+ return true;
+}
+
+/*
+ * Start the transition to the specified universe goal so tasks can begin
+ * switching to it.
+ */
+void klp_start_transition(int universe)
+{
+ if (WARN_ON(klp_universe_goal == universe))
+ return;
+
+ pr_notice("'%s': %s...\n", klp_transition_patch->mod->name,
+ universe == KLP_UNIVERSE_NEW ? "patching" : "unpatching");
+
+ /*
+ * Set the global universe goal which tasks will switch to.
+ *
+ * Note that any newly forked tasks after this call will be born in the
+ * goal universe. So the transition begins here, even before we start
+ * switching tasks.
+ */
+ klp_universe_goal = universe;
+
+ /*
+ * Enforce the ordering of the universe goal write with later
+ * task universe writes which are done via
+ * klp_try_complete_transition(). The corresponding read barrier is in
+ * klp_update_task_universe().
+ */
+ smp_wmb();
+}
+
+/*
+ * This function can be called in the middle of an existing transition to
+ * reverse the direction of the universe goal. This can be done to effectively
+ * cancel an existing enable or disable operation if there are any tasks which
+ * are stuck in the starting universe.
+ */
+void klp_reverse_transition(void)
+{
+ struct klp_patch *patch = klp_transition_patch;
+
+ klp_start_transition(!klp_universe_goal);
+ klp_try_complete_transition();
+
+ patch->enabled = !patch->enabled;
+}
+
+/*
+ * Set the global universe goal and all tasks to the starting universe, and
+ * initialize all function transition states to true in preparation for
+ * patching or unpatching.
+ */
+void klp_init_transition(struct klp_patch *patch, int universe)
+{
+ struct task_struct *g, *task;
+ unsigned int cpu;
+ struct klp_object *obj;
+ struct klp_func *func;
+
+ klp_transition_patch = patch;
+
+ /*
+ * Initialize the universe goal to the starting universe.
+ */
+ klp_universe_goal = universe;
+
+ /*
+ * Ensure that if another CPU forks a task after the below task
+ * universe writes and calls klp_update_task_universe(), it also sees
+ * the above write to the universe goal. Otherwise it may undo the
+ * task universe writes below and mess up the task's starting universe.
+ */
+ smp_wmb();
+
+ /*
+ * If the patch can be applied or reverted immediately, skip the
+ * per-task transitions.
+ */
+ if (patch->immediate)
+ return;
+
+ /*
+ * Initialize the task universes to their starting universe to prepare
+ * them for switching to the goal universe.
+ */
+ read_lock(&tasklist_lock);
+ for_each_process_thread(g, task)
+ klp_update_task_universe(task);
+ read_unlock(&tasklist_lock);
+
+ /*
+ * Ditto for the idle "swapper" tasks.
+ */
+ get_online_cpus();
+ for_each_online_cpu(cpu)
+ klp_update_task_universe(idle_task(cpu));
+ put_online_cpus();
+
+ /*
+ * Ensure klp_ftrace_handler() sees the task->klp_universe updates
+ * before the func->transition updates. Otherwise it could read an
+ * out-of-date task universe and pick the wrong function.
+ */
+ smp_wmb();
+
+ /*
+ * Set the func transition states so klp_ftrace_handler() will know to
+ * switch to the transition logic.
+ *
+ * When patching, the funcs aren't yet in the func_stack and will be
+ * made visible to the ftrace handler shortly by the calls to
+ * klp_patch_object().
+ *
+ * When unpatching, the funcs are already in the func_stack and so are
+ * already visible to the ftrace handler.
+ */
+ klp_for_each_object(patch, obj)
+ klp_for_each_func(obj, func)
+ func->transition = true;
+
+ /*
+ * For the enable path, ensure klp_ftrace_handler() will see the
+ * func->transition updates before the funcs become visible to the
+ * handler. Otherwise the handler may wrongly pick the new func before
+ * the task switches to the new universe.
+ *
+ * For the disable path, the funcs are already visible to the handler.
+ * But we still need to ensure the ftrace handler will see the
+ * func->transition updates before the tasks start switching to the old
+ * universe. Otherwise the handler can miss a universe change which
+ * would result in it wrongly picking the new function.
+ */
+ smp_wmb();
+}
diff --git a/kernel/livepatch/transition.h b/kernel/livepatch/transition.h
new file mode 100644
index 0000000..98e9930
--- /dev/null
+++ b/kernel/livepatch/transition.h
@@ -0,0 +1,20 @@
+#ifndef _LIVEPATCH_TRANSITION_H
+#define _LIVEPATCH_TRANSITION_H
+
+#include <linux/livepatch.h>
+
+enum {
+ KLP_UNIVERSE_UNDEFINED = -1,
+ KLP_UNIVERSE_OLD,
+ KLP_UNIVERSE_NEW,
+};
+
+extern struct klp_patch *klp_transition_patch;
+
+extern void klp_init_transition(struct klp_patch *patch, int universe);
+extern void klp_start_transition(int universe);
+extern void klp_reverse_transition(void);
+extern bool klp_try_complete_transition(void);
+extern void klp_complete_transition(void);
+
+#endif /* _LIVEPATCH_TRANSITION_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index be1ef22..431007b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -74,6 +74,7 @@
#include <linux/context_tracking.h>
#include <linux/compiler.h>
#include <linux/frame.h>
+#include <linux/livepatch.h>

#include <asm/switch_to.h>
#include <asm/tlb.h>
@@ -5082,6 +5083,7 @@ void init_idle(struct task_struct *idle, int cpu)
#ifdef CONFIG_SMP
sprintf(idle->comm, "%s/%d", INIT_TASK_COMM, cpu);
#endif
+ klp_update_task_universe(idle);
}

int cpuset_cpumask_can_shrink(const struct cpumask *cur,
--
2.4.3

2016-03-25 19:37:46

by Josh Poimboeuf

[permalink] [raw]
Subject: [RFC PATCH v1.9 09/14] livepatch: remove unnecessary object loaded check

klp_patch_object()'s callers already ensure that the object is loaded,
so its call to klp_is_object_loaded() is unnecessary.

This will also make it possible to move the patching code into a
separate file.

Signed-off-by: Josh Poimboeuf <[email protected]>
---
kernel/livepatch/core.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index be1e106..1f70500 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -402,9 +402,6 @@ static int klp_patch_object(struct klp_object *obj)
if (WARN_ON(obj->patched))
return -EINVAL;

- if (WARN_ON(!klp_is_object_loaded(obj)))
- return -EINVAL;
-
klp_for_each_func(obj, func) {
ret = klp_patch_func(func);
if (ret) {
--
2.4.3

2016-03-25 19:37:59

by Josh Poimboeuf

[permalink] [raw]
Subject: [RFC PATCH v1.9 08/14] livepatch: separate enabled and patched states

Once we have a consistency model, patches and their objects will be
enabled and disabled at different times. For example, when a patch is
disabled, its loaded objects' funcs can remain registered with ftrace
indefinitely until the unpatching operation is complete and they're no
longer in use.

It's less confusing if we give them different names: patches can be
enabled or disabled; objects (and their funcs) can be patched or
unpatched:

- Enabled means that a patch is logically enabled (but not necessarily
fully applied).

- Patched means that an object's funcs are registered with ftrace and
added to the klp_ops func stack.

Also, since these states are binary, represent them with booleans
instead of ints.

Signed-off-by: Josh Poimboeuf <[email protected]>
---
include/linux/livepatch.h | 17 ++++-------
kernel/livepatch/core.c | 72 +++++++++++++++++++++++------------------------
2 files changed, 42 insertions(+), 47 deletions(-)

diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
index bd830d5..6d45dc7 100644
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -28,11 +28,6 @@

#include <asm/livepatch.h>

-enum klp_state {
- KLP_DISABLED,
- KLP_ENABLED
-};
-
/**
* struct klp_func - function structure for live patching
* @old_name: name of the function to be patched
@@ -41,8 +36,8 @@ enum klp_state {
* can be found (optional)
* @old_addr: the address of the function being patched
* @kobj: kobject for sysfs resources
- * @state: tracks function-level patch application state
* @stack_node: list node for klp_ops func_stack list
+ * @patched: the func has been added to the klp_ops list
*/
struct klp_func {
/* external */
@@ -60,8 +55,8 @@ struct klp_func {
/* internal */
unsigned long old_addr;
struct kobject kobj;
- enum klp_state state;
struct list_head stack_node;
+ bool patched;
};

/**
@@ -90,7 +85,7 @@ struct klp_reloc {
* @kobj: kobject for sysfs resources
* @mod: kernel module associated with the patched object
* (NULL for vmlinux)
- * @state: tracks object-level patch application state
+ * @patched: the object's funcs have been add to the klp_ops list
*/
struct klp_object {
/* external */
@@ -101,7 +96,7 @@ struct klp_object {
/* internal */
struct kobject kobj;
struct module *mod;
- enum klp_state state;
+ bool patched;
};

/**
@@ -110,7 +105,7 @@ struct klp_object {
* @objs: object entries for kernel objects to be patched
* @list: list node for global list of registered patches
* @kobj: kobject for sysfs resources
- * @state: tracks patch-level application state
+ * @enabled: the patch is enabled (but operation may be incomplete)
*/
struct klp_patch {
/* external */
@@ -120,7 +115,7 @@ struct klp_patch {
/* internal */
struct list_head list;
struct kobject kobj;
- enum klp_state state;
+ bool enabled;
};

#define klp_for_each_object(patch, obj) \
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index d68fbf6..be1e106 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -298,11 +298,11 @@ unlock:
rcu_read_unlock();
}

-static void klp_disable_func(struct klp_func *func)
+static void klp_unpatch_func(struct klp_func *func)
{
struct klp_ops *ops;

- if (WARN_ON(func->state != KLP_ENABLED))
+ if (WARN_ON(!func->patched))
return;
if (WARN_ON(!func->old_addr))
return;
@@ -322,10 +322,10 @@ static void klp_disable_func(struct klp_func *func)
list_del_rcu(&func->stack_node);
}

- func->state = KLP_DISABLED;
+ func->patched = false;
}

-static int klp_enable_func(struct klp_func *func)
+static int klp_patch_func(struct klp_func *func)
{
struct klp_ops *ops;
int ret;
@@ -333,7 +333,7 @@ static int klp_enable_func(struct klp_func *func)
if (WARN_ON(!func->old_addr))
return -EINVAL;

- if (WARN_ON(func->state != KLP_DISABLED))
+ if (WARN_ON(func->patched))
return -EINVAL;

ops = klp_find_ops(func->old_addr);
@@ -372,7 +372,7 @@ static int klp_enable_func(struct klp_func *func)
list_add_rcu(&func->stack_node, &ops->func_stack);
}

- func->state = KLP_ENABLED;
+ func->patched = true;

return 0;

@@ -383,36 +383,36 @@ err:
return ret;
}

-static void klp_disable_object(struct klp_object *obj)
+static void klp_unpatch_object(struct klp_object *obj)
{
struct klp_func *func;

klp_for_each_func(obj, func)
- if (func->state == KLP_ENABLED)
- klp_disable_func(func);
+ if (func->patched)
+ klp_unpatch_func(func);

- obj->state = KLP_DISABLED;
+ obj->patched = false;
}

-static int klp_enable_object(struct klp_object *obj)
+static int klp_patch_object(struct klp_object *obj)
{
struct klp_func *func;
int ret;

- if (WARN_ON(obj->state != KLP_DISABLED))
+ if (WARN_ON(obj->patched))
return -EINVAL;

if (WARN_ON(!klp_is_object_loaded(obj)))
return -EINVAL;

klp_for_each_func(obj, func) {
- ret = klp_enable_func(func);
+ ret = klp_patch_func(func);
if (ret) {
- klp_disable_object(obj);
+ klp_unpatch_object(obj);
return ret;
}
}
- obj->state = KLP_ENABLED;
+ obj->patched = true;

return 0;
}
@@ -423,17 +423,17 @@ static int __klp_disable_patch(struct klp_patch *patch)

/* enforce stacking: only the last enabled patch can be disabled */
if (!list_is_last(&patch->list, &klp_patches) &&
- list_next_entry(patch, list)->state == KLP_ENABLED)
+ list_next_entry(patch, list)->enabled)
return -EBUSY;

pr_notice("disabling patch '%s'\n", patch->mod->name);

klp_for_each_object(patch, obj) {
- if (obj->state == KLP_ENABLED)
- klp_disable_object(obj);
+ if (obj->patched)
+ klp_unpatch_object(obj);
}

- patch->state = KLP_DISABLED;
+ patch->enabled = false;

return 0;
}
@@ -457,7 +457,7 @@ int klp_disable_patch(struct klp_patch *patch)
goto err;
}

- if (patch->state == KLP_DISABLED) {
+ if (!patch->enabled) {
ret = -EINVAL;
goto err;
}
@@ -475,12 +475,12 @@ static int __klp_enable_patch(struct klp_patch *patch)
struct klp_object *obj;
int ret;

- if (WARN_ON(patch->state != KLP_DISABLED))
+ if (WARN_ON(patch->enabled))
return -EINVAL;

/* enforce stacking: only the first disabled patch can be enabled */
if (patch->list.prev != &klp_patches &&
- list_prev_entry(patch, list)->state == KLP_DISABLED)
+ !list_prev_entry(patch, list)->enabled)
return -EBUSY;

pr_notice_once("tainting kernel with TAINT_LIVEPATCH\n");
@@ -492,12 +492,12 @@ static int __klp_enable_patch(struct klp_patch *patch)
if (!klp_is_object_loaded(obj))
continue;

- ret = klp_enable_object(obj);
+ ret = klp_patch_object(obj);
if (ret)
goto unregister;
}

- patch->state = KLP_ENABLED;
+ patch->enabled = true;

return 0;

@@ -555,20 +555,20 @@ static ssize_t enabled_store(struct kobject *kobj, struct kobj_attribute *attr,
if (ret)
return -EINVAL;

- if (val != KLP_DISABLED && val != KLP_ENABLED)
+ if (val > 1)
return -EINVAL;

patch = container_of(kobj, struct klp_patch, kobj);

mutex_lock(&klp_mutex);

- if (val == patch->state) {
+ if (patch->enabled == val) {
/* already in requested state */
ret = -EINVAL;
goto err;
}

- if (val == KLP_ENABLED) {
+ if (val) {
ret = __klp_enable_patch(patch);
if (ret)
goto err;
@@ -593,7 +593,7 @@ static ssize_t enabled_show(struct kobject *kobj,
struct klp_patch *patch;

patch = container_of(kobj, struct klp_patch, kobj);
- return snprintf(buf, PAGE_SIZE-1, "%d\n", patch->state);
+ return snprintf(buf, PAGE_SIZE-1, "%d\n", patch->enabled);
}

static struct kobj_attribute enabled_kobj_attr = __ATTR_RW(enabled);
@@ -684,7 +684,7 @@ static void klp_free_patch(struct klp_patch *patch)
static int klp_init_func(struct klp_object *obj, struct klp_func *func)
{
INIT_LIST_HEAD(&func->stack_node);
- func->state = KLP_DISABLED;
+ func->patched = false;

/* The format for the sysfs directory is <function,sympos> where sympos
* is the nth occurrence of this symbol in kallsyms for the patched
@@ -729,7 +729,7 @@ static int klp_init_object(struct klp_patch *patch, struct klp_object *obj)
if (!obj->funcs)
return -EINVAL;

- obj->state = KLP_DISABLED;
+ obj->patched = false;
obj->mod = NULL;

klp_find_object_module(obj);
@@ -770,7 +770,7 @@ static int klp_init_patch(struct klp_patch *patch)

mutex_lock(&klp_mutex);

- patch->state = KLP_DISABLED;
+ patch->enabled = false;

ret = kobject_init_and_add(&patch->kobj, &klp_ktype_patch,
klp_root_kobj, "%s", patch->mod->name);
@@ -816,7 +816,7 @@ int klp_unregister_patch(struct klp_patch *patch)
goto out;
}

- if (patch->state == KLP_ENABLED) {
+ if (patch->enabled) {
ret = -EBUSY;
goto out;
}
@@ -897,13 +897,13 @@ int klp_module_coming(struct module *mod)
goto err;
}

- if (patch->state == KLP_DISABLED)
+ if (!patch->enabled)
break;

pr_notice("applying patch '%s' to loading module '%s'\n",
patch->mod->name, obj->mod->name);

- ret = klp_enable_object(obj);
+ ret = klp_patch_object(obj);
if (ret) {
pr_warn("failed to apply patch '%s' to module '%s' (%d)\n",
patch->mod->name, obj->mod->name, ret);
@@ -954,10 +954,10 @@ void klp_module_going(struct module *mod)
if (!klp_is_module(obj) || strcmp(obj->name, mod->name))
continue;

- if (patch->state != KLP_DISABLED) {
+ if (patch->enabled) {
pr_notice("reverting patch '%s' on unloading module '%s'\n",
patch->mod->name, obj->mod->name);
- klp_disable_object(obj);
+ klp_unpatch_object(obj);
}

klp_free_object_loaded(obj);
--
2.4.3

2016-03-25 19:36:15

by Josh Poimboeuf

[permalink] [raw]
Subject: [RFC PATCH v1.9 05/14] sched: horrible way to detect whether a task has been preempted

This is a horrible way to detect whether a task has been preempted.
Come up with something better: task flag? or is there already an
existing mechanism?

Signed-off-by: Josh Poimboeuf <[email protected]>
---
include/linux/sched.h | 11 ++++++++++-
kernel/sched/core.c | 26 ++++++++++++++++++++++++--
2 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 589c478..62d0961 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -3239,4 +3239,13 @@ struct update_util_data {
void cpufreq_set_update_util_data(int cpu, struct update_util_data *data);
#endif /* CONFIG_CPU_FREQ */

-#endif
+#ifdef CONFIG_PREEMPT
+extern bool in_preempt_schedule_irq(unsigned long addr);
+#else
+static inline bool in_preempt_schedule_irq(unsigned long addr)
+{
+ return false;
+}
+#endif /* CONFIG_PREEMPT */
+
+#endif /* _LINUX_SCHED_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d8465ee..be1ef22 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3334,8 +3334,6 @@ asmlinkage __visible void __sched notrace preempt_schedule_notrace(void)
}
EXPORT_SYMBOL_GPL(preempt_schedule_notrace);

-#endif /* CONFIG_PREEMPT */
-
/*
* this is the entry point to schedule() from kernel preemption
* off of irq context.
@@ -3360,8 +3358,32 @@ asmlinkage __visible void __sched preempt_schedule_irq(void)
} while (need_resched());

exception_exit(prev_state);
+
+ asm("preempt_schedule_irq_end:");
}

+/*
+ * in_preempt_schedule_irq - determine if instruction address is inside the
+ * preempt_schedule_irq() function
+ *
+ * This is used when walking the stack of a task to determine whether an
+ * interrupt frame exists.
+ *
+ * NOTE: This function could return false if the address is in the function
+ * epilogue. But it's good enough for our purposes, because we only care about
+ * addresses which have been saved on a stack. If preempt_schedule_irq() is on
+ * the stack of a task, the saved address will always be prior to the epilogue.
+ */
+bool in_preempt_schedule_irq(unsigned long addr)
+{
+ extern void *preempt_schedule_irq_end;
+
+ return (addr >= (unsigned long)preempt_schedule_irq &&
+ addr < (unsigned long)preempt_schedule_irq_end);
+}
+
+#endif /* CONFIG_PREEMPT */
+
int default_wake_function(wait_queue_t *curr, unsigned mode, int wake_flags,
void *key)
{
--
2.4.3

2016-03-25 19:38:20

by Josh Poimboeuf

[permalink] [raw]
Subject: [RFC PATCH v1.9 07/14] x86/stacktrace: add function for detecting reliable stack traces

For live patching and possibly other use cases, a stack trace is only
useful if you can be assured that it's completely reliable. Add a new
save_stack_trace_tsk_reliable() function to achieve that.

Scenarios which indicate that a stack strace may be unreliable:

- interrupt stacks
- preemption
- corrupted stack data
- newly forked tasks
- running tasks
- the user didn't provide a large enough entries array

Also add a config option so arch-independent code can determine at build
time whether the function is implemented.

Signed-off-by: Josh Poimboeuf <[email protected]>
---
arch/Kconfig | 6 ++++++
arch/x86/Kconfig | 1 +
arch/x86/kernel/dumpstack.c | 36 ++++++++++++++++++++++++++++++++++++
arch/x86/kernel/stacktrace.c | 32 ++++++++++++++++++++++++++++++++
include/linux/stacktrace.h | 20 ++++++++++++++++----
kernel/stacktrace.c | 4 ++--
lib/Kconfig.debug | 6 ++++++
7 files changed, 99 insertions(+), 6 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 81869a5..68b95f1 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -589,6 +589,12 @@ config HAVE_STACK_VALIDATION
Architecture supports the 'objtool check' host tool command, which
performs compile-time stack metadata validation.

+config HAVE_RELIABLE_STACKTRACE
+ bool
+ help
+ Architecure has a save_stack_trace_tsk_reliable() function which only
+ returns a stack trace if it can guarantee the trace is reliable.
+
#
# ABI hall of shame
#
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2dc18605..76274b8 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -138,6 +138,7 @@ config X86
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
select HAVE_REGS_AND_STACK_ACCESS_API
+ select HAVE_RELIABLE_STACKTRACE if X86_64 && FRAME_POINTER
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_UID16 if X86_32 || IA32_EMULATION
select HAVE_UNSTABLE_SCHED_CLOCK
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 3b10518..9c68bfc 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -145,6 +145,42 @@ int print_context_stack_bp(struct thread_info *tinfo,
}
EXPORT_SYMBOL_GPL(print_context_stack_bp);

+int print_context_stack_reliable(struct thread_info *tinfo,
+ unsigned long *stack, unsigned long *bp,
+ const struct stacktrace_ops *ops,
+ void *data, unsigned long *end, int *graph)
+{
+ struct stack_frame *frame = (struct stack_frame *)*bp;
+ struct stack_frame *last_frame = frame;
+ unsigned long *ret_addr = &frame->return_address;
+
+ if (test_ti_thread_flag(tinfo, TIF_FORK))
+ return -EINVAL;
+
+ while (valid_stack_ptr(tinfo, ret_addr, sizeof(*ret_addr), end)) {
+ unsigned long addr = *ret_addr;
+
+ if (frame <= last_frame || !__kernel_text_address(addr) ||
+ in_preempt_schedule_irq(addr))
+ return -EINVAL;
+
+ if (ops->address(data, addr, 1))
+ return -EINVAL;
+
+ print_ftrace_graph_addr(addr, data, ops, tinfo, graph);
+
+ last_frame = frame;
+ frame = frame->next_frame;
+ ret_addr = &frame->return_address;
+ }
+
+ if (last_frame + 1 != (void *)task_pt_regs(tinfo->task))
+ return -EINVAL;
+
+ *bp = (unsigned long)frame;
+ return 0;
+}
+
static int print_trace_stack(void *data, char *name)
{
printk("%s <%s> ", (char *)data, name);
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 9ee98ee..61078eb 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -14,6 +14,11 @@ static int save_stack_stack(void *data, char *name)
return 0;
}

+static int save_stack_stack_reliable(void *data, char *name)
+{
+ return -EINVAL;
+}
+
static int
__save_stack_address(void *data, unsigned long addr, bool reliable, bool nosched)
{
@@ -59,6 +64,12 @@ static const struct stacktrace_ops save_stack_ops_nosched = {
.walk_stack = print_context_stack,
};

+static const struct stacktrace_ops save_stack_ops_reliable = {
+ .stack = save_stack_stack_reliable,
+ .address = save_stack_address,
+ .walk_stack = print_context_stack_reliable,
+};
+
/*
* Save stack-backtrace addresses into a stack_trace buffer.
*/
@@ -148,3 +159,24 @@ void save_stack_trace_user(struct stack_trace *trace)
trace->entries[trace->nr_entries++] = ULONG_MAX;
}

+#ifdef CONFIG_RELIABLE_STACKTRACE
+/*
+ * Returns 0 if the stack trace is deemed reliable. The caller must ensure
+ * that the task is either sleeping or is the current task.
+ */
+int save_stack_trace_tsk_reliable(struct task_struct *tsk,
+ struct stack_trace *trace)
+{
+ int ret;
+
+ ret = dump_trace(tsk, NULL, NULL, 0, &save_stack_ops_reliable, trace);
+ if (ret)
+ return ret;
+
+ if (trace->nr_entries == trace->max_entries)
+ return -EINVAL;
+
+ trace->entries[trace->nr_entries++] = ULONG_MAX;
+ return 0;
+}
+#endif /* CONFIG_RELIABLE_STACKTRACE */
diff --git a/include/linux/stacktrace.h b/include/linux/stacktrace.h
index 0a34489..527e4cc 100644
--- a/include/linux/stacktrace.h
+++ b/include/linux/stacktrace.h
@@ -2,17 +2,18 @@
#define __LINUX_STACKTRACE_H

#include <linux/types.h>
+#include <linux/errno.h>

struct task_struct;
struct pt_regs;

-#ifdef CONFIG_STACKTRACE
struct stack_trace {
unsigned int nr_entries, max_entries;
unsigned long *entries;
int skip; /* input argument: How many entries to skip */
};

+#ifdef CONFIG_STACKTRACE
extern void save_stack_trace(struct stack_trace *trace);
extern void save_stack_trace_regs(struct pt_regs *regs,
struct stack_trace *trace);
@@ -29,12 +30,23 @@ extern void save_stack_trace_user(struct stack_trace *trace);
# define save_stack_trace_user(trace) do { } while (0)
#endif

-#else
+#else /* !CONFIG_STACKTRACE */
# define save_stack_trace(trace) do { } while (0)
# define save_stack_trace_tsk(tsk, trace) do { } while (0)
# define save_stack_trace_user(trace) do { } while (0)
# define print_stack_trace(trace, spaces) do { } while (0)
# define snprint_stack_trace(buf, size, trace, spaces) do { } while (0)
-#endif
+#endif /* CONFIG_STACKTRACE */

-#endif
+#ifdef CONFIG_RELIABLE_STACKTRACE
+extern int save_stack_trace_tsk_reliable(struct task_struct *tsk,
+ struct stack_trace *trace);
+#else
+static inline int save_stack_trace_tsk_reliable(struct task_struct *tsk,
+ struct stack_trace *trace)
+{
+ return -ENOSYS;
+}
+#endif /* CONFIG_RELIABLE_STACKTRACE */
+
+#endif /* __LINUX_STACKTRACE_H */
diff --git a/kernel/stacktrace.c b/kernel/stacktrace.c
index b6e4c16..f35bc5d 100644
--- a/kernel/stacktrace.c
+++ b/kernel/stacktrace.c
@@ -58,8 +58,8 @@ int snprint_stack_trace(char *buf, size_t size,
EXPORT_SYMBOL_GPL(snprint_stack_trace);

/*
- * Architectures that do not implement save_stack_trace_tsk or
- * save_stack_trace_regs get this weak alias and a once-per-bootup warning
+ * Architectures that do not implement save_stack_trace_*()
+ * get this weak alias and a once-per-bootup warning
* (whenever this facility is utilized - for example by procfs):
*/
__weak void
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 1e9a607..1edf69c 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1159,6 +1159,12 @@ config STACKTRACE
It is also used by various kernel debugging features that require
stack trace generation.

+config RELIABLE_STACKTRACE
+ def_bool y
+ depends on HAVE_RELIABLE_STACKTRACE
+ depends on STACKTRACE
+ depends on STACK_VALIDATION
+
config DEBUG_KOBJECT
bool "kobject debugging"
depends on DEBUG_KERNEL
--
2.4.3

2016-03-25 19:36:13

by Josh Poimboeuf

[permalink] [raw]
Subject: [RFC PATCH v1.9 01/14] x86/asm/head: cleanup initial stack variable

The stack_start variable is similar in usage to initial_code and
initial_gs: they're all stored in head_64.S and they're all updated by
SMP and suspend/resume before starting a CPU.

Rename stack_start to initial_stack to be more consistent with the
others.

Also do a few other related cleanups:

- Remove the unused init_rsp variable declaration.

- Remove the ".word 0" statement after the initial_stack definition
because it has no apparent function.

Signed-off-by: Josh Poimboeuf <[email protected]>
---
arch/x86/include/asm/realmode.h | 2 +-
arch/x86/include/asm/smp.h | 3 ---
arch/x86/kernel/acpi/sleep.c | 2 +-
arch/x86/kernel/head_32.S | 8 ++++----
arch/x86/kernel/head_64.S | 10 ++++------
arch/x86/kernel/smpboot.c | 2 +-
6 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 9c6b890..677a671 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -44,9 +44,9 @@ struct trampoline_header {
extern struct real_mode_header *real_mode_header;
extern unsigned char real_mode_blob_end[];

-extern unsigned long init_rsp;
extern unsigned long initial_code;
extern unsigned long initial_gs;
+extern unsigned long initial_stack;

extern unsigned char real_mode_blob[];
extern unsigned char real_mode_relocs[];
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 20a3de5..bd4c76c 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -38,9 +38,6 @@ DECLARE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_bios_cpu_apicid);
DECLARE_EARLY_PER_CPU_READ_MOSTLY(int, x86_cpu_to_logical_apicid);
#endif

-/* Static state in head.S used to set up a CPU */
-extern unsigned long stack_start; /* Initial stack pointer address */
-
struct task_struct;

struct smp_ops {
diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index adb3eaf..4858733 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -99,7 +99,7 @@ int x86_acpi_suspend_lowlevel(void)
saved_magic = 0x12345678;
#else /* CONFIG_64BIT */
#ifdef CONFIG_SMP
- stack_start = (unsigned long)temp_stack + sizeof(temp_stack);
+ initial_stack = (unsigned long)temp_stack + sizeof(temp_stack);
early_gdt_descr.address =
(unsigned long)get_cpu_gdt_table(smp_processor_id());
initial_gs = per_cpu_offset(smp_processor_id());
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 54cdbd2..60a16fc 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -94,7 +94,7 @@ RESERVE_BRK(pagetables, INIT_MAP_SIZE)
*/
__HEAD
ENTRY(startup_32)
- movl pa(stack_start),%ecx
+ movl pa(initial_stack),%ecx

/* test KEEP_SEGMENTS flag to see if the bootloader is asking
us to not reload segments */
@@ -286,7 +286,7 @@ num_subarch_entries = (. - subarch_entries) / 4
* start_secondary().
*/
ENTRY(start_cpu0)
- movl stack_start, %ecx
+ movl initial_stack, %ecx
movl %ecx, %esp
jmp *(initial_code)
ENDPROC(start_cpu0)
@@ -307,7 +307,7 @@ ENTRY(startup_32_smp)
movl %eax,%es
movl %eax,%fs
movl %eax,%gs
- movl pa(stack_start),%ecx
+ movl pa(initial_stack),%ecx
movl %eax,%ss
leal -__PAGE_OFFSET(%ecx),%esp

@@ -714,7 +714,7 @@ ENTRY(initial_page_table)

.data
.balign 4
-ENTRY(stack_start)
+ENTRY(initial_stack)
.long init_thread_union+THREAD_SIZE

__INITRODATA
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 22fbf9d..278270e 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -217,7 +217,7 @@ ENTRY(secondary_startup_64)
movq %rax, %cr0

/* Setup a boot time stack */
- movq stack_start(%rip), %rsp
+ movq initial_stack(%rip), %rsp

/* zero EFLAGS after setting rsp */
pushq $0
@@ -300,7 +300,7 @@ ENTRY(secondary_startup_64)
* start_secondary().
*/
ENTRY(start_cpu0)
- movq stack_start(%rip),%rsp
+ movq initial_stack(%rip),%rsp
movq initial_code(%rip),%rax
pushq $0 # fake return address to stop unwinder
pushq $__KERNEL_CS # set correct cs
@@ -309,17 +309,15 @@ ENTRY(start_cpu0)
ENDPROC(start_cpu0)
#endif

- /* SMP bootup changes these two */
+ /* SMP bootup changes these variables */
__REFDATA
.balign 8
GLOBAL(initial_code)
.quad x86_64_start_kernel
GLOBAL(initial_gs)
.quad INIT_PER_CPU_VAR(irq_stack_union)
-
- GLOBAL(stack_start)
+ GLOBAL(initial_stack)
.quad init_thread_union+THREAD_SIZE-8
- .word 0
__FINITDATA

bad_address:
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index b2c99f8..670f460 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -950,7 +950,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)

early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(cpu);
initial_code = (unsigned long)start_secondary;
- stack_start = idle->thread.sp;
+ initial_stack = idle->thread.sp;

/*
* Enable the espfix hack for this CPU
--
2.4.3

2016-03-25 19:38:54

by Josh Poimboeuf

[permalink] [raw]
Subject: [RFC PATCH v1.9 04/14] x86: move _stext marker before head code

When core_kernel_text() is used to determine whether an address on a
task's stack trace is a kernel text address, it incorrectly returns
false for early text addresses for the head code between the _text and
_stext markers.

Head code is text code too, so mark it as such. This seems to match the
intent of other users of the _stext symbol, and it also seems consistent
with what other architectures are already doing.

Signed-off-by: Josh Poimboeuf <[email protected]>
---
arch/x86/kernel/vmlinux.lds.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index d239639..66e6acb 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -91,10 +91,10 @@ SECTIONS
/* Text and read-only data */
.text : AT(ADDR(.text) - LOAD_OFFSET) {
_text = .;
+ _stext = .;
/* bootstrapping code */
HEAD_TEXT
. = ALIGN(8);
- _stext = .;
TEXT_TEXT
SCHED_TEXT
LOCK_TEXT
--
2.4.3

2016-03-25 19:39:19

by Josh Poimboeuf

[permalink] [raw]
Subject: [RFC PATCH v1.9 03/14] x86/asm/head: standardize the bottom of the stack for idle tasks

Thanks to all the recent x86 entry code refactoring, most tasks' kernel
stacks start at the same offset right above their saved pt_regs,
regardless of which syscall was used to enter the kernel. That creates
a useful convention which makes it straightforward to identify the
"bottom" of the stack, which can be useful for stack walking code which
needs to verify the stack is sane.

However there are still a few types of tasks which don't yet follow that
convention:

1) CPU idle tasks, aka the "swapper" tasks

2) freshly forked tasks which haven't run yet and have TIF_FORK set

Make the idle tasks conform to the new stack bottom convention by
starting their stack at a sizeof(pt_regs) offset from the end of the
stack page.

Signed-off-by: Josh Poimboeuf <[email protected]>
---
arch/x86/kernel/head_64.S | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index f2deae7..255fe54 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -20,6 +20,7 @@
#include <asm/processor-flags.h>
#include <asm/percpu.h>
#include <asm/nops.h>
+#include "../entry/calling.h"

#ifdef CONFIG_PARAVIRT
#include <asm/asm-offsets.h>
@@ -287,8 +288,9 @@ ENTRY(start_cpu)
* REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
* address given in m16:64.
*/
- movq initial_code(%rip),%rax
- pushq $0 # fake return address to stop unwinder
+ call 1f # put return address on stack for unwinder
+1: xorq %rbp, %rbp # clear frame pointer
+ movq initial_code(%rip), %rax
pushq $__KERNEL_CS # set correct cs
pushq %rax # target address in negative space
lretq
@@ -316,7 +318,7 @@ ENDPROC(start_cpu0)
GLOBAL(initial_gs)
.quad INIT_PER_CPU_VAR(irq_stack_union)
GLOBAL(initial_stack)
- .quad init_thread_union+THREAD_SIZE-8
+ .quad init_thread_union + THREAD_SIZE - SIZEOF_PTREGS
__FINITDATA

bad_address:
--
2.4.3

2016-03-25 19:39:34

by Josh Poimboeuf

[permalink] [raw]
Subject: [RFC PATCH v1.9 02/14] x86/asm/head: use a common function for starting CPUs

There are two different pieces of code for starting a CPU: start_cpu0()
and the end of secondary_startup_64(). They're identical except for the
stack setup. Combine the common parts into a shared start_cpu()
function.

Signed-off-by: Josh Poimboeuf <[email protected]>
---
arch/x86/kernel/head_64.S | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 278270e..f2deae7 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -255,13 +255,15 @@ ENTRY(secondary_startup_64)
movl $MSR_GS_BASE,%ecx
movl initial_gs(%rip),%eax
movl initial_gs+4(%rip),%edx
- wrmsr
+ wrmsr

/* rsi is pointer to real mode structure with interesting info.
pass it to C */
movq %rsi, %rdi
-
- /* Finally jump to run C code and to be on real kernel address
+
+ENTRY(start_cpu)
+ /*
+ * Jump to run C code and to be on a real kernel address.
* Since we are running on identity-mapped space we have to jump
* to the full 64bit address, this is only possible as indirect
* jump. In addition we need to ensure %cs is set so we make this
@@ -290,6 +292,7 @@ ENTRY(secondary_startup_64)
pushq $__KERNEL_CS # set correct cs
pushq %rax # target address in negative space
lretq
+ENDPROC(start_cpu)

#include "verify_cpu.S"

@@ -297,15 +300,11 @@ ENTRY(secondary_startup_64)
/*
* Boot CPU0 entry point. It's called from play_dead(). Everything has been set
* up already except stack. We just set up stack here. Then call
- * start_secondary().
+ * start_secondary() via start_cpu().
*/
ENTRY(start_cpu0)
- movq initial_stack(%rip),%rsp
- movq initial_code(%rip),%rax
- pushq $0 # fake return address to stop unwinder
- pushq $__KERNEL_CS # set correct cs
- pushq %rax # target address in negative space
- lretq
+ movq initial_stack(%rip), %rsp
+ jmp start_cpu
ENDPROC(start_cpu0)
#endif

--
2.4.3

2016-03-31 09:33:52

by Jiri Slaby

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 13/14] livepatch: add /proc/<pid>/patch_status

On 03/25/2016, 08:35 PM, Josh Poimboeuf wrote:
> Expose the per-task klp_universe value so users can determine which
> tasks are holding up completion of a patching operation.
>
> Call it "patch_status" rather than "universe": it's at least more
> descriptive for the average user.
>
> Signed-off-by: Josh Poimboeuf <[email protected]>
> ---
> fs/proc/base.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index b1755b2..f90998b 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -2801,6 +2801,15 @@ static int proc_pid_personality(struct seq_file *m, struct pid_namespace *ns,
> return err;
> }
>
> +#ifdef CONFIG_LIVEPATCH
> +static int proc_pid_patch_status(struct seq_file *m, struct pid_namespace *ns,
> + struct pid *pid, struct task_struct *task)
> +{
> + seq_printf(m, "%d\n", task->klp_universe);
> + return 0;
> +}
> +#endif /* CONFIG_LIVEPATCH */
> +
> /*
> * Thread groups
> */
> @@ -2900,6 +2909,9 @@ static const struct pid_entry tgid_base_stuff[] = {
> REG("timers", S_IRUGO, proc_timers_operations),
> #endif
> REG("timerslack_ns", S_IRUGO|S_IWUGO, proc_pid_set_timerslack_ns_operations),
> +#ifdef CONFIG_LIVEPATCH
> + ONE("patch_status", S_IRUSR, proc_pid_patch_status),
> +#endif

I think we want to know the state about all tasks, not only group
leaders. This should go to tid_base_stuff.

thanks,
--
js
suse labs

2016-03-31 09:40:19

by Jiri Slaby

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 13/14] livepatch: add /proc/<pid>/patch_status

On 03/31/2016, 11:33 AM, Jiri Slaby wrote:
> On 03/25/2016, 08:35 PM, Josh Poimboeuf wrote:
>> Expose the per-task klp_universe value so users can determine which
>> tasks are holding up completion of a patching operation.
>>
>> Call it "patch_status" rather than "universe": it's at least more
>> descriptive for the average user.
>>
>> Signed-off-by: Josh Poimboeuf <[email protected]>
>> ---
>> fs/proc/base.c | 12 ++++++++++++
>> 1 file changed, 12 insertions(+)
>>
>> diff --git a/fs/proc/base.c b/fs/proc/base.c
>> index b1755b2..f90998b 100644
>> --- a/fs/proc/base.c
>> +++ b/fs/proc/base.c
>> @@ -2801,6 +2801,15 @@ static int proc_pid_personality(struct seq_file *m, struct pid_namespace *ns,
>> return err;
>> }
>>
>> +#ifdef CONFIG_LIVEPATCH
>> +static int proc_pid_patch_status(struct seq_file *m, struct pid_namespace *ns,
>> + struct pid *pid, struct task_struct *task)
>> +{
>> + seq_printf(m, "%d\n", task->klp_universe);
>> + return 0;
>> +}
>> +#endif /* CONFIG_LIVEPATCH */
>> +
>> /*
>> * Thread groups
>> */
>> @@ -2900,6 +2909,9 @@ static const struct pid_entry tgid_base_stuff[] = {
>> REG("timers", S_IRUGO, proc_timers_operations),
>> #endif
>> REG("timerslack_ns", S_IRUGO|S_IWUGO, proc_pid_set_timerslack_ns_operations),
>> +#ifdef CONFIG_LIVEPATCH
>> + ONE("patch_status", S_IRUSR, proc_pid_patch_status),
>> +#endif
>
> I think we want to know the state about all tasks, not only group
> leaders. This should go to tid_base_stuff.

Or loop over all tasks here by for_each_thread(task, t) and check them all.

> thanks,
--
js
suse labs

2016-03-31 12:54:31

by Miroslav Benes

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 00/14] livepatch: hybrid consistency model


Hi,

this is a great work. I'll have to review it properly (especially 13/14,
probably several times as it is a heavy stuff), but I've gathered some
notes so there they are.

On Fri, 25 Mar 2016, Josh Poimboeuf wrote:

> These patches are still a work in progress, but Jiri asked that I share
> them before I go on vacation next week. Based on origin/master because
> it has CONFIG_STACK_VALIDATION.
>
> This has two consistency models: the immediate model (like in today's
> code) and the new kpatch/kgraft hybrid model.
>
> The default is the hybrid model: kgraft's per-task consistency and
> syscall barrier switching combined with kpatch's stack trace switching.
> There are also a number of fallback options which make it pretty
> flexible, yet the code is still quite simple.
>
> Patches are applied on a per-task basis, when the task is deemed safe to
> switch over. It uses a tiered approach to determine when a task is safe
> and can be switched.
>
> The first wave of attack is stack checking of sleeping tasks. If no
> affected functions are on the stack of a given task, the task is
> switched. In most cases this will patch most or all of the tasks on the
> first try. Otherwise it'll keep trying periodically. This option is
> only available if the architecture has reliable stacks
> (CONFIG_RELIABLE_STACKTRACE and CONFIG_STACK_VALIDATION).

I think we could allow periodic checking even for
!CONFIG_RELIABLE_STACKTRACE situations. The problematic task could migrate
by itself after some time (meaning it woke up meanwhile and sleeps
somewhere else now, or it went through a syscall boundary). So we can
gain something, especially when combined with a fake signal approach. More
on that below and in my 13/14 mail.

> The next line of attack, if needed, is syscall/IRQ switching. A task is
> switched when it returns from a system call or a user space IRQ. This
> approach is less than ideal because it usually requires signaling tasks
> to get them to switch. It also doesn't work for kthreads. But it's
> still useful in some cases when user tasks get stuck sleeping on an
> affected function.
>
> For architectures which don't have reliable stacks, users have two
> options:
>
> a) use the hybrid fallback option of using only the syscall/IRQ
> switching (which is the default);
>
> b) or they can use the immediate model (which is the model we already
> have today) by setting the patch->immediate flag.
>
> There's also a func->immediate flag which allows users to specify that
> certain functions in the patch can be applied without per-task
> consistency. This might be useful if you want to patch a common
> function like schedule(), and the function change doesn't need
> consistency but the rest of the patch does.
>
> Still have a lot of TODOs, some of them are listed here. If you see
> something objectionable, it might be a good idea to make sure it's not
> already on the TODO list :-)
>
> TODO:
> - actually test it
> - don't use TIP_KLP_NEED_UPDATE in arch-independent code
> - figure out new API to keep the use of task_rq_lock() in the sched code

Hm, no idea how to do it so that everyone is satisfied. I still remember
Peter's protests.

> - cleaner way to detect preemption on the stack
> - allow patch modules to be removed. still needs more discussion and
> thought. maybe something like Miroslav's patch would be good:
> https://lkml.kernel.org/r/[email protected]

Yup, that could be part of the patch set. The second option (to rework
klp_unregister_patch and move kobject_put out of mutex protected parts) is
maybe a way to go. The mutex_trylock approach works as well, but it is not
clean and nice enough I guess. However the patch is there :).

Anyway the module removal should be prohibited when one uses immmediate
flag set to true. Without consistency model we cannot say if it is safe to
remove the module. Some process could still be in its code.

> - come up with a better name than universe? KLP_STATE_PREV/NEXT?
> KLP_UNPATCHED/PATCHED? there were some objections to the name in v1.
> - update documentation for sysfs, proc, livepatch
> - need atomic accesses or READ_ONCE/WRITE_ONCE anywhere?
> - ability to force a patch to the goal universe

This could be made by a call to klp_update_task_universe for all tasks,
couldn't it? We have something similar in kgraft.

> - try ftrace handler switching idea from v1 cover letter
> - split up the patches better
> - cc all the right people

I'd add a fake signal facility for sleeping non-migrated tasks. This
would accelerate a migration to a new universe. We have it in kgraft for
quite some time and it worked out great. See
lkml.kernel.org/r/[email protected] which
went with Jiri's kgraft-on-klp patch set. See also Oleg's reply as it is
important (I changed kgraft implementation according to that).

Miroslav

2016-03-31 13:03:24

by Miroslav Benes

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 07/14] x86/stacktrace: add function for detecting reliable stack traces

On Fri, 25 Mar 2016, Josh Poimboeuf wrote:

> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 2dc18605..76274b8 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -138,6 +138,7 @@ config X86
> select HAVE_PERF_REGS
> select HAVE_PERF_USER_STACK_DUMP
> select HAVE_REGS_AND_STACK_ACCESS_API
> + select HAVE_RELIABLE_STACKTRACE if X86_64 && FRAME_POINTER

I understand we have to rely on frame pointer for now. Do you plan to
switch to dwarf unwinder one day in the future? IOW is there a plan to
implement dwarf stuff generation in objtool and then to have a dwarf-based
stack unwinder upstream and to use it for live patching?

We have FRAME_POINTER unset in SLES for performance reasons (there was
some 5 percent slowdown measured in the past. However we should redo the
experiments.) and one day we'd really like to switch to upstream from
kgraft :). So I'm just asking.

Miroslav

2016-03-31 13:12:47

by Miroslav Benes

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 12/14] livepatch: create per-task consistency model

On Fri, 25 Mar 2016, Josh Poimboeuf wrote:

[...]

> diff --git a/kernel/fork.c b/kernel/fork.c
> index d277e83..27b181e 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -76,6 +76,7 @@
> #include <linux/compiler.h>
> #include <linux/sysctl.h>
> #include <linux/kcov.h>
> +#include <linux/livepatch.h>
>
> #include <asm/pgtable.h>
> #include <asm/pgalloc.h>
> @@ -1615,6 +1616,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
> total_forks++;
> spin_unlock(&current->sighand->siglock);
> syscall_tracepoint_update(p);
> + klp_update_task_universe(p);

Shouldn't we copy transition and TIF from the parent? I deal with a race
in kgraft and the solution seems to be this code exactly at this place in
copy_process(). I need to think about it.

[...]

> +static void klp_schedule_work(void)
> +{
> + if (IS_ENABLED(CONFIG_RELIABLE_STACKTRACE))
> + schedule_delayed_work(&klp_work, round_jiffies_relative(HZ));
> +}

As mentioned in my cover letter reply I'd allow to schedule delayed work
even for !CONFIG_RELIABLE_STACKTRACE archs and configurations. There is a
check in klp_try_switch_task() for CONFIG_RELIABLE_STACKTRACE so the
change should be trivial. The patching could be successful even without
reliable stack traces especially in combination with a fake signal and
a syscall boundary migration.

Miroslav

2016-04-01 13:34:52

by Miroslav Benes

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 00/14] livepatch: hybrid consistency model


> - actually test it

I did slightly and it partially worked and partially it did not.

When I applied sample livepatch module, /proc/cmdline was patched and when
I called 'cat /proc/cmdline' I got the correct livepatched message. So far
so good. But the patching itself never finished because of many processes
with unreliable stacks. It almost looked like every sleeping process was
reported. I haven't debugged that yet.

Second, I have a simple test case. Kthread which sleeps in to-be-patched
function foo() for a while and then it sleeps somewhere else and that in a
loop. After live patch application the kthread is reported to have
unreliable stack and it is not migrated. The good thing is that also the
function foo() from the old universe is called and thus the consistency
model works.

So I guess there is some problem in a stack checking...

Miroslav

2016-04-01 13:39:49

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 00/14] livepatch: hybrid consistency model

On Fri 2016-03-25 14:34:47, Josh Poimboeuf wrote:
> These patches are still a work in progress, but Jiri asked that I share
> them before I go on vacation next week. Based on origin/master because
> it has CONFIG_STACK_VALIDATION.

I have to follow Mirek and say that it is a great work.

> There's also a func->immediate flag which allows users to specify that
> certain functions in the patch can be applied without per-task
> consistency. This might be useful if you want to patch a common
> function like schedule(), and the function change doesn't need
> consistency but the rest of the patch does.

I like the possibility to immediately patch some functions or objects.
Just note that this is not yet completely implemented and it is not
on the TODO list.

We probably should not set func->transition flag when func->immediate
is set or when the related func->object is set. It currently happens
only when patch->immediate is set.

Also we should ignore immediate functions and objects when the stack
is checked.


> Still have a lot of TODOs, some of them are listed here. If you see
> something objectionable, it might be a good idea to make sure it's not
> already on the TODO list :-)
>
> TODO:
> - come up with a better name than universe? KLP_STATE_PREV/NEXT?
> KLP_UNPATCHED/PATCHED? there were some objections to the name in v1.

The name "universe" has an advantage if we later allow to
enable/disable more patches in parallel. The integer might hold
an identifier of the last applied patch. I have been playing with
this for kGraft one year ago and it was really challenging.
We should avoid it if possible. It is not really needed
if we are able to complete any transition in a reasonable time.

If we support only one transition at a time, a simple boolean
or even bit should be enough. The most descriptive name would
be klp_transition_patch_applied but it is quite long.

Note that similar information is provided by TIF_KLP_NEED_UPDATE
flag. We use only this bit in kGraft. It saves some space in
task_struct but it brings other challenges. We need to prevent
migration using a global "kgr_immutable" flag until ftrace handlers
for all patched functions are in place. We need to set the flag
back when the ftrace handler is called and the global "kgr_immutable"
flag is set.

> - update documentation for sysfs, proc, livepatch

Also we should publish somewhere the information about TIF_KLP_NEED_UPDATE
flag, e.g. /proc/<pid>/klp_need_update. It is handy to see what
process blocks the transition. We have something similar in
kGraft, see in_progress_show() at
https://git.kernel.org/cgit/linux/kernel/git/jirislaby/kgraft.git/commit/?h=kgraft-4.4&id=1c82fbd7b1fe240f4ed178a6506a93033f6a4bed


I am still shaking my head around the patches.

Best Regards,
Petr

2016-04-01 15:38:21

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 00/14] livepatch: hybrid consistency model

On Fri 2016-04-01 15:39:44, Petr Mladek wrote:
> On Fri 2016-03-25 14:34:47, Josh Poimboeuf wrote:
> > These patches are still a work in progress, but Jiri asked that I share
> > them before I go on vacation next week. Based on origin/master because
> > it has CONFIG_STACK_VALIDATION.
>
> I have to follow Mirek and say that it is a great work.
>
> > There's also a func->immediate flag which allows users to specify that
> > certain functions in the patch can be applied without per-task
> > consistency. This might be useful if you want to patch a common
> > function like schedule(), and the function change doesn't need
> > consistency but the rest of the patch does.
>
> I like the possibility to immediately patch some functions or objects.
> Just note that this is not yet completely implemented and it is not
> on the TODO list.

Correction. Only patch and func can be marked by the "immediate" flag, not
an object. It looks fine.

> We probably should not set func->transition flag when func->immediate
> is set or when the related func->object is set. It currently happens
> only when patch->immediate is set.

This is true but only for func->transition.


> Also we should ignore immediate functions and objects when the stack
> is checked.

This is already done. I have missed this.

I am sorry for confusion. I should have shaken my head even more
before writing.

Best Regards,
Petr

2016-04-04 15:55:06

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 07/14] x86/stacktrace: add function for detecting reliable stack traces

On Fri 2016-03-25 14:34:54, Josh Poimboeuf wrote:
> For live patching and possibly other use cases, a stack trace is only
> useful if you can be assured that it's completely reliable. Add a new
> save_stack_trace_tsk_reliable() function to achieve that.
>
> Scenarios which indicate that a stack strace may be unreliable:
>
> - interrupt stacks
> - preemption
> - corrupted stack data
> - newly forked tasks
> - running tasks
> - the user didn't provide a large enough entries array
>
> Also add a config option so arch-independent code can determine at build
> time whether the function is implemented.
>
> diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
> index 3b10518..9c68bfc 100644
> --- a/arch/x86/kernel/dumpstack.c
> +++ b/arch/x86/kernel/dumpstack.c
> @@ -145,6 +145,42 @@ int print_context_stack_bp(struct thread_info *tinfo,
> }
> EXPORT_SYMBOL_GPL(print_context_stack_bp);
>
> +int print_context_stack_reliable(struct thread_info *tinfo,
> + unsigned long *stack, unsigned long *bp,
> + const struct stacktrace_ops *ops,
> + void *data, unsigned long *end, int *graph)
> +{
> + struct stack_frame *frame = (struct stack_frame *)*bp;
> + struct stack_frame *last_frame = frame;

I tried to debug why the patching never finishes as reported by Mirek.
This initialization breaks the whole function, see below.
I would initialize last_frame to NULL or maybe (void *)stack.


> + unsigned long *ret_addr = &frame->return_address;
> +
> + if (test_ti_thread_flag(tinfo, TIF_FORK))
> + return -EINVAL;
> +
> + while (valid_stack_ptr(tinfo, ret_addr, sizeof(*ret_addr), end)) {
> + unsigned long addr = *ret_addr;
> +
> + if (frame <= last_frame || !__kernel_text_address(addr) ||

frame == last_frame in the very rist iteration, so we always return -EINVAL.

Best Regards,
Petr

> + in_preempt_schedule_irq(addr))
> + return -EINVAL;
> +
> + if (ops->address(data, addr, 1))
> + return -EINVAL;
> +
> + print_ftrace_graph_addr(addr, data, ops, tinfo, graph);
> +
> + last_frame = frame;
> + frame = frame->next_frame;
> + ret_addr = &frame->return_address;
> + }
> +
> + if (last_frame + 1 != (void *)task_pt_regs(tinfo->task))
> + return -EINVAL;
> +
> + *bp = (unsigned long)frame;
> + return 0;
> +}

2016-04-04 16:56:50

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 13/14] livepatch: add /proc/<pid>/patch_status

On Thu, Mar 31, 2016 at 11:33:46AM +0200, Jiri Slaby wrote:
> On 03/25/2016, 08:35 PM, Josh Poimboeuf wrote:
> > Expose the per-task klp_universe value so users can determine which
> > tasks are holding up completion of a patching operation.
> >
> > Call it "patch_status" rather than "universe": it's at least more
> > descriptive for the average user.
> >
> > Signed-off-by: Josh Poimboeuf <[email protected]>
> > ---
> > fs/proc/base.c | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
> >
> > diff --git a/fs/proc/base.c b/fs/proc/base.c
> > index b1755b2..f90998b 100644
> > --- a/fs/proc/base.c
> > +++ b/fs/proc/base.c
> > @@ -2801,6 +2801,15 @@ static int proc_pid_personality(struct seq_file *m, struct pid_namespace *ns,
> > return err;
> > }
> >
> > +#ifdef CONFIG_LIVEPATCH
> > +static int proc_pid_patch_status(struct seq_file *m, struct pid_namespace *ns,
> > + struct pid *pid, struct task_struct *task)
> > +{
> > + seq_printf(m, "%d\n", task->klp_universe);
> > + return 0;
> > +}
> > +#endif /* CONFIG_LIVEPATCH */
> > +
> > /*
> > * Thread groups
> > */
> > @@ -2900,6 +2909,9 @@ static const struct pid_entry tgid_base_stuff[] = {
> > REG("timers", S_IRUGO, proc_timers_operations),
> > #endif
> > REG("timerslack_ns", S_IRUGO|S_IWUGO, proc_pid_set_timerslack_ns_operations),
> > +#ifdef CONFIG_LIVEPATCH
> > + ONE("patch_status", S_IRUSR, proc_pid_patch_status),
> > +#endif
>
> I think we want to know the state about all tasks, not only group
> leaders. This should go to tid_base_stuff.

Good catch, thanks!

--
Josh

2016-04-04 17:03:13

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 00/14] livepatch: hybrid consistency model

On Thu, Mar 31, 2016 at 02:54:26PM +0200, Miroslav Benes wrote:
>
> Hi,
>
> this is a great work. I'll have to review it properly (especially 13/14,
> probably several times as it is a heavy stuff), but I've gathered some
> notes so there they are.
>
> On Fri, 25 Mar 2016, Josh Poimboeuf wrote:
>
> > These patches are still a work in progress, but Jiri asked that I share
> > them before I go on vacation next week. Based on origin/master because
> > it has CONFIG_STACK_VALIDATION.
> >
> > This has two consistency models: the immediate model (like in today's
> > code) and the new kpatch/kgraft hybrid model.
> >
> > The default is the hybrid model: kgraft's per-task consistency and
> > syscall barrier switching combined with kpatch's stack trace switching.
> > There are also a number of fallback options which make it pretty
> > flexible, yet the code is still quite simple.
> >
> > Patches are applied on a per-task basis, when the task is deemed safe to
> > switch over. It uses a tiered approach to determine when a task is safe
> > and can be switched.
> >
> > The first wave of attack is stack checking of sleeping tasks. If no
> > affected functions are on the stack of a given task, the task is
> > switched. In most cases this will patch most or all of the tasks on the
> > first try. Otherwise it'll keep trying periodically. This option is
> > only available if the architecture has reliable stacks
> > (CONFIG_RELIABLE_STACKTRACE and CONFIG_STACK_VALIDATION).
>
> I think we could allow periodic checking even for
> !CONFIG_RELIABLE_STACKTRACE situations. The problematic task could migrate
> by itself after some time (meaning it woke up meanwhile and sleeps
> somewhere else now, or it went through a syscall boundary). So we can
> gain something, especially when combined with a fake signal approach. More
> on that below and in my 13/14 mail.

Good idea, agreed.

> > The next line of attack, if needed, is syscall/IRQ switching. A task is
> > switched when it returns from a system call or a user space IRQ. This
> > approach is less than ideal because it usually requires signaling tasks
> > to get them to switch. It also doesn't work for kthreads. But it's
> > still useful in some cases when user tasks get stuck sleeping on an
> > affected function.
> >
> > For architectures which don't have reliable stacks, users have two
> > options:
> >
> > a) use the hybrid fallback option of using only the syscall/IRQ
> > switching (which is the default);
> >
> > b) or they can use the immediate model (which is the model we already
> > have today) by setting the patch->immediate flag.
> >
> > There's also a func->immediate flag which allows users to specify that
> > certain functions in the patch can be applied without per-task
> > consistency. This might be useful if you want to patch a common
> > function like schedule(), and the function change doesn't need
> > consistency but the rest of the patch does.
> >
> > Still have a lot of TODOs, some of them are listed here. If you see
> > something objectionable, it might be a good idea to make sure it's not
> > already on the TODO list :-)
> >
> > TODO:
> > - actually test it
> > - don't use TIP_KLP_NEED_UPDATE in arch-independent code
> > - figure out new API to keep the use of task_rq_lock() in the sched code
>
> Hm, no idea how to do it so that everyone is satisfied. I still remember
> Peter's protests.

Yeah, I'll loop in Peter and Ingo on v2 so we can get their input.

> > - cleaner way to detect preemption on the stack
> > - allow patch modules to be removed. still needs more discussion and
> > thought. maybe something like Miroslav's patch would be good:
> > https://lkml.kernel.org/r/[email protected]
>
> Yup, that could be part of the patch set. The second option (to rework
> klp_unregister_patch and move kobject_put out of mutex protected parts) is
> maybe a way to go. The mutex_trylock approach works as well, but it is not
> clean and nice enough I guess. However the patch is there :).
>
> Anyway the module removal should be prohibited when one uses immmediate
> flag set to true. Without consistency model we cannot say if it is safe to
> remove the module. Some process could still be in its code.

Good point.

>
> > - come up with a better name than universe? KLP_STATE_PREV/NEXT?
> > KLP_UNPATCHED/PATCHED? there were some objections to the name in v1.
> > - update documentation for sysfs, proc, livepatch
> > - need atomic accesses or READ_ONCE/WRITE_ONCE anywhere?
> > - ability to force a patch to the goal universe
>
> This could be made by a call to klp_update_task_universe for all tasks,
> couldn't it? We have something similar in kgraft.

Yeah, I think so.

> > - try ftrace handler switching idea from v1 cover letter
> > - split up the patches better
> > - cc all the right people
>
> I'd add a fake signal facility for sleeping non-migrated tasks. This
> would accelerate a migration to a new universe. We have it in kgraft for
> quite some time and it worked out great. See
> lkml.kernel.org/r/[email protected] which
> went with Jiri's kgraft-on-klp patch set. See also Oleg's reply as it is
> important (I changed kgraft implementation according to that).

Ok, I'll look into sending a fake signal to remaining tasks.

--
Josh

2016-04-04 17:54:40

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 07/14] x86/stacktrace: add function for detecting reliable stack traces

On Thu, Mar 31, 2016 at 03:03:16PM +0200, Miroslav Benes wrote:
> On Fri, 25 Mar 2016, Josh Poimboeuf wrote:
>
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 2dc18605..76274b8 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -138,6 +138,7 @@ config X86
> > select HAVE_PERF_REGS
> > select HAVE_PERF_USER_STACK_DUMP
> > select HAVE_REGS_AND_STACK_ACCESS_API
> > + select HAVE_RELIABLE_STACKTRACE if X86_64 && FRAME_POINTER
>
> I understand we have to rely on frame pointer for now. Do you plan to
> switch to dwarf unwinder one day in the future? IOW is there a plan to
> implement dwarf stuff generation in objtool and then to have a dwarf-based
> stack unwinder upstream and to use it for live patching?

Yes, adding DWARF validation and generation to objtool and creating a
DWARF unwinder upstream are all planned, hopefully soon.

> We have FRAME_POINTER unset in SLES for performance reasons (there was
> some 5 percent slowdown measured in the past. However we should redo the
> experiments.) and one day we'd really like to switch to upstream from
> kgraft :). So I'm just asking.

If you redo the experiments and have some useful data to share, please
do share :-)

--
Josh

2016-04-04 17:59:03

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 07/14] x86/stacktrace: add function for detecting reliable stack traces

On Mon, Apr 04, 2016 at 05:55:01PM +0200, Petr Mladek wrote:
> On Fri 2016-03-25 14:34:54, Josh Poimboeuf wrote:
> > For live patching and possibly other use cases, a stack trace is only
> > useful if you can be assured that it's completely reliable. Add a new
> > save_stack_trace_tsk_reliable() function to achieve that.
> >
> > Scenarios which indicate that a stack strace may be unreliable:
> >
> > - interrupt stacks
> > - preemption
> > - corrupted stack data
> > - newly forked tasks
> > - running tasks
> > - the user didn't provide a large enough entries array
> >
> > Also add a config option so arch-independent code can determine at build
> > time whether the function is implemented.
> >
> > diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
> > index 3b10518..9c68bfc 100644
> > --- a/arch/x86/kernel/dumpstack.c
> > +++ b/arch/x86/kernel/dumpstack.c
> > @@ -145,6 +145,42 @@ int print_context_stack_bp(struct thread_info *tinfo,
> > }
> > EXPORT_SYMBOL_GPL(print_context_stack_bp);
> >
> > +int print_context_stack_reliable(struct thread_info *tinfo,
> > + unsigned long *stack, unsigned long *bp,
> > + const struct stacktrace_ops *ops,
> > + void *data, unsigned long *end, int *graph)
> > +{
> > + struct stack_frame *frame = (struct stack_frame *)*bp;
> > + struct stack_frame *last_frame = frame;
>
> I tried to debug why the patching never finishes as reported by Mirek.
> This initialization breaks the whole function, see below.
> I would initialize last_frame to NULL or maybe (void *)stack.
>
>
> > + unsigned long *ret_addr = &frame->return_address;
> > +
> > + if (test_ti_thread_flag(tinfo, TIF_FORK))
> > + return -EINVAL;
> > +
> > + while (valid_stack_ptr(tinfo, ret_addr, sizeof(*ret_addr), end)) {
> > + unsigned long addr = *ret_addr;
> > +
> > + if (frame <= last_frame || !__kernel_text_address(addr) ||
>
> frame == last_frame in the very rist iteration, so we always return -EINVAL.

Ah, right, thanks for debugging it. I actually did test an earlier
iteration of the function, but obviously didn't test this one.

--
Josh

2016-04-04 18:21:44

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 12/14] livepatch: create per-task consistency model

On Thu, Mar 31, 2016 at 03:12:39PM +0200, Miroslav Benes wrote:
> On Fri, 25 Mar 2016, Josh Poimboeuf wrote:
>
> [...]
>
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index d277e83..27b181e 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -76,6 +76,7 @@
> > #include <linux/compiler.h>
> > #include <linux/sysctl.h>
> > #include <linux/kcov.h>
> > +#include <linux/livepatch.h>
> >
> > #include <asm/pgtable.h>
> > #include <asm/pgalloc.h>
> > @@ -1615,6 +1616,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
> > total_forks++;
> > spin_unlock(&current->sighand->siglock);
> > syscall_tracepoint_update(p);
> > + klp_update_task_universe(p);
>
> Shouldn't we copy transition and TIF from the parent? I deal with a race
> in kgraft and the solution seems to be this code exactly at this place in
> copy_process(). I need to think about it.

Hm, can you explain why it should be copied from the parent?

I'm thinking the above code is correct for today, but it should still be
changed to be more future-proof.

Here's my thinking:

A forked task starts out with no stack, so if I understand correctly, it
can safely start out in the goal universe, regardless of which universe
its parent belongs to.

However, the current ret_from_fork code is a mess, and Andy Lutomirski
has mentioned that he would like to give newly forked tasks a proper
stack such that instead of jumping to ret_from_fork, they would just
return from schedule(). In that case, it would no longer be safe to
start the new task in the goal universe because it could be "sleeping"
on a to-be-patched function.

So for proper future proofing, newly forked tasks should be started in
the initial universe (rather than starting in the goal universe or
inheriting the parent's universe). They can then be transitioned over
to the goal universe like any other task. How does that sound?

--
Josh

2016-04-04 18:33:11

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 12/14] livepatch: create per-task consistency model

On Mon, Apr 04, 2016 at 08:27:59PM +0200, Vojtech Pavlik wrote:
> On Mon, Apr 04, 2016 at 01:21:38PM -0500, Josh Poimboeuf wrote:
>
> > Hm, can you explain why it should be copied from the parent?
> >
> > I'm thinking the above code is correct for today, but it should still be
> > changed to be more future-proof.
> >
> > Here's my thinking:
> >
> > A forked task starts out with no stack, so if I understand correctly, it
> > can safely start out in the goal universe, regardless of which universe
> > its parent belongs to.
> >
> > However, the current ret_from_fork code is a mess, and Andy Lutomirski
> > has mentioned that he would like to give newly forked tasks a proper
> > stack such that instead of jumping to ret_from_fork, they would just
> > return from schedule(). In that case, it would no longer be safe to
> > start the new task in the goal universe because it could be "sleeping"
> > on a to-be-patched function.
> >
> > So for proper future proofing, newly forked tasks should be started in
> > the initial universe (rather than starting in the goal universe or
> > inheriting the parent's universe). They can then be transitioned over
> > to the goal universe like any other task. How does that sound?
>
> How could a newly forked task start in the old universe if its parent
> has already been migrated? Any context it inherits will already be from
> the new universe.

Can you be more specific about "context" and why inheritance of it would
be a problem?

--
Josh

2016-04-04 19:00:33

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 12/14] livepatch: create per-task consistency model

On Mon, Apr 04, 2016 at 01:21:38PM -0500, Josh Poimboeuf wrote:

> Hm, can you explain why it should be copied from the parent?
>
> I'm thinking the above code is correct for today, but it should still be
> changed to be more future-proof.
>
> Here's my thinking:
>
> A forked task starts out with no stack, so if I understand correctly, it
> can safely start out in the goal universe, regardless of which universe
> its parent belongs to.
>
> However, the current ret_from_fork code is a mess, and Andy Lutomirski
> has mentioned that he would like to give newly forked tasks a proper
> stack such that instead of jumping to ret_from_fork, they would just
> return from schedule(). In that case, it would no longer be safe to
> start the new task in the goal universe because it could be "sleeping"
> on a to-be-patched function.
>
> So for proper future proofing, newly forked tasks should be started in
> the initial universe (rather than starting in the goal universe or
> inheriting the parent's universe). They can then be transitioned over
> to the goal universe like any other task. How does that sound?

How could a newly forked task start in the old universe if its parent
has already been migrated? Any context it inherits will already be from
the new universe.

--
Vojtech Pavlik
Director SuSE Labs

2016-04-05 11:36:38

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 12/14] livepatch: create per-task consistency model

On Mon, Apr 04, 2016 at 01:33:07PM -0500, Josh Poimboeuf wrote:
> On Mon, Apr 04, 2016 at 08:27:59PM +0200, Vojtech Pavlik wrote:
> > On Mon, Apr 04, 2016 at 01:21:38PM -0500, Josh Poimboeuf wrote:
> >
> > > Hm, can you explain why it should be copied from the parent?
> > >
> > > I'm thinking the above code is correct for today, but it should still be
> > > changed to be more future-proof.
> > >
> > > Here's my thinking:
> > >
> > > A forked task starts out with no stack, so if I understand correctly, it
> > > can safely start out in the goal universe, regardless of which universe
> > > its parent belongs to.
> > >
> > > However, the current ret_from_fork code is a mess, and Andy Lutomirski
> > > has mentioned that he would like to give newly forked tasks a proper
> > > stack such that instead of jumping to ret_from_fork, they would just
> > > return from schedule(). In that case, it would no longer be safe to
> > > start the new task in the goal universe because it could be "sleeping"
> > > on a to-be-patched function.
> > >
> > > So for proper future proofing, newly forked tasks should be started in
> > > the initial universe (rather than starting in the goal universe or
> > > inheriting the parent's universe). They can then be transitioned over
> > > to the goal universe like any other task. How does that sound?
> >
> > How could a newly forked task start in the old universe if its parent
> > has already been migrated? Any context it inherits will already be from
> > the new universe.
>
> Can you be more specific about "context" and why inheritance of it would
> be a problem?

Currently a forked task starts out with no stack, and as such it can
start in the goal universe.

If we create a synthetic stack, then we may need to start in the initial
universe, as the synthetic stack would likely be created using initial
universe return addresses.

If we simply copy the stack of the parent process, which is in my
opionion the safest way, as it places little assumptions on the
compiler, then it may contain either old or new addresses
and we need to copy the universe flag along.

--
Vojtech Pavlik
Director SUSE Labs

2016-04-05 13:44:35

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 00/14] livepatch: hybrid consistency model

On Fri, Apr 01, 2016 at 03:39:44PM +0200, Petr Mladek wrote:
> On Fri 2016-03-25 14:34:47, Josh Poimboeuf wrote:
> > These patches are still a work in progress, but Jiri asked that I share
> > them before I go on vacation next week. Based on origin/master because
> > it has CONFIG_STACK_VALIDATION.
>
> I have to follow Mirek and say that it is a great work.
>
> > There's also a func->immediate flag which allows users to specify that
> > certain functions in the patch can be applied without per-task
> > consistency. This might be useful if you want to patch a common
> > function like schedule(), and the function change doesn't need
> > consistency but the rest of the patch does.
>
> We probably should not set func->transition flag when func->immediate
> is set or when the related func->object is set. It currently happens
> only when patch->immediate is set.

Agreed, I'll skip setting func->transition if func->immediate is set.

> > Still have a lot of TODOs, some of them are listed here. If you see
> > something objectionable, it might be a good idea to make sure it's not
> > already on the TODO list :-)
> >
> > TODO:
> > - come up with a better name than universe? KLP_STATE_PREV/NEXT?
> > KLP_UNPATCHED/PATCHED? there were some objections to the name in v1.
>
> The name "universe" has an advantage if we later allow to
> enable/disable more patches in parallel. The integer might hold
> an identifier of the last applied patch. I have been playing with
> this for kGraft one year ago and it was really challenging.
> We should avoid it if possible. It is not really needed
> if we are able to complete any transition in a reasonable time.

Agreed, we should really avoid having more than two universes at a time,
and I don't foresee a reason to do that in the future. I'll probably
get rid of the universe name in favor of something more descriptive.

> If we support only one transition at a time, a simple boolean
> or even bit should be enough. The most descriptive name would
> be klp_transition_patch_applied but it is quite long.

Yeah, I'll change it to a bool.

> Note that similar information is provided by TIF_KLP_NEED_UPDATE
> flag. We use only this bit in kGraft. It saves some space in
> task_struct but it brings other challenges. We need to prevent
> migration using a global "kgr_immutable" flag until ftrace handlers
> for all patched functions are in place. We need to set the flag
> back when the ftrace handler is called and the global "kgr_immutable"
> flag is set.
>
> > - update documentation for sysfs, proc, livepatch
>
> Also we should publish somewhere the information about TIF_KLP_NEED_UPDATE
> flag, e.g. /proc/<pid>/klp_need_update. It is handy to see what
> process blocks the transition. We have something similar in
> kGraft, see in_progress_show() at
> https://git.kernel.org/cgit/linux/kernel/git/jirislaby/kgraft.git/commit/?h=kgraft-4.4&id=1c82fbd7b1fe240f4ed178a6506a93033f6a4bed

Patch 13/14 exposes the per-task patched state in /proc, so I think that
already does what you're asking for? It doesn't expose
TIF_KLP_NEED_UPDATE, but that's more of an internal detail I think.

--
Josh

2016-04-05 13:53:24

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 12/14] livepatch: create per-task consistency model

On Tue, Apr 05, 2016 at 01:36:34PM +0200, Vojtech Pavlik wrote:
> On Mon, Apr 04, 2016 at 01:33:07PM -0500, Josh Poimboeuf wrote:
> > On Mon, Apr 04, 2016 at 08:27:59PM +0200, Vojtech Pavlik wrote:
> > > On Mon, Apr 04, 2016 at 01:21:38PM -0500, Josh Poimboeuf wrote:
> > >
> > > > Hm, can you explain why it should be copied from the parent?
> > > >
> > > > I'm thinking the above code is correct for today, but it should still be
> > > > changed to be more future-proof.
> > > >
> > > > Here's my thinking:
> > > >
> > > > A forked task starts out with no stack, so if I understand correctly, it
> > > > can safely start out in the goal universe, regardless of which universe
> > > > its parent belongs to.
> > > >
> > > > However, the current ret_from_fork code is a mess, and Andy Lutomirski
> > > > has mentioned that he would like to give newly forked tasks a proper
> > > > stack such that instead of jumping to ret_from_fork, they would just
> > > > return from schedule(). In that case, it would no longer be safe to
> > > > start the new task in the goal universe because it could be "sleeping"
> > > > on a to-be-patched function.
> > > >
> > > > So for proper future proofing, newly forked tasks should be started in
> > > > the initial universe (rather than starting in the goal universe or
> > > > inheriting the parent's universe). They can then be transitioned over
> > > > to the goal universe like any other task. How does that sound?
> > >
> > > How could a newly forked task start in the old universe if its parent
> > > has already been migrated? Any context it inherits will already be from
> > > the new universe.
> >
> > Can you be more specific about "context" and why inheritance of it would
> > be a problem?
>
> Currently a forked task starts out with no stack, and as such it can
> start in the goal universe.
>
> If we create a synthetic stack, then we may need to start in the initial
> universe, as the synthetic stack would likely be created using initial
> universe return addresses.
>
> If we simply copy the stack of the parent process, which is in my
> opionion the safest way, as it places little assumptions on the
> compiler, then it may contain either old or new addresses
> and we need to copy the universe flag along.

That all makes sense. I'll change it to inherit the parent's universe
for now.

--
Josh

2016-04-05 14:24:37

by Miroslav Benes

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 00/14] livepatch: hybrid consistency model

On Mon, 4 Apr 2016, Josh Poimboeuf wrote:

> > I'd add a fake signal facility for sleeping non-migrated tasks. This
> > would accelerate a migration to a new universe. We have it in kgraft for
> > quite some time and it worked out great. See
> > lkml.kernel.org/r/[email protected] which
> > went with Jiri's kgraft-on-klp patch set. See also Oleg's reply as it is
> > important (I changed kgraft implementation according to that).
>
> Ok, I'll look into sending a fake signal to remaining tasks.

Oh, do not worry about this one. I can prepare a patch once we discuss and
review the rest. I think you have a lot on your plate even now.

There are several options how to do it. I think it would be nice not send
a fake signal immediately. So we can...

1. do it after some time of waiting, or

2. we can have a knob in sysfs for an admin to send a fake signal if he
does not want to wait anymore, or

3. (and this is crazy one) we can trace the progress of a migration and
when there is none for several iterations we can send a signal

...

Miroslav

2016-04-05 14:35:00

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 00/14] livepatch: hybrid consistency model

On Tue, Apr 05, 2016 at 04:24:33PM +0200, Miroslav Benes wrote:
> On Mon, 4 Apr 2016, Josh Poimboeuf wrote:
>
> > > I'd add a fake signal facility for sleeping non-migrated tasks. This
> > > would accelerate a migration to a new universe. We have it in kgraft for
> > > quite some time and it worked out great. See
> > > lkml.kernel.org/r/[email protected] which
> > > went with Jiri's kgraft-on-klp patch set. See also Oleg's reply as it is
> > > important (I changed kgraft implementation according to that).
> >
> > Ok, I'll look into sending a fake signal to remaining tasks.
>
> Oh, do not worry about this one. I can prepare a patch once we discuss and
> review the rest. I think you have a lot on your plate even now.
>
> There are several options how to do it. I think it would be nice not send
> a fake signal immediately. So we can...
>
> 1. do it after some time of waiting, or
>
> 2. we can have a knob in sysfs for an admin to send a fake signal if he
> does not want to wait anymore, or
>
> 3. (and this is crazy one) we can trace the progress of a migration and
> when there is none for several iterations we can send a signal

Ok, thanks, I'll skip it for now. We do need to think about the options
and I do have a full plate already.

I might also procrastinate with the patch module removal patch, since
that's another separate piece which might need more discussion.

--
Josh

2016-04-05 14:53:49

by Miroslav Benes

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 00/14] livepatch: hybrid consistency model

On Tue, 5 Apr 2016, Josh Poimboeuf wrote:

> On Tue, Apr 05, 2016 at 04:24:33PM +0200, Miroslav Benes wrote:
> > On Mon, 4 Apr 2016, Josh Poimboeuf wrote:
> >
> > > > I'd add a fake signal facility for sleeping non-migrated tasks. This
> > > > would accelerate a migration to a new universe. We have it in kgraft for
> > > > quite some time and it worked out great. See
> > > > lkml.kernel.org/r/[email protected] which
> > > > went with Jiri's kgraft-on-klp patch set. See also Oleg's reply as it is
> > > > important (I changed kgraft implementation according to that).
> > >
> > > Ok, I'll look into sending a fake signal to remaining tasks.
> >
> > Oh, do not worry about this one. I can prepare a patch once we discuss and
> > review the rest. I think you have a lot on your plate even now.
> >
> > There are several options how to do it. I think it would be nice not send
> > a fake signal immediately. So we can...
> >
> > 1. do it after some time of waiting, or
> >
> > 2. we can have a knob in sysfs for an admin to send a fake signal if he
> > does not want to wait anymore, or
> >
> > 3. (and this is crazy one) we can trace the progress of a migration and
> > when there is none for several iterations we can send a signal
>
> Ok, thanks, I'll skip it for now. We do need to think about the options
> and I do have a full plate already.
>
> I might also procrastinate with the patch module removal patch, since
> that's another separate piece which might need more discussion.

I'll do this one as well.

Miroslav

2016-04-05 17:32:56

by Minfei Huang

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 12/14] livepatch: create per-task consistency model

On 03/25/16 at 02:34P, Josh Poimboeuf wrote:
> +static int klp_check_stack(struct task_struct *task)
> +{
> + static unsigned long entries[MAX_STACK_ENTRIES];
> + struct stack_trace trace;
> + struct klp_object *obj;
> + struct klp_func *func;
> + int ret;
> +
> + trace.skip = 0;
> + trace.nr_entries = 0;
> + trace.max_entries = MAX_STACK_ENTRIES;
> + trace.entries = entries;
> + ret = save_stack_trace_tsk_reliable(task, &trace);

Format the array entries before using it.

Thanks
Minfei

2016-04-05 21:17:12

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 12/14] livepatch: create per-task consistency model

On Wed, Apr 06, 2016 at 01:32:48AM +0800, Minfei Huang wrote:
> On 03/25/16 at 02:34P, Josh Poimboeuf wrote:
> > +static int klp_check_stack(struct task_struct *task)
> > +{
> > + static unsigned long entries[MAX_STACK_ENTRIES];
> > + struct stack_trace trace;
> > + struct klp_object *obj;
> > + struct klp_func *func;
> > + int ret;
> > +
> > + trace.skip = 0;
> > + trace.nr_entries = 0;
> > + trace.max_entries = MAX_STACK_ENTRIES;
> > + trace.entries = entries;
> > + ret = save_stack_trace_tsk_reliable(task, &trace);
>
> Format the array entries before using it.

Do you mean zero the array? If so, that's not necessary because
save_stack_trace_tsk_reliable() doesn't require it. But regardless it
will automatically be zeroed because it's a static variable.

--
Josh

2016-04-06 08:16:04

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 00/14] livepatch: hybrid consistency model

On Tue 2016-04-05 08:44:30, Josh Poimboeuf wrote:
> On Fri, Apr 01, 2016 at 03:39:44PM +0200, Petr Mladek wrote:
> > On Fri 2016-03-25 14:34:47, Josh Poimboeuf wrote:
> > > - update documentation for sysfs, proc, livepatch
> >
> > Also we should publish somewhere the information about TIF_KLP_NEED_UPDATE
> > flag, e.g. /proc/<pid>/klp_need_update. It is handy to see what
> > process blocks the transition. We have something similar in
> > kGraft, see in_progress_show() at
> > https://git.kernel.org/cgit/linux/kernel/git/jirislaby/kgraft.git/commit/?h=kgraft-4.4&id=1c82fbd7b1fe240f4ed178a6506a93033f6a4bed
>
> Patch 13/14 exposes the per-task patched state in /proc, so I think that
> already does what you're asking for? It doesn't expose
> TIF_KLP_NEED_UPDATE, but that's more of an internal detail I think.

Heh, I have missed it. Yup, the per-thread patch state info seems to
be enough.

Best Regards,
Petr

2016-04-06 13:06:23

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 05/14] sched: horrible way to detect whether a task has been preempted

On Fri 2016-03-25 14:34:52, Josh Poimboeuf wrote:
> This is a horrible way to detect whether a task has been preempted.
> Come up with something better: task flag? or is there already an
> existing mechanism?

What about using kallsyms_lookup_size_offset() to check the address.
It is more heavyweight but less hacky. The following code seems
to work for me:

bool in_preempt_schedule_irq(unsigned long addr)
{
static unsigned long size;

if (unlikely(!size)) {
int ret;

ret = kallsyms_lookup_size_offset(
(unsigned long)preempt_schedule_irq,
size, NULL);

/*
* Warn when the function is used without kallsyms or
* when it is unable to locate preempt_schedule_irq().
* Be conservative and always return true in this case.
*/
if (WARN_ON(!ret))
size = -1L;
}

return (addr - (unsigned long)preempt_schedule_irq <= size);
}


Best Regards,
Petr

2016-04-06 16:34:00

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 05/14] sched: horrible way to detect whether a task has been preempted

On Wed, Apr 06, 2016 at 03:06:19PM +0200, Petr Mladek wrote:
> On Fri 2016-03-25 14:34:52, Josh Poimboeuf wrote:
> > This is a horrible way to detect whether a task has been preempted.
> > Come up with something better: task flag? or is there already an
> > existing mechanism?
>
> What about using kallsyms_lookup_size_offset() to check the address.
> It is more heavyweight but less hacky. The following code seems
> to work for me:
>
> bool in_preempt_schedule_irq(unsigned long addr)
> {
> static unsigned long size;
>
> if (unlikely(!size)) {
> int ret;
>
> ret = kallsyms_lookup_size_offset(
> (unsigned long)preempt_schedule_irq,
> size, NULL);
>
> /*
> * Warn when the function is used without kallsyms or
> * when it is unable to locate preempt_schedule_irq().
> * Be conservative and always return true in this case.
> */
> if (WARN_ON(!ret))
> size = -1L;
> }
>
> return (addr - (unsigned long)preempt_schedule_irq <= size);
> }

Yeah, that would definitely be better. Though still somewhat hacky.

--
Josh

2016-04-07 09:47:05

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 05/14] sched: horrible way to detect whether a task has been preempted

On Wed 2016-04-06 11:33:56, Josh Poimboeuf wrote:
> On Wed, Apr 06, 2016 at 03:06:19PM +0200, Petr Mladek wrote:
> > On Fri 2016-03-25 14:34:52, Josh Poimboeuf wrote:
> > > This is a horrible way to detect whether a task has been preempted.
> > > Come up with something better: task flag? or is there already an
> > > existing mechanism?
> >
> > What about using kallsyms_lookup_size_offset() to check the address.
> > It is more heavyweight but less hacky. The following code seems
> > to work for me:
> >
> > bool in_preempt_schedule_irq(unsigned long addr)
> > {
> > static unsigned long size;
> >
> > if (unlikely(!size)) {
> > int ret;
> >
> > ret = kallsyms_lookup_size_offset(
> > (unsigned long)preempt_schedule_irq,
> > size, NULL);
^^^^
It works even better with &size ;-)

> >
> > /*
> > * Warn when the function is used without kallsyms or
> > * when it is unable to locate preempt_schedule_irq().
> > * Be conservative and always return true in this case.
> > */
> > if (WARN_ON(!ret))
> > size = -1L;
> > }
> >
> > return (addr - (unsigned long)preempt_schedule_irq <= size);
> > }
>
> Yeah, that would definitely be better. Though still somewhat hacky.

Yeah. Well this is the same approach that we use to check if a patched
function is on the stack. We could even move this check into the
livepatch code but then print_context_stack_reliable() will not
always give reliable results.

Best Regards,
Petr

2016-04-07 11:55:59

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 07/14] x86/stacktrace: add function for detecting reliable stack traces

On Fri 2016-03-25 14:34:54, Josh Poimboeuf wrote:
> For live patching and possibly other use cases, a stack trace is only
> useful if you can be assured that it's completely reliable. Add a new
> save_stack_trace_tsk_reliable() function to achieve that.
>
> Scenarios which indicate that a stack strace may be unreliable:
>
> - interrupt stacks
> - preemption
> - corrupted stack data
> - newly forked tasks
> - running tasks
> - the user didn't provide a large enough entries array
>
> Also add a config option so arch-independent code can determine at build
> time whether the function is implemented.
>
> diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
> index 3b10518..9c68bfc 100644
> --- a/arch/x86/kernel/dumpstack.c
> +++ b/arch/x86/kernel/dumpstack.c
> @@ -145,6 +145,42 @@ int print_context_stack_bp(struct thread_info *tinfo,
> }
> EXPORT_SYMBOL_GPL(print_context_stack_bp);
>
> +int print_context_stack_reliable(struct thread_info *tinfo,
> + unsigned long *stack, unsigned long *bp,
> + const struct stacktrace_ops *ops,
> + void *data, unsigned long *end, int *graph)
> +{
> + struct stack_frame *frame = (struct stack_frame *)*bp;
> + struct stack_frame *last_frame = frame;
> + unsigned long *ret_addr = &frame->return_address;
> +
> + if (test_ti_thread_flag(tinfo, TIF_FORK))
> + return -EINVAL;

Why exactly is a stack of a forked task unreliable, please?

There was some discussion about the possible stack state and the patch
state after forking, see
http://thread.gmane.org/gmane.linux.kernel/2184163/focus=2191057

Anyway, I think that the stack should be ready for use when the process
is visible in the task list. It means that it should be reliable.


> + while (valid_stack_ptr(tinfo, ret_addr, sizeof(*ret_addr), end)) {
> + unsigned long addr = *ret_addr;
> +
> + if (frame <= last_frame || !__kernel_text_address(addr) ||
> + in_preempt_schedule_irq(addr))

I wonder how exactly this works :-)

First, __kernel_text_address() returns true also for dynamically generated
ftrace handlers, see is_ftrace_trampoline(). Do we have a guarantee
that these functions generate a valid stack frame? We might want to
ignore these here.


Second, if I get it correctly, in_preempt_schedule_irq() works because
this functions is called only for tasks that are _not_ running.
It means that we must be exactly at

preempt_schedule_irq()
__schedule()
context_switch()
switch_to()

It means that preempt_schedule_irq() must be on the stack if at
least one of the other functions is not inlined.

As Jiri Kosina explained to me. We check it because it is
called on exit from an interrupt handler. The interrupt might
came at any time, for example, right before a function saves
the stack frame. This is why it makes the stack unreliable.

If I get it correctly, this is the only location when the
running process might get rescheduled from irq context. Other
possibilities keeps the process running and the stack is
marked unreliable elsewhere.

Well, I wonder if we should be more suspicious and make
sure that only the regular process stack is used.

Best Regards,
Petr

2016-04-07 12:10:36

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 00/14] livepatch: hybrid consistency model

On Fri 2016-03-25 14:34:47, Josh Poimboeuf wrote:
> TODO:
> - try ftrace handler switching idea from v1 cover letter

I have had a discussion about it with Mirek. This would help with
kthreads. If they are sleeping in a patched function, we wake
them up, this will help to migrate them before they get asleep again.

But it might be quite tricky. We must make sure to avoid a deadlock.
We probably should not check the stack in atomic context or
in time sensitive functions.

An alternative would be to check the stack and try migration
when the process goes into a sleep. It is a location where
we should not be afraid of any deadlocks or slight delay.
There should be high changes for a successful migration with
a minimal impact on the system throughput.

Best Regards,
Petr

2016-04-07 14:34:07

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 05/14] sched: horrible way to detect whether a task has been preempted

On Thu, Apr 07, 2016 at 11:47:00AM +0200, Petr Mladek wrote:
> On Wed 2016-04-06 11:33:56, Josh Poimboeuf wrote:
> > On Wed, Apr 06, 2016 at 03:06:19PM +0200, Petr Mladek wrote:
> > > On Fri 2016-03-25 14:34:52, Josh Poimboeuf wrote:
> > > > This is a horrible way to detect whether a task has been preempted.
> > > > Come up with something better: task flag? or is there already an
> > > > existing mechanism?
> > >
> > > What about using kallsyms_lookup_size_offset() to check the address.
> > > It is more heavyweight but less hacky. The following code seems
> > > to work for me:
> > >
> > > bool in_preempt_schedule_irq(unsigned long addr)
> > > {
> > > static unsigned long size;
> > >
> > > if (unlikely(!size)) {
> > > int ret;
> > >
> > > ret = kallsyms_lookup_size_offset(
> > > (unsigned long)preempt_schedule_irq,
> > > size, NULL);
> ^^^^
> It works even better with &size ;-)
>
> > >
> > > /*
> > > * Warn when the function is used without kallsyms or
> > > * when it is unable to locate preempt_schedule_irq().
> > > * Be conservative and always return true in this case.
> > > */
> > > if (WARN_ON(!ret))
> > > size = -1L;
> > > }
> > >
> > > return (addr - (unsigned long)preempt_schedule_irq <= size);
> > > }
> >
> > Yeah, that would definitely be better. Though still somewhat hacky.
>
> Yeah. Well this is the same approach that we use to check if a patched
> function is on the stack.

Oh, I agree that it's a good way to check if preempt_schedule_irq() is
on the stack. I'm just not convinced that's the cleanest way to ask
"has this task been preempted".

> We could even move this check into the livepatch code but then
> print_context_stack_reliable() will not always give reliable results.

Why would moving the check to the livepatch code affect the reliability
of print_context_stack_reliable()?

--
Josh

2016-04-07 14:46:59

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 07/14] x86/stacktrace: add function for detecting reliable stack traces

On Thu, Apr 07, 2016 at 01:55:52PM +0200, Petr Mladek wrote:
> On Fri 2016-03-25 14:34:54, Josh Poimboeuf wrote:
> > For live patching and possibly other use cases, a stack trace is only
> > useful if you can be assured that it's completely reliable. Add a new
> > save_stack_trace_tsk_reliable() function to achieve that.
> >
> > Scenarios which indicate that a stack strace may be unreliable:
> >
> > - interrupt stacks
> > - preemption
> > - corrupted stack data
> > - newly forked tasks
> > - running tasks
> > - the user didn't provide a large enough entries array
> >
> > Also add a config option so arch-independent code can determine at build
> > time whether the function is implemented.
> >
> > diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
> > index 3b10518..9c68bfc 100644
> > --- a/arch/x86/kernel/dumpstack.c
> > +++ b/arch/x86/kernel/dumpstack.c
> > @@ -145,6 +145,42 @@ int print_context_stack_bp(struct thread_info *tinfo,
> > }
> > EXPORT_SYMBOL_GPL(print_context_stack_bp);
> >
> > +int print_context_stack_reliable(struct thread_info *tinfo,
> > + unsigned long *stack, unsigned long *bp,
> > + const struct stacktrace_ops *ops,
> > + void *data, unsigned long *end, int *graph)
> > +{
> > + struct stack_frame *frame = (struct stack_frame *)*bp;
> > + struct stack_frame *last_frame = frame;
> > + unsigned long *ret_addr = &frame->return_address;
> > +
> > + if (test_ti_thread_flag(tinfo, TIF_FORK))
> > + return -EINVAL;
>
> Why exactly is a stack of a forked task unreliable, please?
>
> There was some discussion about the possible stack state and the patch
> state after forking, see
> http://thread.gmane.org/gmane.linux.kernel/2184163/focus=2191057
>
> Anyway, I think that the stack should be ready for use when the process
> is visible in the task list. It means that it should be reliable.

To be honest I don't remember exactly why I added this check. I think
it's related to the fact that newly forked tasks don't yet have a stack.
I should have definitely added a comment. I'll need to revisit it.

> > + while (valid_stack_ptr(tinfo, ret_addr, sizeof(*ret_addr), end)) {
> > + unsigned long addr = *ret_addr;
> > +
> > + if (frame <= last_frame || !__kernel_text_address(addr) ||
> > + in_preempt_schedule_irq(addr))
>
> I wonder how exactly this works :-)
>
> First, __kernel_text_address() returns true also for dynamically generated
> ftrace handlers, see is_ftrace_trampoline(). Do we have a guarantee
> that these functions generate a valid stack frame? We might want to
> ignore these here.

This is a good point. I think the ftrace code does the right thing.
But we don't necessarily have a way to guarantee that. Maybe we should
consider it unreliable.

> Second, if I get it correctly, in_preempt_schedule_irq() works because
> this functions is called only for tasks that are _not_ running.
> It means that we must be exactly at
>
> preempt_schedule_irq()
> __schedule()
> context_switch()
> switch_to()
>
> It means that preempt_schedule_irq() must be on the stack if at
> least one of the other functions is not inlined.
>
> As Jiri Kosina explained to me. We check it because it is
> called on exit from an interrupt handler. The interrupt might
> came at any time, for example, right before a function saves
> the stack frame. This is why it makes the stack unreliable.
>
> If I get it correctly, this is the only location when the
> running process might get rescheduled from irq context. Other
> possibilities keeps the process running and the stack is
> marked unreliable elsewhere.

Right. I guess that needs a better comment too.

> Well, I wonder if we should be more suspicious and make
> sure that only the regular process stack is used.

Notice the save_stack_stack_reliable() function, which is called by
dump_trace() when the task is running on an interrupt or exception
stack. It returns -EINVAL, so the stack gets marked unreliable. Does
that address your concern, or did you mean something else?

--
Josh

2016-04-07 15:08:20

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 00/14] livepatch: hybrid consistency model

On Thu, Apr 07, 2016 at 02:10:30PM +0200, Petr Mladek wrote:
> On Fri 2016-03-25 14:34:47, Josh Poimboeuf wrote:
> > TODO:
> > - try ftrace handler switching idea from v1 cover letter
>
> I have had a discussion about it with Mirek. This would help with
> kthreads. If they are sleeping in a patched function, we wake
> them up, this will help to migrate them before they get asleep again.
>
> But it might be quite tricky. We must make sure to avoid a deadlock.

I assume a deadlock could only occur if the function is changing locking
semantics, and it's up to the patch author to be careful? Or did I miss
the point?

> We probably should not check the stack in atomic context

Can you elaborate why not?

Regardless, this might be fine, if the only goal of this approach is to
transition kthreads (which I think it is).

> or in time sensitive functions.

Would it be up to the patch author to make this judgement?

> An alternative would be to check the stack and try migration
> when the process goes into a sleep. It is a location where
> we should not be afraid of any deadlocks or slight delay.
> There should be high changes for a successful migration with
> a minimal impact on the system throughput.

But if it's sleeping on a patched function as postulated above, that
doesn't solve the stated problem :-)

--
Josh

2016-04-07 15:47:04

by Jiri Kosina

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 00/14] livepatch: hybrid consistency model

On Thu, 7 Apr 2016, Josh Poimboeuf wrote:

> > > - try ftrace handler switching idea from v1 cover letter
[ ... ]
> > We probably should not check the stack in atomic context
>
> Can you elaborate why not?

I admittedly forgot what the "ftrace handler switching idea" is, and am
not sure where exactly to look for it (could you please point it to me so
that I can refresh my memory), but generally we can't assume that a memory
holding stack of a sleeping task hasn't been reclaimed and wouldn't need
to have been paged in again.

--
Jiri Kosina
SUSE Labs

2016-04-07 18:03:08

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 00/14] livepatch: hybrid consistency model

On Thu, Apr 07, 2016 at 05:47:00PM +0200, Jiri Kosina wrote:
> On Thu, 7 Apr 2016, Josh Poimboeuf wrote:
>
> > > > - try ftrace handler switching idea from v1 cover letter
> [ ... ]
> > > We probably should not check the stack in atomic context
> >
> > Can you elaborate why not?
>
> I admittedly forgot what the "ftrace handler switching idea" is, and am
> not sure where exactly to look for it (could you please point it to me so
> that I can refresh my memory)

Here's where I originally described it [1]:

| 2) As mentioned above, kthreads which are always sleeping on a patched function
| will never transition to the new universe. This is really a minor issue
| (less than 1% of patches). It's not necessarily something that needs to be
| resolved with this patch set, but it would be good to have some discussion
| about it regardless.
|
| To overcome this issue, I have 1/2 an idea: we could add some stack checking
| code to the ftrace handler itself to transition the kthread to the new
| universe after it re-enters the function it was originally sleeping on, if
| the stack doesn't already have have any other to-be-patched functions.
| Combined with the klp_transition_work_fn()'s periodic stack checking of
| sleeping tasks, that would handle most of the cases (except when trying to
| patch the high-level thread_fn itself).

> but generally we can't assume that a memory holding stack of a
> sleeping task hasn't been reclaimed and wouldn't need to have been
> paged in again.

Hm, we're talking about kernel stacks, right? Are they not always
resident in memory?


[1] https://lkml.kernel.org/r/[email protected]:

--
Josh

2016-04-07 18:33:07

by Jiri Kosina

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 00/14] livepatch: hybrid consistency model

On Thu, 7 Apr 2016, Josh Poimboeuf wrote:

> > I admittedly forgot what the "ftrace handler switching idea" is, and am
> > not sure where exactly to look for it (could you please point it to me so
> > that I can refresh my memory)
>
> Here's where I originally described it [1]:

Thanks!

> | 2) As mentioned above, kthreads which are always sleeping on a patched function
> | will never transition to the new universe. This is really a minor issue
> | (less than 1% of patches). It's not necessarily something that needs to be
> | resolved with this patch set, but it would be good to have some discussion
> | about it regardless.
> |
> | To overcome this issue, I have 1/2 an idea: we could add some stack checking
> | code to the ftrace handler itself to transition the kthread to the new
> | universe after it re-enters the function it was originally sleeping on, if
> | the stack doesn't already have have any other to-be-patched functions.
> | Combined with the klp_transition_work_fn()'s periodic stack checking of
> | sleeping tasks, that would handle most of the cases (except when trying to
> | patch the high-level thread_fn itself).
>
> > but generally we can't assume that a memory holding stack of a
> > sleeping task hasn't been reclaimed and wouldn't need to have been
> > paged in again.
>
> Hm, we're talking about kernel stacks, right? Are they not always
> resident in memory?

Sure they are, excuse my evening braindamage.

Thanks,

--
Jiri Kosina
SUSE Labs

2016-04-07 21:15:30

by Jessica Yu

[permalink] [raw]
Subject: Re: sched: horrible way to detect whether a task has been preempted

+++ Josh Poimboeuf [25/03/16 14:34 -0500]:
>This is a horrible way to detect whether a task has been preempted.
>Come up with something better: task flag? or is there already an
>existing mechanism?
>
>Signed-off-by: Josh Poimboeuf <[email protected]>
>---
> include/linux/sched.h | 11 ++++++++++-
> kernel/sched/core.c | 26 ++++++++++++++++++++++++--
> 2 files changed, 34 insertions(+), 3 deletions(-)
>
>diff --git a/include/linux/sched.h b/include/linux/sched.h
>index 589c478..62d0961 100644
>--- a/include/linux/sched.h
>+++ b/include/linux/sched.h
>@@ -3239,4 +3239,13 @@ struct update_util_data {
> void cpufreq_set_update_util_data(int cpu, struct update_util_data *data);
> #endif /* CONFIG_CPU_FREQ */
>
>-#endif
>+#ifdef CONFIG_PREEMPT
>+extern bool in_preempt_schedule_irq(unsigned long addr);
>+#else
>+static inline bool in_preempt_schedule_irq(unsigned long addr)
>+{
>+ return false;
>+}
>+#endif /* CONFIG_PREEMPT */
>+
>+#endif /* _LINUX_SCHED_H */
>diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>index d8465ee..be1ef22 100644
>--- a/kernel/sched/core.c
>+++ b/kernel/sched/core.c
>@@ -3334,8 +3334,6 @@ asmlinkage __visible void __sched notrace preempt_schedule_notrace(void)
> }
> EXPORT_SYMBOL_GPL(preempt_schedule_notrace);
>
>-#endif /* CONFIG_PREEMPT */
>-
> /*
> * this is the entry point to schedule() from kernel preemption
> * off of irq context.
>@@ -3360,8 +3358,32 @@ asmlinkage __visible void __sched preempt_schedule_irq(void)
> } while (need_resched());
>
> exception_exit(prev_state);
>+
>+ asm("preempt_schedule_irq_end:");
> }
>
>+/*
>+ * in_preempt_schedule_irq - determine if instruction address is inside the
>+ * preempt_schedule_irq() function
>+ *
>+ * This is used when walking the stack of a task to determine whether an
>+ * interrupt frame exists.
>+ *
>+ * NOTE: This function could return false if the address is in the function
>+ * epilogue. But it's good enough for our purposes, because we only care about
>+ * addresses which have been saved on a stack. If preempt_schedule_irq() is on
>+ * the stack of a task, the saved address will always be prior to the epilogue.
>+ */
>+bool in_preempt_schedule_irq(unsigned long addr)
>+{
>+ extern void *preempt_schedule_irq_end;
>+
>+ return (addr >= (unsigned long)preempt_schedule_irq &&
>+ addr < (unsigned long)preempt_schedule_irq_end);
>+}
>+
>+#endif /* CONFIG_PREEMPT */
>+
> int default_wake_function(wait_queue_t *curr, unsigned mode, int wake_flags,
> void *key)
> {

Been sort of rattling my head over the scheduler code :-) Just
following the calls in and out of __schedule() it doesn't look like
there is a current flag/mechanism to tell whether or not a task has
been preempted..

Is there any reason why you didn't just create a new task flag,
something like TIF_PREEMPTED_IRQ, which would be set once
preempt_schedule_irq() is entered and unset after __schedule() returns
(for that task)? This would roughly correspond to setting the task
flag when the frame for preempt_schedule_irq() is pushed and unsetting
it just before the frame preempt_schedule_irq() is popped for that
task. This seems simpler than walking through all the frames just to
see if in_preempt_schedule_irq() had been called. Would that work?

Jessica

2016-04-07 21:37:23

by Jiri Kosina

[permalink] [raw]
Subject: Re: sched: horrible way to detect whether a task has been preempted

On Thu, 7 Apr 2016, Jessica Yu wrote:

> Been sort of rattling my head over the scheduler code :-) Just following
> the calls in and out of __schedule() it doesn't look like there is a
> current flag/mechanism to tell whether or not a task has been
> preempted..

Performing the complete stack unwind just to determine whether task has
been preempted non-volutarily is a slight overkill indeed :/

> Is there any reason why you didn't just create a new task flag,
> something like TIF_PREEMPTED_IRQ, which would be set once
> preempt_schedule_irq() is entered and unset after __schedule() returns
> (for that task)? This would roughly correspond to setting the task flag
> when the frame for preempt_schedule_irq() is pushed and unsetting it
> just before the frame preempt_schedule_irq() is popped for that task.
> This seems simpler than walking through all the frames just to see if
> in_preempt_schedule_irq() had been called. Would that work?

Alternatively, without eating up a TIF_ space, it'd be possible to push a
magic contents on top of the stack in preempt_schedule_irq() (and pop it
once we are returning from there), and if such magic value is detected, we
just don't bother and claim unreliability.

That has advantages of both aproaches combined, i.e. it's relatively
low-cost in terms of performance penalty, and it's reliable (in a sense
that you don't have false positives).

The small disadvantage is that you can (very rarely, depending on the
chosen magic) have false negatives. That probably doesn't hurt too much,
given the high inprobability and non-lethal consequences.

How does that sound?

--
Jiri Kosina
SUSE Labs

2016-04-07 22:35:14

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: sched: horrible way to detect whether a task has been preempted

On Thu, Apr 07, 2016 at 11:37:19PM +0200, Jiri Kosina wrote:
> On Thu, 7 Apr 2016, Jessica Yu wrote:
>
> > Been sort of rattling my head over the scheduler code :-) Just following
> > the calls in and out of __schedule() it doesn't look like there is a
> > current flag/mechanism to tell whether or not a task has been
> > preempted..
>
> Performing the complete stack unwind just to determine whether task has
> been preempted non-volutarily is a slight overkill indeed :/
>
> > Is there any reason why you didn't just create a new task flag,
> > something like TIF_PREEMPTED_IRQ, which would be set once
> > preempt_schedule_irq() is entered and unset after __schedule() returns
> > (for that task)? This would roughly correspond to setting the task flag
> > when the frame for preempt_schedule_irq() is pushed and unsetting it
> > just before the frame preempt_schedule_irq() is popped for that task.
> > This seems simpler than walking through all the frames just to see if
> > in_preempt_schedule_irq() had been called. Would that work?
>
> Alternatively, without eating up a TIF_ space, it'd be possible to push a
> magic contents on top of the stack in preempt_schedule_irq() (and pop it
> once we are returning from there), and if such magic value is detected, we
> just don't bother and claim unreliability.
>
> That has advantages of both aproaches combined, i.e. it's relatively
> low-cost in terms of performance penalty, and it's reliable (in a sense
> that you don't have false positives).
>
> The small disadvantage is that you can (very rarely, depending on the
> chosen magic) have false negatives. That probably doesn't hurt too much,
> given the high inprobability and non-lethal consequences.
>
> How does that sound?

To do that from C code, I guess we'd still need some arch-specific code
in an asm() statement to do the actual push?

I think I'd prefer just updating some field in the task_struct. That
way it would be simple and arch-independent. And the stack walker
wouldn't have to scan for some special value on the stack.

--
Josh

2016-04-07 22:53:16

by Jiri Kosina

[permalink] [raw]
Subject: Re: sched: horrible way to detect whether a task has been preempted

On Thu, 7 Apr 2016, Josh Poimboeuf wrote:

> To do that from C code, I guess we'd still need some arch-specific code
> in an asm() statement to do the actual push?

This could potentially be worked around I believe (thinking for example of
a onstack-allocated local variable with predefined contents that the
compiler would not be allowed to optimize out; certainly not the only
option).

Thanks,

--
Jiri Kosina
SUSE Labs

2016-04-07 23:15:19

by Jessica Yu

[permalink] [raw]
Subject: Re: sched: horrible way to detect whether a task has been preempted

+++ Jiri Kosina [07/04/16 23:37 +0200]:
>On Thu, 7 Apr 2016, Jessica Yu wrote:
>
>> Been sort of rattling my head over the scheduler code :-) Just following
>> the calls in and out of __schedule() it doesn't look like there is a
>> current flag/mechanism to tell whether or not a task has been
>> preempted..
>
>Performing the complete stack unwind just to determine whether task has
>been preempted non-volutarily is a slight overkill indeed :/
>
>> Is there any reason why you didn't just create a new task flag,
>> something like TIF_PREEMPTED_IRQ, which would be set once
>> preempt_schedule_irq() is entered and unset after __schedule() returns
>> (for that task)? This would roughly correspond to setting the task flag
>> when the frame for preempt_schedule_irq() is pushed and unsetting it
>> just before the frame preempt_schedule_irq() is popped for that task.
>> This seems simpler than walking through all the frames just to see if
>> in_preempt_schedule_irq() had been called. Would that work?
>
>Alternatively, without eating up a TIF_ space, it'd be possible to push a
>magic contents on top of the stack in preempt_schedule_irq() (and pop it
>once we are returning from there), and if such magic value is detected, we
>just don't bother and claim unreliability.

Ah, but wouldn't we still have to walk through the frames (i.e. enter
the loop in patch 7/14) to look for the magic value in this approach?

>That has advantages of both aproaches combined, i.e. it's relatively
>low-cost in terms of performance penalty, and it's reliable (in a sense
>that you don't have false positives).
>
>The small disadvantage is that you can (very rarely, depending on the
>chosen magic) have false negatives. That probably doesn't hurt too much,
>given the high inprobability and non-lethal consequences.
>
>How does that sound?
>
>--
>Jiri Kosina
>SUSE Labs
>

2016-04-08 07:05:31

by Jiri Kosina

[permalink] [raw]
Subject: Re: sched: horrible way to detect whether a task has been preempted

On Thu, 7 Apr 2016, Jessica Yu wrote:

> > Alternatively, without eating up a TIF_ space, it'd be possible to push a
> > magic contents on top of the stack in preempt_schedule_irq() (and pop it
> > once we are returning from there), and if such magic value is detected, we
> > just don't bother and claim unreliability.
>
> Ah, but wouldn't we still have to walk through the frames (i.e. enter
> the loop in patch 7/14) to look for the magic value in this approach?

The idea was that it'd be located at a place to which saved stack pointer
of the sleeping task is pointing to (or at a fixed offset from it).

--
Jiri Kosina
SUSE Labs

2016-04-08 08:03:11

by Petr Mladek

[permalink] [raw]
Subject: Re: sched: horrible way to detect whether a task has been preempted

On Fri 2016-04-08 09:05:28, Jiri Kosina wrote:
> On Thu, 7 Apr 2016, Jessica Yu wrote:
>
> > > Alternatively, without eating up a TIF_ space, it'd be possible to push a
> > > magic contents on top of the stack in preempt_schedule_irq() (and pop it
> > > once we are returning from there), and if such magic value is detected, we
> > > just don't bother and claim unreliability.
> >
> > Ah, but wouldn't we still have to walk through the frames (i.e. enter
> > the loop in patch 7/14) to look for the magic value in this approach?
>
> The idea was that it'd be located at a place to which saved stack pointer
> of the sleeping task is pointing to (or at a fixed offset from it).

It is an interesting idea but it looks even more hacky than checking
the frame pointers and return values.

Checking the stack might be an overkill but we already do this for all
the other patched functions.

The big advantage about checking the stack is that it does not add
any overhead to the scheduler code, does not eat any TIF flag or
memory. The overhead is only when we are migrating a task and it is
charged to a separate process.

Best Regards,
Petr

2016-04-08 08:07:18

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 05/14] sched: horrible way to detect whether a task has been preempted

On Thu 2016-04-07 09:34:03, Josh Poimboeuf wrote:
> On Thu, Apr 07, 2016 at 11:47:00AM +0200, Petr Mladek wrote:
> > On Wed 2016-04-06 11:33:56, Josh Poimboeuf wrote:
> > > On Wed, Apr 06, 2016 at 03:06:19PM +0200, Petr Mladek wrote:
> > We could even move this check into the livepatch code but then
> > print_context_stack_reliable() will not always give reliable results.
>
> Why would moving the check to the livepatch code affect the reliability
> of print_context_stack_reliable()?

print_context_stack_reliable() is a generic function that might
eventualy be used also outside livepatch code. If there is
preempt_schedule_irq() on the stack, it means that the rest
of the stack might be unreliable and it should be detected
by the function itself.

Let's forget the idea of moving the check into the livepatch
code :-)

Best Regards,
Petr

2016-04-08 08:24:21

by Petr Mladek

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 07/14] x86/stacktrace: add function for detecting reliable stack traces

On Thu 2016-04-07 09:46:55, Josh Poimboeuf wrote:
> On Thu, Apr 07, 2016 at 01:55:52PM +0200, Petr Mladek wrote:
> > Well, I wonder if we should be more suspicious and make
> > sure that only the regular process stack is used.
>
> Notice the save_stack_stack_reliable() function, which is called by
> dump_trace() when the task is running on an interrupt or exception
> stack. It returns -EINVAL, so the stack gets marked unreliable. Does
> that address your concern, or did you mean something else?

I see. It does what I wanted.

Thanks,
Petr

2016-04-08 14:31:35

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: sched: horrible way to detect whether a task has been preempted

On Fri, Apr 08, 2016 at 10:03:04AM +0200, Petr Mladek wrote:
> On Fri 2016-04-08 09:05:28, Jiri Kosina wrote:
> > On Thu, 7 Apr 2016, Jessica Yu wrote:
> >
> > > > Alternatively, without eating up a TIF_ space, it'd be possible to push a
> > > > magic contents on top of the stack in preempt_schedule_irq() (and pop it
> > > > once we are returning from there), and if such magic value is detected, we
> > > > just don't bother and claim unreliability.
> > >
> > > Ah, but wouldn't we still have to walk through the frames (i.e. enter
> > > the loop in patch 7/14) to look for the magic value in this approach?
> >
> > The idea was that it'd be located at a place to which saved stack pointer
> > of the sleeping task is pointing to (or at a fixed offset from it).
>
> It is an interesting idea but it looks even more hacky than checking
> the frame pointers and return values.
>
> Checking the stack might be an overkill but we already do this for all
> the other patched functions.
>
> The big advantage about checking the stack is that it does not add
> any overhead to the scheduler code, does not eat any TIF flag or
> memory. The overhead is only when we are migrating a task and it is
> charged to a separate process.

My biggest concern about checking the stack for preempt_schedule_irq()
is that it's kind of brittle:

- What if the preemption code changes such that it's no longer a
reliable indicator? For example, what if preempt_schedule_irq() is
only called in some places, and a new __preempt_schedule_irq() is
called elsewhere?

- Or due to some obscure gcc optimization like partial inlining or
sibling tail calls, preempt_schedule_irq() doesn't show up on the
stack?

- Or the code could silently break if there were another static
preempt_schedule_irq symbol somewhere (though we could prevent this by
searching all symbols to ensure there are no duplicates).

These scenarios are unlikely, but they could conceivably happen...

Anyway, we really wouldn't have to eat a TIF flag. We could instead add
something to task_struct. I can at least propose it. If anybody
doesn't like it, maybe they'll suggest something else, or maybe then we
can go with checking the stack.

--
Josh

2016-04-08 14:34:37

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 05/14] sched: horrible way to detect whether a task has been preempted

On Fri, Apr 08, 2016 at 10:07:10AM +0200, Petr Mladek wrote:
> On Thu 2016-04-07 09:34:03, Josh Poimboeuf wrote:
> > On Thu, Apr 07, 2016 at 11:47:00AM +0200, Petr Mladek wrote:
> > > On Wed 2016-04-06 11:33:56, Josh Poimboeuf wrote:
> > > > On Wed, Apr 06, 2016 at 03:06:19PM +0200, Petr Mladek wrote:
> > > We could even move this check into the livepatch code but then
> > > print_context_stack_reliable() will not always give reliable results.
> >
> > Why would moving the check to the livepatch code affect the reliability
> > of print_context_stack_reliable()?
>
> print_context_stack_reliable() is a generic function that might
> eventualy be used also outside livepatch code. If there is
> preempt_schedule_irq() on the stack, it means that the rest
> of the stack might be unreliable and it should be detected
> by the function itself.

Ah, I see now. I actually thought you meant something else (moving
in_preempt_schedule_irq() itself to livepatch code, but still calling it
from print_context_stack_reliable()).

> Let's forget the idea of moving the check into the livepatch
> code :-)

Agreed :-)

--
Josh

2016-04-11 03:29:55

by Jessica Yu

[permalink] [raw]
Subject: Re: x86/stacktrace: add function for detecting reliable stack traces

+++ Josh Poimboeuf [25/03/16 14:34 -0500]:
>For live patching and possibly other use cases, a stack trace is only
>useful if you can be assured that it's completely reliable. Add a new
>save_stack_trace_tsk_reliable() function to achieve that.
>
>Scenarios which indicate that a stack strace may be unreliable:

s/strace/trace

>
>- interrupt stacks
>- preemption
>- corrupted stack data
>- newly forked tasks
>- running tasks
>- the user didn't provide a large enough entries array
>
>Also add a config option so arch-independent code can determine at build
>time whether the function is implemented.
>
>Signed-off-by: Josh Poimboeuf <[email protected]>
>---
> arch/Kconfig | 6 ++++++
> arch/x86/Kconfig | 1 +
> arch/x86/kernel/dumpstack.c | 36 ++++++++++++++++++++++++++++++++++++
> arch/x86/kernel/stacktrace.c | 32 ++++++++++++++++++++++++++++++++
> include/linux/stacktrace.h | 20 ++++++++++++++++----
> kernel/stacktrace.c | 4 ++--
> lib/Kconfig.debug | 6 ++++++
> 7 files changed, 99 insertions(+), 6 deletions(-)
>
>diff --git a/arch/Kconfig b/arch/Kconfig
>index 81869a5..68b95f1 100644
>--- a/arch/Kconfig
>+++ b/arch/Kconfig
>@@ -589,6 +589,12 @@ config HAVE_STACK_VALIDATION
> Architecture supports the 'objtool check' host tool command, which
> performs compile-time stack metadata validation.
>
>+config HAVE_RELIABLE_STACKTRACE
>+ bool
>+ help
>+ Architecure has a save_stack_trace_tsk_reliable() function which only

s/Architecure/Architecture

2016-04-11 03:31:34

by Jessica Yu

[permalink] [raw]
Subject: Re: livepatch: separate enabled and patched states

+++ Josh Poimboeuf [25/03/16 14:34 -0500]:
>Once we have a consistency model, patches and their objects will be
>enabled and disabled at different times. For example, when a patch is
>disabled, its loaded objects' funcs can remain registered with ftrace
>indefinitely until the unpatching operation is complete and they're no
>longer in use.
>
>It's less confusing if we give them different names: patches can be
>enabled or disabled; objects (and their funcs) can be patched or
>unpatched:
>
>- Enabled means that a patch is logically enabled (but not necessarily
> fully applied).
>
>- Patched means that an object's funcs are registered with ftrace and
> added to the klp_ops func stack.
>
>Also, since these states are binary, represent them with booleans
>instead of ints.
>
>Signed-off-by: Josh Poimboeuf <[email protected]>
>---
> include/linux/livepatch.h | 17 ++++-------
> kernel/livepatch/core.c | 72 +++++++++++++++++++++++------------------------
> 2 files changed, 42 insertions(+), 47 deletions(-)
>
>diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
>index bd830d5..6d45dc7 100644
>--- a/include/linux/livepatch.h
>+++ b/include/linux/livepatch.h
>@@ -28,11 +28,6 @@
>
> #include <asm/livepatch.h>
>
>-enum klp_state {
>- KLP_DISABLED,
>- KLP_ENABLED
>-};
>-
> /**
> * struct klp_func - function structure for live patching
> * @old_name: name of the function to be patched
>@@ -41,8 +36,8 @@ enum klp_state {
> * can be found (optional)
> * @old_addr: the address of the function being patched
> * @kobj: kobject for sysfs resources
>- * @state: tracks function-level patch application state
> * @stack_node: list node for klp_ops func_stack list
>+ * @patched: the func has been added to the klp_ops list
> */
> struct klp_func {
> /* external */
>@@ -60,8 +55,8 @@ struct klp_func {
> /* internal */
> unsigned long old_addr;
> struct kobject kobj;
>- enum klp_state state;
> struct list_head stack_node;
>+ bool patched;
> };
>
> /**
>@@ -90,7 +85,7 @@ struct klp_reloc {
> * @kobj: kobject for sysfs resources
> * @mod: kernel module associated with the patched object
> * (NULL for vmlinux)
>- * @state: tracks object-level patch application state
>+ * @patched: the object's funcs have been add to the klp_ops list

s/add/added :)

2016-04-11 08:38:52

by Petr Mladek

[permalink] [raw]
Subject: Re: sched: horrible way to detect whether a task has been preempted

On Fri 2016-04-08 09:31:31, Josh Poimboeuf wrote:
> On Fri, Apr 08, 2016 at 10:03:04AM +0200, Petr Mladek wrote:
> > The big advantage about checking the stack is that it does not add
> > any overhead to the scheduler code, does not eat any TIF flag or
> > memory. The overhead is only when we are migrating a task and it is
> > charged to a separate process.
>
> My biggest concern about checking the stack for preempt_schedule_irq()
> is that it's kind of brittle:
>
> - What if the preemption code changes such that it's no longer a
> reliable indicator? For example, what if preempt_schedule_irq() is
> only called in some places, and a new __preempt_schedule_irq() is
> called elsewhere?
>
> - Or due to some obscure gcc optimization like partial inlining or
> sibling tail calls, preempt_schedule_irq() doesn't show up on the
> stack?
>
> - Or the code could silently break if there were another static
> preempt_schedule_irq symbol somewhere (though we could prevent this by
> searching all symbols to ensure there are no duplicates).
>
> These scenarios are unlikely, but they could conceivably happen...

You are right.

> Anyway, we really wouldn't have to eat a TIF flag. We could instead add
> something to task_struct. I can at least propose it. If anybody
> doesn't like it, maybe they'll suggest something else, or maybe then we
> can go with checking the stack.

Yup, we should give the extra task struct stuff a try. Another
advantage is that it would be easier to port for another
architectures.

Best Regards,
Petr

2016-04-11 14:16:23

by Jiri Slaby

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 07/14] x86/stacktrace: add function for detecting reliable stack traces

On 04/04/2016, 07:54 PM, Josh Poimboeuf wrote:
> On Thu, Mar 31, 2016 at 03:03:16PM +0200, Miroslav Benes wrote:
>> On Fri, 25 Mar 2016, Josh Poimboeuf wrote:
>>
>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>>> index 2dc18605..76274b8 100644
>>> --- a/arch/x86/Kconfig
>>> +++ b/arch/x86/Kconfig
>>> @@ -138,6 +138,7 @@ config X86
>>> select HAVE_PERF_REGS
>>> select HAVE_PERF_USER_STACK_DUMP
>>> select HAVE_REGS_AND_STACK_ACCESS_API
>>> + select HAVE_RELIABLE_STACKTRACE if X86_64 && FRAME_POINTER
>>
>> I understand we have to rely on frame pointer for now. Do you plan to
>> switch to dwarf unwinder one day in the future? IOW is there a plan to
>> implement dwarf stuff generation in objtool and then to have a dwarf-based
>> stack unwinder upstream and to use it for live patching?
>
> Yes, adding DWARF validation and generation to objtool and creating a
> DWARF unwinder upstream are all planned, hopefully soon.

I may seem to be obtrusive, but here, I would like to offer my help
again. If you need any kind of help with the unwinder, I can definitely
help with that. Be it coding, testing, ideas discussion or anything else.

We had been using unwinder for over a decade in SUSE but it stopped
working for assembly recently (for obvious reasons). So having a working
and reliable unwinder again is one of the top priorities for us.

thanks,
--
js
suse labs


Attachments:
signature.asc (819.00 B)
OpenPGP digital signature

2016-04-12 14:44:57

by Chris J Arges

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 08/14] livepatch: separate enabled and patched states

On Fri, Mar 25, 2016 at 02:34:55PM -0500, Josh Poimboeuf wrote:
> Once we have a consistency model, patches and their objects will be
> enabled and disabled at different times. For example, when a patch is
> disabled, its loaded objects' funcs can remain registered with ftrace
> indefinitely until the unpatching operation is complete and they're no
> longer in use.
>
> It's less confusing if we give them different names: patches can be
> enabled or disabled; objects (and their funcs) can be patched or
> unpatched:
>
> - Enabled means that a patch is logically enabled (but not necessarily
> fully applied).
>
> - Patched means that an object's funcs are registered with ftrace and
> added to the klp_ops func stack.
>
> Also, since these states are binary, represent them with booleans
> instead of ints.
>

Josh,

Awesome work here first of all!

Looking through the patchset a bit I see the following bools:
- functions: patched, transitioning
- objects: patched
- patches: enabled

It seems this reflects the following states at a patch level:
disabled - module inserted, not yet logically enabled
enabled - logically enabled, but not all objects/functions patched
transitioning - objects/functions are being applied or reverted
patched - all objects/functions patched

However each object and function could have the same state and the parent object
just reflects the 'aggregate state'. For example if all funcs in an object are
patched then the object is also patched.

Perhaps we need more states (or maybe there will be more in the future), but
wouldn't this just be easier to have something like for each patch, object, and
function?

enum klp_state{
KLP_DISABLED,
KLP_ENABLED,
KLP_TRANSITION,
KLP_PATCHED,
}


I'm happy to help out too.

--chris


> Signed-off-by: Josh Poimboeuf <[email protected]>
> ---
> include/linux/livepatch.h | 17 ++++-------
> kernel/livepatch/core.c | 72 +++++++++++++++++++++++------------------------
> 2 files changed, 42 insertions(+), 47 deletions(-)
>
> diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
> index bd830d5..6d45dc7 100644
> --- a/include/linux/livepatch.h
> +++ b/include/linux/livepatch.h
> @@ -28,11 +28,6 @@
>
> #include <asm/livepatch.h>
>
> -enum klp_state {
> - KLP_DISABLED,
> - KLP_ENABLED
> -};
> -
> /**
> * struct klp_func - function structure for live patching
> * @old_name: name of the function to be patched
> @@ -41,8 +36,8 @@ enum klp_state {
> * can be found (optional)
> * @old_addr: the address of the function being patched
> * @kobj: kobject for sysfs resources
> - * @state: tracks function-level patch application state
> * @stack_node: list node for klp_ops func_stack list
> + * @patched: the func has been added to the klp_ops list
> */
> struct klp_func {
> /* external */
> @@ -60,8 +55,8 @@ struct klp_func {
> /* internal */
> unsigned long old_addr;
> struct kobject kobj;
> - enum klp_state state;
> struct list_head stack_node;
> + bool patched;
> };
>
> /**
> @@ -90,7 +85,7 @@ struct klp_reloc {
> * @kobj: kobject for sysfs resources
> * @mod: kernel module associated with the patched object
> * (NULL for vmlinux)
> - * @state: tracks object-level patch application state
> + * @patched: the object's funcs have been add to the klp_ops list
> */
> struct klp_object {
> /* external */
> @@ -101,7 +96,7 @@ struct klp_object {
> /* internal */
> struct kobject kobj;
> struct module *mod;
> - enum klp_state state;
> + bool patched;
> };
>
> /**
> @@ -110,7 +105,7 @@ struct klp_object {
> * @objs: object entries for kernel objects to be patched
> * @list: list node for global list of registered patches
> * @kobj: kobject for sysfs resources
> - * @state: tracks patch-level application state
> + * @enabled: the patch is enabled (but operation may be incomplete)
> */
> struct klp_patch {
> /* external */
> @@ -120,7 +115,7 @@ struct klp_patch {
> /* internal */
> struct list_head list;
> struct kobject kobj;
> - enum klp_state state;
> + bool enabled;
> };
>
> #define klp_for_each_object(patch, obj) \
> diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> index d68fbf6..be1e106 100644
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -298,11 +298,11 @@ unlock:
> rcu_read_unlock();
> }
>
> -static void klp_disable_func(struct klp_func *func)
> +static void klp_unpatch_func(struct klp_func *func)
> {
> struct klp_ops *ops;
>
> - if (WARN_ON(func->state != KLP_ENABLED))
> + if (WARN_ON(!func->patched))
> return;
> if (WARN_ON(!func->old_addr))
> return;
> @@ -322,10 +322,10 @@ static void klp_disable_func(struct klp_func *func)
> list_del_rcu(&func->stack_node);
> }
>
> - func->state = KLP_DISABLED;
> + func->patched = false;
> }
>
> -static int klp_enable_func(struct klp_func *func)
> +static int klp_patch_func(struct klp_func *func)
> {
> struct klp_ops *ops;
> int ret;
> @@ -333,7 +333,7 @@ static int klp_enable_func(struct klp_func *func)
> if (WARN_ON(!func->old_addr))
> return -EINVAL;
>
> - if (WARN_ON(func->state != KLP_DISABLED))
> + if (WARN_ON(func->patched))
> return -EINVAL;
>
> ops = klp_find_ops(func->old_addr);
> @@ -372,7 +372,7 @@ static int klp_enable_func(struct klp_func *func)
> list_add_rcu(&func->stack_node, &ops->func_stack);
> }
>
> - func->state = KLP_ENABLED;
> + func->patched = true;
>
> return 0;
>
> @@ -383,36 +383,36 @@ err:
> return ret;
> }
>
> -static void klp_disable_object(struct klp_object *obj)
> +static void klp_unpatch_object(struct klp_object *obj)
> {
> struct klp_func *func;
>
> klp_for_each_func(obj, func)
> - if (func->state == KLP_ENABLED)
> - klp_disable_func(func);
> + if (func->patched)
> + klp_unpatch_func(func);
>
> - obj->state = KLP_DISABLED;
> + obj->patched = false;
> }
>
> -static int klp_enable_object(struct klp_object *obj)
> +static int klp_patch_object(struct klp_object *obj)
> {
> struct klp_func *func;
> int ret;
>
> - if (WARN_ON(obj->state != KLP_DISABLED))
> + if (WARN_ON(obj->patched))
> return -EINVAL;
>
> if (WARN_ON(!klp_is_object_loaded(obj)))
> return -EINVAL;
>
> klp_for_each_func(obj, func) {
> - ret = klp_enable_func(func);
> + ret = klp_patch_func(func);
> if (ret) {
> - klp_disable_object(obj);
> + klp_unpatch_object(obj);
> return ret;
> }
> }
> - obj->state = KLP_ENABLED;
> + obj->patched = true;
>
> return 0;
> }
> @@ -423,17 +423,17 @@ static int __klp_disable_patch(struct klp_patch *patch)
>
> /* enforce stacking: only the last enabled patch can be disabled */
> if (!list_is_last(&patch->list, &klp_patches) &&
> - list_next_entry(patch, list)->state == KLP_ENABLED)
> + list_next_entry(patch, list)->enabled)
> return -EBUSY;
>
> pr_notice("disabling patch '%s'\n", patch->mod->name);
>
> klp_for_each_object(patch, obj) {
> - if (obj->state == KLP_ENABLED)
> - klp_disable_object(obj);
> + if (obj->patched)
> + klp_unpatch_object(obj);
> }
>
> - patch->state = KLP_DISABLED;
> + patch->enabled = false;
>
> return 0;
> }
> @@ -457,7 +457,7 @@ int klp_disable_patch(struct klp_patch *patch)
> goto err;
> }
>
> - if (patch->state == KLP_DISABLED) {
> + if (!patch->enabled) {
> ret = -EINVAL;
> goto err;
> }
> @@ -475,12 +475,12 @@ static int __klp_enable_patch(struct klp_patch *patch)
> struct klp_object *obj;
> int ret;
>
> - if (WARN_ON(patch->state != KLP_DISABLED))
> + if (WARN_ON(patch->enabled))
> return -EINVAL;
>
> /* enforce stacking: only the first disabled patch can be enabled */
> if (patch->list.prev != &klp_patches &&
> - list_prev_entry(patch, list)->state == KLP_DISABLED)
> + !list_prev_entry(patch, list)->enabled)
> return -EBUSY;
>
> pr_notice_once("tainting kernel with TAINT_LIVEPATCH\n");
> @@ -492,12 +492,12 @@ static int __klp_enable_patch(struct klp_patch *patch)
> if (!klp_is_object_loaded(obj))
> continue;
>
> - ret = klp_enable_object(obj);
> + ret = klp_patch_object(obj);
> if (ret)
> goto unregister;
> }
>
> - patch->state = KLP_ENABLED;
> + patch->enabled = true;
>
> return 0;
>
> @@ -555,20 +555,20 @@ static ssize_t enabled_store(struct kobject *kobj, struct kobj_attribute *attr,
> if (ret)
> return -EINVAL;
>
> - if (val != KLP_DISABLED && val != KLP_ENABLED)
> + if (val > 1)
> return -EINVAL;
>
> patch = container_of(kobj, struct klp_patch, kobj);
>
> mutex_lock(&klp_mutex);
>
> - if (val == patch->state) {
> + if (patch->enabled == val) {
> /* already in requested state */
> ret = -EINVAL;
> goto err;
> }
>
> - if (val == KLP_ENABLED) {
> + if (val) {
> ret = __klp_enable_patch(patch);
> if (ret)
> goto err;
> @@ -593,7 +593,7 @@ static ssize_t enabled_show(struct kobject *kobj,
> struct klp_patch *patch;
>
> patch = container_of(kobj, struct klp_patch, kobj);
> - return snprintf(buf, PAGE_SIZE-1, "%d\n", patch->state);
> + return snprintf(buf, PAGE_SIZE-1, "%d\n", patch->enabled);
> }
>
> static struct kobj_attribute enabled_kobj_attr = __ATTR_RW(enabled);
> @@ -684,7 +684,7 @@ static void klp_free_patch(struct klp_patch *patch)
> static int klp_init_func(struct klp_object *obj, struct klp_func *func)
> {
> INIT_LIST_HEAD(&func->stack_node);
> - func->state = KLP_DISABLED;
> + func->patched = false;
>
> /* The format for the sysfs directory is <function,sympos> where sympos
> * is the nth occurrence of this symbol in kallsyms for the patched
> @@ -729,7 +729,7 @@ static int klp_init_object(struct klp_patch *patch, struct klp_object *obj)
> if (!obj->funcs)
> return -EINVAL;
>
> - obj->state = KLP_DISABLED;
> + obj->patched = false;
> obj->mod = NULL;
>
> klp_find_object_module(obj);
> @@ -770,7 +770,7 @@ static int klp_init_patch(struct klp_patch *patch)
>
> mutex_lock(&klp_mutex);
>
> - patch->state = KLP_DISABLED;
> + patch->enabled = false;
>
> ret = kobject_init_and_add(&patch->kobj, &klp_ktype_patch,
> klp_root_kobj, "%s", patch->mod->name);
> @@ -816,7 +816,7 @@ int klp_unregister_patch(struct klp_patch *patch)
> goto out;
> }
>
> - if (patch->state == KLP_ENABLED) {
> + if (patch->enabled) {
> ret = -EBUSY;
> goto out;
> }
> @@ -897,13 +897,13 @@ int klp_module_coming(struct module *mod)
> goto err;
> }
>
> - if (patch->state == KLP_DISABLED)
> + if (!patch->enabled)
> break;
>
> pr_notice("applying patch '%s' to loading module '%s'\n",
> patch->mod->name, obj->mod->name);
>
> - ret = klp_enable_object(obj);
> + ret = klp_patch_object(obj);
> if (ret) {
> pr_warn("failed to apply patch '%s' to module '%s' (%d)\n",
> patch->mod->name, obj->mod->name, ret);
> @@ -954,10 +954,10 @@ void klp_module_going(struct module *mod)
> if (!klp_is_module(obj) || strcmp(obj->name, mod->name))
> continue;
>
> - if (patch->state != KLP_DISABLED) {
> + if (patch->enabled) {
> pr_notice("reverting patch '%s' on unloading module '%s'\n",
> patch->mod->name, obj->mod->name);
> - klp_disable_object(obj);
> + klp_unpatch_object(obj);
> }
>
> klp_free_object_loaded(obj);
> --
> 2.4.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe live-patching" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2016-04-12 15:57:01

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 07/14] x86/stacktrace: add function for detecting reliable stack traces

On Mon, Apr 11, 2016 at 04:16:14PM +0200, Jiri Slaby wrote:
> On 04/04/2016, 07:54 PM, Josh Poimboeuf wrote:
> > On Thu, Mar 31, 2016 at 03:03:16PM +0200, Miroslav Benes wrote:
> >> On Fri, 25 Mar 2016, Josh Poimboeuf wrote:
> >>
> >>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> >>> index 2dc18605..76274b8 100644
> >>> --- a/arch/x86/Kconfig
> >>> +++ b/arch/x86/Kconfig
> >>> @@ -138,6 +138,7 @@ config X86
> >>> select HAVE_PERF_REGS
> >>> select HAVE_PERF_USER_STACK_DUMP
> >>> select HAVE_REGS_AND_STACK_ACCESS_API
> >>> + select HAVE_RELIABLE_STACKTRACE if X86_64 && FRAME_POINTER
> >>
> >> I understand we have to rely on frame pointer for now. Do you plan to
> >> switch to dwarf unwinder one day in the future? IOW is there a plan to
> >> implement dwarf stuff generation in objtool and then to have a dwarf-based
> >> stack unwinder upstream and to use it for live patching?
> >
> > Yes, adding DWARF validation and generation to objtool and creating a
> > DWARF unwinder upstream are all planned, hopefully soon.
>
> I may seem to be obtrusive, but here, I would like to offer my help
> again. If you need any kind of help with the unwinder, I can definitely
> help with that. Be it coding, testing, ideas discussion or anything else.
>
> We had been using unwinder for over a decade in SUSE but it stopped
> working for assembly recently (for obvious reasons). So having a working
> and reliable unwinder again is one of the top priorities for us.

Since you already have an unwinder in SUSE, it sounds like what you need
most urgently is DWARF generation support in objtool so that your
unwinder will work again for asm code, right? The kernel already emits
DWARF today (for C code), so an upstream unwinder isn't needed first.

That's high on my TODO list, and the coding is probably 70-80% done. I
don't want to hold up progress... But I don't really feel comfortable
passing the code off because it's quite a big feature with some major
changes to objtool. Being the objtool author, I want to make sure to
give it my full attention.

I'm spread quite thin at the moment but I do hope to have v1 of those
patches soon, hopefully May-ish.

--
Josh

2016-04-12 17:16:06

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 08/14] livepatch: separate enabled and patched states

On Tue, Apr 12, 2016 at 09:44:43AM -0500, Chris J Arges wrote:
> On Fri, Mar 25, 2016 at 02:34:55PM -0500, Josh Poimboeuf wrote:
> > Once we have a consistency model, patches and their objects will be
> > enabled and disabled at different times. For example, when a patch is
> > disabled, its loaded objects' funcs can remain registered with ftrace
> > indefinitely until the unpatching operation is complete and they're no
> > longer in use.
> >
> > It's less confusing if we give them different names: patches can be
> > enabled or disabled; objects (and their funcs) can be patched or
> > unpatched:
> >
> > - Enabled means that a patch is logically enabled (but not necessarily
> > fully applied).
> >
> > - Patched means that an object's funcs are registered with ftrace and
> > added to the klp_ops func stack.
> >
> > Also, since these states are binary, represent them with booleans
> > instead of ints.
> >
>
> Josh,
>
> Awesome work here first of all!
>
> Looking through the patchset a bit I see the following bools:
> - functions: patched, transitioning
> - objects: patched
> - patches: enabled
>
> It seems this reflects the following states at a patch level:
> disabled - module inserted, not yet logically enabled
> enabled - logically enabled, but not all objects/functions patched
> transitioning - objects/functions are being applied or reverted
> patched - all objects/functions patched
>
> However each object and function could have the same state and the parent object
> just reflects the 'aggregate state'. For example if all funcs in an object are
> patched then the object is also patched.
>
> Perhaps we need more states (or maybe there will be more in the future), but
> wouldn't this just be easier to have something like for each patch, object, and
> function?
>
> enum klp_state{
> KLP_DISABLED,
> KLP_ENABLED,
> KLP_TRANSITION,
> KLP_PATCHED,
> }
>
>
> I'm happy to help out too.

Thanks for the comments. First let me try to explain why I chose two
bools rather than a single state variable.

At a func level, it's always in one of the following states:

patched=0 transition=0: unpatched
patched=0 transition=1: unpatched, temporary starting state
patched=1 transition=1: patched, may be visible to some tasks
patched=1 transition=0: patched, visible to all tasks

And for unpatching, it goes in the reverse order:

patched=1 transition=0: patched, visible to all tasks
patched=1 transition=1: patched, may be visible to some tasks
patched=0 transition=1: unpatched, temporary ending state
patched=0 transition=0: unpatched

(note to self, put the above in a comment somewhere)

Now, we could convert the states from two bools into a single enum. But
I think it would complicate the code. Because nowhere in the code does
it need to access the full state. In some places it accesses
func->patched and in other places it accesses func->transition, but it
never needs to access both.

So for example, the following check in klp_ftrace_handler():

if (unlikely(func->transition)) {

would change to:

if (unlikely(func->state == KLP_ENABLED || func->state == KLP_TRANSITION)) {

Sure, we could use a helper function to make that more readable. But
with the bools its clearer and you don't need a helper function.

As another example, see the following code in klp_complete_transition():

klp_for_each_object(klp_transition_patch, obj)
klp_for_each_func(obj, func)
func->transition = false;

The last line would have to be changed to something like:

if (patching...)
func->state = KLP_PATCHED;
else /* unpatching */
func->state = KLP_DISABLED;

So that's why I picked two bools over a single state variable: it seems
to make the code simpler.

As to the other idea about copying the func states to the object and
patch level, I get the feeling that would also complicate the code. We
patch and transition at a function granularity, so the "real" state is
at the func level. Proliferating that state to objects and patches
might be tricky to get right, and it could make it harder to understand
exactly what's going on in the code. And I don't really see a benefit
to doing that.

--
Josh

2016-04-12 17:36:00

by Chris J Arges

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 08/14] livepatch: separate enabled and patched states

On Tue, Apr 12, 2016 at 12:16:00PM -0500, Josh Poimboeuf wrote:
> On Tue, Apr 12, 2016 at 09:44:43AM -0500, Chris J Arges wrote:
> > On Fri, Mar 25, 2016 at 02:34:55PM -0500, Josh Poimboeuf wrote:
> > > Once we have a consistency model, patches and their objects will be
> > > enabled and disabled at different times. For example, when a patch is
> > > disabled, its loaded objects' funcs can remain registered with ftrace
> > > indefinitely until the unpatching operation is complete and they're no
> > > longer in use.
> > >
> > > It's less confusing if we give them different names: patches can be
> > > enabled or disabled; objects (and their funcs) can be patched or
> > > unpatched:
> > >
> > > - Enabled means that a patch is logically enabled (but not necessarily
> > > fully applied).
> > >
> > > - Patched means that an object's funcs are registered with ftrace and
> > > added to the klp_ops func stack.
> > >
> > > Also, since these states are binary, represent them with booleans
> > > instead of ints.
> > >
> >
> > Josh,
> >
> > Awesome work here first of all!
> >
> > Looking through the patchset a bit I see the following bools:
> > - functions: patched, transitioning
> > - objects: patched
> > - patches: enabled
> >
> > It seems this reflects the following states at a patch level:
> > disabled - module inserted, not yet logically enabled
> > enabled - logically enabled, but not all objects/functions patched
> > transitioning - objects/functions are being applied or reverted
> > patched - all objects/functions patched
> >
> > However each object and function could have the same state and the parent object
> > just reflects the 'aggregate state'. For example if all funcs in an object are
> > patched then the object is also patched.
> >
> > Perhaps we need more states (or maybe there will be more in the future), but
> > wouldn't this just be easier to have something like for each patch, object, and
> > function?
> >
> > enum klp_state{
> > KLP_DISABLED,
> > KLP_ENABLED,
> > KLP_TRANSITION,
> > KLP_PATCHED,
> > }
> >
> >
> > I'm happy to help out too.
>
> Thanks for the comments. First let me try to explain why I chose two
> bools rather than a single state variable.
>
> At a func level, it's always in one of the following states:
>
> patched=0 transition=0: unpatched
> patched=0 transition=1: unpatched, temporary starting state
> patched=1 transition=1: patched, may be visible to some tasks
> patched=1 transition=0: patched, visible to all tasks
>
> And for unpatching, it goes in the reverse order:
>
> patched=1 transition=0: patched, visible to all tasks
> patched=1 transition=1: patched, may be visible to some tasks
> patched=0 transition=1: unpatched, temporary ending state
> patched=0 transition=0: unpatched
>
> (note to self, put the above in a comment somewhere)
>

This is helpful and makes more sense.

> Now, we could convert the states from two bools into a single enum. But
> I think it would complicate the code. Because nowhere in the code does
> it need to access the full state. In some places it accesses
> func->patched and in other places it accesses func->transition, but it
> never needs to access both.
>
> So for example, the following check in klp_ftrace_handler():
>
> if (unlikely(func->transition)) {
>
> would change to:
>
> if (unlikely(func->state == KLP_ENABLED || func->state == KLP_TRANSITION)) {
>
> Sure, we could use a helper function to make that more readable. But
> with the bools its clearer and you don't need a helper function.
>
> As another example, see the following code in klp_complete_transition():
>
> klp_for_each_object(klp_transition_patch, obj)
> klp_for_each_func(obj, func)
> func->transition = false;
>
> The last line would have to be changed to something like:
>
> if (patching...)
> func->state = KLP_PATCHED;
> else /* unpatching */
> func->state = KLP_DISABLED;
>
> So that's why I picked two bools over a single state variable: it seems
> to make the code simpler.
>
> As to the other idea about copying the func states to the object and
> patch level, I get the feeling that would also complicate the code. We
> patch and transition at a function granularity, so the "real" state is
> at the func level. Proliferating that state to objects and patches
> might be tricky to get right, and it could make it harder to understand
> exactly what's going on in the code. And I don't really see a benefit
> to doing that.
>
> --
> Josh
>

I think keeping it simple makes a ton of sense, thanks for the explanation.

I'm also envisioning how to troubleshoot patches that don't converge.

So if someone wanted to check if a function has been fully patched on all tasks
they would check:

/sys/kernel/livepatch/<patch>/transition
/sys/kernel/livepatch/<patch>/patched

And see if patched=1 and transition=0 things are applied.

However if patched=1 and transition=1 then if a user wanted to dig down and see
which pid's we were waiting on we could do:

cat /proc/*/patch_status

and check if any pid's patch_status values still contain KLP_UNIVERSE_OLD.

If we wanted to see which function in the task needs patching we could:
cat /proc/<pid>/stack
and see if anything in that stack contains the functions in question.

Anyway I'll keep looking at this patchset, patching using the samples/livepatch
code works for me without issue so far.

--chris

2016-04-12 18:25:22

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 08/14] livepatch: separate enabled and patched states

On Tue, Apr 12, 2016 at 12:35:49PM -0500, Chris J Arges wrote:
> On Tue, Apr 12, 2016 at 12:16:00PM -0500, Josh Poimboeuf wrote:
> > On Tue, Apr 12, 2016 at 09:44:43AM -0500, Chris J Arges wrote:
> > > On Fri, Mar 25, 2016 at 02:34:55PM -0500, Josh Poimboeuf wrote:
> > > > Once we have a consistency model, patches and their objects will be
> > > > enabled and disabled at different times. For example, when a patch is
> > > > disabled, its loaded objects' funcs can remain registered with ftrace
> > > > indefinitely until the unpatching operation is complete and they're no
> > > > longer in use.
> > > >
> > > > It's less confusing if we give them different names: patches can be
> > > > enabled or disabled; objects (and their funcs) can be patched or
> > > > unpatched:
> > > >
> > > > - Enabled means that a patch is logically enabled (but not necessarily
> > > > fully applied).
> > > >
> > > > - Patched means that an object's funcs are registered with ftrace and
> > > > added to the klp_ops func stack.
> > > >
> > > > Also, since these states are binary, represent them with booleans
> > > > instead of ints.
> > > >
> > >
> > > Josh,
> > >
> > > Awesome work here first of all!
> > >
> > > Looking through the patchset a bit I see the following bools:
> > > - functions: patched, transitioning
> > > - objects: patched
> > > - patches: enabled
> > >
> > > It seems this reflects the following states at a patch level:
> > > disabled - module inserted, not yet logically enabled
> > > enabled - logically enabled, but not all objects/functions patched
> > > transitioning - objects/functions are being applied or reverted
> > > patched - all objects/functions patched
> > >
> > > However each object and function could have the same state and the parent object
> > > just reflects the 'aggregate state'. For example if all funcs in an object are
> > > patched then the object is also patched.
> > >
> > > Perhaps we need more states (or maybe there will be more in the future), but
> > > wouldn't this just be easier to have something like for each patch, object, and
> > > function?
> > >
> > > enum klp_state{
> > > KLP_DISABLED,
> > > KLP_ENABLED,
> > > KLP_TRANSITION,
> > > KLP_PATCHED,
> > > }
> > >
> > >
> > > I'm happy to help out too.
> >
> > Thanks for the comments. First let me try to explain why I chose two
> > bools rather than a single state variable.
> >
> > At a func level, it's always in one of the following states:
> >
> > patched=0 transition=0: unpatched
> > patched=0 transition=1: unpatched, temporary starting state
> > patched=1 transition=1: patched, may be visible to some tasks
> > patched=1 transition=0: patched, visible to all tasks
> >
> > And for unpatching, it goes in the reverse order:
> >
> > patched=1 transition=0: patched, visible to all tasks
> > patched=1 transition=1: patched, may be visible to some tasks
> > patched=0 transition=1: unpatched, temporary ending state
> > patched=0 transition=0: unpatched
> >
> > (note to self, put the above in a comment somewhere)
> >
>
> This is helpful and makes more sense.
>
> > Now, we could convert the states from two bools into a single enum. But
> > I think it would complicate the code. Because nowhere in the code does
> > it need to access the full state. In some places it accesses
> > func->patched and in other places it accesses func->transition, but it
> > never needs to access both.
> >
> > So for example, the following check in klp_ftrace_handler():
> >
> > if (unlikely(func->transition)) {
> >
> > would change to:
> >
> > if (unlikely(func->state == KLP_ENABLED || func->state == KLP_TRANSITION)) {
> >
> > Sure, we could use a helper function to make that more readable. But
> > with the bools its clearer and you don't need a helper function.
> >
> > As another example, see the following code in klp_complete_transition():
> >
> > klp_for_each_object(klp_transition_patch, obj)
> > klp_for_each_func(obj, func)
> > func->transition = false;
> >
> > The last line would have to be changed to something like:
> >
> > if (patching...)
> > func->state = KLP_PATCHED;
> > else /* unpatching */
> > func->state = KLP_DISABLED;
> >
> > So that's why I picked two bools over a single state variable: it seems
> > to make the code simpler.
> >
> > As to the other idea about copying the func states to the object and
> > patch level, I get the feeling that would also complicate the code. We
> > patch and transition at a function granularity, so the "real" state is
> > at the func level. Proliferating that state to objects and patches
> > might be tricky to get right, and it could make it harder to understand
> > exactly what's going on in the code. And I don't really see a benefit
> > to doing that.
> >
> > --
> > Josh
> >
>
> I think keeping it simple makes a ton of sense, thanks for the explanation.
>
> I'm also envisioning how to troubleshoot patches that don't converge.
>
> So if someone wanted to check if a function has been fully patched on all tasks
> they would check:
>
> /sys/kernel/livepatch/<patch>/transition
> /sys/kernel/livepatch/<patch>/patched
>
> And see if patched=1 and transition=0 things are applied.
>
> However if patched=1 and transition=1 then if a user wanted to dig down and see
> which pid's we were waiting on we could do:
>
> cat /proc/*/patch_status
>
> and check if any pid's patch_status values still contain KLP_UNIVERSE_OLD.
>
> If we wanted to see which function in the task needs patching we could:
> cat /proc/<pid>/stack
> and see if anything in that stack contains the functions in question.

Yeah. It's not ideal that the user has to check multiple places to see
the "state". But IMO that's not a big deal, and a user space tool
should be able to make it more user friendly.

> Anyway I'll keep looking at this patchset, patching using the
> samples/livepatch code works for me without issue so far.

Ok, thanks!

--
Josh

2016-04-14 08:47:09

by Miroslav Benes

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 14/14] livepatch: update task universe when exiting kernel

On Fri, 25 Mar 2016, Josh Poimboeuf wrote:

> Update a tasks's universe when returning from a system call or user
> space interrupt, or after handling a signal.
>
> This greatly increases the chances of a patch operation succeeding. If
> a task is I/O bound, it can switch universes when returning from a
> system call. If a task is CPU bound, it can switch universes when
> returning from an interrupt. If a task is sleeping on a to-be-patched
> function, the user can send SIGSTOP and SIGCONT to force it to switch.
>
> Since the idle "swapper" tasks don't ever exit the kernel, they're
> updated from within the idle loop.

Well, I am still not familiarized enough with Andy's recent rework of
entry stuff, but I think all of this is correct. Maybe I would add
a note to the changelog, that since TIF_KLP_NEED_UPDATE is defined 14th
bit it is also automatically included in _TIF_ALLWORK_MASKS.

> Signed-off-by: Josh Poimboeuf <[email protected]>
> ---
> arch/x86/entry/common.c | 6 +++++-
> arch/x86/include/asm/thread_info.h | 2 ++
> include/linux/livepatch.h | 2 ++
> kernel/livepatch/transition.c | 37 +++++++++++++++++++++++++++++++++----
> kernel/sched/idle.c | 4 ++++
> 5 files changed, 46 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index e79d93d..94639dd 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -21,6 +21,7 @@
> #include <linux/context_tracking.h>
> #include <linux/user-return-notifier.h>
> #include <linux/uprobes.h>
> +#include <linux/livepatch.h>
>
> #include <asm/desc.h>
> #include <asm/traps.h>
> @@ -202,7 +203,7 @@ long syscall_trace_enter(struct pt_regs *regs)
>
> #define EXIT_TO_USERMODE_LOOP_FLAGS \
> (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE | \
> - _TIF_NEED_RESCHED | _TIF_USER_RETURN_NOTIFY)
> + _TIF_NEED_RESCHED | _TIF_USER_RETURN_NOTIFY | _TIF_KLP_NEED_UPDATE)
>
> static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
> {
> @@ -236,6 +237,9 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
> if (cached_flags & _TIF_USER_RETURN_NOTIFY)
> fire_user_return_notifiers();
>
> + if (unlikely(cached_flags & _TIF_KLP_NEED_UPDATE))
> + klp_update_task_universe(current);
> +

There is a comment at the beginning of this function which should be
updated as well I think.

Miroslav

2016-04-14 08:50:33

by Miroslav Benes

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 14/14] livepatch: update task universe when exiting kernel

On Thu, 14 Apr 2016, Miroslav Benes wrote:

> On Fri, 25 Mar 2016, Josh Poimboeuf wrote:
>
> > Update a tasks's universe when returning from a system call or user
> > space interrupt, or after handling a signal.
> >
> > This greatly increases the chances of a patch operation succeeding. If
> > a task is I/O bound, it can switch universes when returning from a
> > system call. If a task is CPU bound, it can switch universes when
> > returning from an interrupt. If a task is sleeping on a to-be-patched
> > function, the user can send SIGSTOP and SIGCONT to force it to switch.
> >
> > Since the idle "swapper" tasks don't ever exit the kernel, they're
> > updated from within the idle loop.
>
> Well, I am still not familiarized enough with Andy's recent rework of
> entry stuff, but I think all of this is correct. Maybe I would add
> a note to the changelog, that since TIF_KLP_NEED_UPDATE is defined 14th
> bit it is also automatically included in _TIF_ALLWORK_MASKS.

And I forgot to add that I would try to prepare similar thing for s390 and
maybe powerpc (taking recent development there into account). That's gonna
be fun :)

Miroslav

2016-04-14 09:25:25

by Miroslav Benes

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 12/14] livepatch: create per-task consistency model

On Fri, 25 Mar 2016, Josh Poimboeuf wrote:

> Add a basic per-task consistency model. This is the foundation which
> will eventually enable us to patch those ~10% of security patches which
> change function prototypes and/or data semantics.
>
> When a patch is enabled, livepatch enters into a transition state where
> tasks are converging from the old universe to the new universe. If a
> given task isn't using any of the patched functions, it's switched to
> the new universe. Once all the tasks have been converged to the new
> universe, patching is complete.
>
> The same sequence occurs when a patch is disabled, except the tasks
> converge from the new universe to the old universe.
>
> The /sys/kernel/livepatch/<patch>/transition file shows whether a patch
> is in transition. Only a single patch (the topmost patch on the stack)
> can be in transition at a given time. A patch can remain in the
> transition state indefinitely, if any of the tasks are stuck in the
> previous universe.
>
> A transition can be reversed and effectively canceled by writing the
> opposite value to the /sys/kernel/livepatch/<patch>/enabled file while
> the transition is in progress. Then all the tasks will attempt to
> converge back to the original universe.

So after spending some time with this patch I must say hats off to you. I
haven't managed to find anything serious yet and I doubt I would. Few
things below.

> +/*
> + * klp_update_task_universe() - change the patched state of a task
> + * @task: The task to change
> + *
> + * Converts the patched state of the task so that it will switch to the set of
> + * functions in the goal universe.
> + */
> +static inline void klp_update_task_universe(struct task_struct *task)
> +{
> + /*
> + * The corresponding write barriers are in klp_init_transition() and
> + * klp_start_transition(). See the comments there for an explanation.
> + */
> + smp_rmb();
> +
> + task->klp_universe = klp_universe_goal;
> +}

I wonder whether we should introduce a static_key for this function.
klp_update_task_universe() is called in some important code paths and it
is called everytime if CONFIG_LIVEPATCH is on. It does not matter much for
syscall paths, because we enter slowpath there only if TIF_KLP_NEED_UPDATE
is set and that is set iff patching is in progress. But we call the
function also elsewhere. So maybe it would be nice to introduce static_key
in the function which would be on if the patching is in progress and off
if not.

Maybe it is just an overoptimization though.

[...]

> +static bool klp_try_switch_task(struct task_struct *task)
> +{
> + struct rq *rq;
> + unsigned long flags;
> + int ret;
> + bool success = false;
> +
> + /* check if this task has already switched over */
> + if (task->klp_universe == klp_universe_goal)
> + return true;

[...]

> +/*
> + * This function can be called in the middle of an existing transition to
> + * reverse the direction of the universe goal. This can be done to effectively
> + * cancel an existing enable or disable operation if there are any tasks which
> + * are stuck in the starting universe.
> + */
> +void klp_reverse_transition(void)
> +{
> + struct klp_patch *patch = klp_transition_patch;
> +
> + klp_start_transition(!klp_universe_goal);
> + klp_try_complete_transition();
> +
> + patch->enabled = !patch->enabled;
> +}

This function could be called iff the patching is in progress, there are
some not-yet-migrated tasks and we wait for scheduled workqueue to run
klp_try_complete_transition() again, right? I have two questions.

1. Is the call to klp_try_complete_transition() here necessary? It would
be called via workqueue, wouldn't it? I suspect we don't gain much with
this and we introduce possible future problems because of "double
success" of klp_try_complete_transition() (nothing serious there as of
now though). But I might be wrong because I got really confused by this
piece of code and its context :)

2. klp_reverse_transition() works well because not-yet-migrated tasks are
still in KLP_UNIVERSE_OLD (without loss of generality) and since
klp_universe_goal is swapped they are in fact in their target universe.
klp_try_switch_task() returns immediately in such case. The rest is
migrated in an ordinary way.

What about TIF_KLP_NEED_UPDATE (I know it is introduced in later patch but
it is easier to discuss it here)? klp_start_transition() resets the flag
for all the task. Shouldn't the flag be cleared for the task in
klp_try_switch_task() if task->klp_universe == klp_universe_goal? In fact
it does not matter if it is set. The patching success is determined by
klp_func->klp_universe booleans. But those tasks would needlessly go
through syscall slowpaths (see note above about static_key) which could
hurt performance.

Does this make sense?

[...]

transition.h:
>
> +extern void klp_init_transition(struct klp_patch *patch, int universe);
> +extern void klp_start_transition(int universe);
> +extern void klp_reverse_transition(void);
> +extern bool klp_try_complete_transition(void);
> +extern void klp_complete_transition(void);

externs are superfluous here and we do not have them in other header
files.

Anyway, great job!

Miroslav

2016-04-14 13:23:24

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 14/14] livepatch: update task universe when exiting kernel

On Thu, Apr 14, 2016 at 10:47:04AM +0200, Miroslav Benes wrote:
> On Fri, 25 Mar 2016, Josh Poimboeuf wrote:
>
> > Update a tasks's universe when returning from a system call or user
> > space interrupt, or after handling a signal.
> >
> > This greatly increases the chances of a patch operation succeeding. If
> > a task is I/O bound, it can switch universes when returning from a
> > system call. If a task is CPU bound, it can switch universes when
> > returning from an interrupt. If a task is sleeping on a to-be-patched
> > function, the user can send SIGSTOP and SIGCONT to force it to switch.
> >
> > Since the idle "swapper" tasks don't ever exit the kernel, they're
> > updated from within the idle loop.
>
> Well, I am still not familiarized enough with Andy's recent rework of
> entry stuff, but I think all of this is correct. Maybe I would add
> a note to the changelog, that since TIF_KLP_NEED_UPDATE is defined 14th
> bit it is also automatically included in _TIF_ALLWORK_MASKS.

To be honest, putting it in the range of _TIF_ALLWORK_MASK was an
accident. I think the comments in thread_info.h need to be improved a
bit to make that clearer.

> > Signed-off-by: Josh Poimboeuf <[email protected]>
> > ---
> > arch/x86/entry/common.c | 6 +++++-
> > arch/x86/include/asm/thread_info.h | 2 ++
> > include/linux/livepatch.h | 2 ++
> > kernel/livepatch/transition.c | 37 +++++++++++++++++++++++++++++++++----
> > kernel/sched/idle.c | 4 ++++
> > 5 files changed, 46 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> > index e79d93d..94639dd 100644
> > --- a/arch/x86/entry/common.c
> > +++ b/arch/x86/entry/common.c
> > @@ -21,6 +21,7 @@
> > #include <linux/context_tracking.h>
> > #include <linux/user-return-notifier.h>
> > #include <linux/uprobes.h>
> > +#include <linux/livepatch.h>
> >
> > #include <asm/desc.h>
> > #include <asm/traps.h>
> > @@ -202,7 +203,7 @@ long syscall_trace_enter(struct pt_regs *regs)
> >
> > #define EXIT_TO_USERMODE_LOOP_FLAGS \
> > (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE | \
> > - _TIF_NEED_RESCHED | _TIF_USER_RETURN_NOTIFY)
> > + _TIF_NEED_RESCHED | _TIF_USER_RETURN_NOTIFY | _TIF_KLP_NEED_UPDATE)
> >
> > static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
> > {
> > @@ -236,6 +237,9 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
> > if (cached_flags & _TIF_USER_RETURN_NOTIFY)
> > fire_user_return_notifiers();
> >
> > + if (unlikely(cached_flags & _TIF_KLP_NEED_UPDATE))
> > + klp_update_task_universe(current);
> > +
>
> There is a comment at the beginning of this function which should be
> updated as well I think.

Yeah, agreed.

--
Josh

2016-04-14 13:39:55

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 14/14] livepatch: update task universe when exiting kernel

On Thu, Apr 14, 2016 at 10:50:28AM +0200, Miroslav Benes wrote:
> On Thu, 14 Apr 2016, Miroslav Benes wrote:
>
> > On Fri, 25 Mar 2016, Josh Poimboeuf wrote:
> >
> > > Update a tasks's universe when returning from a system call or user
> > > space interrupt, or after handling a signal.
> > >
> > > This greatly increases the chances of a patch operation succeeding. If
> > > a task is I/O bound, it can switch universes when returning from a
> > > system call. If a task is CPU bound, it can switch universes when
> > > returning from an interrupt. If a task is sleeping on a to-be-patched
> > > function, the user can send SIGSTOP and SIGCONT to force it to switch.
> > >
> > > Since the idle "swapper" tasks don't ever exit the kernel, they're
> > > updated from within the idle loop.
> >
> > Well, I am still not familiarized enough with Andy's recent rework of
> > entry stuff, but I think all of this is correct. Maybe I would add
> > a note to the changelog, that since TIF_KLP_NEED_UPDATE is defined 14th
> > bit it is also automatically included in _TIF_ALLWORK_MASKS.
>
> And I forgot to add that I would try to prepare similar thing for s390 and
> maybe powerpc (taking recent development there into account). That's gonna
> be fun :)

Yeah, good point. I've glanced at the entry code for both architectures
and I don't think it'll be too bad, though the devil's in the details.

--
Josh

2016-04-14 16:39:58

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 12/14] livepatch: create per-task consistency model

On Thu, Apr 14, 2016 at 11:25:18AM +0200, Miroslav Benes wrote:
> On Fri, 25 Mar 2016, Josh Poimboeuf wrote:
>
> > Add a basic per-task consistency model. This is the foundation which
> > will eventually enable us to patch those ~10% of security patches which
> > change function prototypes and/or data semantics.
> >
> > When a patch is enabled, livepatch enters into a transition state where
> > tasks are converging from the old universe to the new universe. If a
> > given task isn't using any of the patched functions, it's switched to
> > the new universe. Once all the tasks have been converged to the new
> > universe, patching is complete.
> >
> > The same sequence occurs when a patch is disabled, except the tasks
> > converge from the new universe to the old universe.
> >
> > The /sys/kernel/livepatch/<patch>/transition file shows whether a patch
> > is in transition. Only a single patch (the topmost patch on the stack)
> > can be in transition at a given time. A patch can remain in the
> > transition state indefinitely, if any of the tasks are stuck in the
> > previous universe.
> >
> > A transition can be reversed and effectively canceled by writing the
> > opposite value to the /sys/kernel/livepatch/<patch>/enabled file while
> > the transition is in progress. Then all the tasks will attempt to
> > converge back to the original universe.
>
> So after spending some time with this patch I must say hats off to you. I
> haven't managed to find anything serious yet and I doubt I would. Few
> things below.

Great!

> > +/*
> > + * klp_update_task_universe() - change the patched state of a task
> > + * @task: The task to change
> > + *
> > + * Converts the patched state of the task so that it will switch to the set of
> > + * functions in the goal universe.
> > + */
> > +static inline void klp_update_task_universe(struct task_struct *task)
> > +{
> > + /*
> > + * The corresponding write barriers are in klp_init_transition() and
> > + * klp_start_transition(). See the comments there for an explanation.
> > + */
> > + smp_rmb();
> > +
> > + task->klp_universe = klp_universe_goal;
> > +}
>
> I wonder whether we should introduce a static_key for this function.
> klp_update_task_universe() is called in some important code paths and it
> is called everytime if CONFIG_LIVEPATCH is on. It does not matter much for
> syscall paths, because we enter slowpath there only if TIF_KLP_NEED_UPDATE
> is set and that is set iff patching is in progress. But we call the
> function also elsewhere. So maybe it would be nice to introduce static_key
> in the function which would be on if the patching is in progress and off
> if not.
>
> Maybe it is just an overoptimization though.

klp_update_task_universe() is called from:

- exit_to_usermode_loop() (slow path for syscall/irq exit)
- copy_process() (though I'll be changing this code so children inherit
parents' flags)
- init_idle()
- cpu_idle_loop()
- various places in livepatch code

I think none of those are fast paths, so my feeling is that a static key
would be overkill.

> [...]
>
> > +static bool klp_try_switch_task(struct task_struct *task)
> > +{
> > + struct rq *rq;
> > + unsigned long flags;
> > + int ret;
> > + bool success = false;
> > +
> > + /* check if this task has already switched over */
> > + if (task->klp_universe == klp_universe_goal)
> > + return true;
>
> [...]
>
> > +/*
> > + * This function can be called in the middle of an existing transition to
> > + * reverse the direction of the universe goal. This can be done to effectively
> > + * cancel an existing enable or disable operation if there are any tasks which
> > + * are stuck in the starting universe.
> > + */
> > +void klp_reverse_transition(void)
> > +{
> > + struct klp_patch *patch = klp_transition_patch;
> > +
> > + klp_start_transition(!klp_universe_goal);
> > + klp_try_complete_transition();
> > +
> > + patch->enabled = !patch->enabled;
> > +}
>
> This function could be called iff the patching is in progress, there are
> some not-yet-migrated tasks and we wait for scheduled workqueue to run
> klp_try_complete_transition() again, right? I have two questions.
>
> 1. Is the call to klp_try_complete_transition() here necessary? It would
> be called via workqueue, wouldn't it? I suspect we don't gain much with
> this and we introduce possible future problems because of "double
> success" of klp_try_complete_transition() (nothing serious there as of
> now though). But I might be wrong because I got really confused by this
> piece of code and its context :)

Yeah, the call to klp_try_complete_transition() isn't necessary because
of the workqueue. It's still kind of nice to have though, because it
tries to transition immediately rather than waiting for the work (and we
might end up deciding to change the delayed work frequency to be less
often than once per second).

I can't really envision that future bugs related to "double success"
would be much to worry about. The work function does check for
klp_transition_patch beforehand, and if it didn't,
klp_complete_transition() would die loudly with a NULL pointer
dereference. Of course, it's always hard to anticipate future code
changes and I could be completely wrong.

So I would vote to keep this call, but I don't feel too strongly about
it since it isn't strictly needed.

> 2. klp_reverse_transition() works well because not-yet-migrated tasks are
> still in KLP_UNIVERSE_OLD (without loss of generality) and since
> klp_universe_goal is swapped they are in fact in their target universe.
> klp_try_switch_task() returns immediately in such case. The rest is
> migrated in an ordinary way.
>
> What about TIF_KLP_NEED_UPDATE (I know it is introduced in later patch but
> it is easier to discuss it here)? klp_start_transition() resets the flag
> for all the task. Shouldn't the flag be cleared for the task in
> klp_try_switch_task() if task->klp_universe == klp_universe_goal? In fact
> it does not matter if it is set. The patching success is determined by
> klp_func->klp_universe booleans. But those tasks would needlessly go
> through syscall slowpaths (see note above about static_key) which could
> hurt performance.
>
> Does this make sense?

I agree with you, but for a different reason :-)

And actually there's another way this could happen: if a task is forked
during klp_start_transition() after setting the universe goal but before
setting the thread flags, it would already be transitioned but its flag
would get unnecessarily set. (Though as I mentioned above, the
copy_process() code will be changed anyway, so that should no longer be
an issue).

But I don't think it's really a performance issue, because:

1) Assuming a reliable stack, in most cases the majority of tasks are
transitioned immediately. So when reversing the transition, there
would be only a few tasks that would get the flag set unnecessarily.

2) For those tasks which do have the flag set unnecessarily, only the
next syscall will do the slow path. After that the flag gets
cleared. So it only slows down a single syscall for each task. At a
macro level, that's just a drop in the bucket for performance and,
IMO, not worth worrying about and adding (potentially buggy) code
for.

However, I can think of another reason setting the flag unnecessarily
might be bad: it's just conceptually confusing and could maybe lead to
bugs. (ok, here we go trying to predict the future again!)

Specifically, after klp_complete_transition(), a reasonable livepatch
developer (is that an oxymoron? ;-)) might assume that all tasks'
TIF_KLP_NEED_UPDATE flags are cleared and might write code which makes
that assumption.

And, even with today's code it's kind of hard to convince myself that
having a spurious TIF_KLP_NEED_UPDATE flag set from a previous patching
operation won't cause issues on the next patch.

So maybe we should do as you suggest and clear the flag in
klp_try_switch_task() if the task is already in the goal universe.

Or if we wanted to be paranoid we could just clear all the flags in
klp_complete_transition(), but maybe that's a little too sloppy/lazy?

> [...]
>
> transition.h:
> >
> > +extern void klp_init_transition(struct klp_patch *patch, int universe);
> > +extern void klp_start_transition(int universe);
> > +extern void klp_reverse_transition(void);
> > +extern bool klp_try_complete_transition(void);
> > +extern void klp_complete_transition(void);
>
> externs are superfluous here and we do not have them in other header
> files.

Ah, right.

> Anyway, great job!

Thanks! And thanks for the thoughtful reviews!

--
Josh

2016-04-15 09:18:09

by Miroslav Benes

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 12/14] livepatch: create per-task consistency model

On Thu, 14 Apr 2016, Josh Poimboeuf wrote:

> On Thu, Apr 14, 2016 at 11:25:18AM +0200, Miroslav Benes wrote:
> > On Fri, 25 Mar 2016, Josh Poimboeuf wrote:
> >
> > > +/*
> > > + * klp_update_task_universe() - change the patched state of a task
> > > + * @task: The task to change
> > > + *
> > > + * Converts the patched state of the task so that it will switch to the set of
> > > + * functions in the goal universe.
> > > + */
> > > +static inline void klp_update_task_universe(struct task_struct *task)
> > > +{
> > > + /*
> > > + * The corresponding write barriers are in klp_init_transition() and
> > > + * klp_start_transition(). See the comments there for an explanation.
> > > + */
> > > + smp_rmb();
> > > +
> > > + task->klp_universe = klp_universe_goal;
> > > +}
> >
> > I wonder whether we should introduce a static_key for this function.
> > klp_update_task_universe() is called in some important code paths and it
> > is called everytime if CONFIG_LIVEPATCH is on. It does not matter much for
> > syscall paths, because we enter slowpath there only if TIF_KLP_NEED_UPDATE
> > is set and that is set iff patching is in progress. But we call the
> > function also elsewhere. So maybe it would be nice to introduce static_key
> > in the function which would be on if the patching is in progress and off
> > if not.
> >
> > Maybe it is just an overoptimization though.
>
> klp_update_task_universe() is called from:
>
> - exit_to_usermode_loop() (slow path for syscall/irq exit)
> - copy_process() (though I'll be changing this code so children inherit
> parents' flags)
> - init_idle()
> - cpu_idle_loop()
> - various places in livepatch code
>
> I think none of those are fast paths, so my feeling is that a static key
> would be overkill.

Agreed.

> > > +static bool klp_try_switch_task(struct task_struct *task)
> > > +{
> > > + struct rq *rq;
> > > + unsigned long flags;
> > > + int ret;
> > > + bool success = false;
> > > +
> > > + /* check if this task has already switched over */
> > > + if (task->klp_universe == klp_universe_goal)
> > > + return true;
> >
> > [...]
> >
> > > +/*
> > > + * This function can be called in the middle of an existing transition to
> > > + * reverse the direction of the universe goal. This can be done to effectively
> > > + * cancel an existing enable or disable operation if there are any tasks which
> > > + * are stuck in the starting universe.
> > > + */
> > > +void klp_reverse_transition(void)
> > > +{
> > > + struct klp_patch *patch = klp_transition_patch;
> > > +
> > > + klp_start_transition(!klp_universe_goal);
> > > + klp_try_complete_transition();
> > > +
> > > + patch->enabled = !patch->enabled;
> > > +}
> >
> > This function could be called iff the patching is in progress, there are
> > some not-yet-migrated tasks and we wait for scheduled workqueue to run
> > klp_try_complete_transition() again, right? I have two questions.
> >
> > 1. Is the call to klp_try_complete_transition() here necessary? It would
> > be called via workqueue, wouldn't it? I suspect we don't gain much with
> > this and we introduce possible future problems because of "double
> > success" of klp_try_complete_transition() (nothing serious there as of
> > now though). But I might be wrong because I got really confused by this
> > piece of code and its context :)
>
> Yeah, the call to klp_try_complete_transition() isn't necessary because
> of the workqueue. It's still kind of nice to have though, because it
> tries to transition immediately rather than waiting for the work (and we
> might end up deciding to change the delayed work frequency to be less
> often than once per second).

Yes, there are potential benefits with lower frequency.

> I can't really envision that future bugs related to "double success"
> would be much to worry about. The work function does check for
> klp_transition_patch beforehand, and if it didn't,
> klp_complete_transition() would die loudly with a NULL pointer
> dereference. Of course, it's always hard to anticipate future code
> changes and I could be completely wrong.

Ah, right. There's a check and we clear klp_transition_patch in
klp_try_complete_transition() (I always forget there is a call to
klp_complete_transition "hidden" there :)).

> So I would vote to keep this call, but I don't feel too strongly about
> it since it isn't strictly needed.

Yeah, let's leave it as it is because there is no real harm.

> > 2. klp_reverse_transition() works well because not-yet-migrated tasks are
> > still in KLP_UNIVERSE_OLD (without loss of generality) and since
> > klp_universe_goal is swapped they are in fact in their target universe.
> > klp_try_switch_task() returns immediately in such case. The rest is
> > migrated in an ordinary way.
> >
> > What about TIF_KLP_NEED_UPDATE (I know it is introduced in later patch but
> > it is easier to discuss it here)? klp_start_transition() resets the flag
> > for all the task. Shouldn't the flag be cleared for the task in
> > klp_try_switch_task() if task->klp_universe == klp_universe_goal? In fact
> > it does not matter if it is set. The patching success is determined by
> > klp_func->klp_universe booleans. But those tasks would needlessly go
> > through syscall slowpaths (see note above about static_key) which could
> > hurt performance.
> >
> > Does this make sense?
>
> I agree with you, but for a different reason :-)
>
> And actually there's another way this could happen: if a task is forked
> during klp_start_transition() after setting the universe goal but before
> setting the thread flags, it would already be transitioned but its flag
> would get unnecessarily set. (Though as I mentioned above, the
> copy_process() code will be changed anyway, so that should no longer be
> an issue).
>
> But I don't think it's really a performance issue, because:
>
> 1) Assuming a reliable stack, in most cases the majority of tasks are
> transitioned immediately. So when reversing the transition, there
> would be only a few tasks that would get the flag set unnecessarily.

Correct. Majority of tasks would get the flag cleared before they even get
a change to go through the syscall gate.

> 2) For those tasks which do have the flag set unnecessarily, only the
> next syscall will do the slow path. After that the flag gets
> cleared. So it only slows down a single syscall for each task. At a
> macro level, that's just a drop in the bucket for performance and,
> IMO, not worth worrying about and adding (potentially buggy) code
> for.

Makes sense.

> However, I can think of another reason setting the flag unnecessarily
> might be bad: it's just conceptually confusing and could maybe lead to
> bugs. (ok, here we go trying to predict the future again!)
>
> Specifically, after klp_complete_transition(), a reasonable livepatch
> developer (is that an oxymoron? ;-)) might assume that all tasks'
> TIF_KLP_NEED_UPDATE flags are cleared and might write code which makes
> that assumption.
>
> And, even with today's code it's kind of hard to convince myself that
> having a spurious TIF_KLP_NEED_UPDATE flag set from a previous patching
> operation won't cause issues on the next patch.
>
> So maybe we should do as you suggest and clear the flag in
> klp_try_switch_task() if the task is already in the goal universe.
>
> Or if we wanted to be paranoid we could just clear all the flags in
> klp_complete_transition(), but maybe that's a little too sloppy/lazy?

I'll leave it up to you, because I am not able to decide that. Both
options seem good to me. The former is somewhat more appealing, the latter
is a safe bet.

Thanks,
Miroslav

2016-04-18 15:01:19

by Miroslav Benes

[permalink] [raw]
Subject: [RFC PATCH 2/2] s390/klp: update task universe when exiting kernel

Update a tasks's universe when returning from a system call or user
space interrupt, or after handling a signal.

This greatly increases the chances of a patch operation succeeding. If
a task is I/O bound, it can switch universes when returning from a
system call. If a task is CPU bound, it can switch universes when
returning from an interrupt. If a task is sleeping on a to-be-patched
function, the user can send SIGSTOP and SIGCONT to force it to switch.

Since there are two ways how the syscall can be restarted on return from
a signal handling process, it is important to clear the flag before
do_signal() is called. Otherwise we could miss the migration if we used
SIGSTOP/SIGCONT procedure or fake signal to migrate patching blocking
tasks. If we place our hook to sysc_work label in entry before
TIF_SIGPENDING is evaluated we kill two birds with one stone. The task
is correctly migrated in all return paths from a syscall.

Signed-off-by: Miroslav Benes <[email protected]>
---
arch/s390/include/asm/thread_info.h | 2 ++
arch/s390/kernel/entry.S | 31 ++++++++++++++++++++++++++++++-
2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/thread_info.h b/arch/s390/include/asm/thread_info.h
index 8642c1dab382..451dd2ddc3be 100644
--- a/arch/s390/include/asm/thread_info.h
+++ b/arch/s390/include/asm/thread_info.h
@@ -75,6 +75,7 @@ void arch_release_task_struct(struct task_struct *tsk);
#define TIF_SIGPENDING 1 /* signal pending */
#define TIF_NEED_RESCHED 2 /* rescheduling necessary */
#define TIF_UPROBE 3 /* breakpointed or single-stepping */
+#define TIF_KLP_NEED_UPDATE 4 /* pending live patching update */

#define TIF_31BIT 16 /* 32bit process */
#define TIF_MEMDIE 17 /* is terminating due to OOM killer */
@@ -93,6 +94,7 @@ void arch_release_task_struct(struct task_struct *tsk);
#define _TIF_SIGPENDING _BITUL(TIF_SIGPENDING)
#define _TIF_NEED_RESCHED _BITUL(TIF_NEED_RESCHED)
#define _TIF_UPROBE _BITUL(TIF_UPROBE)
+#define _TIF_KLP_NEED_UPDATE _BITUL(TIF_KLP_NEED_UPDATE)

#define _TIF_31BIT _BITUL(TIF_31BIT)
#define _TIF_SINGLE_STEP _BITUL(TIF_SINGLE_STEP)
diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index 2d47f9cfcb36..cbc3a3b4aa6e 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -46,7 +46,7 @@ STACK_SIZE = 1 << STACK_SHIFT
STACK_INIT = STACK_SIZE - STACK_FRAME_OVERHEAD - __PT_SIZE

_TIF_WORK = (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_NEED_RESCHED | \
- _TIF_UPROBE)
+ _TIF_UPROBE | _TIF_KLP_NEED_UPDATE )
_TIF_TRACE = (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SECCOMP | \
_TIF_SYSCALL_TRACEPOINT)
_CIF_WORK = (_CIF_MCCK_PENDING | _CIF_ASCE | _CIF_FPU)
@@ -329,6 +329,11 @@ ENTRY(system_call)
#endif
TSTMSK __PT_FLAGS(%r11),_PIF_PER_TRAP
jo .Lsysc_singlestep
+#ifdef CONFIG_LIVEPATCH
+ TSTMSK __TI_flags(%r12),_TIF_KLP_NEED_UPDATE
+ jo .Lsysc_update_klp # handle KLP just before signals and
+ # possible syscall restart
+#endif
TSTMSK __TI_flags(%r12),_TIF_SIGPENDING
jo .Lsysc_sigpending
TSTMSK __TI_flags(%r12),_TIF_NOTIFY_RESUME
@@ -404,6 +409,16 @@ ENTRY(system_call)
#endif

#
+# _TIF_KLP_NEED_UPDATE is set, call klp_update_task_universe
+#
+#ifdef CONFIG_LIVEPATCH
+.Lsysc_update_klp:
+ lg %r2,__TI_task(%r12)
+ larl %r14,.Lsysc_return
+ jg klp_update_task_universe
+#endif
+
+#
# _PIF_PER_TRAP is set, call do_per_trap
#
.Lsysc_singlestep:
@@ -652,6 +667,10 @@ ENTRY(io_int_handler)
jo .Lio_mcck_pending
TSTMSK __TI_flags(%r12),_TIF_NEED_RESCHED
jo .Lio_reschedule
+#ifdef CONFIG_LIVEPATCH
+ TSTMSK __TI_flags(%r12),_TIF_KLP_NEED_UPDATE
+ jo .Lio_update_klp
+#endif
TSTMSK __TI_flags(%r12),_TIF_SIGPENDING
jo .Lio_sigpending
TSTMSK __TI_flags(%r12),_TIF_NOTIFY_RESUME
@@ -698,6 +717,16 @@ ENTRY(io_int_handler)
j .Lio_return

#
+# _TIF_KLP_NEED_UPDATE is set, call klp_update_task_universe
+#
+#ifdef CONFIG_LIVEPATCH
+.Lio_update_klp:
+ lg %r2,__TI_task(%r12)
+ larl %r14,.Lio_return
+ jg klp_update_task_universe
+#endif
+
+#
# _TIF_SIGPENDING or is set, call do_signal
#
.Lio_sigpending:
--
2.8.1

2016-04-18 15:01:18

by Miroslav Benes

[permalink] [raw]
Subject: [RFC PATCH 0/2] s390/klp: s390 support

So this is something we have in kGraft for a while (though the actual
implementation in s390's entry.S differs).

The first patch is needed because we want our TIF flag to be part of
_TIF_WORK and s390's tm instruction tests only 8-bits.

The second patch adds a call to klp_update_task_universe() to entry.S.
Specifically to syscall and interrupt return paths.

WARNING: It is only compile-tested. It even cannot be linked, because
klp_update_task_universe() is static inline. Josh, you're gonna change
this part anyway to remove TIF_KLP_NEED_UPDATE from arch-independent
code, aren't you?

Comments are obviously welcome. s390 maintainters not CC'ed yet.

Jiri Slaby (1):
s390: livepatch, reorganize TIF bits

Miroslav Benes (1):
s390/klp: update task universe when exiting kernel

arch/s390/include/asm/thread_info.h | 24 ++++++++++++++++--------
arch/s390/kernel/entry.S | 31 ++++++++++++++++++++++++++++++-
2 files changed, 46 insertions(+), 9 deletions(-)

--
2.8.1

2016-04-18 15:01:17

by Miroslav Benes

[permalink] [raw]
Subject: [RFC PATCH 1/2] s390: livepatch, reorganize TIF bits

From: Jiri Slaby <[email protected]>

Signed-off-by: Jiri Slaby <[email protected]>
---
arch/s390/include/asm/thread_info.h | 22 ++++++++++++++--------
1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/s390/include/asm/thread_info.h b/arch/s390/include/asm/thread_info.h
index 2fffc2c27581..8642c1dab382 100644
--- a/arch/s390/include/asm/thread_info.h
+++ b/arch/s390/include/asm/thread_info.h
@@ -70,14 +70,12 @@ void arch_release_task_struct(struct task_struct *tsk);
/*
* thread information flags bit numbers
*/
+/* _TIF_WORK bits */
#define TIF_NOTIFY_RESUME 0 /* callback before returning to user */
#define TIF_SIGPENDING 1 /* signal pending */
#define TIF_NEED_RESCHED 2 /* rescheduling necessary */
-#define TIF_SYSCALL_TRACE 3 /* syscall trace active */
-#define TIF_SYSCALL_AUDIT 4 /* syscall auditing active */
-#define TIF_SECCOMP 5 /* secure computing */
-#define TIF_SYSCALL_TRACEPOINT 6 /* syscall tracepoint instrumentation */
-#define TIF_UPROBE 7 /* breakpointed or single-stepping */
+#define TIF_UPROBE 3 /* breakpointed or single-stepping */
+
#define TIF_31BIT 16 /* 32bit process */
#define TIF_MEMDIE 17 /* is terminating due to OOM killer */
#define TIF_RESTORE_SIGMASK 18 /* restore signal mask in do_signal() */
@@ -85,15 +83,23 @@ void arch_release_task_struct(struct task_struct *tsk);
#define TIF_BLOCK_STEP 20 /* This task is block stepped */
#define TIF_UPROBE_SINGLESTEP 21 /* This task is uprobe single stepped */

+/* _TIF_TRACE bits */
+#define TIF_SYSCALL_TRACE 24 /* syscall trace active */
+#define TIF_SYSCALL_AUDIT 25 /* syscall auditing active */
+#define TIF_SECCOMP 26 /* secure computing */
+#define TIF_SYSCALL_TRACEPOINT 27 /* syscall tracepoint instrumentation */
+
#define _TIF_NOTIFY_RESUME _BITUL(TIF_NOTIFY_RESUME)
#define _TIF_SIGPENDING _BITUL(TIF_SIGPENDING)
#define _TIF_NEED_RESCHED _BITUL(TIF_NEED_RESCHED)
+#define _TIF_UPROBE _BITUL(TIF_UPROBE)
+
+#define _TIF_31BIT _BITUL(TIF_31BIT)
+#define _TIF_SINGLE_STEP _BITUL(TIF_SINGLE_STEP)
+
#define _TIF_SYSCALL_TRACE _BITUL(TIF_SYSCALL_TRACE)
#define _TIF_SYSCALL_AUDIT _BITUL(TIF_SYSCALL_AUDIT)
#define _TIF_SECCOMP _BITUL(TIF_SECCOMP)
#define _TIF_SYSCALL_TRACEPOINT _BITUL(TIF_SYSCALL_TRACEPOINT)
-#define _TIF_UPROBE _BITUL(TIF_UPROBE)
-#define _TIF_31BIT _BITUL(TIF_31BIT)
-#define _TIF_SINGLE_STEP _BITUL(TIF_SINGLE_STEP)

#endif /* _ASM_THREAD_INFO_H */
--
2.8.1

2016-04-18 15:17:58

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] s390/klp: s390 support

On Mon, Apr 18, 2016 at 05:01:08PM +0200, Miroslav Benes wrote:
> So this is something we have in kGraft for a while (though the actual
> implementation in s390's entry.S differs).
>
> The first patch is needed because we want our TIF flag to be part of
> _TIF_WORK and s390's tm instruction tests only 8-bits.
>
> The second patch adds a call to klp_update_task_universe() to entry.S.
> Specifically to syscall and interrupt return paths.
>
> WARNING: It is only compile-tested. It even cannot be linked, because
> klp_update_task_universe() is static inline. Josh, you're gonna change
> this part anyway to remove TIF_KLP_NEED_UPDATE from arch-independent
> code, aren't you?
>
> Comments are obviously welcome. s390 maintainters not CC'ed yet.
>
> Jiri Slaby (1):
> s390: livepatch, reorganize TIF bits
>
> Miroslav Benes (1):
> s390/klp: update task universe when exiting kernel
>
> arch/s390/include/asm/thread_info.h | 24 ++++++++++++++++--------
> arch/s390/kernel/entry.S | 31 ++++++++++++++++++++++++++++++-
> 2 files changed, 46 insertions(+), 9 deletions(-)

Yeah, I think I will need to do something like change
klp_update_task_universe() to be non-inline. Thanks a lot for the code!

--
Josh

2016-04-28 18:53:57

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 00/14] livepatch: hybrid consistency model

On Tue, Apr 05, 2016 at 08:44:30AM -0500, Josh Poimboeuf wrote:
> On Fri, Apr 01, 2016 at 03:39:44PM +0200, Petr Mladek wrote:
> > > There's also a func->immediate flag which allows users to specify that
> > > certain functions in the patch can be applied without per-task
> > > consistency. This might be useful if you want to patch a common
> > > function like schedule(), and the function change doesn't need
> > > consistency but the rest of the patch does.
> >
> > We probably should not set func->transition flag when func->immediate
> > is set or when the related func->object is set. It currently happens
> > only when patch->immediate is set.
>
> Agreed, I'll skip setting func->transition if func->immediate is set.

So I'm getting ready to post v2, and I think I changed my mind on this
one, for a couple of reasons:

1) It's conceptually simpler if func->transition gets set for all
functions, so there are less edge cases to consider.

2) For unpatching, if func->transition is set, func->immediate results
in the ftrace handler picking the old function immediately, which is
more expected and in line with the name 'immediate'. If 'transition'
is not set then it doesn't switch to the old function until the
klp_func gets removed from the func stack.

> > If we support only one transition at a time, a simple boolean
> > or even bit should be enough. The most descriptive name would
> > be klp_transition_patch_applied but it is quite long.
>
> Yeah, I'll change it to a bool.

I could probably go either way on this one, but I'm leaving it as a
non-bool for now, because I think:

klp_patch_target == KLP_PATCHED
klp_patch_target == KLP_UNPATCHED

reads better and does a better job describing its purpose than:

klp_target_patched
!klp_target_patched

And also, the corresponding task_struct.patch_state variable is now a
tri-state variable, with KLP_UNDEFINED (-1) being the third option to
indicate there's currently no patch operation in progress. And I think
it's easier to understand and implement if we use the same
KLP_PATCHED/UNPATCHED defines for both variables.

--
Josh

2016-12-01 20:28:43

by Jiri Slaby

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 07/14] x86/stacktrace: add function for detecting reliable stack traces

On 06/13/2016, 11:58 PM, Josh Poimboeuf wrote:
> On Thu, Jun 09, 2016 at 10:31:01AM +0200, Jiri Slaby wrote:
>> On 04/12/2016, 05:56 PM, Josh Poimboeuf wrote:
>>>> We had been using unwinder for over a decade in SUSE but it stopped
>>>> working for assembly recently (for obvious reasons). So having a working
>>>> and reliable unwinder again is one of the top priorities for us.
>>>
>>> Since you already have an unwinder in SUSE, it sounds like what you need
>>> most urgently is DWARF generation support in objtool so that your
>>> unwinder will work again for asm code, right? The kernel already emits
>>> DWARF today (for C code), so an upstream unwinder isn't needed first.
>>
>> Right, that's exact.
>>
>>> That's high on my TODO list, and the coding is probably 70-80% done. I
>>> don't want to hold up progress... But I don't really feel comfortable
>>> passing the code off because it's quite a big feature with some major
>>> changes to objtool. Being the objtool author, I want to make sure to
>>> give it my full attention.
>>
>> Sure. There is going to be no pressure to merge the v1 of the changes.
>> But it would be nice to see what you have now so that we can start
>> reviewing and proposing patches on the top of what you have :).
>
> The code is still quite a mess and doesn't even work yet. I think it
> would be counterproductive for everybody if I were to share it at this
> stage.
>
>>> I'm spread quite thin at the moment but I do hope to have v1 of those
>>> patches soon, hopefully May-ish.
>>
>> So I hope we will see something we can start working on now. I can
>> really dedicate one-man hours to that work.
>
> I do appreciate your offer to help. Hopefully soon I'll get it to at
> least a decently working and halfway readable state, and then I'll post
> it. Then any help will be much appreciated.

This has beaten us for way too long, so we are at the state we want to
have the DWARF generation for asm in a very close future. So may I ask
you again to send me what you have now? If not, I will start working on
it from scratch to have something rather soon :).

thanks,
--
js
suse labs

2016-12-01 20:59:27

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v1.9 07/14] x86/stacktrace: add function for detecting reliable stack traces

On Thu, Dec 01, 2016 at 09:28:37PM +0100, Jiri Slaby wrote:
> On 06/13/2016, 11:58 PM, Josh Poimboeuf wrote:
> > On Thu, Jun 09, 2016 at 10:31:01AM +0200, Jiri Slaby wrote:
> >> On 04/12/2016, 05:56 PM, Josh Poimboeuf wrote:
> >>>> We had been using unwinder for over a decade in SUSE but it stopped
> >>>> working for assembly recently (for obvious reasons). So having a working
> >>>> and reliable unwinder again is one of the top priorities for us.
> >>>
> >>> Since you already have an unwinder in SUSE, it sounds like what you need
> >>> most urgently is DWARF generation support in objtool so that your
> >>> unwinder will work again for asm code, right? The kernel already emits
> >>> DWARF today (for C code), so an upstream unwinder isn't needed first.
> >>
> >> Right, that's exact.
> >>
> >>> That's high on my TODO list, and the coding is probably 70-80% done. I
> >>> don't want to hold up progress... But I don't really feel comfortable
> >>> passing the code off because it's quite a big feature with some major
> >>> changes to objtool. Being the objtool author, I want to make sure to
> >>> give it my full attention.
> >>
> >> Sure. There is going to be no pressure to merge the v1 of the changes.
> >> But it would be nice to see what you have now so that we can start
> >> reviewing and proposing patches on the top of what you have :).
> >
> > The code is still quite a mess and doesn't even work yet. I think it
> > would be counterproductive for everybody if I were to share it at this
> > stage.
> >
> >>> I'm spread quite thin at the moment but I do hope to have v1 of those
> >>> patches soon, hopefully May-ish.
> >>
> >> So I hope we will see something we can start working on now. I can
> >> really dedicate one-man hours to that work.
> >
> > I do appreciate your offer to help. Hopefully soon I'll get it to at
> > least a decently working and halfway readable state, and then I'll post
> > it. Then any help will be much appreciated.
>
> This has beaten us for way too long, so we are at the state we want to
> have the DWARF generation for asm in a very close future. So may I ask
> you again to send me what you have now? If not, I will start working on
> it from scratch to have something rather soon :).

I know I've been saying this for months now, but I really do plan to
have it ready "real soon now". In fact I made a ton of progress on it
in October (had to rewrite the main objtool algorithm, basically) but
then I had to shelve it again to work on other things.

I honestly am planning on picking it back up as soon as I post the
consistency model v3, probably next week.

Here's its current state, though it ain't pretty:

https://github.com/jpoimboe/linux/tree/objtool-dwarf

It only has DWARF analysis, not generation. Generation will be a
logical extension of analysis: it already has an internal representation
of the DWARF state for each instruction; it just needs to convert it to
CFI and write it out.

Ignore all the crypto changes, those are frame pointer fixes, discovered
by the new more stringent frame pointer checks, since I completely
rewrote the frame pointer checking as a side effect of adding DWARF
analysis.

--
Josh