LinuxLists.cc - [PATCH v2] livepatch/rcu: Fix stacking of patches when RCU infrastructure is patched

2017-06-14 08:55:14

Subject: [PATCH v2] livepatch/rcu: Fix stacking of patches when RCU infrastructure is patched

rcu_read_(un)lock(), list_*_rcu(), and synchronize_rcu() are used for
a secure access and manipulation of the list of patches that modify
the same function. In particular, it is the variable func_stack that
is accessible from the ftrace handler via struct ftrace_ops and klp_ops.

Of course, it synchronizes also some states of the patch on the top
of the stack, e.g. func->transition in klp_ftrace_handler.

At the same time, this mechanism guards also the manipulation
of task->patch_state. It is modified according to the state of
the transition and the state of the process.

Now, all this works well as long as RCU works well. Sadly livepatching
might get into some corner cases when this is not true. For example,
RCU is not watching when rcu_read_lock() is taken in idle threads.
It is because they might sleep and prevent reaching the grace period
for too long.

There are ways how to make RCU watching even in idle threads,
see rcu_irq_enter(). But there is a small location inside RCU
infrastructure when even this does not work.

This small problematic location can be detected either before
calling rcu_irq_enter() by rcu_irq_enter_disabled() or later by
rcu_is_watching(). Sadly, there is no safe way how to handle it.
Once we detect that RCU was not watching, we might see inconsistent
state of the function stack and the related variables in
klp_ftrace_handler(). Then we could do a wrong decision,
use an incompatible implementation of the function and
break the consistency of the system. We could warn but
we could not avoid the damage.

Fortunately, ftrace has similar problems and they seem to
be solved well there. It uses a heavy weight implementation
of some RCU operations. In particular, it replaces:

+ rcu_read_lock() with preempt_disable_notrace()
+ rcu_read_unlock() with preempt_enable_notrace()
+ synchronize_rcu() with schedule_on_each_cpu(sync_work)

My understanding is that this is RCU implementation from
a stone age. It meets the core RCU requirements but it is
rather ineffective. Especially, it does not allow to batch
or speed up the synchronize calls.

On the other hand, it is very trivial. It allows to safely
trace and/or livepatch even the RCU core infrastructure.
And the effectiveness is a not a big issue because using ftrace
or livepatches on productive systems is a rare operation.
The safety is much more important than a negligible extra
load.

Note that the alternative implementation follows the RCU
principles. Therefore, we could and actually must use
list_*_rcu() variants when manipulating the func_stack.
These functions allow to access the pointers in
the right order and with the right barriers. But they
do not use any other information that would be set
only by rcu_read_lock().

Also note that there are actually two problems solved in ftrace:

First, it cares about the consistency of RCU read sections.
It is being solved the way as described and used in this patch.

Second, ftrace needs to make sure that nobody is inside
the dynamic trampoline when it is being freed. For this, it also
calls synchronize_rcu_tasks() in preemptive kernel in
ftrace_shutdown().

Livepatch has similar problem but it is solved by ftrace for free.
klp_ftrace_handler() is a good guy and newer sleeps. In addition,
it is registered with FTRACE_OPS_FL_DYNAMIC. It causes that
unregister_ftrace_function() calls:

* schedule_on_each_cpu(ftrace_sync) - always
* synchronize_rcu_tasks() - in preemptive kernel

The effect is that nobody is neither inside the dynamic trampoline
nor inside the ftrace handler after unregister_ftrace_function()
returns.

Signed-off-by: Petr Mladek <[email protected]>
---
Changes against v1:

+ fixed typo in comments
+ moved additional notes about ftrace into the commit message

There are no changes in the code.

The discussion about the previous version starts at
https://lkml.kernel.org/r/[email protected]

kernel/livepatch/patch.c | 8 ++++++--
kernel/livepatch/transition.c | 36 +++++++++++++++++++++++++++++++-----
2 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
index f8269036bf0b..52c4e907c14b 100644
--- a/kernel/livepatch/patch.c
+++ b/kernel/livepatch/patch.c
@@ -59,7 +59,11 @@ static void notrace klp_ftrace_handler(unsigned long ip,

ops = container_of(fops, struct klp_ops, fops);

- rcu_read_lock();
+ /*
+ * A variant of synchronize_sched() is used to allow patching functions
+ * where RCU is not watching, see klp_synchronize_transition().
+ */
+ preempt_disable_notrace();

func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
stack_node);
@@ -115,7 +119,7 @@ static void notrace klp_ftrace_handler(unsigned long ip,

klp_arch_set_pc(regs, (unsigned long)func->new_func);
unlock:
- rcu_read_unlock();
+ preempt_enable_notrace();
}

/*
diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index adc0cc64aa4b..181a06e92786 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -49,6 +49,28 @@ static void klp_transition_work_fn(struct work_struct *work)
static DECLARE_DELAYED_WORK(klp_transition_work, klp_transition_work_fn);

/*
+ * This function is just a stub to implement a hard force
+ * of synchronize_sched(). This requires synchronizing
+ * tasks even in userspace and idle.
+ */
+static void klp_sync(struct work_struct *work)
+{
+}
+
+/*
+ * We allow to patch also functions where RCU is not watching,
+ * e.g. before user_exit(). We can not rely on the RCU infrastructure
+ * to do the synchronization. Instead hard force the sched synchronization.
+ *
+ * This approach allows to use RCU functions for manipulating func_stack
+ * a safe way .
+ */
+static void klp_synchronize_transition(void)
+{
+ schedule_on_each_cpu(klp_sync);
+}
+
+/*
* The transition to the target patch state is complete. Clean up the data
* structures.
*/
@@ -73,7 +95,7 @@ static void klp_complete_transition(void)
* func->transition gets cleared, the handler may choose a
* removed function.
*/
- synchronize_rcu();
+ klp_synchronize_transition();
}

if (klp_transition_patch->immediate)
@@ -92,7 +114,7 @@ static void klp_complete_transition(void)

/* Prevent klp_ftrace_handler() from seeing KLP_UNDEFINED state */
if (klp_target_state == KLP_PATCHED)
- synchronize_rcu();
+ klp_synchronize_transition();

read_lock(&tasklist_lock);
for_each_process_thread(g, task) {
@@ -136,7 +158,11 @@ void klp_cancel_transition(void)
*/
void klp_update_patch_state(struct task_struct *task)
{
- rcu_read_lock();
+ /*
+ * A variant of synchronize_sched() is used to allow patching functions
+ * where RCU is not watching, see klp_synchronize_transition().
+ */
+ preempt_disable_notrace();

/*
* This test_and_clear_tsk_thread_flag() call also serves as a read
@@ -153,7 +179,7 @@ void klp_update_patch_state(struct task_struct *task)
if (test_and_clear_tsk_thread_flag(task, TIF_PATCH_PENDING))
task->patch_state = READ_ONCE(klp_target_state);

- rcu_read_unlock();
+ preempt_enable_notrace();
}

/*
@@ -539,7 +565,7 @@ void klp_reverse_transition(void)
clear_tsk_thread_flag(idle_task(cpu), TIF_PATCH_PENDING);

/* Let any remaining calls to klp_update_patch_state() complete */
- synchronize_rcu();
+ klp_synchronize_transition();

klp_start_transition();
}
--
1.8.5.6

2017-06-14 17:05:22

by Josh Poimboeuf

[permalink] [raw]

Subject: Re: [PATCH v2] livepatch/rcu: Fix stacking of patches when RCU infrastructure is patched

On Wed, Jun 14, 2017 at 10:54:52AM +0200, Petr Mladek wrote:
> rcu_read_(un)lock(), list_*_rcu(), and synchronize_rcu() are used for
> a secure access and manipulation of the list of patches that modify
> the same function. In particular, it is the variable func_stack that
> is accessible from the ftrace handler via struct ftrace_ops and klp_ops.
>
> Of course, it synchronizes also some states of the patch on the top
> of the stack, e.g. func->transition in klp_ftrace_handler.
>
> At the same time, this mechanism guards also the manipulation
> of task->patch_state. It is modified according to the state of
> the transition and the state of the process.
>
> Now, all this works well as long as RCU works well. Sadly livepatching
> might get into some corner cases when this is not true. For example,
> RCU is not watching when rcu_read_lock() is taken in idle threads.
> It is because they might sleep and prevent reaching the grace period
> for too long.
>
> There are ways how to make RCU watching even in idle threads,
> see rcu_irq_enter(). But there is a small location inside RCU
> infrastructure when even this does not work.
>
> This small problematic location can be detected either before
> calling rcu_irq_enter() by rcu_irq_enter_disabled() or later by
> rcu_is_watching(). Sadly, there is no safe way how to handle it.
> Once we detect that RCU was not watching, we might see inconsistent
> state of the function stack and the related variables in
> klp_ftrace_handler(). Then we could do a wrong decision,
> use an incompatible implementation of the function and
> break the consistency of the system. We could warn but
> we could not avoid the damage.
>
> Fortunately, ftrace has similar problems and they seem to
> be solved well there. It uses a heavy weight implementation
> of some RCU operations. In particular, it replaces:
>
> + rcu_read_lock() with preempt_disable_notrace()
> + rcu_read_unlock() with preempt_enable_notrace()
> + synchronize_rcu() with schedule_on_each_cpu(sync_work)
>
> My understanding is that this is RCU implementation from
> a stone age. It meets the core RCU requirements but it is
> rather ineffective. Especially, it does not allow to batch
> or speed up the synchronize calls.
>
> On the other hand, it is very trivial. It allows to safely
> trace and/or livepatch even the RCU core infrastructure.
> And the effectiveness is a not a big issue because using ftrace
> or livepatches on productive systems is a rare operation.
> The safety is much more important than a negligible extra
> load.
>
> Note that the alternative implementation follows the RCU
> principles. Therefore, we could and actually must use
> list_*_rcu() variants when manipulating the func_stack.
> These functions allow to access the pointers in
> the right order and with the right barriers. But they
> do not use any other information that would be set
> only by rcu_read_lock().
>
> Also note that there are actually two problems solved in ftrace:
>
> First, it cares about the consistency of RCU read sections.
> It is being solved the way as described and used in this patch.
>
> Second, ftrace needs to make sure that nobody is inside
> the dynamic trampoline when it is being freed. For this, it also
> calls synchronize_rcu_tasks() in preemptive kernel in
> ftrace_shutdown().
>
> Livepatch has similar problem but it is solved by ftrace for free.
> klp_ftrace_handler() is a good guy and newer sleeps. In addition,
> it is registered with FTRACE_OPS_FL_DYNAMIC. It causes that
> unregister_ftrace_function() calls:
>
> * schedule_on_each_cpu(ftrace_sync) - always
> * synchronize_rcu_tasks() - in preemptive kernel
>
> The effect is that nobody is neither inside the dynamic trampoline
> nor inside the ftrace handler after unregister_ftrace_function()
> returns.
>
> Signed-off-by: Petr Mladek <[email protected]>

Acked-by: Josh Poimboeuf <[email protected]>

--
Josh

2017-06-15 08:57:55

by Miroslav Benes

[permalink] [raw]

Subject: Re: [PATCH v2] livepatch/rcu: Fix stacking of patches when RCU infrastructure is patched

On Wed, 14 Jun 2017, Petr Mladek wrote:

> rcu_read_(un)lock(), list_*_rcu(), and synchronize_rcu() are used for
> a secure access and manipulation of the list of patches that modify
> the same function. In particular, it is the variable func_stack that
> is accessible from the ftrace handler via struct ftrace_ops and klp_ops.
>
> Of course, it synchronizes also some states of the patch on the top
> of the stack, e.g. func->transition in klp_ftrace_handler.
>
> At the same time, this mechanism guards also the manipulation
> of task->patch_state. It is modified according to the state of
> the transition and the state of the process.
>
> Now, all this works well as long as RCU works well. Sadly livepatching
> might get into some corner cases when this is not true. For example,
> RCU is not watching when rcu_read_lock() is taken in idle threads.
> It is because they might sleep and prevent reaching the grace period
> for too long.
>
> There are ways how to make RCU watching even in idle threads,
> see rcu_irq_enter(). But there is a small location inside RCU
> infrastructure when even this does not work.
>
> This small problematic location can be detected either before
> calling rcu_irq_enter() by rcu_irq_enter_disabled() or later by
> rcu_is_watching(). Sadly, there is no safe way how to handle it.
> Once we detect that RCU was not watching, we might see inconsistent
> state of the function stack and the related variables in
> klp_ftrace_handler(). Then we could do a wrong decision,
> use an incompatible implementation of the function and
> break the consistency of the system. We could warn but
> we could not avoid the damage.
>
> Fortunately, ftrace has similar problems and they seem to
> be solved well there. It uses a heavy weight implementation
> of some RCU operations. In particular, it replaces:
>
> + rcu_read_lock() with preempt_disable_notrace()
> + rcu_read_unlock() with preempt_enable_notrace()
> + synchronize_rcu() with schedule_on_each_cpu(sync_work)
>
> My understanding is that this is RCU implementation from
> a stone age. It meets the core RCU requirements but it is
> rather ineffective. Especially, it does not allow to batch
> or speed up the synchronize calls.
>
> On the other hand, it is very trivial. It allows to safely
> trace and/or livepatch even the RCU core infrastructure.
> And the effectiveness is a not a big issue because using ftrace
> or livepatches on productive systems is a rare operation.
> The safety is much more important than a negligible extra
> load.
>
> Note that the alternative implementation follows the RCU
> principles. Therefore, we could and actually must use
> list_*_rcu() variants when manipulating the func_stack.
> These functions allow to access the pointers in
> the right order and with the right barriers. But they
> do not use any other information that would be set
> only by rcu_read_lock().
>
> Also note that there are actually two problems solved in ftrace:
>
> First, it cares about the consistency of RCU read sections.
> It is being solved the way as described and used in this patch.
>
> Second, ftrace needs to make sure that nobody is inside
> the dynamic trampoline when it is being freed. For this, it also
> calls synchronize_rcu_tasks() in preemptive kernel in
> ftrace_shutdown().
>
> Livepatch has similar problem but it is solved by ftrace for free.
> klp_ftrace_handler() is a good guy and newer sleeps. In addition,

s/newer/never/

> it is registered with FTRACE_OPS_FL_DYNAMIC. It causes that
> unregister_ftrace_function() calls:
>
> * schedule_on_each_cpu(ftrace_sync) - always
> * synchronize_rcu_tasks() - in preemptive kernel
>
> The effect is that nobody is neither inside the dynamic trampoline
> nor inside the ftrace handler after unregister_ftrace_function()
> returns.
>
> Signed-off-by: Petr Mladek <[email protected]>

Acked-by: Miroslav Benes <[email protected]>

> +/*
> + * We allow to patch also functions where RCU is not watching,
> + * e.g. before user_exit(). We can not rely on the RCU infrastructure
> + * to do the synchronization. Instead hard force the sched synchronization.
> + *
> + * This approach allows to use RCU functions for manipulating func_stack
> + * a safe way .

s/a safe way /safely/.

Miroslav

2017-06-20 08:50:43

by Jiri Kosina

[permalink] [raw]

Subject: Re: [PATCH v2] livepatch/rcu: Fix stacking of patches when RCU infrastructure is patched

I've slightly adjusted the changelog and comment (as noted by Miroslav) a
little bit and queued for 4.12.

Thanks,

--
Jiri Kosina
SUSE Labs