Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752182AbbBJP7W (ORCPT ); Tue, 10 Feb 2015 10:59:22 -0500 Received: from cantor2.suse.de ([195.135.220.15]:32776 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752149AbbBJP7U (ORCPT ); Tue, 10 Feb 2015 10:59:20 -0500 Date: Tue, 10 Feb 2015 16:59:17 +0100 (CET) From: Miroslav Benes To: Josh Poimboeuf cc: Seth Jennings , Jiri Kosina , Vojtech Pavlik , Masami Hiramatsu , live-patching@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 6/9] livepatch: create per-task consistency model In-Reply-To: <2c3d1e685dae5cccc2dfdb1b24c241b2f1c89348.1423499826.git.jpoimboe@redhat.com> Message-ID: References: <2c3d1e685dae5cccc2dfdb1b24c241b2f1c89348.1423499826.git.jpoimboe@redhat.com> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5776 Lines: 162 On Mon, 9 Feb 2015, Josh Poimboeuf wrote: > Add a basic per-task consistency model. This is the foundation which > will eventually enable us to patch those ~10% of security patches which > change function prototypes and/or data semantics. > > When a patch is enabled, livepatch enters into a transition state where > tasks are converging from the old universe to the new universe. If a > given task isn't using any of the patched functions, it's switched to > the new universe. Once all the tasks have been converged to the new > universe, patching is complete. > > The same sequence occurs when a patch is disabled, except the tasks > converge from the new universe to the old universe. > > The /sys/kernel/livepatch//transition file shows whether a patch > is in transition. Only a single patch (the topmost patch on the stack) > can be in transition at a given time. A patch can remain in the > transition state indefinitely, if any of the tasks are stuck in the > previous universe. > > A transition can be reversed and effectively canceled by writing the > opposite value to the /sys/kernel/livepatch//enabled file while > the transition is in progress. Then all the tasks will attempt to > converge back to the original universe. Hi Josh, first, thanks a lot for great work. I'm starting to go through it and it's gonna take me some time to do and send a complete review. Anyway, I suspect there is a possible race in the code. I'm still not sure though. See below... [...] > @@ -38,14 +39,34 @@ static void notrace klp_ftrace_handler(unsigned long ip, > ops = container_of(fops, struct klp_ops, fops); > > rcu_read_lock(); > + > func = list_first_or_null_rcu(&ops->func_stack, struct klp_func, > stack_node); > - rcu_read_unlock(); > > if (WARN_ON_ONCE(!func)) > - return; > + goto unlock; > + > + if (unlikely(func->transition)) { > + /* corresponding smp_wmb() is in klp_init_transition() */ > + smp_rmb(); > + > + if (current->klp_universe == KLP_UNIVERSE_OLD) { > + /* > + * Use the previously patched version of the function. > + * If no previous patches exist, use the original > + * function. > + */ > + func = list_entry_rcu(func->stack_node.next, > + struct klp_func, stack_node); > + > + if (&func->stack_node == &ops->func_stack) > + goto unlock; > + } > + } > > klp_arch_set_pc(regs, (unsigned long)func->new_func); > +unlock: > + rcu_read_unlock(); > } The problem is that there is no guarantee that ftrace handler is called in an atomic context. Hence it could be preempted (if CONFIG_PREEMPT is y) and it could be preempted anywhere before rcu_read_lock (which disables preemption for CONFIG_PREEMPT). Ftrace often uses ftrace_ops_list_func as a callback which calls the handlers with preemption disabled. But not always. For dynamic trampolines it should call the handlers directly and preemption is not disabled. So... > +/* > + * Try to transition all tasks to the universe goal. If any tasks are still > + * stuck in the original universe, schedule a retry. > + */ > +void klp_try_complete_transition(void) > +{ > + unsigned int cpu; > + struct task_struct *g, *t; > + bool complete = true; > + > + /* try to transition all normal tasks */ > + read_lock(&tasklist_lock); > + for_each_process_thread(g, t) > + if (!klp_transition_task(t)) > + complete = false; > + read_unlock(&tasklist_lock); > + > + /* try to transition the idle "swapper" tasks */ > + get_online_cpus(); > + for_each_online_cpu(cpu) > + if (!klp_transition_task(idle_task(cpu))) > + complete = false; > + put_online_cpus(); > + > + /* if not complete, try again later */ > + if (!complete) { > + schedule_delayed_work(&klp_transition_work, > + round_jiffies_relative(HZ)); > + return; > + } > + > + /* success! unpatch obsolete functions and do some cleanup */ > + > + if (klp_universe_goal == KLP_UNIVERSE_OLD) { > + klp_unpatch_objects(klp_transition_patch); > + > + /* prevent ftrace handler from reading old func->transition */ > + synchronize_rcu(); > + } > + > + pr_notice("'%s': %s complete\n", klp_transition_patch->mod->name, > + klp_universe_goal == KLP_UNIVERSE_NEW ? "patching" : > + "unpatching"); > + > + klp_complete_transition(); > +} ...synchronize_rcu() could be insufficient. There still can be some process in our ftrace handler after the call. Consider the following scenario: When synchronize_rcu is called some process could have been preempted on some other cpu somewhere at the start of the ftrace handler before rcu_read_lock. synchronize_rcu waits for the grace period to pass, but that does not mean anything for our process in the handler, because it is not in rcu critical section. There is no guarantee that after synchronize_rcu the process would be away from the handler. "Meanwhile" klp_try_complete_transition continues and calls klp_complete_transition. This clears func->transition flags. Now the process in the handler could be scheduled again. It reads the wrong value of func->transition and redirection to the wrong function is done. What do you think? I hope I made myself clear. There is the similar problem for dynamic trampolines in ftrace. You cannot remove them unless there is no process in the handler. I think rcu-tasks were merged a while ago for this purpose. However ftrace does not use them yet and I don't know if we could exploit them to solve this issue. I need to think more about it. Anyway thanks a lot! Miroslav -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/