Date: Tue, 10 Feb 2015 16:59:17 +0100 (CET)
From: Miroslav Benes <mbenes@suse.cz>
To: Josh Poimboeuf <jpoimboe@redhat.com>
cc: Seth Jennings <sjenning@redhat.com>, Jiri Kosina <jkosina@suse.cz>,
        Vojtech Pavlik <vojtech@suse.cz>,
        Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
        live-patching@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 6/9] livepatch: create per-task consistency model
In-Reply-To: <2c3d1e685dae5cccc2dfdb1b24c241b2f1c89348.1423499826.git.jpoimboe@redhat.com>
Message-ID: <alpine.LNX.2.00.1502101627350.18649@pobox.suse.cz>
References: <cover.1423499826.git.jpoimboe@redhat.com> <2c3d1e685dae5cccc2dfdb1b24c241b2f1c89348.1423499826.git.jpoimboe@redhat.com>
User-Agent: Alpine 2.00 (LNX 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5776
Lines: 162


On Mon, 9 Feb 2015, Josh Poimboeuf wrote:

> Add a basic per-task consistency model.  This is the foundation which
> will eventually enable us to patch those ~10% of security patches which
> change function prototypes and/or data semantics.
> 
> When a patch is enabled, livepatch enters into a transition state where
> tasks are converging from the old universe to the new universe.  If a
> given task isn't using any of the patched functions, it's switched to
> the new universe.  Once all the tasks have been converged to the new
> universe, patching is complete.
> 
> The same sequence occurs when a patch is disabled, except the tasks
> converge from the new universe to the old universe.
> 
> The /sys/kernel/livepatch/<patch>/transition file shows whether a patch
> is in transition.  Only a single patch (the topmost patch on the stack)
> can be in transition at a given time.  A patch can remain in the
> transition state indefinitely, if any of the tasks are stuck in the
> previous universe.
> 
> A transition can be reversed and effectively canceled by writing the
> opposite value to the /sys/kernel/livepatch/<patch>/enabled file while
> the transition is in progress.  Then all the tasks will attempt to
> converge back to the original universe.

Hi Josh,

first, thanks a lot for great work. I'm starting to go through it and it's 
gonna take me some time to do and send a complete review. Anyway, I 
suspect there is a possible race in the code. I'm still not sure though. 
See below...

[...]

> @@ -38,14 +39,34 @@ static void notrace klp_ftrace_handler(unsigned long ip,
>  	ops = container_of(fops, struct klp_ops, fops);
>  
>  	rcu_read_lock();
> +
>  	func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
>  				      stack_node);
> -	rcu_read_unlock();
>  
>  	if (WARN_ON_ONCE(!func))
> -		return;
> +		goto unlock;
> +
> +	if (unlikely(func->transition)) {
> +		/* corresponding smp_wmb() is in klp_init_transition() */
> +		smp_rmb();
> +
> +		if (current->klp_universe == KLP_UNIVERSE_OLD) {
> +			/*
> +			 * Use the previously patched version of the function.
> +			 * If no previous patches exist, use the original
> +			 * function.
> +			 */
> +			func = list_entry_rcu(func->stack_node.next,
> +					      struct klp_func, stack_node);
> +
> +			if (&func->stack_node == &ops->func_stack)
> +				goto unlock;
> +		}
> +	}
>  
>  	klp_arch_set_pc(regs, (unsigned long)func->new_func);
> +unlock:
> +	rcu_read_unlock();
>  }

The problem is that there is no guarantee that ftrace handler is called in 
an atomic context. Hence it could be preempted (if CONFIG_PREEMPT is y) 
and it could be preempted anywhere before rcu_read_lock (which disables 
preemption for CONFIG_PREEMPT). Ftrace often uses ftrace_ops_list_func as 
a callback which calls the handlers with preemption disabled. But not 
always. For dynamic trampolines it should call the handlers directly and 
preemption is not disabled.

So...

> +/*
> + * Try to transition all tasks to the universe goal.  If any tasks are still
> + * stuck in the original universe, schedule a retry.
> + */
> +void klp_try_complete_transition(void)
> +{
> +	unsigned int cpu;
> +	struct task_struct *g, *t;
> +	bool complete = true;
> +
> +	/* try to transition all normal tasks */
> +	read_lock(&tasklist_lock);
> +	for_each_process_thread(g, t)
> +		if (!klp_transition_task(t))
> +			complete = false;
> +	read_unlock(&tasklist_lock);
> +
> +	/* try to transition the idle "swapper" tasks */
> +	get_online_cpus();
> +	for_each_online_cpu(cpu)
> +		if (!klp_transition_task(idle_task(cpu)))
> +			complete = false;
> +	put_online_cpus();
> +
> +	/* if not complete, try again later */
> +	if (!complete) {
> +		schedule_delayed_work(&klp_transition_work,
> +				      round_jiffies_relative(HZ));
> +		return;
> +	}
> +
> +	/* success! unpatch obsolete functions and do some cleanup */
> +
> +	if (klp_universe_goal == KLP_UNIVERSE_OLD) {
> +		klp_unpatch_objects(klp_transition_patch);
> +
> +		/* prevent ftrace handler from reading old func->transition */
> +		synchronize_rcu();
> +	}
> +
> +	pr_notice("'%s': %s complete\n", klp_transition_patch->mod->name,
> +		  klp_universe_goal == KLP_UNIVERSE_NEW ? "patching" :
> +							  "unpatching");
> +
> +	klp_complete_transition();
> +}

...synchronize_rcu() could be insufficient. There still can be some  
process in our ftrace handler after the call.

Consider the following scenario:

When synchronize_rcu is called some process could have been preempted on 
some other cpu somewhere at the start of the ftrace handler before  
rcu_read_lock. synchronize_rcu waits for the grace period to pass, but that 
does not mean anything for our process in the handler, because it is not 
in rcu critical section. There is no guarantee that after synchronize_rcu 
the process would be away from the handler. 

"Meanwhile" klp_try_complete_transition continues and calls 
klp_complete_transition. This clears func->transition flags. Now the 
process in the handler could be scheduled again. It reads the wrong value 
of func->transition and redirection to the wrong function is done.

What do you think? I hope I made myself clear.

There is the similar problem for dynamic trampolines in ftrace. You cannot 
remove them unless there is no process in the handler. I think rcu-tasks 
were merged a while ago for this purpose. However ftrace does not use them 
yet and I don't know if we could exploit them to solve this issue. I need 
to think more about it.

Anyway thanks a lot!

Miroslav
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/