Date: Thu, 31 Mar 2016 14:54:26 +0200 (CEST)
From: Miroslav Benes <mbenes@suse.cz>
To: Josh Poimboeuf <jpoimboe@redhat.com>
cc: Jiri Kosina <jikos@kernel.org>, Jessica Yu <jeyu@redhat.com>,
        linux-kernel@vger.kernel.org, live-patching@vger.kernel.org,
        Vojtech Pavlik <vojtech@suse.com>
Subject: Re: [RFC PATCH v1.9 00/14] livepatch: hybrid consistency model
In-Reply-To: <cover.1458933243.git.jpoimboe@redhat.com>
Message-ID: <alpine.LNX.2.00.1603311434430.23131@pobox.suse.cz>
References: <cover.1458933243.git.jpoimboe@redhat.com>
User-Agent: Alpine 2.00 (LNX 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4936
Lines: 108


Hi,

this is a great work. I'll have to review it properly (especially 13/14, 
probably several times as it is a heavy stuff), but I've gathered some 
notes so there they are.

On Fri, 25 Mar 2016, Josh Poimboeuf wrote:

> These patches are still a work in progress, but Jiri asked that I share
> them before I go on vacation next week.  Based on origin/master because
> it has CONFIG_STACK_VALIDATION.
> 
> This has two consistency models: the immediate model (like in today's
> code) and the new kpatch/kgraft hybrid model.
> 
> The default is the hybrid model: kgraft's per-task consistency and
> syscall barrier switching combined with kpatch's stack trace switching.
> There are also a number of fallback options which make it pretty
> flexible, yet the code is still quite simple.
> 
> Patches are applied on a per-task basis, when the task is deemed safe to
> switch over.  It uses a tiered approach to determine when a task is safe
> and can be switched.
> 
> The first wave of attack is stack checking of sleeping tasks.  If no
> affected functions are on the stack of a given task, the task is
> switched.  In most cases this will patch most or all of the tasks on the
> first try.  Otherwise it'll keep trying periodically.  This option is
> only available if the architecture has reliable stacks
> (CONFIG_RELIABLE_STACKTRACE and CONFIG_STACK_VALIDATION).

I think we could allow periodic checking even for 
!CONFIG_RELIABLE_STACKTRACE situations. The problematic task could migrate 
by itself after some time (meaning it woke up meanwhile and sleeps 
somewhere else now, or it went through a syscall boundary). So we can 
gain something, especially when combined with a fake signal approach. More 
on that below and in my 13/14 mail.

> The next line of attack, if needed, is syscall/IRQ switching.  A task is
> switched when it returns from a system call or a user space IRQ.  This
> approach is less than ideal because it usually requires signaling tasks
> to get them to switch.  It also doesn't work for kthreads.  But it's
> still useful in some cases when user tasks get stuck sleeping on an
> affected function.
> 
> For architectures which don't have reliable stacks, users have two
> options:
> 
> a) use the hybrid fallback option of using only the syscall/IRQ
>    switching (which is the default);
> 
> b) or they can use the immediate model (which is the model we already
>    have today) by setting the patch->immediate flag.
> 
> There's also a func->immediate flag which allows users to specify that
> certain functions in the patch can be applied without per-task
> consistency.  This might be useful if you want to patch a common
> function like schedule(), and the function change doesn't need
> consistency but the rest of the patch does.
> 
> Still have a lot of TODOs, some of them are listed here.  If you see
> something objectionable, it might be a good idea to make sure it's not
> already on the TODO list :-)
> 
> TODO:
> - actually test it
> - don't use TIP_KLP_NEED_UPDATE in arch-independent code
> - figure out new API to keep the use of task_rq_lock() in the sched code

Hm, no idea how to do it so that everyone is satisfied. I still remember 
Peter's protests.

> - cleaner way to detect preemption on the stack
> - allow patch modules to be removed.  still needs more discussion and
>   thought.  maybe something like Miroslav's patch would be good:
>   https://lkml.kernel.org/r/alpine.LNX.2.00.1512150857510.24899@pobox.suse.cz

Yup, that could be part of the patch set. The second option (to rework 
klp_unregister_patch and move kobject_put out of mutex protected parts) is 
maybe a way to go. The mutex_trylock approach works as well, but it is not 
clean and nice enough I guess. However the patch is there :).

Anyway the module removal should be prohibited when one uses immmediate 
flag set to true. Without consistency model we cannot say if it is safe to 
remove the module. Some process could still be in its code.

> - come up with a better name than universe?  KLP_STATE_PREV/NEXT?
>   KLP_UNPATCHED/PATCHED?  there were some objections to the name in v1.
> - update documentation for sysfs, proc, livepatch
> - need atomic accesses or READ_ONCE/WRITE_ONCE anywhere?
> - ability to force a patch to the goal universe

This could be made by a call to klp_update_task_universe for all tasks, 
couldn't it? We have something similar in kgraft.

> - try ftrace handler switching idea from v1 cover letter
> - split up the patches better
> - cc all the right people

I'd add a fake signal facility for sleeping non-migrated tasks. This 
would accelerate a migration to a new universe. We have it in kgraft for 
quite some time and it worked out great. See 
lkml.kernel.org/r/1430739625-4658-9-git-send-email-jslaby@suse.cz which 
went with Jiri's kgraft-on-klp patch set. See also Oleg's reply as it is 
important (I changed kgraft implementation according to that).

Miroslav