Date: Mon, 4 Apr 2016 12:03:10 -0500
From: Josh Poimboeuf <jpoimboe@redhat.com>
To: Miroslav Benes <mbenes@suse.cz>
Cc: Jiri Kosina <jikos@kernel.org>, Jessica Yu <jeyu@redhat.com>,
        linux-kernel@vger.kernel.org, live-patching@vger.kernel.org,
        Vojtech Pavlik <vojtech@suse.com>
Subject: Re: [RFC PATCH v1.9 00/14] livepatch: hybrid consistency model
Message-ID: <20160404170310.bpbb5uhaz423hd6f@treble.redhat.com>
References: <cover.1458933243.git.jpoimboe@redhat.com>
 <alpine.LNX.2.00.1603311434430.23131@pobox.suse.cz>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <alpine.LNX.2.00.1603311434430.23131@pobox.suse.cz>
User-Agent: Mutt/1.5.23.1 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5389
Lines: 121

On Thu, Mar 31, 2016 at 02:54:26PM +0200, Miroslav Benes wrote:
> 
> Hi,
> 
> this is a great work. I'll have to review it properly (especially 13/14, 
> probably several times as it is a heavy stuff), but I've gathered some 
> notes so there they are.
> 
> On Fri, 25 Mar 2016, Josh Poimboeuf wrote:
> 
> > These patches are still a work in progress, but Jiri asked that I share
> > them before I go on vacation next week.  Based on origin/master because
> > it has CONFIG_STACK_VALIDATION.
> > 
> > This has two consistency models: the immediate model (like in today's
> > code) and the new kpatch/kgraft hybrid model.
> > 
> > The default is the hybrid model: kgraft's per-task consistency and
> > syscall barrier switching combined with kpatch's stack trace switching.
> > There are also a number of fallback options which make it pretty
> > flexible, yet the code is still quite simple.
> > 
> > Patches are applied on a per-task basis, when the task is deemed safe to
> > switch over.  It uses a tiered approach to determine when a task is safe
> > and can be switched.
> > 
> > The first wave of attack is stack checking of sleeping tasks.  If no
> > affected functions are on the stack of a given task, the task is
> > switched.  In most cases this will patch most or all of the tasks on the
> > first try.  Otherwise it'll keep trying periodically.  This option is
> > only available if the architecture has reliable stacks
> > (CONFIG_RELIABLE_STACKTRACE and CONFIG_STACK_VALIDATION).
> 
> I think we could allow periodic checking even for 
> !CONFIG_RELIABLE_STACKTRACE situations. The problematic task could migrate 
> by itself after some time (meaning it woke up meanwhile and sleeps 
> somewhere else now, or it went through a syscall boundary). So we can 
> gain something, especially when combined with a fake signal approach. More 
> on that below and in my 13/14 mail.

Good idea, agreed.

> > The next line of attack, if needed, is syscall/IRQ switching.  A task is
> > switched when it returns from a system call or a user space IRQ.  This
> > approach is less than ideal because it usually requires signaling tasks
> > to get them to switch.  It also doesn't work for kthreads.  But it's
> > still useful in some cases when user tasks get stuck sleeping on an
> > affected function.
> > 
> > For architectures which don't have reliable stacks, users have two
> > options:
> > 
> > a) use the hybrid fallback option of using only the syscall/IRQ
> >    switching (which is the default);
> > 
> > b) or they can use the immediate model (which is the model we already
> >    have today) by setting the patch->immediate flag.
> > 
> > There's also a func->immediate flag which allows users to specify that
> > certain functions in the patch can be applied without per-task
> > consistency.  This might be useful if you want to patch a common
> > function like schedule(), and the function change doesn't need
> > consistency but the rest of the patch does.
> > 
> > Still have a lot of TODOs, some of them are listed here.  If you see
> > something objectionable, it might be a good idea to make sure it's not
> > already on the TODO list :-)
> > 
> > TODO:
> > - actually test it
> > - don't use TIP_KLP_NEED_UPDATE in arch-independent code
> > - figure out new API to keep the use of task_rq_lock() in the sched code
> 
> Hm, no idea how to do it so that everyone is satisfied. I still remember 
> Peter's protests.

Yeah, I'll loop in Peter and Ingo on v2 so we can get their input.

> > - cleaner way to detect preemption on the stack
> > - allow patch modules to be removed.  still needs more discussion and
> >   thought.  maybe something like Miroslav's patch would be good:
> >   https://lkml.kernel.org/r/alpine.LNX.2.00.1512150857510.24899@pobox.suse.cz
> 
> Yup, that could be part of the patch set. The second option (to rework 
> klp_unregister_patch and move kobject_put out of mutex protected parts) is 
> maybe a way to go. The mutex_trylock approach works as well, but it is not 
> clean and nice enough I guess. However the patch is there :).
> 
> Anyway the module removal should be prohibited when one uses immmediate 
> flag set to true. Without consistency model we cannot say if it is safe to 
> remove the module. Some process could still be in its code.

Good point.

> 
> > - come up with a better name than universe?  KLP_STATE_PREV/NEXT?
> >   KLP_UNPATCHED/PATCHED?  there were some objections to the name in v1.
> > - update documentation for sysfs, proc, livepatch
> > - need atomic accesses or READ_ONCE/WRITE_ONCE anywhere?
> > - ability to force a patch to the goal universe
> 
> This could be made by a call to klp_update_task_universe for all tasks, 
> couldn't it? We have something similar in kgraft.

Yeah, I think so.

> > - try ftrace handler switching idea from v1 cover letter
> > - split up the patches better
> > - cc all the right people
> 
> I'd add a fake signal facility for sleeping non-migrated tasks. This 
> would accelerate a migration to a new universe. We have it in kgraft for 
> quite some time and it worked out great. See 
> lkml.kernel.org/r/1430739625-4658-9-git-send-email-jslaby@suse.cz which 
> went with Jiri's kgraft-on-klp patch set. See also Oleg's reply as it is 
> important (I changed kgraft implementation according to that).

Ok, I'll look into sending a fake signal to remaining tasks.

-- 
Josh