Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755741AbcDDRDN (ORCPT ); Mon, 4 Apr 2016 13:03:13 -0400 Received: from mx1.redhat.com ([209.132.183.28]:47940 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753287AbcDDRDM (ORCPT ); Mon, 4 Apr 2016 13:03:12 -0400 Date: Mon, 4 Apr 2016 12:03:10 -0500 From: Josh Poimboeuf To: Miroslav Benes Cc: Jiri Kosina , Jessica Yu , linux-kernel@vger.kernel.org, live-patching@vger.kernel.org, Vojtech Pavlik Subject: Re: [RFC PATCH v1.9 00/14] livepatch: hybrid consistency model Message-ID: <20160404170310.bpbb5uhaz423hd6f@treble.redhat.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5389 Lines: 121 On Thu, Mar 31, 2016 at 02:54:26PM +0200, Miroslav Benes wrote: > > Hi, > > this is a great work. I'll have to review it properly (especially 13/14, > probably several times as it is a heavy stuff), but I've gathered some > notes so there they are. > > On Fri, 25 Mar 2016, Josh Poimboeuf wrote: > > > These patches are still a work in progress, but Jiri asked that I share > > them before I go on vacation next week. Based on origin/master because > > it has CONFIG_STACK_VALIDATION. > > > > This has two consistency models: the immediate model (like in today's > > code) and the new kpatch/kgraft hybrid model. > > > > The default is the hybrid model: kgraft's per-task consistency and > > syscall barrier switching combined with kpatch's stack trace switching. > > There are also a number of fallback options which make it pretty > > flexible, yet the code is still quite simple. > > > > Patches are applied on a per-task basis, when the task is deemed safe to > > switch over. It uses a tiered approach to determine when a task is safe > > and can be switched. > > > > The first wave of attack is stack checking of sleeping tasks. If no > > affected functions are on the stack of a given task, the task is > > switched. In most cases this will patch most or all of the tasks on the > > first try. Otherwise it'll keep trying periodically. This option is > > only available if the architecture has reliable stacks > > (CONFIG_RELIABLE_STACKTRACE and CONFIG_STACK_VALIDATION). > > I think we could allow periodic checking even for > !CONFIG_RELIABLE_STACKTRACE situations. The problematic task could migrate > by itself after some time (meaning it woke up meanwhile and sleeps > somewhere else now, or it went through a syscall boundary). So we can > gain something, especially when combined with a fake signal approach. More > on that below and in my 13/14 mail. Good idea, agreed. > > The next line of attack, if needed, is syscall/IRQ switching. A task is > > switched when it returns from a system call or a user space IRQ. This > > approach is less than ideal because it usually requires signaling tasks > > to get them to switch. It also doesn't work for kthreads. But it's > > still useful in some cases when user tasks get stuck sleeping on an > > affected function. > > > > For architectures which don't have reliable stacks, users have two > > options: > > > > a) use the hybrid fallback option of using only the syscall/IRQ > > switching (which is the default); > > > > b) or they can use the immediate model (which is the model we already > > have today) by setting the patch->immediate flag. > > > > There's also a func->immediate flag which allows users to specify that > > certain functions in the patch can be applied without per-task > > consistency. This might be useful if you want to patch a common > > function like schedule(), and the function change doesn't need > > consistency but the rest of the patch does. > > > > Still have a lot of TODOs, some of them are listed here. If you see > > something objectionable, it might be a good idea to make sure it's not > > already on the TODO list :-) > > > > TODO: > > - actually test it > > - don't use TIP_KLP_NEED_UPDATE in arch-independent code > > - figure out new API to keep the use of task_rq_lock() in the sched code > > Hm, no idea how to do it so that everyone is satisfied. I still remember > Peter's protests. Yeah, I'll loop in Peter and Ingo on v2 so we can get their input. > > - cleaner way to detect preemption on the stack > > - allow patch modules to be removed. still needs more discussion and > > thought. maybe something like Miroslav's patch would be good: > > https://lkml.kernel.org/r/alpine.LNX.2.00.1512150857510.24899@pobox.suse.cz > > Yup, that could be part of the patch set. The second option (to rework > klp_unregister_patch and move kobject_put out of mutex protected parts) is > maybe a way to go. The mutex_trylock approach works as well, but it is not > clean and nice enough I guess. However the patch is there :). > > Anyway the module removal should be prohibited when one uses immmediate > flag set to true. Without consistency model we cannot say if it is safe to > remove the module. Some process could still be in its code. Good point. > > > - come up with a better name than universe? KLP_STATE_PREV/NEXT? > > KLP_UNPATCHED/PATCHED? there were some objections to the name in v1. > > - update documentation for sysfs, proc, livepatch > > - need atomic accesses or READ_ONCE/WRITE_ONCE anywhere? > > - ability to force a patch to the goal universe > > This could be made by a call to klp_update_task_universe for all tasks, > couldn't it? We have something similar in kgraft. Yeah, I think so. > > - try ftrace handler switching idea from v1 cover letter > > - split up the patches better > > - cc all the right people > > I'd add a fake signal facility for sleeping non-migrated tasks. This > would accelerate a migration to a new universe. We have it in kgraft for > quite some time and it worked out great. See > lkml.kernel.org/r/1430739625-4658-9-git-send-email-jslaby@suse.cz which > went with Jiri's kgraft-on-klp patch set. See also Oleg's reply as it is > important (I changed kgraft implementation according to that). Ok, I'll look into sending a fake signal to remaining tasks. -- Josh