Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752082AbbBVOjO (ORCPT ); Sun, 22 Feb 2015 09:39:14 -0500 Received: from mx1.redhat.com ([209.132.183.28]:33849 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751810AbbBVOjM (ORCPT ); Sun, 22 Feb 2015 09:39:12 -0500 Date: Sun, 22 Feb 2015 08:37:58 -0600 From: Josh Poimboeuf To: Ingo Molnar Cc: Jiri Kosina , Vojtech Pavlik , Peter Zijlstra , Andrew Morton , Ingo Molnar , Seth Jennings , linux-kernel@vger.kernel.org, Linus Torvalds , Arjan van de Ven , Thomas Gleixner , Peter Zijlstra , Borislav Petkov , live-patching@vger.kernel.org Subject: Re: live kernel upgrades (was: live kernel patching design) Message-ID: <20150222143758.GA4399@treble.redhat.com> References: <20150220194901.GB3603@gmail.com> <20150220214613.GA21598@suse.com> <20150221181852.GA8406@gmail.com> <20150221191607.GA9534@gmail.com> <20150221194840.GA10126@gmail.com> <20150222084601.GA23491@gmail.com> <20150222094639.GA23684@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20150222094639.GA23684@gmail.com> User-Agent: Mutt/1.5.23.1-rc1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9791 Lines: 243 [ adding live-patching mailing list to CC ] On Sun, Feb 22, 2015 at 10:46:39AM +0100, Ingo Molnar wrote: > * Ingo Molnar wrote: > > Anyway, let me try to reboot this discussion back to > > technological details by summing up my arguments in > > another mail. > > So here's how I see the kGraft and kpatch series. To not > put too fine a point on it, I think they are fundamentally > misguided in both implementation and in design, which turns > them into an (unwilling) extended arm of the security > theater: > > - kGraft creates a 'mixed' state where old kernel > functions and new kernel functions are allowed to > co-exist, Yes, some tasks may be running old functions and some tasks may be running new functions. This would only cause a problem if there are changes to global data semantics. We have guidelines the patch author can follow to ensure that this isn't a problem. > attempting to get the patching done within a bound > amount of time. Don't forget about my RFC [1] which converges the system to a patched state within a few seconds. If the system isn't patched by then, the user space tool can trigger a safe patch revert. > - kpatch uses kernel stack backtraces to determine whether > a task is executing a function or not - which IMO is > fundamentally fragile as kernel stack backtraces are > 'debug info' and are maintained and created as such: > we've had long lasting stack backtrace bugs which would > now be turned into 'potentially patching a live > function' type of functional (and hard to debug) bugs. > I didn't see much effort that tries to turn this > equation around and makes kernel stacktraces more > robust. Again, I proposed several stack unwinding validation improvements which would make this a non-issue IMO. > - the whole 'consistency model' talk both projects employ > reminds me of how we grew 'security modules': where > people running various mediocre projects would in the > end not seek to create a superior upstream project, but > would seek the 'consensus' in the form of cross-acking > each others' patches as long as their own code got > upstream as well ... That's just not the case. The consistency models were used to describe the features and the pros and cons of the different approaches. The RFC is not a compromise to get "cross-acks". IMO it's an improvement on both kpatch and kGraft. See the RFC cover letter [1] and the original consistency model discussion [2] for more details. > I'm not blaming Linus for giving in to allowing security > modules: they might be the right model for such a hard > to define and in good part psychological discipline as > 'security', but I sure don't see the necessity of doing > that for 'live kernel patching'. > > More importantly, both kGraft and kpatch are pretty limited > in what kinds of updates they allow, and neither kGraft nor > kpatch has any clear path towards applying more complex > fixes to kernel images that I can see: kGraft can only > apply the simplest of fixes where both versions of a > function are interchangeableand kpatch is only marginally > better at that - and that's pretty fundamental to both > projects! Sorry, but that is just not true. We can apply complex patches, including "non-interchangeable functions" and data structures/semantics. The catch is that it requires the patch author to put in the work to modify the patch to make it compatible with live patching. But that's an acceptable tradeoff for distros who want to support live patching. > I think all of these problems could be resolved by shooting > for the moon instead: > > - work towards allowing arbitrary live kernel upgrades! > > not just 'live kernel patches'. > > Work towards the goal of full live kernel upgrades between > any two versions of a kernel that supports live kernel > upgrades (and that doesn't have fatal bugs in the kernel > upgrade support code requiring a hard system restart). > > Arbitrary live kernel upgrades could be achieved by > starting with the 'simple method' I outlined in earlier > mails, using some of the methods that kpatch and kGraft are > both utilizing or planning to utilize: > > - implement user task and kthread parking to get the > kernel into quiescent state. > > - implement (optional, thus ABI-compatible) > system call interruptability and restartability > support. > > - implement task state and (limited) device state > snapshotting support > > - implement live kernel upgrades by: > > - snapshotting all system state transparently > > - fast-rebooting into the new kernel image without > shutting down and rebooting user-space, i.e. _much_ > faster than a regular reboot. > > - restoring system state transparently within the new > kernel image and resuming system workloads where > they were left. > > Even complex external state like TCP socket state and > graphics state can be preserved over an upgrade. As far as > the user is concerned, nothing happened but a brief pause - > and he's now running a v3.21 kernel, not v3.20. > > Obviously one of the simplest utilizations of live kernel > upgrades would be to apply simple security fixes to > production systems. But that's just a very simple > application of a much broader capability. > > Note that if done right, then the time to perform a live > kernel upgrade on a typical system could be brought to well > below 10 seconds system stoppage time: adequate to the vast > majority of installations. > > For special installations or well optimized hardware the > latency could possibly be brought below 1 second stoppage > time. > > This 'live kernel upgrades' approach would have various > advantages: > > - it brings together various principles working towards > shared goals: > > - the boot time reduction folks > - the checkpoint/restore folks > - the hibernation folks > - the suspend/resume and power management folks > - the live patching folks (you) > - the syscall latency reduction folks > > if so many disciplines are working together then maybe > something really good and long term maintainble can > crystalize out of that effort. > > - it ignores the security theater that treats security > fixes as a separate, disproportionally more important > class of fixes and instead allows arbitrary complex > changes over live kernel upgrades. > > - there's no need to 'engineer' live patches separately, > there's no need to review them and their usage sites > for live patching relevant side effects. Just create a > 'better' kernel as defined by users of that kernel: > > - in the enterprise distro space create a more stable > kernel and allow transparent upgrades into it. > > - in the desktop distro space create a kernel that > will contain fixes and support for latest hardware. > > - etc. > > there's the need to engineer c/r and device state > support, but that's a much more concentrated and > specific field with many usecases beyond live > kernel upgrades. > > We have many of the building blocks in place and have them > available: > > - the freezer code already attempts at parking/unparking > threads transparently, that could be fixed/extended. > > - hibernation, regular suspend/resume and in general > power management has in essence already implemented > most building blocks needed to enumerate and > checkpoint/restore device state that otherwise gets > lost in a shutdown/reboot cycle. > > - c/r patches started user state enumeration and > checkpoint/restore logic > > A feature like arbitrary live kernel upgrades would be well > worth the pain and would be worth the complications, and > it's actually very feasible technically. > > The goals of the current live kernel patching projects, > "being able to apply only the simplest of live patches", > which would in my opinion mostly serve the security > theater? They are not forward looking enough, and in that > sense they could even be counterproductive. My only issue with this proposal is the assertion that it's somehow a replacement for kpatch or kGraft. IIUC, the idea is basically to kick the tasks out of the kernel, checkpoint them, replace the entire kernel, and restore the tasks. It sounds like kind of a variation on kexec+criu. There really needs to be a distinction between the two: live upgrading vs live patching. They are two completely different animals, for different use cases. Live upgrading is basically a less disruptive reboot, with application states preserved. Live patching is applying simple fixes which are as close to *zero* disruption as possible. Your upgrade proposal is an *enormous* disruption to the system: - a latency of "well below 10" seconds is completely unacceptable to most users who want to patch the kernel of a production system _while_ it's in production. - more importantly, the number of things that would have to go *right* in order to apply a simple security fix means that it is _exponentially_ more complex than live patching. That kind of risk and disruption to the system is *exactly* what a live patching user is trying to avoid. [1] https://lkml.org/lkml/2015/2/9/475 [2] https://lkml.org/lkml/2014/11/7/354 -- Josh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/