Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751977AbbBVJqs (ORCPT ); Sun, 22 Feb 2015 04:46:48 -0500 Received: from mail-we0-f177.google.com ([74.125.82.177]:39488 "EHLO mail-we0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751872AbbBVJqo (ORCPT ); Sun, 22 Feb 2015 04:46:44 -0500 Date: Sun, 22 Feb 2015 10:46:39 +0100 From: Ingo Molnar To: Jiri Kosina Cc: Vojtech Pavlik , Josh Poimboeuf , Peter Zijlstra , Andrew Morton , Ingo Molnar , Seth Jennings , linux-kernel@vger.kernel.org, Linus Torvalds , Arjan van de Ven , Thomas Gleixner , Peter Zijlstra , Borislav Petkov Subject: live kernel upgrades (was: live kernel patching design) Message-ID: <20150222094639.GA23684@gmail.com> References: <20150220194901.GB3603@gmail.com> <20150220214613.GA21598@suse.com> <20150221181852.GA8406@gmail.com> <20150221191607.GA9534@gmail.com> <20150221194840.GA10126@gmail.com> <20150222084601.GA23491@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150222084601.GA23491@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6928 Lines: 184 * Ingo Molnar wrote: > Anyway, let me try to reboot this discussion back to > technological details by summing up my arguments in > another mail. So here's how I see the kGraft and kpatch series. To not put too fine a point on it, I think they are fundamentally misguided in both implementation and in design, which turns them into an (unwilling) extended arm of the security theater: - kGraft creates a 'mixed' state where old kernel functions and new kernel functions are allowed to co-exist, furthermore there's no guarantee currently at attempting to get the patching done within a bound amount of time. - kpatch uses kernel stack backtraces to determine whether a task is executing a function or not - which IMO is fundamentally fragile as kernel stack backtraces are 'debug info' and are maintained and created as such: we've had long lasting stack backtrace bugs which would now be turned into 'potentially patching a live function' type of functional (and hard to debug) bugs. I didn't see much effort that tries to turn this equation around and makes kernel stacktraces more robust. - the whole 'consistency model' talk both projects employ reminds me of how we grew 'security modules': where people running various mediocre projects would in the end not seek to create a superior upstream project, but would seek the 'consensus' in the form of cross-acking each others' patches as long as their own code got upstream as well ... I'm not blaming Linus for giving in to allowing security modules: they might be the right model for such a hard to define and in good part psychological discipline as 'security', but I sure don't see the necessity of doing that for 'live kernel patching'. More importantly, both kGraft and kpatch are pretty limited in what kinds of updates they allow, and neither kGraft nor kpatch has any clear path towards applying more complex fixes to kernel images that I can see: kGraft can only apply the simplest of fixes where both versions of a function are interchangeable, and kpatch is only marginally better at that - and that's pretty fundamental to both projects! I think all of these problems could be resolved by shooting for the moon instead: - work towards allowing arbitrary live kernel upgrades! not just 'live kernel patches'. Work towards the goal of full live kernel upgrades between any two versions of a kernel that supports live kernel upgrades (and that doesn't have fatal bugs in the kernel upgrade support code requiring a hard system restart). Arbitrary live kernel upgrades could be achieved by starting with the 'simple method' I outlined in earlier mails, using some of the methods that kpatch and kGraft are both utilizing or planning to utilize: - implement user task and kthread parking to get the kernel into quiescent state. - implement (optional, thus ABI-compatible) system call interruptability and restartability support. - implement task state and (limited) device state snapshotting support - implement live kernel upgrades by: - snapshotting all system state transparently - fast-rebooting into the new kernel image without shutting down and rebooting user-space, i.e. _much_ faster than a regular reboot. - restoring system state transparently within the new kernel image and resuming system workloads where they were left. Even complex external state like TCP socket state and graphics state can be preserved over an upgrade. As far as the user is concerned, nothing happened but a brief pause - and he's now running a v3.21 kernel, not v3.20. Obviously one of the simplest utilizations of live kernel upgrades would be to apply simple security fixes to production systems. But that's just a very simple application of a much broader capability. Note that if done right, then the time to perform a live kernel upgrade on a typical system could be brought to well below 10 seconds system stoppage time: adequate to the vast majority of installations. For special installations or well optimized hardware the latency could possibly be brought below 1 second stoppage time. This 'live kernel upgrades' approach would have various advantages: - it brings together various principles working towards shared goals: - the boot time reduction folks - the checkpoint/restore folks - the hibernation folks - the suspend/resume and power management folks - the live patching folks (you) - the syscall latency reduction folks if so many disciplines are working together then maybe something really good and long term maintainble can crystalize out of that effort. - it ignores the security theater that treats security fixes as a separate, disproportionally more important class of fixes and instead allows arbitrary complex changes over live kernel upgrades. - there's no need to 'engineer' live patches separately, there's no need to review them and their usage sites for live patching relevant side effects. Just create a 'better' kernel as defined by users of that kernel: - in the enterprise distro space create a more stable kernel and allow transparent upgrades into it. - in the desktop distro space create a kernel that will contain fixes and support for latest hardware. - etc. there's the need to engineer c/r and device state support, but that's a much more concentrated and specific field with many usecases beyond live kernel upgrades. We have many of the building blocks in place and have them available: - the freezer code already attempts at parking/unparking threads transparently, that could be fixed/extended. - hibernation, regular suspend/resume and in general power management has in essence already implemented most building blocks needed to enumerate and checkpoint/restore device state that otherwise gets lost in a shutdown/reboot cycle. - c/r patches started user state enumeration and checkpoint/restore logic A feature like arbitrary live kernel upgrades would be well worth the pain and would be worth the complications, and it's actually very feasible technically. The goals of the current live kernel patching projects, "being able to apply only the simplest of live patches", which would in my opinion mostly serve the security theater? They are not forward looking enough, and in that sense they could even be counterproductive. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/