Date: Sun, 22 Feb 2015 10:46:39 +0100
From: Ingo Molnar <mingo@kernel.org>
To: Jiri Kosina <jkosina@suse.cz>
Cc: Vojtech Pavlik <vojtech@suse.com>, Josh Poimboeuf <jpoimboe@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Ingo Molnar <mingo@redhat.com>, Seth Jennings <sjenning@redhat.com>,
        linux-kernel@vger.kernel.org,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Arjan van de Ven <arjan@infradead.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        Peter Zijlstra <a.p.zijlstra@chello.nl>,
        Borislav Petkov <bp@alien8.de>
Subject: live kernel upgrades (was: live kernel patching design)
Message-ID: <20150222094639.GA23684@gmail.com>
References: <alpine.LNX.2.00.1502201151390.28769@pobox.suse.cz>
 <20150220194901.GB3603@gmail.com>
 <20150220214613.GA21598@suse.com>
 <20150221181852.GA8406@gmail.com>
 <alpine.LNX.2.00.1502211952120.2357@pobox.suse.cz>
 <20150221191607.GA9534@gmail.com>
 <alpine.LNX.2.00.1502212025450.2357@pobox.suse.cz>
 <20150221194840.GA10126@gmail.com>
 <alpine.LNX.2.00.1502212058080.2357@pobox.suse.cz>
 <20150222084601.GA23491@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150222084601.GA23491@gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 6928
Lines: 184


* Ingo Molnar <mingo@kernel.org> wrote:

> Anyway, let me try to reboot this discussion back to 
> technological details by summing up my arguments in 
> another mail.

So here's how I see the kGraft and kpatch series. To not 
put too fine a point on it, I think they are fundamentally 
misguided in both implementation and in design, which turns 
them into an (unwilling) extended arm of the security 
theater:

 - kGraft creates a 'mixed' state where old kernel
   functions and new kernel functions are allowed to
   co-exist, furthermore there's no guarantee currently at
   attempting to get the patching done within a bound
   amount of time.

 - kpatch uses kernel stack backtraces to determine whether
   a task is executing a function or not - which IMO is
   fundamentally fragile as kernel stack backtraces are
   'debug info' and are maintained and created as such:
   we've had long lasting stack backtrace bugs which would
   now be turned into 'potentially patching a live
   function' type of functional (and hard to debug) bugs.
   I didn't see much effort that tries to turn this
   equation around and makes kernel stacktraces more
   robust.

 - the whole 'consistency model' talk both projects employ 
   reminds me of how we grew 'security modules': where 
   people running various mediocre projects would in the 
   end not seek to create a superior upstream project, but 
   would seek the 'consensus' in the form of cross-acking 
   each others' patches as long as their own code got 
   upstream as well ...

   I'm not blaming Linus for giving in to allowing security
   modules: they might be the right model for such a hard 
   to define and in good part psychological discipline as 
   'security', but I sure don't see the necessity of doing
   that for 'live kernel patching'.

More importantly, both kGraft and kpatch are pretty limited 
in what kinds of updates they allow, and neither kGraft nor 
kpatch has any clear path towards applying more complex 
fixes to kernel images that I can see: kGraft can only 
apply the simplest of fixes where both versions of a 
function are interchangeable, and kpatch is only marginally 
better at that - and that's pretty fundamental to both 
projects!

I think all of these problems could be resolved by shooting 
for the moon instead:

  - work towards allowing arbitrary live kernel upgrades!

not just 'live kernel patches'.

Work towards the goal of full live kernel upgrades between 
any two versions of a kernel that supports live kernel 
upgrades (and that doesn't have fatal bugs in the kernel 
upgrade support code requiring a hard system restart).

Arbitrary live kernel upgrades could be achieved by 
starting with the 'simple method' I outlined in earlier 
mails, using some of the methods that kpatch and kGraft are 
both utilizing or planning to utilize:

  - implement user task and kthread parking to get the 
    kernel into quiescent state.

  - implement (optional, thus ABI-compatible) 
    system call interruptability and restartability 
    support.

  - implement task state and (limited) device state
    snapshotting support

  - implement live kernel upgrades by:

      - snapshotting all system state transparently

      - fast-rebooting into the new kernel image without 
        shutting down and rebooting user-space, i.e. _much_ 
        faster than a regular reboot.

      - restoring system state transparently within the new 
        kernel image and resuming system workloads where 
        they were left.

Even complex external state like TCP socket state and 
graphics state can be preserved over an upgrade. As far as 
the user is concerned, nothing happened but a brief pause - 
and he's now running a v3.21 kernel, not v3.20.

Obviously one of the simplest utilizations of live kernel 
upgrades would be to apply simple security fixes to 
production systems. But that's just a very simple 
application of a much broader capability.

Note that if done right, then the time to perform a live 
kernel upgrade on a typical system could be brought to well 
below 10 seconds system stoppage time: adequate to the vast 
majority of installations.

For special installations or well optimized hardware the 
latency could possibly be brought below 1 second stoppage 
time.

This 'live kernel upgrades' approach would have various 
advantages:

  - it brings together various principles working towards 
    shared goals:

      - the boot time reduction folks
      - the checkpoint/restore folks
      - the hibernation folks
      - the suspend/resume and power management folks
      - the live patching folks (you)
      - the syscall latency reduction folks

    if so many disciplines are working together then maybe 
    something really good and long term maintainble can 
    crystalize out of that effort.

  - it ignores the security theater that treats security
    fixes as a separate, disproportionally more important
    class of fixes and instead allows arbitrary complex 
    changes over live kernel upgrades.

  - there's no need to 'engineer' live patches separately, 
    there's no need to review them and their usage sites 
    for live patching relevant side effects. Just create a 
    'better' kernel as defined by users of that kernel:

      - in the enterprise distro space create a more stable 
        kernel and allow transparent upgrades into it.

      - in the desktop distro space create a kernel that 
        will contain fixes and support for latest hardware.

      - etc.

     there's the need to engineer c/r and device state 
     support, but that's a much more concentrated and 
     specific field with many usecases beyond live 
     kernel upgrades.

We have many of the building blocks in place and have them 
available:

  - the freezer code already attempts at parking/unparking
    threads transparently, that could be fixed/extended.

  - hibernation, regular suspend/resume and in general
    power management has in essence already implemented
    most building blocks needed to enumerate and
    checkpoint/restore device state that otherwise gets
    lost in a shutdown/reboot cycle.

  - c/r patches started user state enumeration and
    checkpoint/restore logic

A feature like arbitrary live kernel upgrades would be well 
worth the pain and would be worth the complications, and 
it's actually very feasible technically.

The goals of the current live kernel patching projects, 
"being able to apply only the simplest of live patches", 
which would in my opinion mostly serve the security 
theater? They are not forward looking enough, and in that 
sense they could even be counterproductive.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/