Date: Tue, 24 Feb 2015 11:23:29 +0100
From: Ingo Molnar <mingo@kernel.org>
To: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>, Vojtech Pavlik <vojtech@suse.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Ingo Molnar <mingo@redhat.com>, Seth Jennings <sjenning@redhat.com>,
        linux-kernel@vger.kernel.org,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Arjan van de Ven <arjan@infradead.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        Peter Zijlstra <a.p.zijlstra@chello.nl>,
        Borislav Petkov <bp@alien8.de>, live-patching@vger.kernel.org
Subject: Re: live kernel upgrades (was: live kernel patching design)
Message-ID: <20150224102328.GC19976@gmail.com>
References: <20150220214613.GA21598@suse.com>
 <20150221181852.GA8406@gmail.com>
 <alpine.LNX.2.00.1502211952120.2357@pobox.suse.cz>
 <20150221191607.GA9534@gmail.com>
 <alpine.LNX.2.00.1502212025450.2357@pobox.suse.cz>
 <20150221194840.GA10126@gmail.com>
 <alpine.LNX.2.00.1502212058080.2357@pobox.suse.cz>
 <20150222084601.GA23491@gmail.com>
 <20150222094639.GA23684@gmail.com>
 <20150222143758.GA4399@treble.redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150222143758.GA4399@treble.redhat.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3468
Lines: 91


* Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> Your upgrade proposal is an *enormous* disruption to the 
> system:
> 
> - a latency of "well below 10" seconds is completely
>   unacceptable to most users who want to patch the kernel 
>   of a production system _while_ it's in production.

I think this statement is false for the following reasons.

  - I'd say the majority of system operators of production 
    systems can live with a couple of seconds of delay at a 
    well defined moment of the day or week - with gradual, 
    pretty much open ended improvements in that latency 
    down the line.

  - I think your argument ignores the fact that live 
    upgrades would extend the scope of 'users willing to 
    patch the kernel of a production system' _enormously_. 

    For example, I have a production system with this much 
    uptime:

       10:50:09 up 153 days,  3:58, 34 users,  load average: 0.00, 0.02, 0.05

    While currently I'm reluctant to reboot the system to 
    upgrade the kernel (due to a reboot's intrusiveness), 
    and that is why it has achieved a relatively high 
    uptime, but I'd definitely allow the kernel to upgrade 
    at 0:00am just fine. (I'd even give it up to a few 
    minutes, as long as TCP connections don't time out.)

    And I don't think my usecase is special.

What gradual improvements in live upgrade latency am I 
talking about?

 - For example the majority of pure user-space process 
   pages in RAM could be saved from the old kernel over 
   into the new kernel - i.e. they'd stay in place in RAM, 
   but they'd be re-hashed for the new data structures. 
   This avoids a big chunk of checkpointing overhead.

 - Likewise, most of the page cache could be saved from an
   old kernel to a new kernel as well - further reducing
   checkpointing overhead.

 - The PROT_NONE mechanism of the current NUMA balancing
   code could be used to transparently mark user-space 
   pages as 'checkpointed'. This would reduce system 
   interruption as only 'newly modified' pages would have 
   to be checkpointed when the upgrade happens.

 - Hardware devices could be marked as 'already in well
   defined state', skipping the more expensive steps of 
   driver initialization.

 - Possibly full user-space page tables could be preserved 
   over an upgrade: this way user-space execution would be 
   unaffected even in the micro level: cache layout, TLB
   patterns, etc.

There's lots of gradual speedups possible with such a model 
IMO.

With live kernel patching we run into a brick wall of 
complexity straight away: we have to analyze the nature of 
the kernel modification, in the context of live patching, 
and that only works for the simplest of kernel 
modifications.

With live kernel upgrades no such brick wall exists, just 
about any transition between kernel versions is possible.

Granted, with live kernel upgrades it's much more complex 
to get the 'simple' case into an even rudimentarily working 
fashion (full userspace state has to be enumerated, saved 
and restored), but once we are there, it's a whole new 
category of goodness and it probably covers 90%+ of the 
live kernel patching usecases on day 1 already ...

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/