Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752579AbbBXMgH (ORCPT ); Tue, 24 Feb 2015 07:36:07 -0500 Received: from cantor2.suse.de ([195.135.220.15]:41048 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752387AbbBXMgE (ORCPT ); Tue, 24 Feb 2015 07:36:04 -0500 Date: Tue, 24 Feb 2015 13:36:01 +0100 From: Vojtech Pavlik To: Ingo Molnar Cc: Josh Poimboeuf , Jiri Kosina , Peter Zijlstra , Andrew Morton , Ingo Molnar , Seth Jennings , linux-kernel@vger.kernel.org, Linus Torvalds , Arjan van de Ven , Thomas Gleixner , Peter Zijlstra , Borislav Petkov , live-patching@vger.kernel.org Subject: Re: live kernel upgrades (was: live kernel patching design) Message-ID: <20150224123601.GC3081@suse.cz> References: <20150221181852.GA8406@gmail.com> <20150221191607.GA9534@gmail.com> <20150221194840.GA10126@gmail.com> <20150222084601.GA23491@gmail.com> <20150222094639.GA23684@gmail.com> <20150222143758.GA4399@treble.redhat.com> <20150224102328.GC19976@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150224102328.GC19976@gmail.com> X-Bounce-Cookie: It's a lemon tree, dear Watson! User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5267 Lines: 128 On Tue, Feb 24, 2015 at 11:23:29AM +0100, Ingo Molnar wrote: > > Your upgrade proposal is an *enormous* disruption to the > > system: > > > > - a latency of "well below 10" seconds is completely > > unacceptable to most users who want to patch the kernel > > of a production system _while_ it's in production. > > I think this statement is false for the following reasons. The statement is very true. > - I'd say the majority of system operators of production > systems can live with a couple of seconds of delay at a > well defined moment of the day or week - with gradual, > pretty much open ended improvements in that latency > down the line. In the most usual corporate setting any noticeable outage, even out of business hours, requires an ahead notice, and an agreement of all stakeholders - teams that depend on the system. If a live patching technology introduces an outage, it's not "live" and because of these bureaucratic reasons, it will not be used and a regular reboot will be scheduled instead. > - I think your argument ignores the fact that live > upgrades would extend the scope of 'users willing to > patch the kernel of a production system' _enormously_. > > For example, I have a production system with this much > uptime: > > 10:50:09 up 153 days, 3:58, 34 users, load average: 0.00, 0.02, 0.05 > > While currently I'm reluctant to reboot the system to > upgrade the kernel (due to a reboot's intrusiveness), > and that is why it has achieved a relatively high > uptime, but I'd definitely allow the kernel to upgrade > at 0:00am just fine. (I'd even give it up to a few > minutes, as long as TCP connections don't time out.) > > And I don't think my usecase is special. I agree that this is useful. But it is a different problem that only partially overlaps with what we're trying to achieve with live patching. If you can make full kernel upgrades to work this way, which I doubt is achievable in the next 10 years due to all the research and infrastructure needed, then you certainly gain an additional group of users. And a great tool. A large portion of those that ask for live patching won't use it, though. But honestly, I prefer a solution that works for small patches now, than a solution for unlimited patches sometime in next decade. > What gradual improvements in live upgrade latency am I > talking about? > > - For example the majority of pure user-space process > pages in RAM could be saved from the old kernel over > into the new kernel - i.e. they'd stay in place in RAM, > but they'd be re-hashed for the new data structures. > This avoids a big chunk of checkpointing overhead. I'd have hoped this would be a given. If you can't preserve memory contents and have to re-load from disk, you can just as well reboot entirely, the time needed will not be much more.. > - Likewise, most of the page cache could be saved from an > old kernel to a new kernel as well - further reducing > checkpointing overhead. > > - The PROT_NONE mechanism of the current NUMA balancing > code could be used to transparently mark user-space > pages as 'checkpointed'. This would reduce system > interruption as only 'newly modified' pages would have > to be checkpointed when the upgrade happens. > > - Hardware devices could be marked as 'already in well > defined state', skipping the more expensive steps of > driver initialization. > > - Possibly full user-space page tables could be preserved > over an upgrade: this way user-space execution would be > unaffected even in the micro level: cache layout, TLB > patterns, etc. > > There's lots of gradual speedups possible with such a model > IMO. Yes, as I say above, guaranteeing decades of employment. ;) > With live kernel patching we run into a brick wall of > complexity straight away: we have to analyze the nature of > the kernel modification, in the context of live patching, > and that only works for the simplest of kernel > modifications. But you're able to _use_ it. > With live kernel upgrades no such brick wall exists, just > about any transition between kernel versions is possible. The brick wall you run to is "I need to implement full kernel state serialization before I can do anything at all." That's something that isn't even clear _how_ to do. Particularly with Linux kernel's development model where internal ABI and structures are always in flux it may not even be realistic. > Granted, with live kernel upgrades it's much more complex > to get the 'simple' case into an even rudimentarily working > fashion (full userspace state has to be enumerated, saved > and restored), but once we are there, it's a whole new > category of goodness and it probably covers 90%+ of the > live kernel patching usecases on day 1 already ... Feel free to start working on it. I'll stick with live patching. -- Vojtech Pavlik Director SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/