Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753535AbbBXKXg (ORCPT ); Tue, 24 Feb 2015 05:23:36 -0500 Received: from mail-wi0-f172.google.com ([209.85.212.172]:51489 "EHLO mail-wi0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752410AbbBXKXe (ORCPT ); Tue, 24 Feb 2015 05:23:34 -0500 Date: Tue, 24 Feb 2015 11:23:29 +0100 From: Ingo Molnar To: Josh Poimboeuf Cc: Jiri Kosina , Vojtech Pavlik , Peter Zijlstra , Andrew Morton , Ingo Molnar , Seth Jennings , linux-kernel@vger.kernel.org, Linus Torvalds , Arjan van de Ven , Thomas Gleixner , Peter Zijlstra , Borislav Petkov , live-patching@vger.kernel.org Subject: Re: live kernel upgrades (was: live kernel patching design) Message-ID: <20150224102328.GC19976@gmail.com> References: <20150220214613.GA21598@suse.com> <20150221181852.GA8406@gmail.com> <20150221191607.GA9534@gmail.com> <20150221194840.GA10126@gmail.com> <20150222084601.GA23491@gmail.com> <20150222094639.GA23684@gmail.com> <20150222143758.GA4399@treble.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150222143758.GA4399@treble.redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3468 Lines: 91 * Josh Poimboeuf wrote: > Your upgrade proposal is an *enormous* disruption to the > system: > > - a latency of "well below 10" seconds is completely > unacceptable to most users who want to patch the kernel > of a production system _while_ it's in production. I think this statement is false for the following reasons. - I'd say the majority of system operators of production systems can live with a couple of seconds of delay at a well defined moment of the day or week - with gradual, pretty much open ended improvements in that latency down the line. - I think your argument ignores the fact that live upgrades would extend the scope of 'users willing to patch the kernel of a production system' _enormously_. For example, I have a production system with this much uptime: 10:50:09 up 153 days, 3:58, 34 users, load average: 0.00, 0.02, 0.05 While currently I'm reluctant to reboot the system to upgrade the kernel (due to a reboot's intrusiveness), and that is why it has achieved a relatively high uptime, but I'd definitely allow the kernel to upgrade at 0:00am just fine. (I'd even give it up to a few minutes, as long as TCP connections don't time out.) And I don't think my usecase is special. What gradual improvements in live upgrade latency am I talking about? - For example the majority of pure user-space process pages in RAM could be saved from the old kernel over into the new kernel - i.e. they'd stay in place in RAM, but they'd be re-hashed for the new data structures. This avoids a big chunk of checkpointing overhead. - Likewise, most of the page cache could be saved from an old kernel to a new kernel as well - further reducing checkpointing overhead. - The PROT_NONE mechanism of the current NUMA balancing code could be used to transparently mark user-space pages as 'checkpointed'. This would reduce system interruption as only 'newly modified' pages would have to be checkpointed when the upgrade happens. - Hardware devices could be marked as 'already in well defined state', skipping the more expensive steps of driver initialization. - Possibly full user-space page tables could be preserved over an upgrade: this way user-space execution would be unaffected even in the micro level: cache layout, TLB patterns, etc. There's lots of gradual speedups possible with such a model IMO. With live kernel patching we run into a brick wall of complexity straight away: we have to analyze the nature of the kernel modification, in the context of live patching, and that only works for the simplest of kernel modifications. With live kernel upgrades no such brick wall exists, just about any transition between kernel versions is possible. Granted, with live kernel upgrades it's much more complex to get the 'simple' case into an even rudimentarily working fashion (full userspace state has to be enumerated, saved and restored), but once we are there, it's a whole new category of goodness and it probably covers 90%+ of the live kernel patching usecases on day 1 already ... Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/