Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752090AbbBUSaN (ORCPT ); Sat, 21 Feb 2015 13:30:13 -0500 Received: from mail-we0-f174.google.com ([74.125.82.174]:40306 "EHLO mail-we0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751865AbbBUSaK (ORCPT ); Sat, 21 Feb 2015 13:30:10 -0500 Date: Sat, 21 Feb 2015 19:30:05 +0100 From: Ingo Molnar To: Josh Poimboeuf Cc: Vojtech Pavlik , Jiri Kosina , Peter Zijlstra , Andrew Morton , Ingo Molnar , Seth Jennings , linux-kernel@vger.kernel.org, Linus Torvalds Subject: Re: live patching design (was: Re: [PATCH 1/3] sched: add sched_task_call()) Message-ID: <20150221183005.GB8406@gmail.com> References: <20150219214229.GD15980@treble.redhat.com> <20150220095003.GA23506@gmail.com> <20150220104418.GD25076@gmail.com> <20150220194901.GB3603@gmail.com> <20150220214613.GA21598@suse.com> <20150220220845.GI15980@treble.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150220220845.GI15980@treble.redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2574 Lines: 70 * Josh Poimboeuf wrote: > On Fri, Feb 20, 2015 at 10:46:13PM +0100, Vojtech Pavlik wrote: > > On Fri, Feb 20, 2015 at 08:49:01PM +0100, Ingo Molnar wrote: > > > > > I.e. it's in essence the strong stop-all atomic > > > patching model of 'kpatch', combined with the > > > reliable avoidance of kernel stacks that 'kgraft' > > > uses. > > > > > That should be the starting point, because it's the > > > most reliable method. > > > > In the consistency models discussion, this was marked > > the "LEAVE_KERNEL+SWITCH_KERNEL" model. It's indeed the > > strongest model of all, but also comes at the highest > > cost in terms of impact on running tasks. It's so high > > (the interruption may be seconds or more) that it was > > deemed not worth implementing. > > Yeah, this is way too disruptive to the user. > > Even the comparatively tiny latency caused by kpatch's > use of stop_machine() was considered unacceptable by > some. Unreliable, unrobust patching is even more disruptive... What I think makes it long term fragile is that we combine two unrobust, unlikely mechanisms: the chance that a task just happens to execute a patched function, with the chance that debug information is unreliable. For example tracing patching got debugged to a fair degree because we rely on the patching for actual tracing functionality. Even with that relatively robust usage model we had our crises ... I just don't see how a stack backtrace based live patching method can become robust in the long run. > Plus a lot of processes would see EINTR, causing more > havoc. Parking threads safely in user mode does not require the propagation of syscall interruption to user-space. (It does have some other requirements, such as making all syscalls interruptible to a 'special' signalling method that only live patching triggers - even syscalls that are under the normal ABI uninterruptible, such as sys_sync().) On the other hand, if it's too slow, people will work on improving signal propagation latencies: making syscalls more readily interruptible and more seemlessly restartable has various other advantages beyond live kernel patching. I.e. it's a win-win scenario and will improve various areas of the kernel in terms of syscall interruptability latencies. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/