Date: Sat, 21 Feb 2015 19:30:05 +0100
From: Ingo Molnar <mingo@kernel.org>
To: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Vojtech Pavlik <vojtech@suse.com>, Jiri Kosina <jkosina@suse.cz>,
        Peter Zijlstra <peterz@infradead.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Ingo Molnar <mingo@redhat.com>, Seth Jennings <sjenning@redhat.com>,
        linux-kernel@vger.kernel.org,
        Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: live patching design (was: Re: [PATCH 1/3] sched: add
 sched_task_call())
Message-ID: <20150221183005.GB8406@gmail.com>
References: <20150219214229.GD15980@treble.redhat.com>
 <alpine.LNX.2.00.1502200830430.28769@pobox.suse.cz>
 <alpine.LNX.2.00.1502200939230.28769@pobox.suse.cz>
 <20150220095003.GA23506@gmail.com>
 <alpine.LNX.2.00.1502201052050.28769@pobox.suse.cz>
 <20150220104418.GD25076@gmail.com>
 <alpine.LNX.2.00.1502201151390.28769@pobox.suse.cz>
 <20150220194901.GB3603@gmail.com>
 <20150220214613.GA21598@suse.com>
 <20150220220845.GI15980@treble.redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150220220845.GI15980@treble.redhat.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2574
Lines: 70


* Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Fri, Feb 20, 2015 at 10:46:13PM +0100, Vojtech Pavlik wrote:
> > On Fri, Feb 20, 2015 at 08:49:01PM +0100, Ingo Molnar wrote:
> >
> > > I.e. it's in essence the strong stop-all atomic 
> > > patching model of 'kpatch', combined with the 
> > > reliable avoidance of kernel stacks that 'kgraft' 
> > > uses.
> > 
> > > That should be the starting point, because it's the 
> > > most reliable method.
> > 
> > In the consistency models discussion, this was marked 
> > the "LEAVE_KERNEL+SWITCH_KERNEL" model. It's indeed the 
> > strongest model of all, but also comes at the highest 
> > cost in terms of impact on running tasks. It's so high 
> > (the interruption may be seconds or more) that it was 
> > deemed not worth implementing.
> 
> Yeah, this is way too disruptive to the user.
> 
> Even the comparatively tiny latency caused by kpatch's 
> use of stop_machine() was considered unacceptable by 
> some.

Unreliable, unrobust patching is even more disruptive...

What I think makes it long term fragile is that we combine 
two unrobust, unlikely mechanisms: the chance that a task 
just happens to execute a patched function, with the chance 
that debug information is unreliable.

For example tracing patching got debugged to a fair degree 
because we rely on the patching for actual tracing 
functionality. Even with that relatively robust usage model 
we had our crises ...

I just don't see how a stack backtrace based live patching 
method can become robust in the long run.

> Plus a lot of processes would see EINTR, causing more 
> havoc.

Parking threads safely in user mode does not require the 
propagation of syscall interruption to user-space.

(It does have some other requirements, such as making all 
syscalls interruptible to a 'special' signalling method 
that only live patching triggers - even syscalls that are 
under the normal ABI uninterruptible, such as sys_sync().)

On the other hand, if it's too slow, people will work on 
improving signal propagation latencies: making syscalls 
more readily interruptible and more seemlessly restartable 
has various other advantages beyond live kernel patching.

I.e. it's a win-win scenario and will improve various areas 
of the kernel in terms of syscall interruptability 
latencies.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/