Date: Tue, 17 Feb 2015 15:25:32 -0600
From: Josh Poimboeuf <jpoimboe@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>, Ingo Molnar <mingo@redhat.com>,
        Jiri Kosina <jkosina@suse.cz>, Seth Jennings <sjenning@redhat.com>,
        linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/3] sched: add sched_task_call()
Message-ID: <20150217212532.GJ11861@treble.redhat.com>
References: <cover.1424109806.git.jpoimboe@redhat.com>
 <d8614d494bb136343c884cbdd780b9985faa07a0.1424109806.git.jpoimboe@redhat.com>
 <20150216204436.GH5029@twins.programming.kicks-ass.net>
 <20150216220505.GB11861@treble.redhat.com>
 <20150217092450.GI5029@twins.programming.kicks-ass.net>
 <20150217141211.GC11861@treble.redhat.com>
 <20150217181541.GP5029@twins.programming.kicks-ass.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <20150217181541.GP5029@twins.programming.kicks-ass.net>
User-Agent: Mutt/1.5.23.1-rc1 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5570
Lines: 133

On Tue, Feb 17, 2015 at 07:15:41PM +0100, Peter Zijlstra wrote:
> On Tue, Feb 17, 2015 at 08:12:11AM -0600, Josh Poimboeuf wrote:
> > On Tue, Feb 17, 2015 at 10:24:50AM +0100, Peter Zijlstra wrote:
> > > So far stack unwinding has basically been a best effort debug output
> > > kind of thing, you're wanting to make the integrity of the kernel depend
> > > on it.
> > > 
> > > You require an absolute 100% correctness of the stack unwinder -- where
> > > today it is; as stated above; a best effort debug output thing.
> > > 
> > > That is a _big_ change.
> > 
> > True, this does seem to be the first attempt to rely on the correctness
> > of the stack walking code.
> > 
> > > Has this been properly considered; has all the asm of the relevant
> > > architectures been audited? Are you planning on maintaining that level
> > > of audit for all new patches?
> > 
> > I agree, the unwinder needs to be 100% correct.  Currently we only
> > support x86_64.  Luckily, in general, stacks are very well defined and
> > walking the stack of a sleeping task should be straightforward.  I don't
> > think it would be too hard to ensure the stack unwinder is right for
> > other architectures as we port them.
> 
> I would not be too sure about that, the x86 framepointer thing is not
> universal. IIRC some have to have some form of in-kernel dwarf
> unwinding.
> 
> And I'm assuming you're hard relying on CONFIG_FRAMEPOINTER here,
> because otherwise x86 stacks are a mess too.

Yeah, it'll rely on CONFIG_FRAME_POINTER.  IIUC, the arches we care
about now (x86, power, s390, arm64) all have frame pointer support, and
the stack formats are all sane, AFAICT.

If we ever port livepatch to a more obscure arch for which walking the
stack is more difficult, we'll have several options:

a) spend the time to ensure the unwinding code is correct and resilient
   to errors;

b) leave the consistency model compiled code out if !FRAME_POINTER and
   allow users to patch without one (similar to the livepatch code
   that's already in the livepatch tree today); or

c) not support that arch.

> So even with CONFIG_FRAMEPOINTER, things like copy_to/from_user, which
> are implemented in asm, don't honour that. So if one of those faults and
> the fault handler sleeps, you'll miss say your
> 'copy_user_enhanced_fast_string' entry.
> 
> Then again, those asm functions don't have function trace bits either,
> so you can't patch them to begin with I suppose.
> 
> Here's to hoping you don't have to..

Right.  We can't patch asm functions because we rely on ftrace, which
needs the -mfentry prologue.

> > > Because the way you propose to do things, we'll end up with silent but
> > > deadly fail if the unwinder is less than 100% correct. No way to easily
> > > debug that, no warns, just silent corruption.
> > 
> > That's a good point.  There's definitely room for additional error
> > checking in the x86 stack walking code.  A couple of ideas:
> > 
> > - make sure it starts from a __schedule() call at the top
> > - make sure we actually reach the bottom of the stack
> > - make sure each stack frame's return address is immediately after a
> >   call instruction
> > 
> > If we hit any of those errors, we can bail out, unregister with ftrace
> > and restore the system to its original state.
> 
> And then hope you can get a better trace next time around? Or will you
> fall-back to an alternative method of patching?

Yeah, on second thought, we wouldn't have to cancel the patch.  We could
defer to check the task's stack again at a later time.  If it's stuck
there, the user can try sending it a signal to unstick it, or cancel the
patching process.  Those mechanisms are already in place with my
consistency model patch set.

I'd also do a WARN_ON_ONCE and a dump of the full stack data, since I'm
guessing it would either indicate an error in the unwinding code or
point us to an unexpected stack condition.

> > > Are you really really sure you want to go do this?
> > 
> > Basically, yes.  We've had a lot of conversations about many different
> > variations of how to do this over the past year, and this is by far the
> > best approach we've come up with.
> 
> For some reason I'm thinking jikos is going to disagree with you on that
> :-)
> 
> I'm further thinking we don't actually need 2 (or more) different means
> of live patching in the kernel. So you all had better sit down (again)
> and come up with something you all agree on.

Yeah, I also _really_ want to avoid multiple consistency models.

In fact, that's a big motivation behind my consistency model patch set.
It's heavily inspired by a suggestion from Vojtech [1].  It combines
kpatch (backtrace checking) with kGraft (per-thread consistency).  It
has several advantages over either of the individual approaches.  See
http://lwn.net/Articles/632582/ where I describe its pros over both
kpatch and kGraft.

Jiri, would you and Vojtech agree that the proposed consistency model is
all we need?  Or do you still want to do the multiple consistency model
thing?

> > FWIW, we've been doing something similar with kpatch and stop_machine()
> > over the last 1+ years, and have run into zero problems with that
> > approach.
> 
> Could be you've just been lucky...

I can't really disagree with that ;-)


[1] https://lkml.org/lkml/2014/11/7/354

-- 
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/