Date: Tue, 17 Feb 2015 19:15:41 +0100
From: Peter Zijlstra <peterz@infradead.org>
To: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, Ingo Molnar <mingo@redhat.com>,
        Jiri Kosina <jkosina@suse.cz>, Seth Jennings <sjenning@redhat.com>,
        linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/3] sched: add sched_task_call()
Message-ID: <20150217181541.GP5029@twins.programming.kicks-ass.net>
References: <cover.1424109806.git.jpoimboe@redhat.com>
 <d8614d494bb136343c884cbdd780b9985faa07a0.1424109806.git.jpoimboe@redhat.com>
 <20150216204436.GH5029@twins.programming.kicks-ass.net>
 <20150216220505.GB11861@treble.redhat.com>
 <20150217092450.GI5029@twins.programming.kicks-ass.net>
 <20150217141211.GC11861@treble.redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150217141211.GC11861@treble.redhat.com>
User-Agent: Mutt/1.5.21 (2012-12-30)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3477
Lines: 83

On Tue, Feb 17, 2015 at 08:12:11AM -0600, Josh Poimboeuf wrote:
> On Tue, Feb 17, 2015 at 10:24:50AM +0100, Peter Zijlstra wrote:
> > So far stack unwinding has basically been a best effort debug output
> > kind of thing, you're wanting to make the integrity of the kernel depend
> > on it.
> > 
> > You require an absolute 100% correctness of the stack unwinder -- where
> > today it is; as stated above; a best effort debug output thing.
> > 
> > That is a _big_ change.
> 
> True, this does seem to be the first attempt to rely on the correctness
> of the stack walking code.
> 
> > Has this been properly considered; has all the asm of the relevant
> > architectures been audited? Are you planning on maintaining that level
> > of audit for all new patches?
> 
> I agree, the unwinder needs to be 100% correct.  Currently we only
> support x86_64.  Luckily, in general, stacks are very well defined and
> walking the stack of a sleeping task should be straightforward.  I don't
> think it would be too hard to ensure the stack unwinder is right for
> other architectures as we port them.

I would not be too sure about that, the x86 framepointer thing is not
universal. IIRC some have to have some form of in-kernel dwarf
unwinding.

And I'm assuming you're hard relying on CONFIG_FRAMEPOINTER here,
because otherwise x86 stacks are a mess too.

So even with CONFIG_FRAMEPOINTER, things like copy_to/from_user, which
are implemented in asm, don't honour that. So if one of those faults and
the fault handler sleeps, you'll miss say your
'copy_user_enhanced_fast_string' entry.

Then again, those asm functions don't have function trace bits either,
so you can't patch them to begin with I suppose.

Here's to hoping you don't have to..

> > Because the way you propose to do things, we'll end up with silent but
> > deadly fail if the unwinder is less than 100% correct. No way to easily
> > debug that, no warns, just silent corruption.
> 
> That's a good point.  There's definitely room for additional error
> checking in the x86 stack walking code.  A couple of ideas:
> 
> - make sure it starts from a __schedule() call at the top
> - make sure we actually reach the bottom of the stack
> - make sure each stack frame's return address is immediately after a
>   call instruction
> 
> If we hit any of those errors, we can bail out, unregister with ftrace
> and restore the system to its original state.

And then hope you can get a better trace next time around? Or will you
fall-back to an alternative method of patching?

> > Are you really really sure you want to go do this?
> 
> Basically, yes.  We've had a lot of conversations about many different
> variations of how to do this over the past year, and this is by far the
> best approach we've come up with.

For some reason I'm thinking jikos is going to disagree with you on that
:-)

I'm further thinking we don't actually need 2 (or more) different means
of live patching in the kernel. So you all had better sit down (again)
and come up with something you all agree on.

> FWIW, we've been doing something similar with kpatch and stop_machine()
> over the last 1+ years, and have run into zero problems with that
> approach.

Could be you've just been lucky...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/