Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755987AbbBQSPv (ORCPT ); Tue, 17 Feb 2015 13:15:51 -0500 Received: from bombadil.infradead.org ([198.137.202.9]:58377 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753872AbbBQSPu (ORCPT ); Tue, 17 Feb 2015 13:15:50 -0500 Date: Tue, 17 Feb 2015 19:15:41 +0100 From: Peter Zijlstra To: Josh Poimboeuf Cc: Andrew Morton , Ingo Molnar , Jiri Kosina , Seth Jennings , linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/3] sched: add sched_task_call() Message-ID: <20150217181541.GP5029@twins.programming.kicks-ass.net> References: <20150216204436.GH5029@twins.programming.kicks-ass.net> <20150216220505.GB11861@treble.redhat.com> <20150217092450.GI5029@twins.programming.kicks-ass.net> <20150217141211.GC11861@treble.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150217141211.GC11861@treble.redhat.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3477 Lines: 83 On Tue, Feb 17, 2015 at 08:12:11AM -0600, Josh Poimboeuf wrote: > On Tue, Feb 17, 2015 at 10:24:50AM +0100, Peter Zijlstra wrote: > > So far stack unwinding has basically been a best effort debug output > > kind of thing, you're wanting to make the integrity of the kernel depend > > on it. > > > > You require an absolute 100% correctness of the stack unwinder -- where > > today it is; as stated above; a best effort debug output thing. > > > > That is a _big_ change. > > True, this does seem to be the first attempt to rely on the correctness > of the stack walking code. > > > Has this been properly considered; has all the asm of the relevant > > architectures been audited? Are you planning on maintaining that level > > of audit for all new patches? > > I agree, the unwinder needs to be 100% correct. Currently we only > support x86_64. Luckily, in general, stacks are very well defined and > walking the stack of a sleeping task should be straightforward. I don't > think it would be too hard to ensure the stack unwinder is right for > other architectures as we port them. I would not be too sure about that, the x86 framepointer thing is not universal. IIRC some have to have some form of in-kernel dwarf unwinding. And I'm assuming you're hard relying on CONFIG_FRAMEPOINTER here, because otherwise x86 stacks are a mess too. So even with CONFIG_FRAMEPOINTER, things like copy_to/from_user, which are implemented in asm, don't honour that. So if one of those faults and the fault handler sleeps, you'll miss say your 'copy_user_enhanced_fast_string' entry. Then again, those asm functions don't have function trace bits either, so you can't patch them to begin with I suppose. Here's to hoping you don't have to.. > > Because the way you propose to do things, we'll end up with silent but > > deadly fail if the unwinder is less than 100% correct. No way to easily > > debug that, no warns, just silent corruption. > > That's a good point. There's definitely room for additional error > checking in the x86 stack walking code. A couple of ideas: > > - make sure it starts from a __schedule() call at the top > - make sure we actually reach the bottom of the stack > - make sure each stack frame's return address is immediately after a > call instruction > > If we hit any of those errors, we can bail out, unregister with ftrace > and restore the system to its original state. And then hope you can get a better trace next time around? Or will you fall-back to an alternative method of patching? > > Are you really really sure you want to go do this? > > Basically, yes. We've had a lot of conversations about many different > variations of how to do this over the past year, and this is by far the > best approach we've come up with. For some reason I'm thinking jikos is going to disagree with you on that :-) I'm further thinking we don't actually need 2 (or more) different means of live patching in the kernel. So you all had better sit down (again) and come up with something you all agree on. > FWIW, we've been doing something similar with kpatch and stop_machine() > over the last 1+ years, and have run into zero problems with that > approach. Could be you've just been lucky... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/