Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756693AbbBQONj (ORCPT ); Tue, 17 Feb 2015 09:13:39 -0500 Received: from mx1.redhat.com ([209.132.183.28]:50184 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756624AbbBQONi (ORCPT ); Tue, 17 Feb 2015 09:13:38 -0500 Date: Tue, 17 Feb 2015 08:12:11 -0600 From: Josh Poimboeuf To: Peter Zijlstra Cc: Andrew Morton , Ingo Molnar , Jiri Kosina , Seth Jennings , linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/3] sched: add sched_task_call() Message-ID: <20150217141211.GC11861@treble.redhat.com> References: <20150216204436.GH5029@twins.programming.kicks-ass.net> <20150216220505.GB11861@treble.redhat.com> <20150217092450.GI5029@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20150217092450.GI5029@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.23.1-rc1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2727 Lines: 68 On Tue, Feb 17, 2015 at 10:24:50AM +0100, Peter Zijlstra wrote: > On Mon, Feb 16, 2015 at 04:05:05PM -0600, Josh Poimboeuf wrote: > > Yeah, I can understand that. I definitely want to avoid touching the > > scheduler code. Basically I'm trying to find a way to atomically do the > > following: > > > > if (task is sleeping) { > > walk the stack > > if (certain set of functions isn't on the stack) > > set (or clear) a thread flag for the task > > } > > > > Any ideas on how I can achieve that? So far my ideas are: > > So far stack unwinding has basically been a best effort debug output > kind of thing, you're wanting to make the integrity of the kernel depend > on it. > > You require an absolute 100% correctness of the stack unwinder -- where > today it is; as stated above; a best effort debug output thing. > > That is a _big_ change. True, this does seem to be the first attempt to rely on the correctness of the stack walking code. > Has this been properly considered; has all the asm of the relevant > architectures been audited? Are you planning on maintaining that level > of audit for all new patches? I agree, the unwinder needs to be 100% correct. Currently we only support x86_64. Luckily, in general, stacks are very well defined and walking the stack of a sleeping task should be straightforward. I don't think it would be too hard to ensure the stack unwinder is right for other architectures as we port them. > Because the way you propose to do things, we'll end up with silent but > deadly fail if the unwinder is less than 100% correct. No way to easily > debug that, no warns, just silent corruption. That's a good point. There's definitely room for additional error checking in the x86 stack walking code. A couple of ideas: - make sure it starts from a __schedule() call at the top - make sure we actually reach the bottom of the stack - make sure each stack frame's return address is immediately after a call instruction If we hit any of those errors, we can bail out, unregister with ftrace and restore the system to its original state. > Are you really really sure you want to go do this? Basically, yes. We've had a lot of conversations about many different variations of how to do this over the past year, and this is by far the best approach we've come up with. FWIW, we've been doing something similar with kpatch and stop_machine() over the last 1+ years, and have run into zero problems with that approach. -- Josh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/