Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752823AbcD2ULo (ORCPT ); Fri, 29 Apr 2016 16:11:44 -0400 Received: from mx1.redhat.com ([209.132.183.28]:51699 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750875AbcD2ULn (ORCPT ); Fri, 29 Apr 2016 16:11:43 -0400 Date: Fri, 29 Apr 2016 15:11:39 -0500 From: Josh Poimboeuf To: Andy Lutomirski Cc: Jessica Yu , Jiri Kosina , Miroslav Benes , Ingo Molnar , Peter Zijlstra , Michael Ellerman , Heiko Carstens , live-patching@vger.kernel.org, "linux-kernel@vger.kernel.org" , X86 ML , linuxppc-dev@lists.ozlabs.org, "linux-s390@vger.kernel.org" , Vojtech Pavlik , Jiri Slaby , Petr Mladek , Chris J Arges , Andy Lutomirski Subject: Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking Message-ID: <20160429201139.pudoged2yathyo64@treble> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.6.0.1 (2016-04-01) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Fri, 29 Apr 2016 20:11:41 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1297 Lines: 30 On Fri, Apr 29, 2016 at 11:06:53AM -0700, Andy Lutomirski wrote: > On Thu, Apr 28, 2016 at 1:44 PM, Josh Poimboeuf wrote: > > A preempted function might not have had a chance to save the frame > > pointer to the stack yet, which can result in its caller getting skipped > > on a stack trace. > > > > Add a flag to indicate when the task has been preempted so that stack > > dump code can determine whether the stack trace is reliable. > > I think I like this, but how do you handle the rather similar case in > which a task goes to sleep because it's waiting on IO that happened in > response to get_user, put_user, copy_from_user, etc? Hm, good question. I was thinking that page faults had a dedicated stack, but now looking at the entry and traps code, that doesn't seem to be the case. Anyway I think it shouldn't be a problem if we make sure that any kernel function which might trigger a valid page fault (e.g., copy_user_generic_string) do the proper frame pointer setup first. Then the stack should still be reliable. In fact I might be able to teach objtool to enforce that: any function which uses an exception table should create a stack frame. Or alternatively, maybe set some kind of flag for page faults, similar to what I did with this patch. -- Josh