Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753597AbdG2J2j (ORCPT ); Sat, 29 Jul 2017 05:28:39 -0400 Received: from merlin.infradead.org ([205.233.59.134]:50256 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753457AbdG2J2h (ORCPT ); Sat, 29 Jul 2017 05:28:37 -0400 Date: Sat, 29 Jul 2017 11:28:12 +0200 From: Peter Zijlstra To: Andy Lutomirski Cc: Ingo Molnar , Josh Poimboeuf , Andres Freund , X86 ML , "linux-kernel@vger.kernel.org" , live-patching@vger.kernel.org, Linus Torvalds , Andy Lutomirski , Jiri Slaby , "H. Peter Anvin" , Mike Galbraith , Jiri Olsa , Arnaldo Carvalho de Melo , Namhyung Kim , Alexander Shishkin Subject: Re: [RFC] perf: Delayed userspace unwind (Was: [PATCH v3 00/10] x86: ORC unwinder) Message-ID: <20170729092812.GC6524@worktop.programming.kicks-ass.net> References: <20170712214920.5droainfqjmq7sgu@alap3.anarazel.de> <20170712223225.zkq7tdb7pzgb3wy7@treble> <20170713071253.a3slz3j5tcgy3rkk@hirez.programming.kicks-ass.net> <20170713085015.yjjv5ig2znplx5jl@hirez.programming.kicks-ass.net> <20170713085114.h4vjgg7jjbl6dohb@hirez.programming.kicks-ass.net> <20170713091911.aj7e7dvrbqcyxh7l@gmail.com> <20170725115512.k3wk62fdnxdd6enu@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.22.1 (2013-10-16) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1744 Lines: 42 On Fri, Jul 28, 2017 at 08:35:16PM -0700, Andy Lutomirski wrote: > I haven't checked task_work specifically, but a bunch of the exit work > is permitted to sleep, which is potentially useful. Yes. > If this becomes successful enough that we could eventually deprecate > the old code, I wonder if copy_from_user_nmi() could go away? :) So we still use that for things like the PEBS IP fixup for older CPUs. That needs to read the userspace code. Also, since all this is optional on userspace asking for the new format, we will probably (forever) need to support userspace not asking for it. > > + if (!work->func) { > > + work->func = perf_callchain_work; > > + /* > > + * We cannot do set_notify_resume() from NMI context, > > + * also, knowing we are already in an interrupted > > + * context and will pass return to userspace, we can > > + * simply set TIF_NOTIFY_RESUME. > > + */ > > + task_work_add(current, work, false); > > + set_tsk_thread_flag(current, TIF_NOTIFY_RESUME); > > There's a more or leas unavoidable window in which this won't be > noticed, which could plausibly confuse userspace. It might be > possible to figure out a way for an NMI to tell if it lands in this > window, but it would be a bit tricky. Correct, I have been thinking on how to do that but haven't found anything particularly nice yet. > Also, is the task_work code prepared to handle task_work_add during > exit? That is one I hadn't thought of, but basically task_work_add() will fail if the task is too far gone. At that point we should fallback to the 'old' behaviour and simply include the information in the kernel SAMPLE record.