Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751310AbdGMJTT (ORCPT ); Thu, 13 Jul 2017 05:19:19 -0400 Received: from mail-wm0-f65.google.com ([74.125.82.65]:34204 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750953AbdGMJTQ (ORCPT ); Thu, 13 Jul 2017 05:19:16 -0400 Date: Thu, 13 Jul 2017 11:19:11 +0200 From: Ingo Molnar To: Peter Zijlstra Cc: Josh Poimboeuf , Andres Freund , x86@kernel.org, linux-kernel@vger.kernel.org, live-patching@vger.kernel.org, Linus Torvalds , Andy Lutomirski , Jiri Slaby , "H. Peter Anvin" , Mike Galbraith , Jiri Olsa , Arnaldo Carvalho de Melo , Namhyung Kim , Alexander Shishkin Subject: Re: [PATCH v3 00/10] x86: ORC unwinder (previously undwarf) Message-ID: <20170713091911.aj7e7dvrbqcyxh7l@gmail.com> References: <20170712214920.5droainfqjmq7sgu@alap3.anarazel.de> <20170712223225.zkq7tdb7pzgb3wy7@treble> <20170713071253.a3slz3j5tcgy3rkk@hirez.programming.kicks-ass.net> <20170713085015.yjjv5ig2znplx5jl@hirez.programming.kicks-ass.net> <20170713085114.h4vjgg7jjbl6dohb@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170713085114.h4vjgg7jjbl6dohb@hirez.programming.kicks-ass.net> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1743 Lines: 51 * Peter Zijlstra wrote: > > One gloriously ugly hack would be to delay the userspace unwind to > > return-to-userspace, at which point we have a schedulable context and can take > > faults. I don't think it's ugly, and it has various advantages: > > Of course, then you have to somehow identify this later unwind sample with all > > relevant prior samples and stitch the whole thing back together, but that > > should be doable. > > > > In fact, it would not be at all hard to do, just queue a task_work from the > > NMI and have that do the EH based unwind. This would have a couple of advantages: - as you mention, being able to fault in debug info and generally do IO/scheduling, - profiling overhead would be accounted to the task context that generates it, not the NMI context, - there would be a natural batching/coalescing optimization if multiple events hit the same system call: the user-space backtrace would only have to be looked up once for all samples that got collected. This could be done by separating the user-space backtrace into a separate event, and perf tooling would then apply the same user-space backtrace to all prior kernel samples. I.e. the ring-buffer would have trace entries like: [ kernel sample #1, with kernel backtrace #1 ] [ kernel sample #2, with kernel backtrace #2 ] [ kernel sample #3, with kernel backtrace #3 ] [ user-space backtrace #1 at syscall return ] ... Note how the three kernel samples didn't have to do any user-space unwinding at all, so the user-space unwinding overhead got reduced by a factor of 3. Tooling would know that 'user-space backtrace #1' applies to the previous three kernel samples. Or so? Thanks, Ingo