Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752443AbdGMMSA (ORCPT ); Thu, 13 Jul 2017 08:18:00 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53936 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752075AbdGMMR6 (ORCPT ); Thu, 13 Jul 2017 08:17:58 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 856107EBD3 Authentication-Results: ext-mx03.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx03.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=jpoimboe@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 856107EBD3 Date: Thu, 13 Jul 2017 07:17:55 -0500 From: Josh Poimboeuf To: Ingo Molnar Cc: Peter Zijlstra , Andres Freund , x86@kernel.org, linux-kernel@vger.kernel.org, live-patching@vger.kernel.org, Linus Torvalds , Andy Lutomirski , Jiri Slaby , "H. Peter Anvin" , Mike Galbraith , Jiri Olsa , Arnaldo Carvalho de Melo , Namhyung Kim , Alexander Shishkin Subject: Re: [PATCH v3 00/10] x86: ORC unwinder (previously undwarf) Message-ID: <20170713121755.hsuvecrzvyxbdvvk@treble> References: <20170712214920.5droainfqjmq7sgu@alap3.anarazel.de> <20170712223225.zkq7tdb7pzgb3wy7@treble> <20170713071253.a3slz3j5tcgy3rkk@hirez.programming.kicks-ass.net> <20170713085015.yjjv5ig2znplx5jl@hirez.programming.kicks-ass.net> <20170713085114.h4vjgg7jjbl6dohb@hirez.programming.kicks-ass.net> <20170713091911.aj7e7dvrbqcyxh7l@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20170713091911.aj7e7dvrbqcyxh7l@gmail.com> User-Agent: Mutt/1.6.0.1 (2016-04-01) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Thu, 13 Jul 2017 12:17:58 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2270 Lines: 61 On Thu, Jul 13, 2017 at 11:19:11AM +0200, Ingo Molnar wrote: > > * Peter Zijlstra wrote: > > > > One gloriously ugly hack would be to delay the userspace unwind to > > > return-to-userspace, at which point we have a schedulable context and can take > > > faults. > > I don't think it's ugly, and it has various advantages: > > > > Of course, then you have to somehow identify this later unwind sample with all > > > relevant prior samples and stitch the whole thing back together, but that > > > should be doable. > > > > > > In fact, it would not be at all hard to do, just queue a task_work from the > > > NMI and have that do the EH based unwind. > > This would have a couple of advantages: > > - as you mention, being able to fault in debug info and generally do > IO/scheduling, > > - profiling overhead would be accounted to the task context that generates it, > not the NMI context, > > - there would be a natural batching/coalescing optimization if multiple events > hit the same system call: the user-space backtrace would only have to be looked > up once for all samples that got collected. > > This could be done by separating the user-space backtrace into a separate event, > and perf tooling would then apply the same user-space backtrace to all prior > kernel samples. > > I.e. the ring-buffer would have trace entries like: > > [ kernel sample #1, with kernel backtrace #1 ] > [ kernel sample #2, with kernel backtrace #2 ] > [ kernel sample #3, with kernel backtrace #3 ] > [ user-space backtrace #1 at syscall return ] > ... > > Note how the three kernel samples didn't have to do any user-space unwinding at > all, so the user-space unwinding overhead got reduced by a factor of 3. > > Tooling would know that 'user-space backtrace #1' applies to the previous three > kernel samples. > > Or so? BTW, while we're throwing out ideas for this, here's another idea, though it's almost certainly not a good one :-) For user space stack unwinding, the kernel could emulate what the kernel 'guess' unwinder does by scanning the user space stack and returning all the text addresses it finds. The results wouldn't be 100% accurate, but they could end up being useful over time. -- Josh