Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755432AbbLAH2c (ORCPT ); Tue, 1 Dec 2015 02:28:32 -0500 Received: from mail-wm0-f49.google.com ([74.125.82.49]:35242 "EHLO mail-wm0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754653AbbLAH2a (ORCPT ); Tue, 1 Dec 2015 02:28:30 -0500 Date: Tue, 1 Dec 2015 08:28:26 +0100 From: Ingo Molnar To: Peter Zijlstra Cc: "Wangnan (F)" , Jiri Olsa , Arnaldo Carvalho de Melo , David Ahern , Milian Wolff , linux-kernel@vger.kernel.org, pi3orama , lizefan 00213767 Subject: Re: [BUG REPORT] perf tools: x86_64: Broken calllchain when sampling taken at 'callq' instruction Message-ID: <20151201072826.GB28270@gmail.com> References: <564C3011.8090002@huawei.com> <20151118082033.GA24726@gmail.com> <564C3A0E.3030502@huawei.com> <564C3BAA.4040806@huawei.com> <20151119063709.GA14852@gmail.com> <564D6FF9.3030105@huawei.com> <20151119102300.GA2830@gmail.com> <20151119112315.GL3816@twins.programming.kicks-ass.net> <20151127083811.GA26257@gmail.com> <20151130092843.GF17308@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151130092843.GF17308@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3646 Lines: 81 * Peter Zijlstra wrote: > On Fri, Nov 27, 2015 at 09:38:11AM +0100, Ingo Molnar wrote: > > > > * Peter Zijlstra wrote: > > > > > On Thu, Nov 19, 2015 at 11:23:00AM +0100, Ingo Molnar wrote: > > > > PEBS is an asynchronous hardware tracing mechanism, when batched PEBS is used it > > > > might not even result in any interruption of execution. The 'pt_regs' does not > > > > necessarily correspond to an interrupted, restartable context - we take the RIP > > > > from the PEBS machinery and also use LBR and disassembly to determine the previous > > > > instruction, before reporting it to user-space. > > > > > > Note that modern PEBS hardware (hsw+) does the rollback in hardware. > > > Prior to that we indeed to it manually using the LBR. > > > > > > As to pt_regs, we construct a franken pt_regs based on the actual PEBS > > > buffer overflow PMI and bits from the PEBS record (which also includes > > > some register state). See > > > arch/x86/kernel/cpu/perf_event_intel_ds.c:setup_pebs_sample_data(). > > > > > > We always copy the flags, ip, bp and sp from the PEBS record into the > > > interrupt pt_regs. > > > > > > And note that the PEBS record is constructed at instruction retirement, > > > so it shows the state _after_ the instruction, with exception of the > > > (hsw+) real_ip field. > > > > > > So the unwinder will have to be taught that if the IP points at a stack > > > altering instruction (call, push, etc.) it will have to 'undo' the > > > effects on the actual stack (I appreciate this might be 'interesting' > > > for things like: pop, ret, etc.). > > > > So do we dump both the 'real' and the actual RIP, to not force tooling into having > > to decode instructions and such? > > Nope, we only expose the corrected one. > > > (Which is pretty hard and fragile and not always > > possible with instructions that destroy the original RIP, like JMP, etc.) > > Not sure what you're getting at here. We don't need the uncorrected > instruction. Well, we need it for stack unwinding, as you point it out: > But the problem here is that we rewind the instruction stream, but not > the stack. And the stack unwinder is (obviously) interested in the stack > state. Unwinding the stack state would fix it as well - but an equivalent solution would be to pass along the original RIP would fix it as well: we'd have a self-consistent pair of RIP/RSP. Especially since unwinding the RSP is probably hard: > I'm not sure we want (or need) to go undo the specific instruction's > stack effect in-kernel. If the !DWARF unwinders are similarly confused > we might need to put it in kernel (expensive *groan*). If its only the > DWARF muck then its something that can be done in userspace just > fine, although we might need to copy slightly more of the stack than SP > is pointing at, such that we can undo RET/POP etc. which would have data > beyond the head of stack. > > The easiest solution might be to figure out the biggest stack offset for > any instruction and always capture that much over the head of stack. so I think the problem here is that the RSP does not match up to the RIP. We can either pass along the original RIP+RSP, or the fixed up one - but what we do currently is that we pass along only half of it - which corrupts dwarf unwinding state that doesn't tolerate such errors. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/