Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp441746pxb; Sat, 6 Mar 2021 05:00:48 -0800 (PST) X-Google-Smtp-Source: ABdhPJw6/tDBz+vds75Ip6xPlmWnTqyKp7Ky7QQRA7YdmHQR/H8275dUtdzLU7+pwSq5r3nTd/AB X-Received: by 2002:a17:907:78d9:: with SMTP id kv25mr6941028ejc.415.1615035648167; Sat, 06 Mar 2021 05:00:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1615035648; cv=none; d=google.com; s=arc-20160816; b=sHOqHNZk4rjyI66GMlPiG8qGhYubhjw2R/sSf5GrNfLA4tKH7CYyPFABoB3N0N2gJ3 Apg1Fq3fW9B8P4LZZIQandzOc2F5fmcT1BXEVkj4a9MDLmYZ4Ue6y0Y/MYfHkdLQOTG3 peDlUvvlxc10qcOg7rmlTnm81nmE7tJyC+jfE+3cpIL5JoTzKgYmCsB8Gl9AJEkBWVKV yeqtqMtVSEzFYrlW8UhVNCddddWibEJkg1UaP9ubqQ5oR1ddNDGjUimeTVNMMi2KvYdy fRPZm7VWLL3eI+2xfZLndZPibTKvbeHiirPw2Jtgy3HDMJb+aDeyYLrv6h+NYciLBD9M eN/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=l9Io8/Dtn134grdaiQuJY3elPBa+qV2F6TbgV8/ioNY=; b=JC2lD6pwouN31xtw6eHORNkv2/sbga0gBBuzwOqdGEwly3BMddeb7+CZ+oCHsfajJq MYLDi5sSFiL2N50ZETm6JiTSY1XrMKpibTvrqjMcRmQ/0l7hEF80CZ4QDTehdKmqPBnC 68E9YQz4dJit4TpQmvd81bbxiclQWx39qZRW9w/UvVYadJ9UTwMSzu2ptvh7O0HgwheR f7tN0241Ux0IjftmpYJOL3FpFvpiaqAdhu9PtvtLOd8sLKXS4SKrammv7UHAELufJ3K6 Q3lG8qTFqxDPpqc6LZHCJcYRMOk4h4Sv2TrvCqqbqYSTEHAc2MeorZro4XzYGiFvC2Vh 88hQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=rXly2PXN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id eb10si3400520ejc.273.2021.03.06.05.00.25; Sat, 06 Mar 2021 05:00:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=rXly2PXN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230438AbhCFMzl (ORCPT + 99 others); Sat, 6 Mar 2021 07:55:41 -0500 Received: from mail.kernel.org ([198.145.29.99]:47064 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230439AbhCFMzf (ORCPT ); Sat, 6 Mar 2021 07:55:35 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 6A5EA65009; Sat, 6 Mar 2021 12:55:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1615035334; bh=ezJonZsW5VUFEppqbj+9l/V+HQgHfbegXc0Ez2nPw7c=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=rXly2PXNd6/iZFHpre/hKPSfW1ScoRf5nekgi+nlshuDKpl5LMa4FbDAU8wq+JuqX FXL885SnE/Vz90cak2jj/YR7XMpiAjQOSFXS0NB0BSI4MU1Eo7A/g2LeR4wFOlJ+vi ZQeWQnBAwkp6wrlk2rlstjDEAMCYau8incUejJ6Eb0f6s/roEMqm0usDOAQhrz/6Eo DVXO73Cu4eMgwOJhXCWKryhzCDLuktSQGkICL9Ql3/d9WpaXbUQx6oUlSkoWbq6t3t JK+wNRyGMq2A2M4uRVvj1VRTj10W1cWCzndGWC6/Vi1gf04ZuFvaePM/QCLE3/l68g DyT/kQ7pdif6Q== Received: by quaco.ghostprotocols.net (Postfix, from userid 1000) id 3FC1840647; Sat, 6 Mar 2021 09:55:32 -0300 (-03) Date: Sat, 6 Mar 2021 09:55:32 -0300 From: Arnaldo Carvalho de Melo To: James Clark Cc: Alexandre Truong , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, John Garry , Will Deacon , Mathieu Poirier , Leo Yan , Peter Zijlstra , Ingo Molnar , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Kemeng Shi , Ian Rogers , Andi Kleen , Kan Liang , Jin Yao , Adrian Hunter , Suzuki K Poulose , Al Grant , Wilco Dijkstra Subject: Re: [PATCH RESEND WITH CCs v3 4/4] perf tools: determine if LR is the return address Message-ID: References: <20210304163255.10363-1-alexandre.truong@arm.com> <20210304163255.10363-4-alexandre.truong@arm.com> <57fdc0c9-ea72-5cd0-405a-c7bff0daf9d6@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <57fdc0c9-ea72-5cd0-405a-c7bff0daf9d6@arm.com> X-Url: http://acmel.wordpress.com Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Em Fri, Mar 05, 2021 at 10:54:03AM +0200, James Clark escreveu: > I've tested this patchset on a few different applications and have seen it significantly improve > quality of frame pointer stacks on aarch64. For example with GDB 10 and default build options, > 'bfd_calc_gnu_debuglink_crc32' is a leaf function, and its caller 'gdb_bfd_crc' is ommitted, > but with the patchset it is included. I've also confirmed that this is correct from looking at > the source code. > > Before: > > # Children Self Command Shared Object Symbol > # ........ ........ ............... .......................... ........... > # > 34.55% 0.00% gdb-100 gdb-100 [.] _start > 0.78% > _start > __libc_start_main > main > gdb_main > captured_command_loop > gdb_do_one_event > check_async_event_handlers > fetch_inferior_event > inferior_event_handler > do_all_inferior_continuations > attach_post_wait > post_create_inferior > svr4_solib_create_inferior_hook > solib_add > solib_read_symbols > symbol_file_add_with_addrs > read_symbols > elf_symfile_read > find_separate_debug_file_by_debuglink[abi:cxx11] > find_separate_debug_file > separate_debug_file_exists > gdb_bfd_crc > bfd_calc_gnu_debuglink_crc32 > > After: > > # Children Self Command Shared Object Symbol > # ........ ........ ............... .......................... ........... > # > 34.55% 0.00% gdb-100 gdb-100 [.] _start > 0.78% > _start > __libc_start_main > main > gdb_main > captured_command_loop > gdb_do_one_event > check_async_event_handlers > fetch_inferior_event > inferior_event_handler > do_all_inferior_continuations > attach_post_wait > post_create_inferior > svr4_solib_create_inferior_hook > solib_add > solib_read_symbols > symbol_file_add_with_addrs > read_symbols > elf_symfile_read > find_separate_debug_file_by_debuglink[abi:cxx11] > find_separate_debug_file > separate_debug_file_exists > get_file_crc <--------------------- leaf frame caller added > bfd_calc_gnu_debuglink_crc32 > > There is a question about whether the overhead of recording all the registers is acceptable, for > filesize and time. We could make it a manual step, at the cost of not showing better frame pointer > stacks by default. Can someone quantify this, i.e. how much space per perf.data for a typical scenario? But anyway, I'm applying it as is now, we can change it if needed, its not like files with the extra registers won't be valid if/when we decide not to collect it by default in the future. If we decide to make this selectable, we should have it as a .perfconfig knob as well, so that one can set it and change the default, etc. - Arnaldo > Tested-by: James Clark > > On 04/03/2021 18:32, Alexandre Truong wrote: > > On arm64 and frame pointer mode (e.g: perf record --callgraph fp), > > use dwarf unwind info to check if the link register is the return > > address in order to inject it to the frame pointer stack. > > > > Write the following application: > > > > int a = 10; > > > > void f2(void) > > { > > for (int i = 0; i < 1000000; i++) > > a *= a; > > } > > > > void f1() > > { > > for (int i = 0; i < 10; i++) > > f2(); > > } > > > > int main (void) > > { > > f1(); > > return 0; > > } > > > > with the following compilation flags: > > gcc -fno-omit-frame-pointer -fno-inline -O2 > > > > The compiler omits the frame pointer for f2 on arm. This is a problem > > with any leaf call, for example an application with many different > > calls to malloc() would always omit the calling frame, even if it > > can be determined. > > > > ./perf record --call-graph fp ./a.out > > ./perf report > > > > currently gives the following stack: > > > > 0xffffea52f361 > > _start > > __libc_start_main > > main > > f2 > > > > After this change, perf report correctly shows f1() calling f2(), > > even though it was missing from the frame pointer unwind: > > > > ./perf report > > > > 0xffffea52f361 > > _start > > __libc_start_main > > main > > f1 > > f2 > > > > Signed-off-by: Alexandre Truong > > Cc: John Garry > > Cc: Will Deacon > > Cc: Mathieu Poirier > > Cc: Leo Yan > > Cc: Peter Zijlstra > > Cc: Ingo Molnar > > Cc: Arnaldo Carvalho de Melo > > Cc: Mark Rutland > > Cc: Alexander Shishkin > > Cc: Jiri Olsa > > Cc: Namhyung Kim > > Cc: Kemeng Shi > > Cc: Ian Rogers > > Cc: Andi Kleen > > Cc: Kan Liang > > Cc: Jin Yao > > Cc: Adrian Hunter > > Cc: Suzuki K Poulose > > Cc: Al Grant > > Cc: James Clark > > Cc: Wilco Dijkstra > > --- > > tools/perf/util/Build | 1 + > > .../util/arm-frame-pointer-unwind-support.c | 44 +++++++++++++++++++ > > .../util/arm-frame-pointer-unwind-support.h | 7 +++ > > tools/perf/util/machine.c | 9 ++-- > > 4 files changed, 58 insertions(+), 3 deletions(-) > > create mode 100644 tools/perf/util/arm-frame-pointer-unwind-support.c > > create mode 100644 tools/perf/util/arm-frame-pointer-unwind-support.h > > > > diff --git a/tools/perf/util/Build b/tools/perf/util/Build > > index 188521f34347..3b82cb992bce 100644 > > --- a/tools/perf/util/Build > > +++ b/tools/perf/util/Build > > @@ -1,3 +1,4 @@ > > +perf-y += arm-frame-pointer-unwind-support.o > > perf-y += annotate.o > > perf-y += block-info.o > > perf-y += block-range.o > > diff --git a/tools/perf/util/arm-frame-pointer-unwind-support.c b/tools/perf/util/arm-frame-pointer-unwind-support.c > > new file mode 100644 > > index 000000000000..964efd08e72e > > --- /dev/null > > +++ b/tools/perf/util/arm-frame-pointer-unwind-support.c > > @@ -0,0 +1,44 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +#include "../arch/arm64/include/uapi/asm/perf_regs.h" > > +#include "arch/arm64/include/perf_regs.h" > > +#include "event.h" > > +#include "arm-frame-pointer-unwind-support.h" > > +#include "callchain.h" > > +#include "unwind.h" > > + > > +struct entries { > > + u64 stack[2]; > > + size_t length; > > +}; > > + > > +static bool get_leaf_frame_caller_enabled(struct perf_sample *sample) > > +{ > > + return callchain_param.record_mode == CALLCHAIN_FP && sample->user_regs.regs > > + && sample->user_regs.mask == PERF_REGS_MASK; > > +} > > + > > +static int add_entry(struct unwind_entry *entry, void *arg) > > +{ > > + struct entries *entries = arg; > > + > > + entries->stack[entries->length++] = entry->ip; > > + return 0; > > +} > > + > > +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread) > > +{ > > + int ret; > > + > > + struct entries entries = {{0, 0}, 0}; > > + > > + if (!get_leaf_frame_caller_enabled(sample)) > > + return 0; > > + > > + ret = unwind__get_entries(add_entry, &entries, thread, sample, 2); > > + > > + if (ret || entries.length != 2) > > + return ret; > > + > > + return callchain_param.order == ORDER_CALLER ? > > + entries.stack[0] : entries.stack[1]; > > +} > > diff --git a/tools/perf/util/arm-frame-pointer-unwind-support.h b/tools/perf/util/arm-frame-pointer-unwind-support.h > > new file mode 100644 > > index 000000000000..16dc03fa9abe > > --- /dev/null > > +++ b/tools/perf/util/arm-frame-pointer-unwind-support.h > > @@ -0,0 +1,7 @@ > > +/* SPDX-License-Identifier: GPL-2.0 */ > > +#ifndef __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H > > +#define __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H > > + > > +u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread); > > + > > +#endif /* __PERF_ARM_FRAME_POINTER_UNWIND_SUPPORT_H */ > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c > > index 7f03ffa016b0..dfb72dbc0e2d 100644 > > --- a/tools/perf/util/machine.c > > +++ b/tools/perf/util/machine.c > > @@ -34,6 +34,7 @@ > > #include "bpf-event.h" > > #include // page_size > > #include "cgroup.h" > > +#include "arm-frame-pointer-unwind-support.h" > > > > #include > > #include > > @@ -2671,10 +2672,12 @@ static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread, > > return err; > > } > > > > -static u64 get_leaf_frame_caller(struct perf_sample *sample __maybe_unused, > > - struct thread *thread __maybe_unused) > > +static u64 get_leaf_frame_caller(struct perf_sample *sample, struct thread *thread) > > { > > - return 0; > > + if (strncmp(thread->maps->machine->env->arch, "aarch64", 7) == 0) > > + return get_leaf_frame_caller_aarch64(sample, thread); > > + else > > + return 0; > > } > > > > static int thread__resolve_callchain_sample(struct thread *thread, > > -- - Arnaldo