Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751751AbbD3Bu2 (ORCPT ); Wed, 29 Apr 2015 21:50:28 -0400 Received: from szxga02-in.huawei.com ([119.145.14.65]:20643 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750969AbbD3BuZ (ORCPT ); Wed, 29 Apr 2015 21:50:25 -0400 Message-ID: <55418A4D.5010900@huawei.com> Date: Thu, 30 Apr 2015 09:50:05 +0800 From: Hou Pengyang User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:17.0) Gecko/20130509 Thunderbird/17.0.6 MIME-Version: 1.0 To: Will Deacon CC: "a.p.zijlstra@chello.nl" , "paulus@samba.org" , "mingo@redhat.com" , "acme@kernel.org" , "wangnan0@huawei.com" , "Catalin Marinas" , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , Subject: Re: [PATCH] arm64: perf: Fix callchain parse error with kernel tracepoint events References: <1430227248-19657-1-git-send-email-houpengyang@huawei.com> <20150429101234.GJ8236@arm.com> In-Reply-To: <20150429101234.GJ8236@arm.com> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.111.95.59] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3954 Lines: 94 On 2015/4/29 18:12, Will Deacon wrote: > Hello, > > On Tue, Apr 28, 2015 at 02:20:48PM +0100, Hou Pengyang wrote: >> For ARM64, when tracing with tracepoint events, the IP and cpsr are set >> to 0, preventing the perf code parsing the callchain and resolving the >> symbols correctly. >> >> ./perf record -e sched:sched_switch -g --call-graph dwarf ls >> [ perf record: Captured and wrote 0.146 MB perf.data ] >> ./perf report -f >> Samples: 194 of event 'sched:sched_switch', Event count (approx.): 194 >> Children Self Command Shared Object Symbol >> 100.00% 100.00% ls [unknown] [.] 0000000000000000 >> >> The fix is to implement perf_arch_fetch_caller_regs for ARM64, which fills >> several necessary registers used for callchain unwinding, including pc,sp, >> fp and psr . >> >> With this patch, callchain can be parsed correctly as follows: >> >> ...... >> + 2.63% 0.00% ls [kernel.kallsyms] [k] vfs_symlink >> + 2.63% 0.00% ls [kernel.kallsyms] [k] follow_down >> + 2.63% 0.00% ls [kernel.kallsyms] [k] pfkey_get >> + 2.63% 0.00% ls [kernel.kallsyms] [k] do_execveat_common.isra.33 >> - 2.63% 0.00% ls [kernel.kallsyms] [k] pfkey_send_policy_notify >> pfkey_send_policy_notify >> pfkey_get >> v9fs_vfs_rename >> page_follow_link_light >> link_path_walk >> el0_svc_naked >> ....... >> >> For tracepoint event, stack parsing also doesn't work well for ARM. Jean Pihet >> comed up a patch: >> http://thread.gmane.org/gmane.linux.kernel/1734283/focus=1734280 > > Any chance you could revive that series too, please? I'd like to update both > arm and arm64 together, since we're currently working at merging the two > perf backends and introducing discrepencies is going to delay that even > longer. > >> Signed-off-by: Hou Pengyang >> --- >> arch/arm64/include/asm/perf_event.h | 16 ++++++++++++++++ >> 1 file changed, 16 insertions(+) >> >> diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h >> index d26d1d5..16a074f 100644 >> --- a/arch/arm64/include/asm/perf_event.h >> +++ b/arch/arm64/include/asm/perf_event.h >> @@ -24,4 +24,20 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs); >> #define perf_misc_flags(regs) perf_misc_flags(regs) >> #endif >> >> +#define perf_arch_fetch_caller_regs(regs, __ip) { \ >> + unsigned long sp; \ >> + __asm__ ("mov %[sp], sp\n" : [sp] "=r" (sp)); \ >> + (regs)->pc = (__ip); \ >> + __asm__ ( \ >> + "str %[sp], %[_arm64_sp] \n\t" \ >> + "str x29, %[_arm64_fp] \n\t" \ >> + "mrs %[_arm64_cpsr], spsr_el1 \n\t" \ >> + : [_arm64_sp] "=m" (regs->sp), \ >> + [_arm64_fp] "=m" (regs->regs[29]), \ >> + [_arm64_cpsr] "=r" (regs->pstate) \ > > Does this really all need to be in assembly code? Ideally we'd use something > like __builtin_stack_pointer and __builtin_frame_pointer. That just leaves > the CPSR, but given that it's (a) only used for user_mode_regs tests and (b) > this macro is only used by ftrace, then we just set it to a static value > indicating that we're at EL1. > > So I *think* we should be able to write this as three lines of C. > Hi, will, as you said, we can get fp by __builtin_frame_address() and pstate by setting it to a static value. However, for sp, there isn't a gcc builtin fuction like __builtin_stack_pointer, so assembly code is needed. What's more, if CONFIG_FRAME_POINTER is close, can fp be got by __builtin_frame_address()? > Will > > . > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/