2015-05-08 05:52:41

by Hou Pengyang

[permalink] [raw]
Subject: [PATCH v4 0/2] arm & arm64: perf: Fix callchain parse error with kernel tracepoint events

For arm & arm64, when tracing with tracepoint events, the IP and cpsr
are set to 0, preventing the perf code parsing the callchain and
resolving the symbols correctly.

These two patches fix this by implementing perf_arch_fetch_caller_regs
for arm and arm64, which fills several necessary register info for
callchain unwinding and symbol resolving.

v3->v4:
- fix compile errors

v2->v3:
- split the original patch into two, one for arm and the other arm64;
- change '|=' to '=' when setting cpsr.

Hou Pengyang (2):
arm: perf: Fix callchain parse error with kernel tracepoint events
arm64: perf: Fix callchain parse error with kernel tracepoint events

arch/arm/include/asm/perf_event.h | 7 +++++++
arch/arm64/include/asm/perf_event.h | 7 +++++++
2 files changed, 14 insertions(+)

--
1.8.3.4


2015-05-08 05:52:59

by Hou Pengyang

[permalink] [raw]
Subject: [PATCH v4 1/2] arm: perf: Fix callchain parse error with kernel tracepoint events

For ARM, when tracing with tracepoint events, the IP and cpsr are set
to 0, preventing the perf code parsing the callchain and resolving the
symbols correctly.

./perf record -e sched:sched_switch -g --call-graph dwarf ls
[ perf record: Captured and wrote 0.006 MB perf.data ]
./perf report -f
Samples: 5 of event 'sched:sched_switch', Event count (approx.): 5
Children Self Command Shared Object Symbol
100.00% 100.00% ls [unknown] [.] 00000000

The fix is to implement perf_arch_fetch_caller_regs for ARM, which fills
several necessary registers used for callchain unwinding, including pc,sp,
fp and cpsr.

With this patch, callchain can be parsed correctly as :

.....
- 100.00% 100.00% ls [kernel.kallsyms] [k] __sched_text_start
+ __sched_text_start
+ 20.00% 0.00% ls libc-2.18.so [.] _dl_addr
+ 20.00% 0.00% ls libc-2.18.so [.] write
.....

Jean Pihet found this in ARM and come up with a patch:
http://thread.gmane.org/gmane.linux.kernel/1734283/focus=1734280

This patch rewrite Jean's patch in C.

Signed-off-by: Hou Pengyang <[email protected]>
---
arch/arm/include/asm/perf_event.h | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/arm/include/asm/perf_event.h b/arch/arm/include/asm/perf_event.h
index d9cf138..4f9dec4 100644
--- a/arch/arm/include/asm/perf_event.h
+++ b/arch/arm/include/asm/perf_event.h
@@ -19,4 +19,11 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
#define perf_misc_flags(regs) perf_misc_flags(regs)
#endif

+#define perf_arch_fetch_caller_regs(regs, __ip) { \
+ (regs)->ARM_pc = (__ip); \
+ (regs)->ARM_fp = (unsigned long) __builtin_frame_address(0); \
+ (regs)->ARM_sp = current_stack_pointer; \
+ (regs)->ARM_cpsr = SVC_MODE; \
+}
+
#endif /* __ARM_PERF_EVENT_H__ */
--
1.8.3.4

2015-05-08 05:52:32

by Hou Pengyang

[permalink] [raw]
Subject: [PATCH v4 2/2] arm64: perf: Fix callchain parse error with kernel tracepoint events

For ARM64, when tracing with tracepoint events, the IP and pstate are set
to 0, preventing the perf code parsing the callchain and resolving the
symbols correctly.

./perf record -e sched:sched_switch -g --call-graph dwarf ls
[ perf record: Captured and wrote 0.146 MB perf.data ]
./perf report -f
Samples: 194 of event 'sched:sched_switch', Event count (approx.): 194
Children Self Command Shared Object Symbol
100.00% 100.00% ls [unknown] [.] 0000000000000000

The fix is to implement perf_arch_fetch_caller_regs for ARM64, which fills
several necessary registers used for callchain unwinding, including pc,sp,
fp and spsr .

With this patch, callchain can be parsed correctly as follows:

......
+ 2.63% 0.00% ls [kernel.kallsyms] [k] vfs_symlink
+ 2.63% 0.00% ls [kernel.kallsyms] [k] follow_down
+ 2.63% 0.00% ls [kernel.kallsyms] [k] pfkey_get
+ 2.63% 0.00% ls [kernel.kallsyms] [k] do_execveat_common.isra.33
- 2.63% 0.00% ls [kernel.kallsyms] [k] pfkey_send_policy_notify
pfkey_send_policy_notify
pfkey_get
v9fs_vfs_rename
page_follow_link_light
link_path_walk
el0_svc_naked
.......

Signed-off-by: Hou Pengyang <[email protected]>
---
arch/arm64/include/asm/perf_event.h | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
index d26d1d5..6471773 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -24,4 +24,11 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
#define perf_misc_flags(regs) perf_misc_flags(regs)
#endif

+#define perf_arch_fetch_caller_regs(regs, __ip) { \
+ (regs)->pc = (__ip); \
+ (regs)->regs[AARCH64_INSN_REG_FP] = (unsigned long) __builtin_frame_address(0); \
+ (regs)->sp = current_stack_pointer; \
+ (regs)->pstate = PSR_MODE_EL1h; \
+}
+
#endif
--
1.8.3.4

2015-05-08 15:38:06

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] arm64: perf: Fix callchain parse error with kernel tracepoint events

On Fri, May 08, 2015 at 06:43:04AM +0100, Hou Pengyang wrote:
> For ARM64, when tracing with tracepoint events, the IP and pstate are set
> to 0, preventing the perf code parsing the callchain and resolving the
> symbols correctly.
>
> ./perf record -e sched:sched_switch -g --call-graph dwarf ls
> [ perf record: Captured and wrote 0.146 MB perf.data ]
> ./perf report -f
> Samples: 194 of event 'sched:sched_switch', Event count (approx.): 194
> Children Self Command Shared Object Symbol
> 100.00% 100.00% ls [unknown] [.] 0000000000000000
>
> The fix is to implement perf_arch_fetch_caller_regs for ARM64, which fills
> several necessary registers used for callchain unwinding, including pc,sp,
> fp and spsr .
>
> With this patch, callchain can be parsed correctly as follows:
>
> ......
> + 2.63% 0.00% ls [kernel.kallsyms] [k] vfs_symlink
> + 2.63% 0.00% ls [kernel.kallsyms] [k] follow_down
> + 2.63% 0.00% ls [kernel.kallsyms] [k] pfkey_get
> + 2.63% 0.00% ls [kernel.kallsyms] [k] do_execveat_common.isra.33
> - 2.63% 0.00% ls [kernel.kallsyms] [k] pfkey_send_policy_notify
> pfkey_send_policy_notify
> pfkey_get
> v9fs_vfs_rename
> page_follow_link_light
> link_path_walk
> el0_svc_naked
> .......
>
> Signed-off-by: Hou Pengyang <[email protected]>
> ---
> arch/arm64/include/asm/perf_event.h | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
> index d26d1d5..6471773 100644
> --- a/arch/arm64/include/asm/perf_event.h
> +++ b/arch/arm64/include/asm/perf_event.h
> @@ -24,4 +24,11 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
> #define perf_misc_flags(regs) perf_misc_flags(regs)
> #endif
>
> +#define perf_arch_fetch_caller_regs(regs, __ip) { \
> + (regs)->pc = (__ip); \
> + (regs)->regs[AARCH64_INSN_REG_FP] = (unsigned long) __builtin_frame_address(0); \

Just a minor thing, but I'd rather we explicitly used '29' as the index
here. The AARCH64_INSN_REG_FP is really for the instruction generation
code used by BPF and I think it's better to be explicit about the register
number here.

Anyway, I've queued your arch/arm/ patch and Catalin can take this one
for 4.2 once you've made the small change above and added my Ack.

Thanks,

Will

2015-05-10 11:09:10

by Hou Pengyang

[permalink] [raw]
Subject: [PATCH v5] arm64: perf: Fix callchain parse error with kernel tracepoint events

For ARM64, when tracing with tracepoint events, the IP and pstate are set
to 0, preventing the perf code parsing the callchain and resolving the
symbols correctly.

./perf record -e sched:sched_switch -g --call-graph dwarf ls
[ perf record: Captured and wrote 0.146 MB perf.data ]
./perf report -f
Samples: 194 of event 'sched:sched_switch', Event count (approx.): 194
Children Self Command Shared Object Symbol
100.00% 100.00% ls [unknown] [.] 0000000000000000

The fix is to implement perf_arch_fetch_caller_regs for ARM64, which fills
several necessary registers used for callchain unwinding, including pc,sp,
fp and spsr .

With this patch, callchain can be parsed correctly as follows:

......
+ 2.63% 0.00% ls [kernel.kallsyms] [k] vfs_symlink
+ 2.63% 0.00% ls [kernel.kallsyms] [k] follow_down
+ 2.63% 0.00% ls [kernel.kallsyms] [k] pfkey_get
+ 2.63% 0.00% ls [kernel.kallsyms] [k] do_execveat_common.isra.33
- 2.63% 0.00% ls [kernel.kallsyms] [k] pfkey_send_policy_notify
pfkey_send_policy_notify
pfkey_get
v9fs_vfs_rename
page_follow_link_light
link_path_walk
el0_svc_naked
.......

Signed-off-by: Hou Pengyang <[email protected]>
Acked-by: Will Deacon <[email protected]>
---
arch/arm64/include/asm/perf_event.h | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
index d26d1d5..6471773 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -24,4 +24,11 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
#define perf_misc_flags(regs) perf_misc_flags(regs)
#endif

+#define perf_arch_fetch_caller_regs(regs, __ip) { \
+ (regs)->pc = (__ip); \
+ (regs)->regs[29] = (unsigned long) __builtin_frame_address(0); \
+ (regs)->sp = current_stack_pointer; \
+ (regs)->pstate = PSR_MODE_EL1h; \
+}
+
#endif
--
1.8.5.2

2015-05-19 16:52:11

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v5] arm64: perf: Fix callchain parse error with kernel tracepoint events

On Sun, May 10, 2015 at 11:07:40AM +0000, Hou Pengyang wrote:
> For ARM64, when tracing with tracepoint events, the IP and pstate are set
> to 0, preventing the perf code parsing the callchain and resolving the
> symbols correctly.
>
> ./perf record -e sched:sched_switch -g --call-graph dwarf ls
> [ perf record: Captured and wrote 0.146 MB perf.data ]
> ./perf report -f
> Samples: 194 of event 'sched:sched_switch', Event count (approx.): 194
> Children Self Command Shared Object Symbol
> 100.00% 100.00% ls [unknown] [.] 0000000000000000
>
> The fix is to implement perf_arch_fetch_caller_regs for ARM64, which fills
> several necessary registers used for callchain unwinding, including pc,sp,
> fp and spsr .
>
> With this patch, callchain can be parsed correctly as follows:
>
> ......
> + 2.63% 0.00% ls [kernel.kallsyms] [k] vfs_symlink
> + 2.63% 0.00% ls [kernel.kallsyms] [k] follow_down
> + 2.63% 0.00% ls [kernel.kallsyms] [k] pfkey_get
> + 2.63% 0.00% ls [kernel.kallsyms] [k] do_execveat_common.isra.33
> - 2.63% 0.00% ls [kernel.kallsyms] [k] pfkey_send_policy_notify
> pfkey_send_policy_notify
> pfkey_get
> v9fs_vfs_rename
> page_follow_link_light
> link_path_walk
> el0_svc_naked
> .......
>
> Signed-off-by: Hou Pengyang <[email protected]>
> Acked-by: Will Deacon <[email protected]>

Queued for 4.2. Thanks.

--
Catalin

2015-05-20 06:53:11

by Jean Pihet

[permalink] [raw]
Subject: Re: [PATCH v5] arm64: perf: Fix callchain parse error with kernel tracepoint events

Hi Catalin, Will,

On Tue, May 19, 2015 at 6:52 PM, Catalin Marinas
<[email protected]> wrote:
> On Sun, May 10, 2015 at 11:07:40AM +0000, Hou Pengyang wrote:
>> For ARM64, when tracing with tracepoint events, the IP and pstate are set
>> to 0, preventing the perf code parsing the callchain and resolving the
>> symbols correctly.
>>
>> ./perf record -e sched:sched_switch -g --call-graph dwarf ls
>> [ perf record: Captured and wrote 0.146 MB perf.data ]
>> ./perf report -f
>> Samples: 194 of event 'sched:sched_switch', Event count (approx.): 194
>> Children Self Command Shared Object Symbol
>> 100.00% 100.00% ls [unknown] [.] 0000000000000000
>>
>> The fix is to implement perf_arch_fetch_caller_regs for ARM64, which fills
>> several necessary registers used for callchain unwinding, including pc,sp,
>> fp and spsr .
>>
>> With this patch, callchain can be parsed correctly as follows:
>>
>> ......
>> + 2.63% 0.00% ls [kernel.kallsyms] [k] vfs_symlink
>> + 2.63% 0.00% ls [kernel.kallsyms] [k] follow_down
>> + 2.63% 0.00% ls [kernel.kallsyms] [k] pfkey_get
>> + 2.63% 0.00% ls [kernel.kallsyms] [k] do_execveat_common.isra.33
>> - 2.63% 0.00% ls [kernel.kallsyms] [k] pfkey_send_policy_notify
>> pfkey_send_policy_notify
>> pfkey_get
>> v9fs_vfs_rename
>> page_follow_link_light
>> link_path_walk
>> el0_svc_naked
>> .......
>>
>> Signed-off-by: Hou Pengyang <[email protected]>
>> Acked-by: Will Deacon <[email protected]>
>
> Queued for 4.2. Thanks.

Nice to see this one going out, finally.

Cheers,
Jean

>
> --
> Catalin
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel