Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755022Ab2HGNYx (ORCPT ); Tue, 7 Aug 2012 09:24:53 -0400 Received: from mx1.redhat.com ([209.132.183.28]:12922 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754477Ab2HGNVX (ORCPT ); Tue, 7 Aug 2012 09:21:23 -0400 From: Jiri Olsa To: linux-kernel@vger.kernel.org Cc: Arnaldo Carvalho de Melo , Arun Sharma , Benjamin Redelings , Corey Ashford , Cyrill Gorcunov , "Frank Ch. Eigler" , Frederic Weisbecker , Ingo Molnar , Masami Hiramatsu , Paul Mackerras , Peter Zijlstra , Robert Richter , Stephane Eranian , Tom Zanussi , Ulrich Drepper Subject: [PATCHv10 00/12] perf: Add backtrace post dwarf unwind Date: Tue, 7 Aug 2012 15:20:35 +0200 Message-Id: <1344345647-11536-1-git-send-email-jolsa@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7680 Lines: 186 hi, patches available also as tarball in here: http://people.redhat.com/~jolsa/perf_post_unwind_v10.tar.bz2 v10 changes: - omit copy_from_user_nmi_nochk function - record -g option fix for unlimited arg len v9 changes: - rebased to current tip tree v8 changes: - patch 2 - added dump registers ABI specification as suggested by Stephane - v7 patches 9,10,16,17 already in v7 changes: - omitted v6 patches 9 and 15 They need more work and will be sent separately. I dont want to hold off whole patchset because of them. We could miss some related backtraces (syscall, vdso) in this version. - v6 patch 11, 14, 20 already in v6 changes: patch 01/23 - unrelated - ftrace stuff patch 03/23 - added PERF_SAMPLE_REGS_USER bit - added regs_user initialization patch 07/23 - added PERF_SAMPLE_STACK_USER bit - sample_stack_user changed to u32 and added size check new patches 1,9,10,20 v5 changes: patch 1/19 - having just one enum set of the perf registers patch 2/19 - using for_each_set_bit for scanning the mask - single regs enum for both 32 and 64 bits versions - using regs mask != 0 trigger to trigger the regs dump patch 5/19 - adding perf_output_skip so we can skip undumped part of the stack in RB patch 6/19 - using stack size != 0 trigger to trigger the stack dump - do not zero the memory for non retrieved part of the stack dump patch 7/19 - adding exclude_callchain_kernel attribute patch 8/19 - this could be taken without the rest of the series v4 changes: - no real change from v3, just rebase - v3 patch 06/17 got already merged v3 changes: patch 01/17 - added HAVE_PERF_REGS config option patch 02/17, 04/17 - regs and stack perf interface is more general now patch 06/17 - unrelated online fix for i386 compilation patch 16/17 - few namespace fixies --- Adding the post unwinding user stack backtrace using dwarf unwind via libunwind. The original work was done by Frederic. I mostly took his patches and make them compile in current kernel code plus I added some stuff here and there. The main idea is to store user registers and portion of user stack when the sample data during the record phase. Then during the report, when the data is presented, perform the actual dwarf dwarf unwind. attached patches: 01/12 perf: Unified API to record selective sets of arch registers 02/12 perf: Add ability to attach user level registers dump to sample 03/12 perf: Factor __output_copy to be usable with specific copy function 04/12 perf: Add perf_output_skip function to skip bytes in sample 05/12 perf: Add ability to attach user stack dump to sample 06/12 perf: Add attribute to filter out callchains 07/12 perf tools: Adding PERF_ATTR_SIZE_VER2 to the header swap check 08/12 perf tools: Add interface to arch registers sets 09/12 perf tools: Add libunwind dependency for DWARF CFI unwinding 10/12 perf tools: Support user regs and stack in sample parsing 11/12 perf tools: Support for DWARF CFI unwinding on post processing 12/12 perf tools: Support for DWARF mode callchain I tested on Fedora. There was not much gain on i386, because the binaries are compiled with frame pointers. Thought the dwarf backtrace is more accurate and unwraps calls in more details (functions that do not set the frame pointers). I could see some improvement on x86_64, where I got full backtrace where current code could got just the first address out of the instruction pointer. Example on x86_64: [dwarf] perf record -g dwarf -e syscalls:sys_enter_write date 100.00% date libc-2.14.90.so [.] __GI___libc_write | --- __GI___libc_write _IO_file_write@@GLIBC_2.2.5 new_do_write _IO_do_write@@GLIBC_2.2.5 _IO_file_overflow@@GLIBC_2.2.5 0x4022cd 0x401ee6 __libc_start_main 0x4020b9 [frame pointer] perf record -g fp -e syscalls:sys_enter_write date 100.00% date libc-2.14.90.so [.] __GI___libc_write | --- __GI___libc_write Also I tested on coreutils binaries mainly, but I could see getting wider backtraces with dwarf unwind for more complex application like firefox. Attached patches should work on both x86 and x86_64. The unwind backtrace can be interrupted by following reasons: - bug in unwind information of processed shared library - bug in unwind processing code (most likely ;) ) - insufficient dump stack size - until full syscall register storage and vdso support we could miss some related backtraces jirka Cc: Arnaldo Carvalho de Melo Cc: Arun Sharma Cc: Benjamin Redelings Cc: Corey Ashford Cc: Cyrill Gorcunov Cc: Frank Ch. Eigler Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Masami Hiramatsu Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Robert Richter Cc: Stephane Eranian Cc: Tom Zanussi Cc: Ulrich Drepper --- arch/Kconfig | 13 ++ arch/x86/Kconfig | 2 + arch/x86/include/asm/perf_event.h | 2 + arch/x86/include/asm/perf_regs.h | 33 ++++++ arch/x86/kernel/Makefile | 2 + arch/x86/kernel/perf_regs.c | 105 ++++++++++++++++ include/linux/perf_event.h | 60 +++++++++- include/linux/perf_regs.h | 25 ++++ kernel/events/callchain.c | 38 +++--- kernel/events/core.c | 214 +++++++++++++++++++++++++++++++++ kernel/events/internal.h | 82 +++++++++---- kernel/events/ring_buffer.c | 10 +- tools/perf/Makefile | 45 ++++++- tools/perf/arch/x86/Makefile | 3 + tools/perf/arch/x86/include/perf_regs.h | 80 +++++++++++++ tools/perf/arch/x86/util/unwind.c | 111 +++++++++++++++++ tools/perf/builtin-record.c | 114 +++++++++++++++++- tools/perf/builtin-report.c | 18 +-- tools/perf/builtin-script.c | 6 +- tools/perf/builtin-top.c | 6 +- tools/perf/config/feature-tests.mak | 25 ++++ tools/perf/perf.h | 9 +- tools/perf/util/event.h | 12 ++ tools/perf/util/evsel.c | 41 ++++++- tools/perf/util/header.c | 3 + tools/perf/util/include/linux/compiler.h | 1 + tools/perf/util/map.h | 5 +- tools/perf/util/perf_regs.h | 14 +++ tools/perf/util/session.c | 107 ++++++++++++++--- tools/perf/util/session.h | 6 +- tools/perf/util/trace-event.h | 2 + tools/perf/util/unwind.c | 567 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ tools/perf/util/unwind.h | 34 ++++++ 33 files changed, 1713 insertions(+), 82 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/