Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752751AbbHJM4M (ORCPT ); Mon, 10 Aug 2015 08:56:12 -0400 Received: from mga11.intel.com ([192.55.52.93]:4008 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751691AbbHJM4J (ORCPT ); Mon, 10 Aug 2015 08:56:09 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.15,645,1432623600"; d="scan'208";a="781179792" From: "Liang, Kan" To: Arnaldo Carvalho de Melo CC: Jiri Olsa , Namhyung Kim , "Andi Kleen" , "linux-kernel@vger.kernel.org" Subject: RE: [PATCH RFC V9 2/3] perf,tools: per-event callgraph support Thread-Topic: [PATCH RFC V9 2/3] perf,tools: per-event callgraph support Thread-Index: AQHQ0L00G1F8ocgTRE2LrVXdCAzqOZ4Ar+cvgASDs2A= Date: Mon, 10 Aug 2015 12:56:04 +0000 Message-ID: <37D7C6CF3E00A74B8858931C1DB2F077018DE420@SHSMSX103.ccr.corp.intel.com> References: <1438890294-33409-1-git-send-email-kan.liang@intel.com> <1438890294-33409-2-git-send-email-kan.liang@intel.com> <20150807153843.GD3325@kernel.org> <20150807154938.GE3325@kernel.org> In-Reply-To: <20150807154938.GE3325@kernel.org> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id t7ACuGcT003453 Content-Length: 6150 Lines: 149 > Em Fri, Aug 07, 2015 at 12:38:43PM -0300, Arnaldo Carvalho de Melo > escreveu: > > Em Thu, Aug 06, 2015 at 03:44:53PM -0400, kan.liang@intel.com escreveu: > > > From: Kan Liang Here are some examples and > > > test results. > > > > > > 1. Comparing the elapsed time and perf.data size from "kernbench -M - > H". > > > > > > The test command for FULL callgraph and time support. > > > "perf record -e > > > '{cpu/cpu-cycles,period=100000/,cpu/instructions,period=20000/p}' > > > --call-graph fp --time" > > Some other oddity I noticed while testing: > > For: > > [root@zoo ~]# perf report --header-only > # ======== > # captured on: Fri Aug 7 12:39:13 2015 > # hostname : zoo > # os release : 4.2.0-rc5+ > # perf version : 4.2.rc4.g1cd951 > # arch : x86_64 > # nrcpus online : 4 > # nrcpus avail : 4 > # cpudesc : Intel(R) Core(TM) i7-3667U CPU @ 2.00GHz # cpuid : > GenuineIntel,6,58,9 # total memory : 8080692 kB # cmdline : > /home/acme/bin/perf record -e {cpu/cpu-cycles,call- > graph=fp,time,period=10000/pp,cpu/instructions,call- > graph=no,time=0,period=20000/p} -a # event : name = cpu/cpu-cycles,call- > graph=fp,time,period=10000/pp, , id = { 38537, 38538, 38539, 38540 }, type > = 4, size = 112, config = 0x3c, { sample_perio # event : name = > cpu/instructions,call-graph=no,time=0,period=20000/p, , id = { 38541, > 38542, 38543, 38544 }, type = 4, size = 112, config = 0xc0, { sample_pe # > HEADER_CPU_TOPOLOGY info available, use -I to display # > HEADER_NUMA_TOPOLOGY info available, use -I to display # pmu > mappings: cpu = 4, software = 1, power = 7, uncore_imc = 8, tracepoint = 2, > uncore_cbox_0 = 9, uncore_cbox_1 = 10, breakpoint = 5 # group: > {cpu/cpu-cycles,call-graph=fp,time,period=10000/pp,cpu/instructions,call- > graph=no,time=0,period=20000/p} > # ======== > # > > I get: > > Samples: 2K of event 'cpu/instructions,call- > graph=no,time=0,period=20000/p', Event count (approx.): 46956518 > Children Self Command Shared Object Symbol ◆ > - 67.56% 0.00% qemu-system-x86 [unknown] [.] > 0xad5e258d4c544155 ▒ > 0xad5e258d4c544155 ▒ > - 67.56% 0.00% qemu-system-x86 libc-2.20.so [.] __libc_start_main ▒ > __libc_start_main ▒ > 0xad5e258d4c544155 ▒ > - 67.56% 0.00% qemu-system-x86 perf [.] main ▒ > main ▒ > __libc_start_main ▒ > 0xad5e258d4c544155 ▒ > - 67.56% 0.00% qemu-system-x86 perf [.] run_builtin ▒ > run_builtin ▒ > main ▒ > __libc_start_main ▒ > > This is in the 'perf report' TUI, why, for an event with 'callgraph=no', we get > callchains? How come? > That's the design. For sampling multiple events, it may not be needed to collect callgraphs for all of them. Because the sample sites are usually nearby. It's enough to collect the callgraphs on a reference event. For other events, it can still show callgraphs according to the callgraphs on a reference event. Thanks, Kan > If I try it with the --stdio, well, its there as well: > > # Samples: 2K of event 'cpu/instructions,call- > graph=no,time=0,period=20000/p' > # Event count (approx.): 46956518 > # > # Children Self Command Shared Object Symbol > # ........ ........ ............... .......................... .............................................. > # > 67.56% 0.00% qemu-system-x86 [unknown] [.] > 0xad5e258d4c544155 > | > ---0xad5e258d4c544155 > > 67.56% 0.00% qemu-system-x86 libc-2.20.so [.] > __libc_start_main > | > ---__libc_start_main > 0xad5e258d4c544155 > > 67.56% 0.00% qemu-system-x86 perf [.] main > | > ---main > __libc_start_main > 0xad5e258d4c544155 > > 67.56% 0.00% qemu-system-x86 perf [.] run_builtin > | > ---run_builtin > main > __libc_start_main > 0xad5e258d4c544155 > > Can you take a look? What I have, i.e. the patches from Jiri and Kan moving > stuff out of the way of the python binding and this patch is at the > tmp.perf/core branch in my git tree: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git > > - Arnaldo > > > > Jiri, while testing this I noticed that the message for EINVAL when > > using the cpu// syntax (per-event settings) is cryptic: > > > > [root@zoo ~]# perf record -e 'cpu/cpu-cycles,call- > graph=fp,time,period=100000/p' ls > > Error: > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) > for event (cpu/cpu-cycles,call-graph=fp,time,period=100000/p). > > /bin/dmesg may provide additional information. > > No CONFIG_PERF_EVENTS=y kernel support configured? > > > > Whereas if we use -F, it is much, much clearer, telling the user > > exactly what is failing and what needs to be done to make it work: > > > > [root@zoo ~]# perf record -F 100000 -e cpu/cpu-cycles/ usleep 1 > > Maximum frequency rate (25000) reached. > > Please use -F freq option with lower value or consider > > tweaking /proc/sys/kernel/perf_event_max_sample_rate. > > [root@zoo ~]# > > > > Hope this is something easy to wire up, given your event parsing kung > > foo skillz... > > > > ;-) > > > > - Arnaldo ????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?