Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753580AbbEEVxo (ORCPT ); Tue, 5 May 2015 17:53:44 -0400 Received: from mail-wg0-f44.google.com ([74.125.82.44]:33778 "EHLO mail-wg0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751174AbbEEVxm (ORCPT ); Tue, 5 May 2015 17:53:42 -0400 Date: Tue, 5 May 2015 23:53:28 +0200 From: Rabin Vincent To: David Ahern , Yunlong Song Cc: a.p.zijlstra@chello.nl, paulus@samba.org, mingo@redhat.com, Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, wangnan0@huawei.com Subject: Re: [Question] How does perf still record the stack of a specified pid even when that process is interrupted and CPU is scheduled to other process Message-ID: <20150505215328.GA3397@debian> References: <553A45CA.8020808@huawei.com> <553A4C18.3030609@gmail.com> <553B9F30.1040100@huawei.com> <553BB895.20800@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <553BB895.20800@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3979 Lines: 88 On Sat, Apr 25, 2015 at 09:53:57AM -0600, David Ahern wrote: > On 4/25/15 8:05 AM, Yunlong Song wrote: > >But this only shows the system call like strace, but we want the call > >stack of kernel functions in fact. > > > We haven't added the callchain option yet; on the to-do list. > > perf trace record -g -- iozone ... > perf trace -i perf.data -s > --> summary of system calls, max/min/average times > > perf trace -i perf.data --duration 10.0 -T > --> note the timestamp where the write took a "long" time > > perf script > --> search down to *around* the time of interest; you want the syscall > entry; timestamp is for exit Now if I understood the use case right, what Yulong Song wants to know is what the iozone process is doing in the kernel (i.e. the stacktrace of why exactly it goes to sleep / what it's waiting for) during these sys_writes which take a long time. The commands above will identify the sys_write which takes time but only provide the stacktrace at the entry and exit of the syscall, but this do not show why the process blocked or what it did inside the system call. So a way to get what is required for the use case would be to make the following changes to the above sequence: (1) include the sched:* events when perf trace record is run (2) around the time of interest, look at the kernel stack st the sched:switch events between the entry and the exit. This will show what the process was waiting for when it when it blocked. The stacktraces at the stat_runtime events in the process may also be useful to understand what was going on. Example: $ perf trace record -g -e sched:* -- dd if=/dev/zero of=/x bs=10M count=100 conv=fsync $ perf trace -i perf.data -s dd (147), 364 events, 94.3%, 0.000 msec syscall calls min avg max stddev --------------- -------- --------- --------- --------- ------ write 63 266.019 316.896 963.413 4.69% ... $ perf trace -i perf.data --duration 960 -T 91916.354 (963.413 ms): dd/147 write(arg0: 1, arg1: 139729327423488, arg2: 10485760, arg3: 582, arg4: 100, arg5: 72340172838076673) = 10485760 $ perf script ... dd 147 [002] 90.952941: raw_syscalls:sys_enter: NR 1 (1, 7f1544ed0000, a00000, 246, 64, 101010101010101) 211a00 syscall_trace_enter_phase2 ([kernel.kallsyms]) 7d79c3 tracesys_phase2 ([kernel.kallsyms]) ed20 [unknown] (/lib/libpthread-2.16.so) (... trimmed everything except the longest block: ...) dd 147 [002] 91.204723: sched:sched_switch: prev_comm=dd prev_pid=147 prev_prio=120 prev_state=D ==> next_comm=swapper/2 next_pid=0 next_prio=120 7d1121 __schedule ([kernel.kallsyms]) 7d1817 schedule ([kernel.kallsyms]) 4578a5 jbd2_log_wait_commit ([kernel.kallsyms]) 4546ff jbd2_log_do_checkpoint ([kernel.kallsyms]) 45489f __jbd2_log_wait_for_space ([kernel.kallsyms]) 44c2d8 start_this_handle ([kernel.kallsyms]) 44c7b6 jbd2__journal_start ([kernel.kallsyms]) 42fc3b __ext4_journal_start_sb ([kernel.kallsyms]) 4012e9 ext4_write_begin ([kernel.kallsyms]) 33795f generic_perform_write ([kernel.kallsyms]) 338b68 __generic_file_write_iter ([kernel.kallsyms]) 3f4371 ext4_file_write_iter ([kernel.kallsyms]) 381ae1 __vfs_write ([kernel.kallsyms]) 381f07 vfs_write ([kernel.kallsyms]) 3825dd sys_write ([kernel.kallsyms]) 7d7a1a tracesys_phase2 ([kernel.kallsyms]) ed20 [unknown] (/lib/libpthread-2.16.so) swapper 0 [002] 91.875573: sched:sched_wakeup: comm=dd pid=147 prio=120 success=1 target_cpu=002 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/