Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753850AbdFMUEi (ORCPT ); Tue, 13 Jun 2017 16:04:38 -0400 Received: from mail.kernel.org ([198.145.29.99]:44038 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752399AbdFMUEh (ORCPT ); Tue, 13 Jun 2017 16:04:37 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 454E4219A8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=rostedt@goodmis.org Date: Tue, 13 Jun 2017 16:04:34 -0400 From: Steven Rostedt To: Will Hawkins Cc: LKML Subject: Re: Ftrace vs perf user page fault statistics differences Message-ID: <20170613160434.605fdef2@gandalf.local.home> In-Reply-To: References: <20170612212005.1bf8c44d@gandalf.local.home> <20170613090649.2a5edb6a@gandalf.local.home> X-Mailer: Claws Mail 3.14.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9007 Lines: 232 On Tue, 13 Jun 2017 14:02:08 -0400 Will Hawkins wrote: > Thank you for pointing this out. I had been using -F for exactly the > reason that you mentioned. I failed to include it in the command that > I sent along. Very sorry for the confusion. Here is an updated version > of the command that I issued: > > > sudo ./trace-cmd record -e exceptions:page_fault_user -T --profile -l > handle_mm_fault -F ../one_page_play/page > > and I generated output like > > ./trace-cmd report --profile > > and I see the following (among some other output): > > Event: page_fault_user:0x7f094f7dd104 (1) > Event: page_fault_user:0x4000e0 (1) > Event: page_fault_user:0x7f094f7eae4a (1) > Event: page_fault_user:0x7f094f860d40 (1) > Event: page_fault_user:0x7f094f7db560 (1) > Event: page_fault_user:0x4040cb (1) > Event: page_fault_user:0x401825 (1) > Event: page_fault_user:0x401473 (1) > Event: page_fault_user:0x7f094f7e64c4 (1) > Event: page_fault_user:0x7f094f7f1212 (1) > > That output comes from under the task: page- heading, so it seems > like those faults are being attributed to the page task. > > This command seems to show something interesting: > > sudo ./trace-cmd record -e exceptions:page_fault_user -p > function_graph -g __do_fault -F ../one_page_play/page > > and the relevant output from > > ./trace-cmd report --profile > > is > > task: page-4032 > Event: func: __do_fault() (4) Total: 6685 Avg: 1671 Max: > 2398(ts:170150.060916) Min:855(ts:170150.054713) > Event: page_fault_user:0x7ffad3143d40 (1) > Event: page_fault_user:0x4000e0 (1) > Event: page_fault_user:0x401473 (1) > Event: page_fault_user:0x7ffad30c94c4 (1) > > This is closer to what I would expect. The first of the two 0x4... > addresses is the entry point and the second is the target. Basically, > that is exactly what I expect. The other two are the "suspicious" > entries. Neither matches the copy_user_enhanced_fast_string symbol > location and are not loaded in the binary (according to gdb). As you state below, there is faults recorded before the exec. Which is true with trace-cmd (not sure about perf). As trace-cmd does do some work after tracing is started and before the exec is called. > > It is odd to me that the output from the previous command includes > information about the trace-cmd process since I specified the -F > option. Did the trace-cmd process have the same pid as the page program? The -F makes it follow the pid, which can be trace-cmd before the fork. Oh, and what versions are you running of the kernel as well as trace-cmd. It's best to always get the latest trace-cmd. Older versions did have some bugs. Also, function_graph can have some residual traces from a sched_switch. That is, the tracing of a task specified by -F is enabled and disabled per cpu at sched_switch. The sched_switch tracepoint is before the actual context switch. When the next task is the task specified by -F, tracing begins. But the context switch can start tracing the previous task right after the sched switch tracepoint. Here, in kernel/sched/core.c in __schedule(): trace_sched_switch(preempt, prev, next); /* Also unlocks the rq: */ rq = context_switch(rq, prev, next, &rf); The context_switch() function will be traced even when the previous task isn't the task from -F. > > But, back to exactly what you asked. Here is the result of running > perf again with the -a option. I ran this command: > > sudo perf record --call-graph fp -e page-faults -a ../one_page_play/page > > 50.41% page ld-2.19.so [.] do_lookup_x > | > --- do_lookup_x > > 44.21% perf [kernel.kallsyms] [k] iov_iter_fault_in_readable > | > --- iov_iter_fault_in_readable > generic_file_buffered_write > __generic_file_aio_write > generic_file_aio_write > ext4_file_write > do_sync_write > vfs_write > sys_write > system_call_fastpath > __write_nocancel > 0x4081a5 > 0x407a40 > __libc_start_main > > 4.13% perf perf [.] 0x0000000000015b54 > | > --- 0x415b54 > 0x4081a5 > 0x407a40 > __libc_start_main > > 0.41% page page [.] _start > | > --- _start > > 0.41% page page [.] target > | > --- target > > 0.41% page [kernel.kallsyms] [k] copy_user_enhanced_fast_string > | > --- copy_user_enhanced_fast_string > load_elf_binary > search_binary_handler > do_execve_common.isra.23 > sys_execve > stub_execve > __execve > > > What is interesting, is that the output differs based on whether I've > dropped the kernel caches before I run the perf record. When I do > that, there are no page faults attributed to the entry point or the > target of the program. I would imagine that, after dropping caches, > the readahead handler picks up those pages when the binary is loaded > and negates the need for a page fault. That, indeed, seems to be the > case. I can see that when I run perf with an additional > block:block_rq_issue event. Immediately after dropping the caches, > there is a block request event. On subsequent executions, there is no > such event but there are the page faults that I expect. > > What I did notice is that some of the page faults come from the task > before execve is called. From what I've seen, this looks like the > kernel reclaiming pages from the spawning process before it is > replaced with the new binary (during the call to execve). After the > execve, there seem to be to page faults: > > > page-4613 [006] 171795.748310: funcgraph_entry: > 0.151 us | mutex_unlock(); > page-4613 [006] 171795.748313: funcgraph_entry: > 0.166 us | __fsnotify_parent(); > page-4613 [006] 171795.748313: funcgraph_entry: > 0.321 us | fsnotify(); > page-4613 [006] 171795.748314: funcgraph_entry: > 0.090 us | __sb_end_write(); > page-4613 [006] 171795.748317: funcgraph_entry: > | trace_do_page_fault() { > page-4613 [006] 171795.748317: page_fault_user: > address=__per_cpu_end ip=__per_cpu_end error_code=0x4 > page-4613 [006] 171795.748319: funcgraph_exit: > 2.254 us | } > page-4613 [006] 171795.748321: funcgraph_entry: > | trace_do_page_fault() { > page-4613 [006] 171795.748322: page_fault_user: > address=__per_cpu_end ip=__per_cpu_end error_code=0x14 > page-4613 [006] 171795.748323: funcgraph_exit: > 1.144 us | } > > NOTICE THIS: > page-4613 [006] 171795.748324: funcgraph_entry: > | sys_execve() { > page-4613 [007] 171795.748391: block_rq_issue: 8,0 > R 0 () 764812912 + 16 [trace-cmd] > page-4613 [005] 171795.759476: funcgraph_exit: # > 11152.111 us | } > page-4613 [005] 171795.759477: funcgraph_entry: > 3.745 us | do_notify_resume(); > page-4613 [005] 171795.759481: funcgraph_entry: > | trace_do_page_fault() { > page-4613 [005] 171795.759482: page_fault_user: > address=__per_cpu_end ip=__per_cpu_end error_code=0x14 > page-4613 [005] 171795.759487: funcgraph_exit: > 5.833 us | } > page-4613 [005] 171795.759488: funcgraph_entry: > | trace_do_page_fault() { > page-4613 [005] 171795.759489: page_fault_user: > address=__per_cpu_end ip=__per_cpu_end error_code=0x14 > page-4613 [005] 171795.759490: funcgraph_exit: > 2.003 us | } > page-4613 [005] 171795.759492: funcgraph_entry: > | sys_exit() { > > I wish that I could get something "better" than " > address=__per_cpu_end ip=__per_cpu_end error_code=0x14" for those page > faults. That would really tell me more about whether this is the > "correct" behavior. Also, you might want to do a trace-cmd report -l, which lets you see if the event happened in interrupt context. > > As ever, thank you very much for your help! Using these tools has been > an incredible learning experience. I still think that I am just > missing something stupid, but I really appreciate your patience. No no, you got me curious as well. Although, the only page faults I see are from before the exec and the actual program. Could you run the -F on your program with the other options you have and report a full: trace-cmd report -l And then you can ask about what you don't understand being there. -- Steve -- Steve