Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754547AbcK1IaJ (ORCPT ); Mon, 28 Nov 2016 03:30:09 -0500 Received: from mga14.intel.com ([192.55.52.115]:61515 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754367AbcK1I3z (ORCPT ); Mon, 28 Nov 2016 03:29:55 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,563,1473145200"; d="scan'208";a="36345279" Date: Mon, 28 Nov 2016 16:29:50 +0800 From: Aaron Lu To: xiaolong.ye@intel.com Cc: Linus Torvalds , Dave Hansen , LKML , lkp@01.org, linux-mm@kvack.org Subject: Re: [lkp] [mremap] 5d1904204c: will-it-scale.per_thread_ops -13.1% regression Message-ID: <20161128082950.GA1901@aaronlu.sh.intel.com> References: <20161127182153.GE2501@yexl-desktop> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20161127182153.GE2501@yexl-desktop> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 11033 Lines: 211 +linux-mm On Mon, Nov 28, 2016 at 02:21:53AM +0800, kernel test robot wrote: > > Greeting, Thanks for the report. > > FYI, we noticed a -13.1% regression of will-it-scale.per_thread_ops due to commit: I took a look at the test, it 1 creates an eventfd with the counter's initial value set to 0; 2 writes 1 to this eventfd, i.e. set its counter to 1; 3 does read, i.e. return the value of 1 and reset the counter to 0; 4 loop to step 2. I don't see move_vma/move_page_tables/move_ptes involved in this test though. I also tried trace-cmd to see if I missed anything: # trace-cmd record -p function --func-stack -l move_vma -l move_page_tables -l move_huge_pmd ./eventfd1_threads -t 1 -s 10 The report is: # /usr/local/bin/trace-cmd report CPU 0 is empty CPU 1 is empty CPU 2 is empty CPU 3 is empty CPU 4 is empty CPU 5 is empty CPU 6 is empty cpus=8 eventfd1_thread-21210 [007] 2626.438884: function: move_page_tables eventfd1_thread-21210 [007] 2626.438889: kernel_stack: => setup_arg_pages (ffffffff81282ac0) => load_elf_binary (ffffffff812e4503) => search_binary_handler (ffffffff81282268) => exec_binprm (ffffffff8149d6ab) => do_execveat_common.isra.41 (ffffffff81284882) => do_execve (ffffffff8128498c) => SyS_execve (ffffffff81284c3e) => do_syscall_64 (ffffffff81002a76) => return_from_SYSCALL_64 (ffffffff81cbd509) i.e., only one call of move_page_tables is in the log, and it's at the very beginning of the run, not during. I also did the run on my Sandybridge desktop(4 cores 8 threads with 12G memory) and a Broadwell EP, they don't have this performance drop either. Perhaps this problem only occurs on some machine. I'll continue to take a look at this report, hopefully I can figure out why this commit is bisected while its change doesn't seem to play a part in the test. Your comments are greatly appreciated. Thanks, Aaron > > > commit 5d1904204c99596b50a700f092fe49d78edba400 ("mremap: fix race between mremap() and page cleanning") > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master > > in testcase: will-it-scale > on test machine: 12 threads Intel(R) Core(TM) i7 CPU X 980 @ 3.33GHz with 6G memory > with following parameters: > > test: eventfd1 > cpufreq_governor: performance > > test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. > test-url: https://github.com/antonblanchard/will-it-scale > > > Details are as below: > --------------------------------------------------------------------------------------------------> > > > To reproduce: > > git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git > cd lkp-tests > bin/lkp install job.yaml # job file is attached in this email > bin/lkp run job.yaml > > ========================================================================================= > compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase: > gcc-6/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/wsm/eventfd1/will-it-scale > > commit: > 961b708e95 (" fixes for amdgpu, and a bunch of arm drivers.") > 5d1904204c ("mremap: fix race between mremap() and page cleanning") > > 961b708e95181041 5d1904204c99596b50a700f092 > ---------------- -------------------------- > fail:runs %reproduction fail:runs > | | | > %stddev %change %stddev > \ | \ > 2459656 ? 0% -13.1% 2137017 ? 1% will-it-scale.per_thread_ops > 2865527 ? 3% +4.2% 2986100 ? 0% will-it-scale.per_process_ops > 0.62 ? 11% -13.2% 0.54 ? 1% will-it-scale.scalability > 893.40 ? 0% +1.3% 905.24 ? 0% will-it-scale.time.system_time > 169.92 ? 0% -7.0% 158.09 ? 0% will-it-scale.time.user_time > 176943 ? 6% +26.1% 223131 ? 11% cpuidle.C1E-NHM.time > 10.00 ? 6% -10.9% 8.91 ? 4% turbostat.CPU%c6 > 30508 ? 1% +3.4% 31541 ? 0% vmstat.system.cs > 27239 ? 0% +1.5% 27650 ? 0% vmstat.system.in > 2.03 ? 2% -11.6% 1.80 ? 6% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64 > 4.11 ? 1% -12.0% 3.61 ? 4% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_swapgs > 1.70 ? 3% -13.8% 1.46 ? 5% perf-profile.children.cycles-pp.__fget_light > 2.03 ? 2% -11.6% 1.80 ? 6% perf-profile.children.cycles-pp.entry_SYSCALL_64 > 4.11 ? 1% -12.0% 3.61 ? 4% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_swapgs > 12.79 ? 1% -10.0% 11.50 ? 6% perf-profile.children.cycles-pp.selinux_file_permission > 1.70 ? 3% -13.8% 1.46 ? 5% perf-profile.self.cycles-pp.__fget_light > 2.03 ? 2% -11.6% 1.80 ? 6% perf-profile.self.cycles-pp.entry_SYSCALL_64 > 4.11 ? 1% -12.0% 3.61 ? 4% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_swapgs > 5.85 ? 2% -12.5% 5.12 ? 5% perf-profile.self.cycles-pp.selinux_file_permission > 1.472e+12 ? 0% -5.5% 1.392e+12 ? 0% perf-stat.branch-instructions > 0.89 ? 0% -6.0% 0.83 ? 0% perf-stat.branch-miss-rate% > 1.303e+10 ? 0% -11.1% 1.158e+10 ? 0% perf-stat.branch-misses > 5.534e+08 ? 4% -6.9% 5.151e+08 ? 1% perf-stat.cache-references > 9347877 ? 1% +3.4% 9663609 ? 0% perf-stat.context-switches > 2.298e+12 ? 0% -5.6% 2.168e+12 ? 0% perf-stat.dTLB-loads > 1.525e+12 ? 1% -5.4% 1.442e+12 ? 0% perf-stat.dTLB-stores > 7.795e+12 ? 0% -5.5% 7.363e+12 ? 0% perf-stat.iTLB-loads > 6.694e+12 ? 1% -4.5% 6.391e+12 ? 2% perf-stat.instructions > 0.93 ? 0% -5.5% 0.88 ? 0% perf-stat.ipc > 119024 ? 5% -11.3% 105523 ? 8% sched_debug.cfs_rq:/.exec_clock.max > 5933459 ? 19% +24.5% 7385120 ? 3% sched_debug.cpu.nr_switches.max > 1684848 ? 15% +20.6% 2032107 ? 3% sched_debug.cpu.nr_switches.stddev > 5929704 ? 19% +24.5% 7382036 ? 3% sched_debug.cpu.sched_count.max > 1684318 ? 15% +20.6% 2031701 ? 3% sched_debug.cpu.sched_count.stddev > 2826278 ? 18% +30.4% 3684493 ? 3% sched_debug.cpu.sched_goidle.max > 804195 ? 14% +26.2% 1014783 ? 3% sched_debug.cpu.sched_goidle.stddev > 2969365 ? 19% +24.3% 3692180 ? 3% sched_debug.cpu.ttwu_count.max > 843614 ? 15% +20.5% 1016263 ? 3% sched_debug.cpu.ttwu_count.stddev > 2963657 ? 19% +24.4% 3687897 ? 3% sched_debug.cpu.ttwu_local.max > 843104 ? 15% +20.5% 1016333 ? 3% sched_debug.cpu.ttwu_local.stddev > > > > will-it-scale.time.user_time > > 172 ++--------------------*--------*---*----------------------------------+ > 170 ++..*....*...*....*. *.. . ..*.... ..*...*.... | > *. *....*. *. *...* > 168 ++ | > 166 ++ | > | | > 164 ++ | > 162 ++ | > 160 ++ | > | O O | > 158 ++ O O O | > 156 ++ O | > O O | > 154 ++ O O | > 152 ++--O-----------------------------------------------------------------+ > > > will-it-scale.time.system_time > > 912 ++--------------------------------------------------------------------+ > 910 ++ O O | > O O O | > 908 ++ O | > 906 ++ O O O | > 904 ++ O O | > 902 ++ | > | | > 900 ++ | > 898 ++ | > 896 ++ | > 894 ++ ..* > *...*.... ..*.... ..*....*...*....*...*...*....*. | > 892 ++ *. *...*...*....*...*. | > 890 ++--------------------------------------------------------------------+ > > > will-it-scale.per_thread_ops > > 2.55e+06 ++---------------------------------------------------------------+ > 2.5e+06 ++ .*...*.. .*...*... ..*. .*.. | > |.. . .. *...*....*. .. .*... .. . | > 2.45e+06 *+ * .. * *...*...* > 2.4e+06 ++ * | > 2.35e+06 ++ | > 2.3e+06 ++ | > | | > 2.25e+06 ++ | > 2.2e+06 ++ O | > 2.15e+06 ++ O O O O | > 2.1e+06 ++ O O | > O O O | > 2.05e+06 ++ O | > 2e+06 ++---------------------------------------------------------------+ > > [*] bisect-good sample > [O] bisect-bad sample > > > Disclaimer: > Results have been estimated based on internal Intel analysis and are provided > for informational purposes only. Any difference in system hardware or software > design or configuration may affect actual performance. > > > Thanks, > Xiaolong