2018-05-20 02:32:30

by kernel test robot

[permalink] [raw]
Subject: [lkp-robot] [sched/numa] 789ba28013: pxz.throughput -5.8% regression


Greeting,

FYI, we noticed a -5.8% regression of pxz.throughput due to commit:


commit: 789ba28013ce23dbf5e9f5f014f4233b35523bf3 ("Revert "sched/numa: Delay retrying placement for automatic NUMA balance after wake_affine()"")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: pxz
on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 64G memory
with following parameters:

nr_threads: 25%
cpufreq_governor: performance



Details are as below:
-------------------------------------------------------------------------------------------------->

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/testcase:
gcc-7/performance/x86_64-rhel-7.2/25%/debian-x86_64-2016-08-31.cgz/lkp-bdw-ep3/pxz

commit:
94d7dbf108 (" - A stable fix for DM integrity to use kvfree.")
789ba28013 ("Revert "sched/numa: Delay retrying placement for automatic NUMA balance after wake_affine()"")

94d7dbf108813ea4 789ba28013ce23dbf5e9f5f014
---------------- --------------------------
%stddev %change %stddev
\ | \
1.113e+08 -5.8% 1.048e+08 pxz.throughput
8327 ? 3% +9.2% 9094 ? 2% pxz.time.involuntary_context_switches
1872519 ? 3% +25.8% 2356056 ? 3% pxz.time.minor_page_faults
1958 -3.9% 1881 pxz.time.percent_of_cpu_this_job_got
69.87 -1.9% 68.55 pxz.time.system_time
5976 -3.3% 5779 pxz.time.user_time
2800729 ? 3% +21.9% 3412916 ? 4% interrupts.CAL:Function_call_interrupts
98846 +1.4% 100214 vmstat.system.in
493.25 ? 6% +21.4% 599.00 ? 5% slabinfo.skbuff_fclone_cache.active_objs
493.25 ? 6% +21.4% 599.00 ? 5% slabinfo.skbuff_fclone_cache.num_objs
28978 ? 16% +34.5% 38972 ? 4% numa-meminfo.node0.SReclaimable
90527 ? 10% +18.7% 107450 ? 2% numa-meminfo.node0.Slab
13699 ? 14% -25.7% 10176 numa-meminfo.node1.Mapped
36266 ? 13% -27.3% 26383 ? 6% numa-meminfo.node1.SReclaimable
26839 ? 22% -47.2% 14178 ? 39% numa-meminfo.node1.Shmem
2902 ? 19% +31.9% 3828 ? 2% numa-vmstat.node0.nr_mapped
7244 ? 16% +34.5% 9743 ? 4% numa-vmstat.node0.nr_slab_reclaimable
3497 ? 15% -26.1% 2586 ? 2% numa-vmstat.node1.nr_mapped
6727 ? 22% -47.1% 3560 ? 39% numa-vmstat.node1.nr_shmem
9067 ? 13% -27.3% 6596 ? 6% numa-vmstat.node1.nr_slab_reclaimable
1186310 ? 5% +34.2% 1591738 ? 4% proc-vmstat.numa_hint_faults
1103415 ? 6% +27.5% 1406754 ? 4% proc-vmstat.numa_hint_faults_local
40480 ? 3% +29.4% 52378 ? 3% proc-vmstat.numa_huge_pte_updates
1052001 ? 8% +41.4% 1487662 ? 4% proc-vmstat.numa_pages_migrated
21884436 ? 4% +29.7% 28383097 ? 3% proc-vmstat.numa_pte_updates
2667563 ? 2% +18.4% 3158089 ? 2% proc-vmstat.pgfault
1052001 ? 8% +41.4% 1487662 ? 4% proc-vmstat.pgmigrate_success
1278 ? 2% -19.8% 1025 ? 2% proc-vmstat.thp_split_pmd
3.38 ? 7% -2.0 1.38 ?101% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_kernel
3.43 ? 7% -1.9 1.56 ? 84% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_kernel.secondary_startup_64
3.44 ? 7% -1.7 1.72 ? 68% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_kernel.secondary_startup_64
3.44 ? 7% -1.7 1.72 ? 68% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_kernel.secondary_startup_64
3.44 ? 7% -1.7 1.72 ? 68% perf-profile.calltrace.cycles-pp.start_kernel.secondary_startup_64
3.44 ? 7% -1.7 1.72 ? 68% perf-profile.children.cycles-pp.start_kernel
0.08 ? 23% +0.0 0.11 ? 7% perf-profile.children.cycles-pp.anon_pipe_buf_release
0.35 ? 4% +0.0 0.39 ? 8% perf-profile.children.cycles-pp.ktime_get
0.12 ? 5% +0.0 0.14 ? 9% perf-profile.self.cycles-pp.__hrtimer_run_queues
0.15 ? 8% +0.0 0.18 ? 2% perf-profile.self.cycles-pp._raw_spin_lock
0.08 ? 23% +0.0 0.11 ? 7% perf-profile.self.cycles-pp.anon_pipe_buf_release
17.47 ? 6% -12.1% 15.36 ? 8% sched_debug.cfs_rq:/.runnable_load_avg.avg
9.62 ? 2% -15.4% 8.13 ? 6% sched_debug.cpu.cpu_load[0].avg
9.47 ? 5% -12.6% 8.28 ? 3% sched_debug.cpu.cpu_load[1].avg
11487 ? 8% -15.3% 9724 ? 4% sched_debug.cpu.load.avg
386.67 ? 22% +43.7% 555.50 ? 15% sched_debug.cpu.nr_switches.min
5325 ? 5% -16.6% 4441 ? 4% sched_debug.cpu.nr_switches.stddev
0.00 ? 57% +100.0% 0.01 ? 11% sched_debug.cpu.nr_uninterruptible.avg
119.04 ? 44% +71.9% 204.67 ? 24% sched_debug.cpu.sched_count.min
12503 ? 16% -22.4% 9698 ? 13% sched_debug.cpu.sched_goidle.max
55.83 ? 44% +78.0% 99.38 ? 23% sched_debug.cpu.sched_goidle.min
2214 ? 4% -23.0% 1705 ? 7% sched_debug.cpu.sched_goidle.stddev
13757 ? 15% -18.9% 11158 ? 12% sched_debug.cpu.ttwu_count.max
72.42 ? 16% +59.4% 115.46 ? 23% sched_debug.cpu.ttwu_count.min
2213 ? 6% -18.6% 1802 ? 7% sched_debug.cpu.ttwu_count.stddev
41.67 ? 10% +36.6% 56.92 ? 14% sched_debug.cpu.ttwu_local.min
1.829e+12 -5.1% 1.736e+12 perf-stat.branch-instructions
4.612e+10 -4.7% 4.396e+10 perf-stat.branch-misses
43.24 -0.7 42.51 perf-stat.cache-miss-rate%
9.154e+10 -6.2% 8.584e+10 perf-stat.cache-misses
2.117e+11 -4.6% 2.019e+11 perf-stat.cache-references
1.52 +2.5% 1.56 perf-stat.cpi
1.774e+13 -2.8% 1.725e+13 perf-stat.cpu-cycles
863.00 ? 15% +118.4% 1884 ? 2% perf-stat.cpu-migrations
0.02 +0.0 0.02 ? 2% perf-stat.dTLB-load-miss-rate%
3.512e+12 -4.5% 3.356e+12 perf-stat.dTLB-loads
0.02 +0.0 0.02 ? 2% perf-stat.dTLB-store-miss-rate%
2.11e+08 +12.4% 2.373e+08 ? 2% perf-stat.dTLB-store-misses
1.393e+12 -5.5% 1.316e+12 perf-stat.dTLB-stores
1.164e+13 -5.1% 1.104e+13 perf-stat.instructions
0.66 -2.5% 0.64 perf-stat.ipc
2642298 ? 2% +18.6% 3133154 ? 2% perf-stat.minor-faults
2.09 ? 29% +4.5 6.55 ? 6% perf-stat.node-load-miss-rate%
1.387e+09 ? 31% +170.7% 3.753e+09 ? 11% perf-stat.node-load-misses
6.456e+10 ? 7% -17.3% 5.337e+10 ? 6% perf-stat.node-loads
2.59 ? 7% +3.8 6.44 ? 11% perf-stat.node-store-miss-rate%
1.674e+08 ? 7% +132.0% 3.883e+08 ? 14% perf-stat.node-store-misses
6.285e+09 -10.6% 5.619e+09 ? 3% perf-stat.node-stores
2642302 ? 2% +18.6% 3133157 ? 2% perf-stat.page-faults



pxz.throughput

1.13e+08 +-+--------------------------------------------------------------+
1.12e+08 +-+ .+.+.+. .+.+.+ + + + .+.+.. + |
| +.+..+.+ +..+ +..+.+.+ +.+..+.+.+.+ +. + +|
1.11e+08 +-+ + |
1.1e+08 +-+ |
1.09e+08 +-+ |
1.08e+08 +-+ |
| |
1.07e+08 +-+ O O O |
1.06e+08 O-+ O O O |
1.05e+08 +-O |
1.04e+08 +-+ O O O O O O |
| |
1.03e+08 +-+ O |
1.02e+08 +-+--------------------------------------------------------------+





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Xiaolong


Attachments:
(No filename) (9.19 kB)
config-4.17.0-rc4-00046-g789ba28 (167.06 kB)
job-script (6.61 kB)
job.yaml (4.33 kB)
reproduce (263.00 B)
Download all attachments