Greeting,
FYI, we noticed a -4.7% regression of fio.read_iops due to commit:
commit: 75cc3c9161cd95f43ebf6c6a938d4d98ab195bbd ("mm/lru: move lock into lru_note_cost")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: fio-basic
on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 256G memory
with following parameters:
disk: 2pmem
fs: ext4
runtime: 200s
nr_task: 50%
time_based: tb
rw: randread
bs: 4k
ioengine: mmap
test_size: 200G
cpufreq_governor: performance
ucode: 0x5003006
test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
test-url: https://github.com/axboe/fio
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
bin/lkp run generated-yaml-file
=========================================================================================
bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based/ucode:
4k/gcc-9/performance/2pmem/ext4/mmap/x86_64-rhel-8.3/50%/debian-10.4-x86_64-20200603.cgz/200s/randread/lkp-csl-2sp6/200G/fio-basic/tb/0x5003006
commit:
c7c7b80c39 ("mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn")
75cc3c9161 ("mm/lru: move lock into lru_note_cost")
c7c7b80c39a18d99 75cc3c9161cd95f43ebf6c6a938
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.06 +0.0 0.06 fio.latency_20ms%
0.17 ? 6% -0.1 0.10 ? 12% fio.latency_250us%
2.42 ? 5% +0.3 2.73 ? 4% fio.latency_50us%
10762 -4.7% 10251 fio.read_bw_MBps
15928 +5.4% 16792 fio.read_clat_mean_us
620449 ? 4% +15.2% 714702 ? 5% fio.read_clat_stddev
2755207 -4.7% 2624496 fio.read_iops
4.356e+09 -4.7% 4.15e+09 fio.time.file_system_inputs
548995 -13.6% 474105 fio.time.involuntary_context_switches
5.445e+08 -4.7% 5.188e+08 fio.time.major_page_faults
5.512e+08 -4.7% 5.252e+08 fio.workload
2.60 -4.2% 2.50 iostat.cpu.user
993.70 ? 5% -9.6% 898.57 ? 5% sched_debug.cfs_rq:/.util_est_enqueued.max
148.41 -1.2% 146.66 turbostat.RAMWatt
213.43 ? 3% -34.7% 139.43 ? 5% numa-vmstat.node0.nr_isolated_file
210.57 ? 5% -33.0% 141.14 ? 5% numa-vmstat.node1.nr_isolated_file
10692349 -4.9% 10171517 vmstat.io.bi
7671 -9.8% 6917 vmstat.system.cs
42.20 ? 3% +7.6% 45.42 ? 3% perf-sched.total_wait_and_delay.average.ms
42.18 ? 3% +7.6% 45.40 ? 3% perf-sched.total_wait_time.average.ms
10233 ? 2% -17.2% 8477 ? 3% perf-sched.wait_and_delay.count.__sched_text_start.__sched_text_start.preempt_schedule_common._cond_resched.__alloc_pages_slowpath
459.86 ? 7% +36.0% 625.57 ? 14% perf-sched.wait_and_delay.count.__sched_text_start.__sched_text_start.preempt_schedule_common._cond_resched.down_read
20707 ? 4% -9.2% 18791 ? 5% perf-sched.wait_and_delay.count.__sched_text_start.__sched_text_start.worker_thread.kthread.ret_from_fork
0.01 ? 8% +11171.4% 1.24 ?178% perf-sched.wait_time.avg.ms.__sched_text_start.__sched_text_start.preempt_schedule_common._cond_resched.__alloc_pages_nodemask
0.02 ? 71% +1.7e+05% 36.36 ?177% perf-sched.wait_time.max.ms.__sched_text_start.__sched_text_start.preempt_schedule_common._cond_resched.__alloc_pages_nodemask
46297815 -1.3% 45718611 interrupts.CAL:Function_call_interrupts
536.29 ? 11% -18.9% 434.71 ? 12% interrupts.CPU13.RES:Rescheduling_interrupts
562.71 ? 13% -18.5% 458.43 ? 7% interrupts.CPU16.RES:Rescheduling_interrupts
757634 ? 9% -13.6% 654501 ? 6% interrupts.CPU16.TLB:TLB_shootdowns
536.00 ? 13% -18.3% 438.00 ? 5% interrupts.CPU17.RES:Rescheduling_interrupts
550.57 ? 9% -21.3% 433.57 ? 8% interrupts.CPU18.RES:Rescheduling_interrupts
4251 ? 18% +60.7% 6833 ? 9% interrupts.CPU25.NMI:Non-maskable_interrupts
4251 ? 18% +60.7% 6833 ? 9% interrupts.CPU25.PMI:Performance_monitoring_interrupts
506.29 ? 11% -21.9% 395.57 ? 12% interrupts.CPU35.RES:Rescheduling_interrupts
772187 ? 11% -17.0% 640700 ? 9% interrupts.CPU35.TLB:TLB_shootdowns
752779 ? 10% -23.8% 573337 ? 19% interrupts.CPU37.TLB:TLB_shootdowns
374466 -4.8% 356349 proc-vmstat.allocstall_movable
8293 ? 2% -6.9% 7723 ? 2% proc-vmstat.kswapd_low_wmark_hit_quickly
426.29 -34.5% 279.14 ? 4% proc-vmstat.nr_isolated_file
4.288e+08 -5.8% 4.039e+08 proc-vmstat.numa_hit
4.287e+08 -5.8% 4.038e+08 proc-vmstat.numa_local
8297 ? 2% -6.9% 7727 ? 2% proc-vmstat.pageoutrun
20856484 -4.5% 19927281 proc-vmstat.pgalloc_dma32
5.25e+08 -4.7% 5e+08 proc-vmstat.pgalloc_normal
1.09e+09 -4.7% 1.038e+09 proc-vmstat.pgfault
5.355e+08 -4.8% 5.097e+08 proc-vmstat.pgfree
5.445e+08 -4.7% 5.187e+08 proc-vmstat.pgmajfault
2.178e+09 -4.7% 2.075e+09 proc-vmstat.pgpgin
9.606e+08 -5.0% 9.122e+08 proc-vmstat.pgscan_direct
1.079e+09 -4.7% 1.028e+09 proc-vmstat.pgscan_file
4.938e+08 -4.8% 4.698e+08 proc-vmstat.pgsteal_direct
5.345e+08 -4.8% 5.087e+08 proc-vmstat.pgsteal_file
40747069 ? 2% -4.6% 38881007 proc-vmstat.pgsteal_kswapd
33706519 -4.6% 32144491 proc-vmstat.workingset_refault_file
22.69 ? 9% -11.5 11.23 ? 9% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.shrink_inactive_list.shrink_lruvec.shrink_node
21.89 ? 9% -10.7 11.20 ? 9% perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.shrink_inactive_list.shrink_lruvec.shrink_node.do_try_to_free_pages
11.69 ? 10% -2.6 9.05 ? 9% perf-profile.calltrace.cycles-pp.shrink_page_list.shrink_inactive_list.shrink_lruvec.shrink_node.do_try_to_free_pages
4.06 ? 12% -1.8 2.27 ? 9% perf-profile.calltrace.cycles-pp.try_to_unmap_flush.shrink_page_list.shrink_inactive_list.shrink_lruvec.shrink_node
4.06 ? 12% -1.8 2.27 ? 9% perf-profile.calltrace.cycles-pp.arch_tlbbatch_flush.try_to_unmap_flush.shrink_page_list.shrink_inactive_list.shrink_lruvec
4.06 ? 12% -1.8 2.27 ? 9% perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush.shrink_page_list.shrink_inactive_list
3.96 ? 12% -1.7 2.22 ? 10% perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush.shrink_page_list
0.00 +12.5 12.46 ? 8% perf-profile.calltrace.cycles-pp.lru_note_cost.shrink_inactive_list.shrink_lruvec.shrink_node.do_try_to_free_pages
0.00 +12.5 12.50 ? 10% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.lru_note_cost.shrink_inactive_list.shrink_lruvec
0.00 +12.6 12.56 ? 10% perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.lru_note_cost.shrink_inactive_list.shrink_lruvec.shrink_node
12.95 ? 10% -2.9 10.09 ? 9% perf-profile.children.cycles-pp.shrink_page_list
4.34 ? 12% -1.9 2.46 ? 9% perf-profile.children.cycles-pp.try_to_unmap_flush
4.34 ? 12% -1.9 2.46 ? 9% perf-profile.children.cycles-pp.arch_tlbbatch_flush
4.34 ? 12% -1.9 2.46 ? 9% perf-profile.children.cycles-pp.on_each_cpu_cond_mask
4.24 ? 12% -1.8 2.41 ? 10% perf-profile.children.cycles-pp.smp_call_function_many_cond
2.77 ? 10% -0.3 2.44 ? 10% perf-profile.children.cycles-pp.page_referenced
2.09 ? 10% -0.2 1.84 ? 10% perf-profile.children.cycles-pp.page_referenced_one
1.98 ? 10% -0.2 1.75 ? 10% perf-profile.children.cycles-pp.page_vma_mapped_walk
0.98 ? 10% -0.1 0.85 ? 10% perf-profile.children.cycles-pp.isolate_lru_pages
0.37 ? 11% -0.0 0.32 ? 9% perf-profile.children.cycles-pp.sync_regs
0.09 ? 15% -0.0 0.05 ? 6% perf-profile.children.cycles-pp.smp_call_function_single
0.06 ? 10% +13.0 13.02 ? 8% perf-profile.children.cycles-pp.lru_note_cost
4.09 ? 12% -1.8 2.28 ? 10% perf-profile.self.cycles-pp.smp_call_function_many_cond
3.96 ? 11% -0.5 3.42 ? 10% perf-profile.self.cycles-pp.filemap_map_pages
1.58 ? 10% -0.2 1.40 ? 10% perf-profile.self.cycles-pp.page_vma_mapped_walk
0.30 ? 11% -0.1 0.23 ? 16% perf-profile.self.cycles-pp.__remove_mapping
0.09 ? 18% -0.0 0.04 ? 40% perf-profile.self.cycles-pp.smp_call_function_single
0.36 ? 10% -0.0 0.32 ? 9% perf-profile.self.cycles-pp.sync_regs
0.16 ? 10% -0.0 0.13 ? 14% perf-profile.self.cycles-pp.move_pages_to_lru
0.06 ? 10% +0.0 0.08 ? 10% perf-profile.self.cycles-pp.lru_note_cost
0.12 ? 12% +0.1 0.17 ? 14% perf-profile.self.cycles-pp._raw_spin_lock_irq
1.311e+10 -3.7% 1.262e+10 perf-stat.i.branch-instructions
1.148e+08 -4.1% 1.101e+08 perf-stat.i.branch-misses
4.887e+08 -5.1% 4.637e+08 perf-stat.i.cache-misses
6.388e+08 -4.3% 6.115e+08 perf-stat.i.cache-references
7634 -10.1% 6867 perf-stat.i.context-switches
2.24 +4.1% 2.33 perf-stat.i.cpi
329.58 +5.5% 347.60 perf-stat.i.cycles-between-cache-misses
1.598e+10 -3.8% 1.536e+10 perf-stat.i.dTLB-loads
8.453e+09 -4.6% 8.062e+09 perf-stat.i.dTLB-stores
2862085 -3.2% 2770116 perf-stat.i.iTLB-loads
6.452e+10 -3.9% 6.201e+10 perf-stat.i.instructions
0.46 -3.9% 0.45 perf-stat.i.ipc
2707617 -4.7% 2579975 perf-stat.i.major-faults
398.00 -4.0% 381.98 perf-stat.i.metric.M/sec
70003070 ? 2% -5.1% 66402385 ? 2% perf-stat.i.node-stores
2711171 -4.7% 2583543 perf-stat.i.page-faults
2.15 +4.1% 2.24 perf-stat.overall.cpi
283.80 +5.4% 299.24 perf-stat.overall.cycles-between-cache-misses
0.47 -3.9% 0.45 perf-stat.overall.ipc
23424 +1.1% 23674 perf-stat.overall.path-length
1.304e+10 -3.7% 1.256e+10 perf-stat.ps.branch-instructions
1.142e+08 -4.1% 1.095e+08 perf-stat.ps.branch-misses
4.864e+08 -5.2% 4.613e+08 perf-stat.ps.cache-misses
6.357e+08 -4.3% 6.083e+08 perf-stat.ps.cache-references
7594 -10.0% 6832 perf-stat.ps.context-switches
1.59e+10 -3.9% 1.528e+10 perf-stat.ps.dTLB-loads
8.412e+09 -4.7% 8.02e+09 perf-stat.ps.dTLB-stores
2847228 -3.2% 2755506 perf-stat.ps.iTLB-loads
6.42e+10 -3.9% 6.169e+10 perf-stat.ps.instructions
2694573 -4.8% 2566453 perf-stat.ps.major-faults
69668649 ? 2% -5.2% 66048410 ? 2% perf-stat.ps.node-stores
2698105 -4.7% 2570000 perf-stat.ps.page-faults
1.291e+13 -3.7% 1.243e+13 perf-stat.total.instructions
fio.read_bw_MBps
11000 +-------------------------------------------------------------------+
| + : : + + + :: +.+. |
10800 |-+ : :: : : : :: : +: : .+ ++. +. .+ .|
|.++. : + + : : +. .++.+.+ .+.+ : :.+.+ : + + +.+ + + |
10600 |-+ + + + + + + + + |
| |
10400 |-+ |
| O O O OO OO OO |
10200 |-+O O O O O O O O O O |
| O O O O |
10000 |-O O O O |
| O O O O O |
9800 |-+ O O |
| |
9600 +-------------------------------------------------------------------+
fio.read_iops
2.85e+06 +----------------------------------------------------------------+
| + + |
2.8e+06 |-+ :: + :: +.+ |
| +. : : :: + +.: : : : .+ .+ |
2.75e+06 |-++. : + : : : +. +. :+ .+. : + :.+ +.++.+ + +.|
|.+ + +.+ ++.+.++.+ +.+ + ++ + + |
2.7e+06 |-+ |
| |
2.65e+06 |-+ O O OO O |
| O O O O O O O O O |
2.6e+06 |-+ OO O O O O O O |
| O O O |
2.55e+06 |-O O O O O |
| O O O |
2.5e+06 +----------------------------------------------------------------+
fio.workload
5.7e+08 +-----------------------------------------------------------------+
| + + |
5.6e+08 |-+ :: + : : +.+ |
| +. : : :: + + : : + : .+ .+ |
5.5e+08 |-++. : + : : : .+ .+ + : .+ + + :+ +.++.+ + +.|
|.+ + +.+ +.++.+.++ +.+ + +.+ + + |
5.4e+08 |-+ |
| |
5.3e+08 |-+ O OO O O |
| O O O O O O O O O |
5.2e+08 |-+ OO O O O O O O |
| O O O |
5.1e+08 |-O O O O O |
| O O O |
5e+08 +-----------------------------------------------------------------+
fio.time.major_page_faults
5.6e+08 +-----------------------------------------------------------------+
| + + |
5.5e+08 |-+ + :: + + + : : .+.+ |
| +. :+ : : :: .+ +: .+ + :: : + +.++.+.+ .++.|
5.4e+08 |.+ + + +.+ : .+ .+.++.+ .+ + :.+ + + + + |
| + + + + |
5.3e+08 |-+ |
| O OO O O |
5.2e+08 |-+O O O O O O O O O |
| OO O O O O |
5.1e+08 |-+ O O O O |
| O O O O O O |
5e+08 |-+ O O O |
| O |
4.9e+08 +-----------------------------------------------------------------+
fio.time.file_system_inputs
4.45e+09 +----------------------------------------------------------------+
4.4e+09 |-+ :: + :: +.+ |
| +. : : :: + +.: : + :.+ .+ .+ |
4.35e+09 |-++. : + : : : +. +. :+ .+. : + :+ + +.+ + +.|
4.3e+09 |.+ + +.+ ++.+.++.+ +.+ + ++ + + |
| |
4.25e+09 |-+ |
4.2e+09 |-+ O O |
4.15e+09 |-+O O O O O O O |
| O O O O O O O O O O |
4.1e+09 |-+ O O O O |
4.05e+09 |-O O O O |
| O O O O O |
4e+09 |-+ O O |
3.95e+09 +----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
Thanks,
Oliver Sang