Greeting,
FYI, we noticed a 92.0% improvement of fsmark.files_per_sec due to commit:
commit: 70bed0d5447e08702c7595d26c88ca37e8eb88b4 ("block: add a bdev_write_cache helper")
git://git.infradead.org/users/hch/block.git block-api
in testcase: fsmark
on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz with 128G memory
with following parameters:
iterations: 1x
nr_threads: 1t
disk: 1HDD
fs: btrfs
fs2: nfsv4
filesize: 4K
test_size: 40M
sync_method: fsyncBeforeClose
nr_files_per_directory: 1fpd
cpufreq_governor: performance
ucode: 0xd000331
test-description: The fsmark is a file system benchmark to test synchronous write workloads, for example, mail servers workload.
test-url: https://sourceforge.net/projects/fsmark/
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase/ucode:
gcc-11/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-8.3/1fpd/1t/debian-10.4-x86_64-20200603.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark/0xd000331
commit:
6cccbfebc0 ("block: add a bdev_nonrot helper")
70bed0d544 ("block: add a bdev_write_cache helper")
6cccbfebc02395ae 70bed0d5447e08702c7595d26c8
---------------- ---------------------------
%stddev %change %stddev
\ | \
19.10 +92.0% 36.67 fsmark.files_per_sec
536.13 -47.9% 279.40 fsmark.time.elapsed_time
536.13 -47.9% 279.40 fsmark.time.elapsed_time.max
53273 +2.7% 54708 fsmark.time.voluntary_context_switches
1.49 -2.1% 1.46 iostat.cpu.iowait
908369 ? 17% -39.4% 550808 ? 28% numa-numastat.node1.numa_hit
6.694e+10 -48.0% 3.482e+10 cpuidle..time
1.385e+08 -47.7% 72505602 cpuidle..usage
0.03 +0.0 0.04 ? 3% mpstat.cpu.all.sys%
0.01 ? 3% +0.0 0.01 ? 4% mpstat.cpu.all.usr%
577.53 -44.4% 321.22 uptime.boot
70827 -44.7% 39155 uptime.idle
2334 +102.7% 4732 vmstat.io.bo
3380 +45.5% 4919 vmstat.system.cs
1.38e+08 -47.7% 72098570 turbostat.IRQ
22732 ? 12% -38.8% 13910 ? 6% turbostat.POLL
51.67 -3.9% 49.67 ? 2% turbostat.PkgTmp
134519 +15.4% 155275 meminfo.Active
10873 ? 3% -32.7% 7312 meminfo.Active(anon)
123645 +19.7% 147962 meminfo.Active(file)
29545 -12.3% 25909 meminfo.Shmem
256478 -36.6% 162537 ? 39% numa-meminfo.node0.AnonHugePages
7918 ? 30% -55.5% 3522 ? 9% numa-meminfo.node1.Active(anon)
20189 ? 46% +484.8% 118058 ? 66% numa-meminfo.node1.AnonPages
55896 ? 34% +176.2% 154400 ? 47% numa-meminfo.node1.AnonPages.max
25261 ? 31% +383.3% 122094 ? 64% numa-meminfo.node1.Inactive(anon)
1467 ? 16% +26.8% 1860 ? 11% numa-meminfo.node1.PageTables
12916 ? 22% -45.2% 7081 ? 55% numa-meminfo.node1.Shmem
1978 ? 30% -55.5% 880.00 ? 9% numa-vmstat.node1.nr_active_anon
5049 ? 46% +484.5% 29514 ? 66% numa-vmstat.node1.nr_anon_pages
6319 ? 31% +383.1% 30528 ? 64% numa-vmstat.node1.nr_inactive_anon
366.00 ? 17% +26.8% 464.17 ? 10% numa-vmstat.node1.nr_page_table_pages
3231 ? 22% -45.1% 1773 ? 55% numa-vmstat.node1.nr_shmem
1978 ? 30% -55.5% 880.00 ? 9% numa-vmstat.node1.nr_zone_active_anon
6319 ? 31% +383.1% 30528 ? 64% numa-vmstat.node1.nr_zone_inactive_anon
907485 ? 17% -39.2% 551338 ? 28% numa-vmstat.node1.numa_hit
3311 +42.4% 4714 perf-stat.i.context-switches
133.20 +1.8% 135.58 perf-stat.i.cpu-migrations
2.952e+08 +4.3% 3.078e+08 perf-stat.i.dTLB-loads
1.587e+08 +4.3% 1.655e+08 perf-stat.i.dTLB-stores
2945 +4.7% 3084 perf-stat.i.minor-faults
94.72 -1.8 92.97 perf-stat.i.node-load-miss-rate%
6976 ? 19% +65.2% 11527 ? 14% perf-stat.i.node-loads
56884 ? 12% +51.6% 86264 ? 6% perf-stat.i.node-stores
2946 +4.7% 3085 perf-stat.i.page-faults
92.90 -2.4 90.53 perf-stat.overall.node-load-miss-rate%
3305 +42.1% 4697 perf-stat.ps.context-switches
2.946e+08 +4.1% 3.067e+08 perf-stat.ps.dTLB-loads
1.584e+08 +4.1% 1.649e+08 perf-stat.ps.dTLB-stores
2939 +4.5% 3072 perf-stat.ps.minor-faults
6962 ? 19% +64.9% 11483 ? 14% perf-stat.ps.node-loads
56769 ? 12% +51.4% 85938 ? 6% perf-stat.ps.node-stores
2940 +4.5% 3073 perf-stat.ps.page-faults
5.8e+11 ? 3% -46.4% 3.106e+11 ? 4% perf-stat.total.instructions
2718 ? 3% -32.8% 1826 proc-vmstat.nr_active_anon
30918 +19.5% 36954 proc-vmstat.nr_active_file
82517 +2.3% 84385 proc-vmstat.nr_anon_pages
170379 +5.1% 179015 proc-vmstat.nr_dirtied
160.83 +32.7% 213.50 proc-vmstat.nr_dirty
87111 +2.3% 89076 proc-vmstat.nr_inactive_anon
9165 +1.9% 9340 proc-vmstat.nr_mapped
1104 +7.4% 1186 proc-vmstat.nr_page_table_pages
7386 -12.3% 6475 proc-vmstat.nr_shmem
170150 +5.0% 178704 proc-vmstat.nr_written
2718 ? 3% -32.8% 1826 proc-vmstat.nr_zone_active_anon
30918 +19.5% 36954 proc-vmstat.nr_zone_active_file
87111 +2.3% 89076 proc-vmstat.nr_zone_inactive_anon
161.33 +33.5% 215.33 proc-vmstat.nr_zone_write_pending
1722532 -29.5% 1214402 proc-vmstat.numa_hit
1606723 -31.6% 1098636 proc-vmstat.numa_local
1722459 -29.5% 1214419 proc-vmstat.pgalloc_normal
1723177 -42.0% 999350 proc-vmstat.pgfault
1598401 -32.6% 1077857 proc-vmstat.pgfree
1260337 +6.1% 1337822 proc-vmstat.pgpgout
145698 -44.0% 81595 proc-vmstat.pgreuse
34.69 ? 24% +42.8% 49.55 ? 16% sched_debug.cfs_rq:/.load_avg.avg
49.89 ? 7% +55.5% 77.57 ? 4% sched_debug.cfs_rq:/.runnable_avg.avg
633.30 ? 2% +15.3% 730.40 ? 7% sched_debug.cfs_rq:/.runnable_avg.max
116.72 ? 8% +25.6% 146.60 ? 6% sched_debug.cfs_rq:/.runnable_avg.stddev
49.75 ? 7% +55.3% 77.28 ? 4% sched_debug.cfs_rq:/.util_avg.avg
632.78 ? 2% +15.1% 728.53 ? 7% sched_debug.cfs_rq:/.util_avg.max
116.60 ? 8% +25.5% 146.31 ? 6% sched_debug.cfs_rq:/.util_avg.stddev
4.18 ? 22% +53.9% 6.44 ? 15% sched_debug.cfs_rq:/.util_est_enqueued.avg
178.60 ? 10% +39.6% 249.40 ? 10% sched_debug.cfs_rq:/.util_est_enqueued.max
22.91 ? 16% +41.2% 32.34 ? 8% sched_debug.cfs_rq:/.util_est_enqueued.stddev
116236 ? 8% +24.6% 144804 ? 7% sched_debug.cpu.avg_idle.stddev
259878 ? 5% -38.2% 160679 sched_debug.cpu.clock.avg
259881 ? 5% -38.2% 160683 sched_debug.cpu.clock.max
259874 ? 5% -38.2% 160675 sched_debug.cpu.clock.min
1.97 ? 7% +14.2% 2.26 ? 9% sched_debug.cpu.clock.stddev
255028 ? 4% -38.2% 157678 sched_debug.cpu.clock_task.avg
255665 ? 5% -38.2% 158126 sched_debug.cpu.clock_task.max
249556 ? 5% -39.2% 151775 sched_debug.cpu.clock_task.min
11619 ? 3% -22.5% 9002 sched_debug.cpu.curr->pid.max
1173 ? 4% -9.7% 1059 ? 3% sched_debug.cpu.curr->pid.stddev
0.03 ? 7% +26.7% 0.03 ? 5% sched_debug.cpu.nr_running.avg
0.15 ? 2% +10.2% 0.16 ? 2% sched_debug.cpu.nr_running.stddev
8223 ? 4% -15.8% 6924 sched_debug.cpu.nr_switches.avg
1411 ? 9% -22.7% 1090 ? 13% sched_debug.cpu.nr_switches.min
259875 ? 5% -38.2% 160676 sched_debug.cpu_clk
259153 ? 5% -38.3% 159957 sched_debug.ktime
261040 ? 5% -38.2% 161334 sched_debug.sched_clk
53.97 ? 6% -6.4 47.54 ? 2% perf-profile.calltrace.cycles-pp.mwait_idle_with_hints.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
54.36 ? 6% -6.4 47.99 ? 2% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
84.36 -2.3 82.02 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
92.94 -2.0 90.93 perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.secondary_startup_64_no_verify
85.53 -2.0 83.56 perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.secondary_startup_64_no_verify
0.92 ? 11% +0.1 1.07 ? 4% perf-profile.calltrace.cycles-pp.rcu_idle_exit.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
0.29 ?100% +0.4 0.74 ? 10% perf-profile.calltrace.cycles-pp.rcu_core.__softirqentry_text_start.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
0.29 ?101% +0.5 0.81 ? 11% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork
0.10 ?223% +0.5 0.64 ? 10% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork
3.04 ? 8% +0.5 3.58 ? 10% perf-profile.calltrace.cycles-pp.__softirqentry_text_start.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
1.18 ? 7% +0.9 2.05 ? 18% perf-profile.calltrace.cycles-pp.ret_from_fork
1.18 ? 7% +0.9 2.05 ? 18% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
28.17 ? 7% +3.8 31.99 ? 2% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
54.25 ? 6% -6.4 47.90 ? 2% perf-profile.children.cycles-pp.mwait_idle_with_hints
54.65 ? 6% -6.3 48.32 ? 2% perf-profile.children.cycles-pp.intel_idle
86.06 -2.0 84.03 perf-profile.children.cycles-pp.cpuidle_enter_state
86.29 -2.0 84.28 perf-profile.children.cycles-pp.cpuidle_enter
93.82 -2.0 91.85 perf-profile.children.cycles-pp.cpuidle_idle_call
0.07 ? 21% +0.0 0.11 ? 12% perf-profile.children.cycles-pp.can_stop_idle_tick
0.05 ? 50% +0.0 0.09 ? 26% perf-profile.children.cycles-pp.mmap_region
0.04 ? 47% +0.0 0.09 ? 22% perf-profile.children.cycles-pp.call_transmit
0.04 ? 47% +0.0 0.09 ? 22% perf-profile.children.cycles-pp.xprt_transmit
0.06 ? 11% +0.0 0.11 ? 24% perf-profile.children.cycles-pp.process_backlog
0.06 ? 17% +0.0 0.11 ? 20% perf-profile.children.cycles-pp.__local_bh_enable_ip
0.04 ? 72% +0.0 0.09 ? 29% perf-profile.children.cycles-pp.handle_irq_event
0.04 ? 72% +0.0 0.09 ? 29% perf-profile.children.cycles-pp.__handle_irq_event_percpu
0.05 ? 45% +0.1 0.10 ? 20% perf-profile.children.cycles-pp.ip6_protocol_deliver_rcu
0.05 ? 45% +0.1 0.10 ? 20% perf-profile.children.cycles-pp.tcp_v6_rcv
0.04 ? 74% +0.1 0.10 ? 27% perf-profile.children.cycles-pp.rpc_async_schedule
0.07 ? 23% +0.1 0.12 ? 21% perf-profile.children.cycles-pp.ip6_finish_output2
0.04 ? 72% +0.1 0.09 ? 30% perf-profile.children.cycles-pp.__common_interrupt
0.05 ? 45% +0.1 0.10 ? 20% perf-profile.children.cycles-pp.ip6_input_finish
0.05 ? 46% +0.1 0.10 ? 19% perf-profile.children.cycles-pp.__netif_receive_skb_one_core
0.06 ? 13% +0.1 0.11 ? 21% perf-profile.children.cycles-pp.__napi_poll
0.31 ? 10% +0.1 0.37 ? 4% perf-profile.children.cycles-pp.error_entry
0.07 ? 23% +0.1 0.13 ? 18% perf-profile.children.cycles-pp.ip6_xmit
0.04 ? 73% +0.1 0.09 ? 23% perf-profile.children.cycles-pp.xs_tcp_send_request
0.06 ? 13% +0.1 0.12 ? 18% perf-profile.children.cycles-pp.net_rx_action
0.04 ? 73% +0.1 0.09 ? 22% perf-profile.children.cycles-pp.xprt_request_transmit
0.04 ? 71% +0.1 0.09 ? 23% perf-profile.children.cycles-pp.tcp_v6_do_rcv
0.04 ? 71% +0.1 0.09 ? 23% perf-profile.children.cycles-pp.tcp_rcv_established
0.02 ?145% +0.1 0.08 ? 26% perf-profile.children.cycles-pp.inode_permission
0.07 ? 23% +0.1 0.13 ? 17% perf-profile.children.cycles-pp.inet6_csk_xmit
0.08 ? 17% +0.1 0.14 ? 14% perf-profile.children.cycles-pp.__tcp_transmit_skb
0.05 ? 48% +0.1 0.11 ? 20% perf-profile.children.cycles-pp.rpc_run_task
0.04 ? 71% +0.1 0.10 ? 23% perf-profile.children.cycles-pp.queue_work_on
0.05 ? 46% +0.1 0.11 ? 20% perf-profile.children.cycles-pp.rpc_execute
0.08 ? 23% +0.1 0.15 ? 23% perf-profile.children.cycles-pp.svc_recv
0.08 ? 25% +0.1 0.15 ? 38% perf-profile.children.cycles-pp.do_softirq
0.07 ? 9% +0.1 0.14 ? 16% perf-profile.children.cycles-pp.__tcp_push_pending_frames
0.07 ? 11% +0.1 0.14 ? 16% perf-profile.children.cycles-pp.tcp_write_xmit
0.10 ? 23% +0.1 0.18 ? 12% perf-profile.children.cycles-pp.__rpc_execute
0.08 ? 14% +0.1 0.15 ? 14% perf-profile.children.cycles-pp.__queue_work
0.07 ? 10% +0.1 0.15 ? 15% perf-profile.children.cycles-pp.tcp_sock_set_cork
0.15 ? 16% +0.1 0.24 ? 14% perf-profile.children.cycles-pp.perf_trace_sched_wakeup_template
0.13 ? 27% +0.1 0.23 ? 24% perf-profile.children.cycles-pp.open
0.22 ? 12% +0.1 0.32 ? 11% perf-profile.children.cycles-pp.try_to_wake_up
0.18 ? 18% +0.1 0.30 ? 19% perf-profile.children.cycles-pp.perf_trace_sched_switch
0.03 ?100% +0.1 0.16 ? 45% perf-profile.children.cycles-pp.btree_csum_one_bio
0.03 ?100% +0.1 0.16 ? 45% perf-profile.children.cycles-pp.csum_one_extent_buffer
0.29 ? 17% +0.2 0.44 ? 10% perf-profile.children.cycles-pp.unwind_next_frame
0.32 ? 27% +0.2 0.48 ? 14% perf-profile.children.cycles-pp.io_serial_in
0.40 ? 17% +0.2 0.59 ? 10% perf-profile.children.cycles-pp.get_perf_callchain
0.40 ? 17% +0.2 0.59 ? 10% perf-profile.children.cycles-pp.perf_callchain
0.34 ? 16% +0.2 0.53 ? 10% perf-profile.children.cycles-pp.perf_callchain_kernel
0.45 ? 18% +0.2 0.64 ? 10% perf-profile.children.cycles-pp.process_one_work
0.43 ? 16% +0.2 0.62 ? 10% perf-profile.children.cycles-pp.perf_prepare_sample
0.36 ? 19% +0.2 0.58 ? 12% perf-profile.children.cycles-pp.note_gp_changes
0.48 ? 16% +0.2 0.71 ? 11% perf-profile.children.cycles-pp.perf_event_output_forward
0.55 ? 12% +0.2 0.79 ? 8% perf-profile.children.cycles-pp.rcu_core
0.48 ? 15% +0.2 0.72 ? 11% perf-profile.children.cycles-pp.__perf_event_overflow
0.50 ? 15% +0.2 0.75 ? 11% perf-profile.children.cycles-pp.perf_tp_event
0.52 ? 14% +0.3 0.81 ? 11% perf-profile.children.cycles-pp.worker_thread
0.99 ? 13% +0.3 1.28 ? 5% perf-profile.children.cycles-pp.irqtime_account_irq
1.54 ? 12% +0.4 1.90 ? 4% perf-profile.children.cycles-pp.sched_clock_cpu
3.20 ? 8% +0.7 3.87 ? 9% perf-profile.children.cycles-pp.__softirqentry_text_start
3.88 ? 9% +0.8 4.64 ? 10% perf-profile.children.cycles-pp.__irq_exit_rcu
1.18 ? 7% +0.9 2.05 ? 18% perf-profile.children.cycles-pp.kthread
1.19 ? 7% +0.9 2.07 ? 18% perf-profile.children.cycles-pp.ret_from_fork
25.11 ? 8% +3.4 28.51 ? 2% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
54.23 ? 6% -6.4 47.83 ? 2% perf-profile.self.cycles-pp.mwait_idle_with_hints
0.22 ? 11% +0.1 0.30 ? 17% perf-profile.self.cycles-pp.sched_clock_cpu
0.32 ? 27% +0.2 0.48 ? 14% perf-profile.self.cycles-pp.io_serial_in
1.19 ? 12% +0.2 1.44 ? 4% perf-profile.self.cycles-pp.native_sched_clock
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://01.org/lkp