2023-09-22 17:28:19

by kernel test robot

[permalink] [raw]
Subject: [linus:master] [workqueue] 636b927eba: stress-ng.io.ops_per_sec 19.5% improvement



Hello,

kernel test robot noticed a 19.5% improvement of stress-ng.io.ops_per_sec on:


commit: 636b927eba5bc633753f8eb80f35e1d5be806e51 ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz (Skylake) with 32G memory
parameters:

nr_threads: 10%
disk: 1SSD
testtime: 60s
fs: xfs
class: filesystem
test: io
cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230922/[email protected]

=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
filesystem/gcc-12/performance/1SSD/xfs/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-skl-d08/io/stress-ng/60s

commit:
4cbfd3de73 ("workqueue: Call wq_update_unbound_numa() on all CPUs in NUMA node on CPU hotplug")
636b927eba ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues")

4cbfd3de737b9d00 636b927eba5bc633753f8eb80f3
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.53 ? 2% +0.3 1.82 ? 3% mpstat.cpu.all.usr%
0.04 ? 25% -58.2% 0.02 ? 43% perf-sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork
7.29 -2.7% 7.09 iostat.cpu.system
1.52 ? 2% +18.2% 1.80 ? 3% iostat.cpu.user
58.72 ? 46% -63.9% 21.18 ? 50% sched_debug.cfs_rq:/.removed.load_avg.avg
205.63 ? 27% -47.3% 108.41 ? 52% sched_debug.cfs_rq:/.removed.load_avg.stddev
0.13 ? 3% +13.8% 0.15 ? 4% turbostat.IPC
82.74 +1.5% 83.95 turbostat.PkgWatt
2954572 +19.5% 3529576 ? 4% stress-ng.io.ops
49242 +19.5% 58826 ? 4% stress-ng.io.ops_per_sec
151.67 -3.8% 145.86 stress-ng.time.system_time
27.02 +21.6% 32.86 ? 4% stress-ng.time.user_time
1.017e+09 +21.7% 1.238e+09 ? 3% perf-stat.i.branch-instructions
2.07 -0.4 1.71 ? 3% perf-stat.i.branch-miss-rate%
1.42e+08 +20.9% 1.717e+08 ? 2% perf-stat.i.cache-references
2.54 -17.6% 2.09 ? 4% perf-stat.i.cpi
0.13 -0.0 0.12 perf-stat.i.dTLB-load-miss-rate%
1359466 +19.1% 1618588 ? 4% perf-stat.i.dTLB-load-misses
1.134e+09 +19.6% 1.356e+09 ? 3% perf-stat.i.dTLB-loads
0.00 ? 7% -0.0 0.00 ? 3% perf-stat.i.dTLB-store-miss-rate%
5.421e+08 +19.6% 6.483e+08 ? 3% perf-stat.i.dTLB-stores
63.26 ? 4% +6.1 69.35 ? 2% perf-stat.i.iTLB-load-miss-rate%
5.08e+09 +20.7% 6.131e+09 ? 3% perf-stat.i.instructions
0.42 +19.3% 0.50 ? 3% perf-stat.i.ipc
78.71 +20.4% 94.79 ? 3% perf-stat.i.metric.M/sec
2.23 -0.4 1.85 ? 3% perf-stat.overall.branch-miss-rate%
0.33 ? 4% -0.1 0.28 ? 6% perf-stat.overall.cache-miss-rate%
2.44 -16.8% 2.03 ? 3% perf-stat.overall.cpi
0.00 ? 4% -0.0 0.00 ? 3% perf-stat.overall.dTLB-store-miss-rate%
62.97 ? 5% +7.3 70.29 ? 3% perf-stat.overall.iTLB-load-miss-rate%
0.41 +20.4% 0.49 ? 3% perf-stat.overall.ipc
1.001e+09 +21.7% 1.218e+09 ? 3% perf-stat.ps.branch-instructions
1.398e+08 +20.9% 1.69e+08 ? 2% perf-stat.ps.cache-references
1337922 +19.1% 1592892 ? 4% perf-stat.ps.dTLB-load-misses
1.116e+09 +19.6% 1.334e+09 ? 3% perf-stat.ps.dTLB-loads
5.335e+08 +19.6% 6.38e+08 ? 3% perf-stat.ps.dTLB-stores
4.999e+09 +20.7% 6.033e+09 ? 3% perf-stat.ps.instructions
3.167e+11 +20.4% 3.811e+11 ? 3% perf-stat.total.instructions
21.48 ? 3% -7.5 13.96 ? 12% perf-profile.calltrace.cycles-pp._raw_spin_lock.iterate_supers.ksys_sync.__x64_sys_sync.do_syscall_64
18.33 ? 3% -7.0 11.30 ? 14% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.iterate_supers.ksys_sync.__x64_sys_sync
35.16 ? 3% -6.9 28.21 ? 6% perf-profile.calltrace.cycles-pp.iterate_supers.ksys_sync.__x64_sys_sync.do_syscall_64.entry_SYSCALL_64_after_hwframe
36.20 ? 3% -6.9 29.35 ? 6% perf-profile.calltrace.cycles-pp.ksys_sync.__x64_sys_sync.do_syscall_64.entry_SYSCALL_64_after_hwframe.sync
36.20 ? 3% -6.9 29.35 ? 6% perf-profile.calltrace.cycles-pp.__x64_sys_sync.do_syscall_64.entry_SYSCALL_64_after_hwframe.sync
36.44 ? 3% -6.8 29.59 ? 6% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.sync
36.54 ? 3% -6.8 29.71 ? 6% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.sync
36.86 ? 3% -6.8 30.08 ? 6% perf-profile.calltrace.cycles-pp.sync
29.64 ? 8% -5.1 24.54 ? 15% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
29.64 ? 8% -5.1 24.54 ? 15% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
29.63 ? 8% -5.1 24.54 ? 15% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
29.45 ? 8% -5.1 24.37 ? 15% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
29.07 ? 8% -5.1 24.00 ? 15% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
29.78 ? 8% -4.1 25.66 ? 7% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
28.89 ? 8% -4.1 24.82 ? 7% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
26.19 ? 8% -4.0 22.20 ? 8% perf-profile.calltrace.cycles-pp.intel_idle_ibrs.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
9.73 ? 3% -1.8 7.94 ? 4% perf-profile.calltrace.cycles-pp.down_read.iterate_supers.ksys_sync.__x64_sys_sync.do_syscall_64
1.34 ? 5% +0.2 1.50 ? 4% perf-profile.calltrace.cycles-pp._find_next_bit.get_nr_inodes.get_nr_dirty_inodes.writeback_inodes_sb.sync_filesystem
2.11 ? 5% +0.2 2.32 ? 6% perf-profile.calltrace.cycles-pp.__entry_text_start.syncfs
1.14 ? 7% +0.2 1.36 ? 8% perf-profile.calltrace.cycles-pp.up_read.iterate_supers.ksys_sync.__x64_sys_sync.do_syscall_64
2.48 ? 6% +0.4 2.84 ? 4% perf-profile.calltrace.cycles-pp.down_read.__x64_sys_syncfs.do_syscall_64.entry_SYSCALL_64_after_hwframe.syncfs
4.64 ? 5% +0.6 5.23 ? 5% perf-profile.calltrace.cycles-pp.get_nr_inodes.get_nr_dirty_inodes.writeback_inodes_sb.sync_filesystem.__x64_sys_syncfs
4.99 ? 6% +0.6 5.58 ? 3% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.syncfs
4.33 ? 4% +0.6 4.97 ? 5% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syncfs
9.99 ? 5% +1.2 11.21 ? 4% perf-profile.calltrace.cycles-pp.get_nr_dirty_inodes.writeback_inodes_sb.sync_filesystem.__x64_sys_syncfs.do_syscall_64
10.31 ? 5% +1.3 11.56 ? 4% perf-profile.calltrace.cycles-pp.writeback_inodes_sb.sync_filesystem.__x64_sys_syncfs.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +1.3 1.35 ? 8% perf-profile.calltrace.cycles-pp.mutex_spin_on_owner.__mutex_lock.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq
0.00 +1.8 1.78 ? 8% perf-profile.calltrace.cycles-pp.__mutex_lock.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force
0.00 +2.6 2.64 ? 5% perf-profile.calltrace.cycles-pp.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force.xfs_fs_sync_fs.iterate_supers
0.00 +2.7 2.66 ? 5% perf-profile.calltrace.cycles-pp.xlog_cil_force_seq.xfs_log_force.xfs_fs_sync_fs.iterate_supers.ksys_sync
0.00 +2.7 2.69 ? 5% perf-profile.calltrace.cycles-pp.xfs_log_force.xfs_fs_sync_fs.iterate_supers.ksys_sync.__x64_sys_sync
0.00 +2.7 2.70 ? 5% perf-profile.calltrace.cycles-pp.xfs_fs_sync_fs.iterate_supers.ksys_sync.__x64_sys_sync.do_syscall_64
0.00 +6.5 6.49 ? 7% perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.flush_workqueue_prep_pwqs.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq
0.66 ? 9% +7.0 7.63 ? 6% perf-profile.calltrace.cycles-pp.xlog_cil_force_seq.xfs_log_force.xfs_fs_sync_fs.sync_filesystem.__x64_sys_syncfs
0.62 ? 10% +7.0 7.59 ? 6% perf-profile.calltrace.cycles-pp.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force.xfs_fs_sync_fs.sync_filesystem
0.70 ? 9% +7.0 7.68 ? 6% perf-profile.calltrace.cycles-pp.xfs_log_force.xfs_fs_sync_fs.sync_filesystem.__x64_sys_syncfs.do_syscall_64
0.77 ? 8% +7.0 7.76 ? 6% perf-profile.calltrace.cycles-pp.xfs_fs_sync_fs.sync_filesystem.__x64_sys_syncfs.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +7.5 7.46 ? 6% perf-profile.calltrace.cycles-pp.flush_workqueue_prep_pwqs.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force
12.60 ? 5% +8.5 21.12 ? 5% perf-profile.calltrace.cycles-pp.sync_filesystem.__x64_sys_syncfs.do_syscall_64.entry_SYSCALL_64_after_hwframe.syncfs
16.57 ? 5% +9.2 25.72 ? 4% perf-profile.calltrace.cycles-pp.__x64_sys_syncfs.do_syscall_64.entry_SYSCALL_64_after_hwframe.syncfs
0.50 ? 45% +9.7 10.18 ? 6% perf-profile.calltrace.cycles-pp.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force.xfs_fs_sync_fs
21.43 ? 5% +9.8 31.22 ? 4% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syncfs
24.17 ? 5% +10.1 34.30 ? 4% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syncfs
31.94 ? 5% +11.0 42.94 ? 4% perf-profile.calltrace.cycles-pp.syncfs
22.38 ? 3% -7.5 14.88 ? 11% perf-profile.children.cycles-pp._raw_spin_lock
18.34 ? 3% -7.0 11.30 ? 14% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
35.22 ? 3% -6.9 28.30 ? 6% perf-profile.children.cycles-pp.iterate_supers
36.20 ? 3% -6.9 29.35 ? 6% perf-profile.children.cycles-pp.__x64_sys_sync
36.20 ? 3% -6.9 29.35 ? 6% perf-profile.children.cycles-pp.ksys_sync
36.88 ? 3% -6.8 30.10 ? 6% perf-profile.children.cycles-pp.sync
29.64 ? 8% -5.1 24.54 ? 15% perf-profile.children.cycles-pp.start_secondary
29.78 ? 8% -4.1 25.66 ? 7% perf-profile.children.cycles-pp.do_idle
29.78 ? 8% -4.1 25.66 ? 7% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
29.78 ? 8% -4.1 25.66 ? 7% perf-profile.children.cycles-pp.cpu_startup_entry
29.20 ? 8% -4.1 25.10 ? 7% perf-profile.children.cycles-pp.cpuidle_enter
29.59 ? 8% -4.1 25.50 ? 7% perf-profile.children.cycles-pp.cpuidle_idle_call
29.19 ? 8% -4.1 25.10 ? 7% perf-profile.children.cycles-pp.cpuidle_enter_state
26.26 ? 8% -4.0 22.28 ? 8% perf-profile.children.cycles-pp.intel_idle_ibrs
12.26 ? 3% -1.4 10.84 ? 3% perf-profile.children.cycles-pp.down_read
1.89 ? 12% -0.3 1.62 ? 6% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.23 ? 7% -0.1 0.18 ? 11% perf-profile.children.cycles-pp.ktime_get
0.09 ? 10% +0.1 0.14 ? 11% perf-profile.children.cycles-pp.up_write
0.16 ? 14% +0.1 0.22 ? 10% perf-profile.children.cycles-pp.sync_fs_one_sb
0.36 ? 7% +0.1 0.44 ? 4% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.47 ? 7% +0.1 0.55 ? 5% perf-profile.children.cycles-pp.__fget_light
0.38 ? 11% +0.1 0.47 ? 6% perf-profile.children.cycles-pp.mutex_lock
0.46 ? 9% +0.1 0.56 ? 6% perf-profile.children.cycles-pp.__cond_resched
1.24 ? 5% +0.2 1.44 ? 5% perf-profile.children.cycles-pp.sync_inodes_sb
0.00 +0.2 0.22 ? 13% perf-profile.children.cycles-pp.osq_lock
0.44 ? 10% +0.2 0.66 ? 7% perf-profile.children.cycles-pp.mutex_unlock
2.51 ? 6% +0.3 2.77 ? 6% perf-profile.children.cycles-pp.__entry_text_start
1.46 ? 8% +0.3 1.81 ? 6% perf-profile.children.cycles-pp.up_read
4.66 ? 4% +0.6 5.28 ? 5% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
4.92 ? 5% +0.6 5.55 ? 4% perf-profile.children.cycles-pp.get_nr_inodes
10.29 ? 5% +1.2 11.54 ? 4% perf-profile.children.cycles-pp.get_nr_dirty_inodes
10.32 ? 5% +1.3 11.58 ? 4% perf-profile.children.cycles-pp.writeback_inodes_sb
0.00 +1.6 1.62 ? 7% perf-profile.children.cycles-pp.mutex_spin_on_owner
0.00 +2.1 2.12 ? 7% perf-profile.children.cycles-pp.__mutex_lock
0.25 ? 11% +6.4 6.63 ? 7% perf-profile.children.cycles-pp._raw_spin_lock_irq
0.49 ? 9% +7.0 7.50 ? 6% perf-profile.children.cycles-pp.flush_workqueue_prep_pwqs
12.65 ? 5% +8.5 21.16 ? 5% perf-profile.children.cycles-pp.sync_filesystem
16.61 ? 5% +9.2 25.76 ? 4% perf-profile.children.cycles-pp.__x64_sys_syncfs
0.86 ? 10% +9.3 10.19 ? 6% perf-profile.children.cycles-pp.__flush_workqueue
0.91 ? 8% +9.3 10.24 ? 6% perf-profile.children.cycles-pp.xlog_cil_push_now
0.97 ? 8% +9.3 10.30 ? 6% perf-profile.children.cycles-pp.xlog_cil_force_seq
1.11 ? 8% +9.4 10.46 ? 6% perf-profile.children.cycles-pp.xfs_fs_sync_fs
1.02 ? 9% +9.4 10.38 ? 6% perf-profile.children.cycles-pp.xfs_log_force
32.30 ? 5% +11.1 43.37 ? 4% perf-profile.children.cycles-pp.syncfs
18.22 ? 3% -7.0 11.25 ? 14% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
26.26 ? 8% -4.0 22.27 ? 8% perf-profile.self.cycles-pp.intel_idle_ibrs
11.86 ? 3% -1.5 10.36 ? 3% perf-profile.self.cycles-pp.down_read
3.99 ? 2% -0.5 3.53 ? 4% perf-profile.self.cycles-pp._raw_spin_lock
1.62 ? 6% -0.4 1.27 ? 9% perf-profile.self.cycles-pp.iterate_supers
0.12 ? 12% -0.0 0.09 ? 13% perf-profile.self.cycles-pp.ktime_get
0.02 ?141% +0.0 0.06 ? 13% perf-profile.self.cycles-pp.writeback_inodes_sb
0.08 ? 13% +0.1 0.14 ? 12% perf-profile.self.cycles-pp.up_write
0.16 ? 14% +0.1 0.22 ? 10% perf-profile.self.cycles-pp.sync_fs_one_sb
0.39 ? 7% +0.1 0.46 ? 7% perf-profile.self.cycles-pp.syncfs
0.31 ? 8% +0.1 0.38 ? 4% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.46 ? 7% +0.1 0.54 ? 6% perf-profile.self.cycles-pp.__fget_light
0.34 ? 10% +0.1 0.43 ? 6% perf-profile.self.cycles-pp.mutex_lock
0.30 ? 13% +0.1 0.39 ? 7% perf-profile.self.cycles-pp.__cond_resched
0.32 ? 7% +0.1 0.43 ? 6% perf-profile.self.cycles-pp.sync_filesystem
0.00 +0.2 0.22 ? 13% perf-profile.self.cycles-pp.osq_lock
0.43 ? 10% +0.2 0.65 ? 7% perf-profile.self.cycles-pp.mutex_unlock
0.00 +0.2 0.24 ? 8% perf-profile.self.cycles-pp.__mutex_lock
1.40 ? 7% +0.3 1.72 ? 6% perf-profile.self.cycles-pp.up_read
2.99 ? 5% +0.4 3.36 ? 3% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
4.21 ? 5% +0.5 4.71 ? 5% perf-profile.self.cycles-pp.get_nr_dirty_inodes
3.82 ? 4% +0.5 4.33 ? 5% perf-profile.self.cycles-pp.get_nr_inodes
4.48 ? 3% +0.6 5.11 ? 6% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.26 ? 6% +0.6 0.91 ? 7% perf-profile.self.cycles-pp.flush_workqueue_prep_pwqs
0.00 +1.6 1.61 ? 7% perf-profile.self.cycles-pp.mutex_spin_on_owner
0.24 ? 11% +6.3 6.58 ? 7% perf-profile.self.cycles-pp._raw_spin_lock_irq




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki