2021-12-15 13:58:20

by kernel test robot

[permalink] [raw]
Subject: [loop] e3f9387aea: stress-ng.loop.ops_per_sec 78.9% improvement



Greeting,

FYI, we noticed a 78.9% improvement of stress-ng.loop.ops_per_sec due to commit:


commit: e3f9387aea67742b9d1f4de8e5bb2fd08a8a4584 ("loop: Use pr_warn_once() for loop_control_remove() warning")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: stress-ng
on test machine: 96 threads 2 sockets Ice Lake with 256G memory
with following parameters:

nr_threads: 100%
disk: 1HDD
testtime: 60s
class: device
test: loop
cpufreq_governor: performance
ucode: 0xb000280






Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
class/compiler/cpufreq_governor/disk/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
device/gcc-9/performance/1HDD/x86_64-rhel-8.3/100%/debian-10.4-x86_64-20200603.cgz/lkp-icl-2sp1/loop/stress-ng/60s/0xb000280

commit:
d422f40163 ("zram: only make zram_wb_devops for CONFIG_ZRAM_WRITEBACK")
e3f9387aea ("loop: Use pr_warn_once() for loop_control_remove() warning")

d422f40163087408 e3f9387aea67742b9d1f4de8e5b
---------------- ---------------------------
%stddev %change %stddev
\ | \
3361 +79.7% 6041 ? 2% stress-ng.loop.ops
55.12 +78.9% 98.63 stress-ng.loop.ops_per_sec
4244 +61.3% 6847 ? 2% stress-ng.time.involuntary_context_switches
140.71 ? 2% +25.1% 176.00 ? 11% stress-ng.time.percent_of_cpu_this_job_got
90.04 ? 2% +24.7% 112.27 ? 12% stress-ng.time.system_time
152534 ? 8% +38.8% 211722 ? 13% stress-ng.time.voluntary_context_switches
8000 ? 4% +20.0% 9598 ? 13% meminfo.Active
7396 ? 3% +20.8% 8933 ? 14% meminfo.Active(anon)
1.78 ? 4% +0.5 2.30 ? 9% mpstat.cpu.all.sys%
0.36 ? 2% +0.1 0.50 ? 6% mpstat.cpu.all.usr%
61315 ? 2% +29.6% 79471 ? 4% turbostat.C1
0.05 +0.0 0.08 ? 11% turbostat.C1%
3804 ? 3% +56.8% 5965 ? 6% vmstat.io.bi
17445 ? 4% +40.1% 24441 ? 6% vmstat.system.cs
30900 ? 2% +56.3% 48285 ? 6% softirqs.BLOCK
10068 ? 8% -14.0% 8663 ? 11% softirqs.CPU15.SCHED
9690 ? 4% -10.3% 8692 ? 8% softirqs.CPU32.SCHED
9637 ? 7% -14.8% 8207 ? 11% softirqs.CPU52.SCHED
9340 ? 10% -14.3% 8007 ? 6% softirqs.CPU72.SCHED
9936 ? 8% -14.4% 8506 ? 11% softirqs.CPU82.SCHED
9724 ? 9% -12.5% 8510 ? 10% softirqs.CPU93.SCHED
207351 ? 9% +24.3% 257838 ? 8% softirqs.RCU
1844 ? 3% +21.1% 2233 ? 14% proc-vmstat.nr_active_anon
5929 ? 4% +9.3% 6479 ? 3% proc-vmstat.nr_page_table_pages
1844 ? 3% +21.1% 2233 ? 14% proc-vmstat.nr_zone_active_anon
645257 +14.3% 737465 ? 3% proc-vmstat.numa_hit
554490 +16.5% 645922 ? 3% proc-vmstat.numa_local
27004 +56.1% 42164 ? 4% proc-vmstat.pgactivate
682350 +17.4% 800875 ? 3% proc-vmstat.pgalloc_normal
579709 +20.2% 696873 ? 3% proc-vmstat.pgfree
261468 ? 3% +54.9% 404974 ? 7% proc-vmstat.pgpgin
30452 ? 4% +48.5% 45231 ? 11% proc-vmstat.unevictable_pgs_culled
1.351e+09 ? 8% +22.3% 1.651e+09 ? 4% perf-stat.i.branch-instructions
21750 ? 8% +30.1% 28299 ? 8% perf-stat.i.context-switches
903.42 ? 10% +37.4% 1241 ? 9% perf-stat.i.cpu-migrations
1.463e+09 ? 8% +22.8% 1.796e+09 ? 4% perf-stat.i.dTLB-loads
6.545e+08 ? 7% +22.0% 7.983e+08 ? 4% perf-stat.i.dTLB-stores
6.28e+09 ? 8% +20.9% 7.595e+09 ? 4% perf-stat.i.instructions
0.38 ? 6% +14.1% 0.43 ? 8% perf-stat.i.ipc
24.95 ? 7% -15.6% 21.06 ? 9% perf-stat.i.major-faults
32.26 ? 7% +29.9% 41.91 ? 3% perf-stat.i.metric.M/sec
986939 ? 7% +33.0% 1312402 ? 6% perf-stat.i.node-load-misses
320022 ? 7% +34.4% 430212 ? 4% perf-stat.i.node-loads
1097382 ? 7% +39.2% 1527172 ? 11% perf-stat.i.node-store-misses
1.068e+09 ? 2% +31.8% 1.408e+09 ? 4% perf-stat.ps.branch-instructions
6396422 ? 8% +29.6% 8290131 ? 5% perf-stat.ps.cache-misses
17520 ? 4% +41.0% 24709 ? 7% perf-stat.ps.context-switches
1.412e+10 ? 7% +13.9% 1.609e+10 ? 6% perf-stat.ps.cpu-cycles
732.74 ? 4% +48.3% 1086 ? 7% perf-stat.ps.cpu-migrations
1.162e+09 ? 2% +32.4% 1.539e+09 ? 3% perf-stat.ps.dTLB-loads
5.203e+08 +31.1% 6.821e+08 ? 3% perf-stat.ps.dTLB-stores
4.946e+09 ? 2% +30.6% 6.458e+09 ? 3% perf-stat.ps.instructions
838233 ? 7% +40.1% 1174167 ? 5% perf-stat.ps.node-load-misses
258560 ? 2% +44.7% 374131 perf-stat.ps.node-loads
874444 ? 9% +49.8% 1310136 ? 7% perf-stat.ps.node-store-misses
987545 ? 9% +25.4% 1238042 ? 8% perf-stat.ps.node-stores
3.284e+11 +29.0% 4.238e+11 ? 4% perf-stat.total.instructions
36.05 ? 11% -9.1 26.95 ? 18% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
35.61 ? 10% -8.9 26.66 ? 18% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
35.61 ? 10% -8.9 26.66 ? 18% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
35.55 ? 10% -8.9 26.62 ? 18% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
31.96 ? 10% -8.4 23.53 ? 18% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary
32.49 ? 9% -8.3 24.19 ? 18% perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
8.21 ? 77% -8.2 0.00 perf-profile.calltrace.cycles-pp._printk.loop_control_ioctl.cold.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe
8.21 ? 77% -8.2 0.00 perf-profile.calltrace.cycles-pp.vprintk_emit._printk.loop_control_ioctl.cold.__x64_sys_ioctl.do_syscall_64
8.21 ? 77% -8.2 0.00 perf-profile.calltrace.cycles-pp.loop_control_ioctl.cold.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl
7.66 ? 81% -7.7 0.00 perf-profile.calltrace.cycles-pp.console_unlock.vprintk_emit._printk.loop_control_ioctl.cold.__x64_sys_ioctl
7.28 ? 81% -7.3 0.00 perf-profile.calltrace.cycles-pp.serial8250_console_write.console_unlock.vprintk_emit._printk.loop_control_ioctl.cold
19.03 ? 6% -4.7 14.35 ? 17% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
0.71 ? 8% -0.4 0.32 ? 86% perf-profile.calltrace.cycles-pp.rebalance_domains.__softirqentry_text_start.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
1.12 ? 22% -0.4 0.76 ? 25% perf-profile.calltrace.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues
1.23 ? 17% +3.2 4.45 ?107% perf-profile.calltrace.cycles-pp.kernfs_remove.__kobject_del.kobject_del.blk_mq_unregister_hctx.blk_mq_unregister_dev
1.31 ? 17% +3.2 4.54 ?105% perf-profile.calltrace.cycles-pp.__kobject_del.kobject_del.blk_mq_unregister_hctx.blk_mq_unregister_dev.blk_unregister_queue
1.32 ? 17% +3.2 4.55 ?104% perf-profile.calltrace.cycles-pp.kobject_del.blk_mq_unregister_hctx.blk_mq_unregister_dev.blk_unregister_queue.del_gendisk
36.05 ? 11% -9.1 26.95 ? 18% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
36.05 ? 11% -9.1 26.95 ? 18% perf-profile.children.cycles-pp.cpu_startup_entry
36.04 ? 11% -9.1 26.95 ? 18% perf-profile.children.cycles-pp.do_idle
35.61 ? 10% -8.9 26.66 ? 18% perf-profile.children.cycles-pp.start_secondary
32.91 ? 10% -8.4 24.47 ? 18% perf-profile.children.cycles-pp.cpuidle_enter
32.87 ? 10% -8.4 24.44 ? 18% perf-profile.children.cycles-pp.cpuidle_enter_state
8.21 ? 77% -8.2 0.00 perf-profile.children.cycles-pp.loop_control_ioctl.cold
19.26 ? 6% -4.7 14.52 ? 17% perf-profile.children.cycles-pp.intel_idle
1.18 ? 22% -0.4 0.82 ? 23% perf-profile.children.cycles-pp.scheduler_tick
0.73 ? 8% -0.2 0.50 ? 20% perf-profile.children.cycles-pp.rebalance_domains
0.60 ? 12% -0.2 0.44 ? 15% perf-profile.children.cycles-pp.native_irq_return_iret
0.43 ? 8% -0.1 0.31 ? 15% perf-profile.children.cycles-pp.lapic_next_deadline
0.39 ? 7% -0.1 0.29 ? 21% perf-profile.children.cycles-pp.read_tsc
0.26 ? 11% -0.1 0.18 ? 22% perf-profile.children.cycles-pp.arch_scale_freq_tick
0.29 ? 7% -0.1 0.22 ? 17% perf-profile.children.cycles-pp.native_apic_msr_eoi_write
0.10 ? 11% -0.0 0.06 ? 44% perf-profile.children.cycles-pp.call_cpuidle
0.09 ? 11% -0.0 0.06 ? 43% perf-profile.children.cycles-pp.schedule_timeout
0.22 ? 24% +0.4 0.64 ? 84% perf-profile.children.cycles-pp.device_add
0.67 ? 22% +0.7 1.38 ? 60% perf-profile.children.cycles-pp.kernfs_add_one
1.81 ? 17% +1.0 2.81 ? 27% perf-profile.children.cycles-pp.device_add_disk
1.32 ? 17% +3.2 4.55 ?104% perf-profile.children.cycles-pp.blk_mq_unregister_hctx
1.38 ? 16% +3.2 4.62 ?102% perf-profile.children.cycles-pp.blk_mq_unregister_dev
1.50 ? 18% +3.3 4.81 ?100% perf-profile.children.cycles-pp.kernfs_remove
1.57 ? 17% +3.3 4.90 ? 98% perf-profile.children.cycles-pp.blk_unregister_queue
1.61 ? 17% +3.4 5.05 ? 98% perf-profile.children.cycles-pp.kobject_del
2.38 ? 20% +6.7 9.05 ?115% perf-profile.children.cycles-pp.rwsem_down_write_slowpath
2.38 ? 17% +7.1 9.50 ?129% perf-profile.children.cycles-pp.del_gendisk
19.24 ? 6% -4.7 14.51 ? 17% perf-profile.self.cycles-pp.intel_idle
1.69 ? 9% -0.5 1.22 ? 17% perf-profile.self.cycles-pp.cpuidle_enter_state
0.60 ? 12% -0.2 0.44 ? 15% perf-profile.self.cycles-pp.native_irq_return_iret
0.43 ? 8% -0.1 0.31 ? 15% perf-profile.self.cycles-pp.lapic_next_deadline
0.38 ? 7% -0.1 0.28 ? 21% perf-profile.self.cycles-pp.read_tsc
0.26 ? 12% -0.1 0.17 ? 21% perf-profile.self.cycles-pp.arch_scale_freq_tick
0.29 ? 6% -0.1 0.22 ? 19% perf-profile.self.cycles-pp.native_apic_msr_eoi_write
0.09 ? 9% -0.0 0.06 ? 44% perf-profile.self.cycles-pp.call_cpuidle




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (11.91 kB)
config-5.16.0-rc1-00023-ge3f9387aea67 (169.45 kB)
job-script (8.11 kB)
job.yaml (5.54 kB)
reproduce (336.00 B)
Download all attachments

2021-12-15 16:02:24

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [loop] e3f9387aea: stress-ng.loop.ops_per_sec 78.9% improvement

On 2021/12/15 22:58, kernel test robot wrote:
>
>
> Greeting,
>
> FYI, we noticed a 78.9% improvement of stress-ng.loop.ops_per_sec due to commit:
>
>
> commit: e3f9387aea67742b9d1f4de8e5bb2fd08a8a4584 ("loop: Use pr_warn_once() for loop_control_remove() warning")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>

Avoiding printk() flooding will reduce latency, but I'm curious
why the testing program is trying ioctl(fd, LOOP_CTL_REMOVE, -1) ?
Since such ioctl() is pointless, measuring ops_per_sec is also pointless...