2024-04-01 15:32:51

by Oliver Sang

[permalink] [raw]
Subject: [linus:master] [sock_diag] f44e64990b: stress-ng.sockdiag.ops_per_sec 147.0% improvement



Hello,

kernel test robot noticed a 147.0% improvement of stress-ng.sockdiag.ops_per_sec on:


commit: f44e64990beb41167bd7c313d90bcf7e290c3582 ("sock_diag: remove sock_diag_mutex")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:

nr_threads: 100%
testtime: 60s
test: sockdiag
cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240401/[email protected]

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/sockdiag/stress-ng/60s

commit:
86e8921df0 ("sock_diag: allow concurrent operation in sock_diag_rcv_msg()")
f44e64990b ("sock_diag: remove sock_diag_mutex")

86e8921df05c6e94 f44e64990beb41167bd7c313d90
---------------- ---------------------------
%stddev %change %stddev
\ | \
6805 ? 37% +630.7% 49725 ?137% numa-meminfo.node0.Active
6767 ? 37% +634.2% 49687 ?138% numa-meminfo.node0.Active(anon)
7690 +906.4% 77394 ? 48% vmstat.system.cs
420471 +6.2% 446552 vmstat.system.in
2.491e+08 +147.0% 6.154e+08 ? 40% stress-ng.sockdiag.ops
4152375 +147.0% 10257279 ? 40% stress-ng.sockdiag.ops_per_sec
86849 +350.0% 390836 ? 28% stress-ng.time.involuntary_context_switches
0.55 -0.3 0.30 ? 20% mpstat.cpu.all.irq%
0.10 ? 3% +0.0 0.15 ? 22% mpstat.cpu.all.soft%
0.46 +0.1 0.54 ? 2% mpstat.cpu.all.usr%
46.33 ? 12% -84.2% 7.33 ? 84% mpstat.max_utilization.seconds
2234616 ? 2% +136.2% 5279086 ? 37% numa-numastat.node0.local_node
2378097 +124.6% 5342166 ? 36% numa-numastat.node0.numa_hit
2678667 ? 2% +108.5% 5584120 ? 35% numa-numastat.node1.local_node
2768310 ? 3% +107.9% 5755443 ? 34% numa-numastat.node1.numa_hit
1211899 +13.3% 1372481 ? 2% meminfo.Inactive
1211695 +13.3% 1372284 ? 2% meminfo.Inactive(anon)
540274 +25.9% 680362 ? 7% meminfo.Mapped
449208 +8.6% 487827 ? 7% meminfo.SUnreclaim
862353 +23.0% 1060355 ? 3% meminfo.Shmem
161.00 ? 21% +579.5% 1094 ? 64% perf-c2c.DRAM.local
1480 ? 15% +661.4% 11271 ? 57% perf-c2c.DRAM.remote
1391 ? 14% +1182.4% 17843 ? 65% perf-c2c.HITM.local
585.00 ? 10% +1199.8% 7604 ? 59% perf-c2c.HITM.remote
1976 ? 13% +1187.6% 25447 ? 63% perf-c2c.HITM.total
965151 ? 3% -47.0% 511917 ? 6% sched_debug.cpu.avg_idle.avg
225203 ? 48% -84.8% 34261 ?130% sched_debug.cpu.avg_idle.min
1759 ? 6% +542.3% 11302 ? 45% sched_debug.cpu.nr_switches.avg
899.42 +738.6% 7542 ? 42% sched_debug.cpu.nr_switches.min
-30.17 +221.8% -97.08 sched_debug.cpu.nr_uninterruptible.min
1739 ? 37% +612.9% 12403 ?138% numa-vmstat.node0.nr_active_anon
1739 ? 37% +612.9% 12403 ?138% numa-vmstat.node0.nr_zone_active_anon
2377796 +124.5% 5337172 ? 36% numa-vmstat.node0.numa_hit
2234316 ? 2% +136.0% 5274091 ? 37% numa-vmstat.node0.numa_local
2767474 ? 3% +107.8% 5750481 ? 34% numa-vmstat.node1.numa_hit
2677832 ? 2% +108.3% 5579160 ? 35% numa-vmstat.node1.numa_local
980143 +5.0% 1028901 proc-vmstat.nr_file_pages
303091 +13.2% 342957 ? 2% proc-vmstat.nr_inactive_anon
40864 +1.6% 41510 proc-vmstat.nr_kernel_stack
135507 +25.7% 170340 ? 7% proc-vmstat.nr_mapped
215970 +22.6% 264729 ? 3% proc-vmstat.nr_shmem
41429 +7.8% 44664 ? 7% proc-vmstat.nr_slab_reclaimable
112306 +8.7% 122083 ? 7% proc-vmstat.nr_slab_unreclaimable
303091 +13.2% 342957 ? 2% proc-vmstat.nr_zone_inactive_anon
37590 ? 28% +51.2% 56819 ? 18% proc-vmstat.numa_hint_faults
5148855 +115.5% 11093970 ? 35% proc-vmstat.numa_hit
4915589 +120.9% 10859566 ? 36% proc-vmstat.numa_local
206083 ? 27% +58.4% 326447 ? 14% proc-vmstat.numa_pte_updates
32486467 +143.2% 79020889 ? 39% proc-vmstat.pgalloc_normal
759303 +16.3% 882814 ? 3% proc-vmstat.pgfault
32050628 +144.9% 78486695 ? 40% proc-vmstat.pgfree
0.13 ? 7% +536.1% 0.85 ? 21% perf-stat.i.MPKI
3.083e+10 -56.4% 1.344e+10 ? 2% perf-stat.i.branch-instructions
0.19 ? 3% +6870.7 6870 ?104% perf-stat.i.branch-miss-rate%
42989880 ? 2% +2e+06% 8.623e+11 ?101% perf-stat.i.branch-misses
16796111 ? 9% +189.6% 48642444 ? 25% perf-stat.i.cache-misses
68857289 ? 5% +196.5% 2.042e+08 ? 12% perf-stat.i.cache-references
7918 +929.7% 81533 ? 44% perf-stat.i.context-switches
3.94 +165.6% 10.46 perf-stat.i.cpi
39043 ? 10% -64.0% 14047 ? 19% perf-stat.i.cycles-between-cache-misses
1.541e+11 -62.6% 5.76e+10 ? 3% perf-stat.i.instructions
0.26 -61.2% 0.10 perf-stat.i.ipc
0.10 ? 92% +479.0% 0.56 ? 28% perf-stat.i.major-faults
12344 +20.7% 14898 ? 3% perf-stat.i.minor-faults
12345 +20.7% 14899 ? 3% perf-stat.i.page-faults
0.11 ? 9% +685.5% 0.84 ? 21% perf-stat.overall.MPKI
0.12 ? 2% +9674.7 9674 ?101% perf-stat.overall.branch-miss-rate%
4.00 +166.1% 10.63 perf-stat.overall.cpi
37756 ? 10% -65.1% 13184 ? 18% perf-stat.overall.cycles-between-cache-misses
0.25 -62.4% 0.09 perf-stat.overall.ipc
2.952e+10 -56.5% 1.284e+10 ? 2% perf-stat.ps.branch-instructions
35366132 ? 2% +3.6e+06% 1.256e+12 ?100% perf-stat.ps.branch-misses
15767609 ? 9% +194.8% 46490063 ? 26% perf-stat.ps.cache-misses
67236264 ? 4% +194.1% 1.977e+08 ? 12% perf-stat.ps.cache-references
7505 +941.8% 78193 ? 47% perf-stat.ps.context-switches
1.475e+11 -62.7% 5.497e+10 ? 3% perf-stat.ps.instructions
0.08 ? 88% +399.3% 0.41 ? 28% perf-stat.ps.major-faults
10427 ? 2% +19.6% 12474 ? 3% perf-stat.ps.minor-faults
10428 ? 2% +19.6% 12475 ? 3% perf-stat.ps.page-faults
8.86e+12 -62.6% 3.315e+12 ? 3% perf-stat.total.instructions
99.55 -99.6 0.00 perf-profile.calltrace.cycles-pp.sock_diag_rcv.netlink_unicast.netlink_sendmsg.____sys_sendmsg.___sys_sendmsg
99.10 -99.1 0.00 perf-profile.calltrace.cycles-pp.__mutex_lock.sock_diag_rcv.netlink_unicast.netlink_sendmsg.____sys_sendmsg
98.57 -98.6 0.00 perf-profile.calltrace.cycles-pp.osq_lock.__mutex_lock.sock_diag_rcv.netlink_unicast.netlink_sendmsg
99.57 -62.8 36.75 ?107% perf-profile.calltrace.cycles-pp.netlink_unicast.netlink_sendmsg.____sys_sendmsg.___sys_sendmsg.__sys_sendmsg
99.58 -62.8 36.82 ?107% perf-profile.calltrace.cycles-pp.netlink_sendmsg.____sys_sendmsg.___sys_sendmsg.__sys_sendmsg.do_syscall_64
99.58 -62.8 36.82 ?107% perf-profile.calltrace.cycles-pp.____sys_sendmsg.___sys_sendmsg.__sys_sendmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe
99.60 -62.8 36.84 ?107% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendmsg
99.60 -62.8 36.84 ?107% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.sendmsg
99.60 -62.8 36.84 ?107% perf-profile.calltrace.cycles-pp.sendmsg
99.59 -62.8 36.83 ?107% perf-profile.calltrace.cycles-pp.___sys_sendmsg.__sys_sendmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendmsg
99.59 -62.8 36.83 ?107% perf-profile.calltrace.cycles-pp.__sys_sendmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendmsg
0.00 +36.2 36.16 ?108% perf-profile.calltrace.cycles-pp._raw_spin_lock.unix_diag_dump.netlink_dump.__netlink_dump_start.unix_diag_handler_dump
0.00 +36.7 36.65 ?107% perf-profile.calltrace.cycles-pp.unix_diag_dump.netlink_dump.__netlink_dump_start.unix_diag_handler_dump.sock_diag_rcv_msg
0.00 +36.7 36.69 ?107% perf-profile.calltrace.cycles-pp.netlink_dump.__netlink_dump_start.unix_diag_handler_dump.sock_diag_rcv_msg.netlink_rcv_skb
0.00 +36.7 36.70 ?107% perf-profile.calltrace.cycles-pp.__netlink_dump_start.unix_diag_handler_dump.sock_diag_rcv_msg.netlink_rcv_skb.netlink_unicast
0.00 +36.7 36.70 ?107% perf-profile.calltrace.cycles-pp.unix_diag_handler_dump.sock_diag_rcv_msg.netlink_rcv_skb.netlink_unicast.netlink_sendmsg
0.00 +36.7 36.72 ?107% perf-profile.calltrace.cycles-pp.sock_diag_rcv_msg.netlink_rcv_skb.netlink_unicast.netlink_sendmsg.____sys_sendmsg
0.00 +36.7 36.72 ?107% perf-profile.calltrace.cycles-pp.netlink_rcv_skb.netlink_unicast.netlink_sendmsg.____sys_sendmsg.___sys_sendmsg
99.55 -99.6 0.00 perf-profile.children.cycles-pp.sock_diag_rcv
99.10 -99.1 0.00 perf-profile.children.cycles-pp.__mutex_lock
98.60 -98.6 0.00 perf-profile.children.cycles-pp.osq_lock
99.57 -62.8 36.75 ?107% perf-profile.children.cycles-pp.netlink_unicast
99.58 -62.8 36.82 ?107% perf-profile.children.cycles-pp.netlink_sendmsg
99.58 -62.8 36.82 ?107% perf-profile.children.cycles-pp.____sys_sendmsg
99.60 -62.8 36.85 ?107% perf-profile.children.cycles-pp.sendmsg
99.59 -62.8 36.83 ?107% perf-profile.children.cycles-pp.___sys_sendmsg
99.59 -62.8 36.83 ?107% perf-profile.children.cycles-pp.__sys_sendmsg
0.51 ? 2% -0.3 0.22 ? 27% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.50 ? 2% -0.3 0.21 ? 27% perf-profile.children.cycles-pp.hrtimer_interrupt
0.62 ? 2% -0.3 0.35 ? 18% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.64 ? 2% -0.3 0.37 ? 15% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.38 ? 3% -0.2 0.15 ? 22% perf-profile.children.cycles-pp.tick_nohz_highres_handler
0.39 ? 3% -0.2 0.17 ? 16% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.36 ? 4% -0.2 0.14 ? 21% perf-profile.children.cycles-pp.tick_sched_handle
0.36 ? 3% -0.2 0.14 ? 21% perf-profile.children.cycles-pp.update_process_times
0.31 ? 4% -0.2 0.12 ? 19% perf-profile.children.cycles-pp.scheduler_tick
0.24 ? 3% -0.2 0.08 ? 31% perf-profile.children.cycles-pp.task_tick_fair
0.17 ? 6% -0.0 0.12 ? 14% perf-profile.children.cycles-pp.main
0.17 ? 6% -0.0 0.12 ? 14% perf-profile.children.cycles-pp.run_builtin
0.17 ? 6% -0.0 0.13 ? 15% perf-profile.children.cycles-pp.cmd_record
0.17 ? 5% -0.0 0.12 ? 14% perf-profile.children.cycles-pp.record__mmap_read_evlist
0.16 ? 5% -0.0 0.12 ? 12% perf-profile.children.cycles-pp.perf_mmap__push
0.09 ? 5% -0.0 0.08 ? 10% perf-profile.children.cycles-pp.writen
0.09 ? 4% -0.0 0.08 ? 10% perf-profile.children.cycles-pp.write
0.08 ? 5% -0.0 0.07 ? 7% perf-profile.children.cycles-pp.ksys_write
0.07 ? 5% -0.0 0.06 ? 8% perf-profile.children.cycles-pp.shmem_file_write_iter
0.10 +0.0 0.13 ? 5% perf-profile.children.cycles-pp.irq_exit_rcu
0.09 ? 4% +0.0 0.13 ? 26% perf-profile.children.cycles-pp.rcu_core
0.10 ? 3% +0.0 0.14 ? 17% perf-profile.children.cycles-pp.__do_softirq
0.05 +0.1 0.12 ? 48% perf-profile.children.cycles-pp.__sys_recvmsg
0.06 ? 8% +0.1 0.14 ? 47% perf-profile.children.cycles-pp.recvmsg
0.00 +0.1 0.09 ? 48% perf-profile.children.cycles-pp.netlink_recvmsg
0.00 +0.1 0.09 ? 48% perf-profile.children.cycles-pp.sock_recvmsg
0.02 ? 99% +0.1 0.12 ? 47% perf-profile.children.cycles-pp.___sys_recvmsg
0.00 +0.1 0.10 ? 49% perf-profile.children.cycles-pp.____sys_recvmsg
0.07 +0.1 0.18 ? 49% perf-profile.children.cycles-pp.sk_diag_fill
0.00 +0.1 0.12 ? 62% perf-profile.children.cycles-pp._raw_read_lock
0.00 +0.2 0.18 ? 61% perf-profile.children.cycles-pp.sock_i_ino
0.00 +0.9 0.85 ? 60% perf-profile.children.cycles-pp.__wake_up
0.00 +1.1 1.07 ? 60% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.00 +23.6 23.58 ? 63% perf-profile.children.cycles-pp.netlink_create
0.00 +23.6 23.62 ? 63% perf-profile.children.cycles-pp.__sock_create
0.00 +23.7 23.65 ? 62% perf-profile.children.cycles-pp.__sys_socket
0.00 +23.7 23.65 ? 62% perf-profile.children.cycles-pp.__x64_sys_socket
0.00 +23.7 23.66 ? 62% perf-profile.children.cycles-pp.__socket
0.29 +35.9 36.21 ?108% perf-profile.children.cycles-pp._raw_spin_lock
0.42 +36.3 36.67 ?107% perf-profile.children.cycles-pp.unix_diag_dump
0.44 +36.3 36.70 ?107% perf-profile.children.cycles-pp.__netlink_dump_start
0.44 +36.3 36.70 ?107% perf-profile.children.cycles-pp.unix_diag_handler_dump
0.44 +36.3 36.72 ?107% perf-profile.children.cycles-pp.sock_diag_rcv_msg
0.44 +36.3 36.72 ?107% perf-profile.children.cycles-pp.netlink_rcv_skb
0.44 +36.3 36.73 ?107% perf-profile.children.cycles-pp.netlink_dump
0.00 +38.9 38.94 ? 63% perf-profile.children.cycles-pp.__sock_release
0.00 +38.9 38.94 ? 63% perf-profile.children.cycles-pp.netlink_release
0.00 +38.9 38.94 ? 63% perf-profile.children.cycles-pp.sock_close
0.00 +39.0 38.98 ? 63% perf-profile.children.cycles-pp.__fput
0.00 +39.0 38.99 ? 62% perf-profile.children.cycles-pp.__x64_sys_close
0.00 +39.0 39.01 ? 62% perf-profile.children.cycles-pp.__close
0.00 +94.8 94.81 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
98.02 -98.0 0.00 perf-profile.self.cycles-pp.osq_lock
0.06 ? 6% +0.1 0.15 ? 54% perf-profile.self.cycles-pp.unix_diag_dump
0.00 +0.1 0.11 ? 60% perf-profile.self.cycles-pp._raw_read_lock
0.00 +0.3 0.25 ? 51% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.26 ? 2% +2.6 2.82 ? 72% perf-profile.self.cycles-pp._raw_spin_lock
0.00 +94.7 94.65 ? 2% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki