2023-11-23 05:04:20

by kernel test robot

[permalink] [raw]
Subject: [linus:master] [mm, pcp] 6ccdcb6d3a: stress-ng.judy.ops_per_sec -4.7% regression



Hello,

kernel test robot noticed a -4.7% regression of stress-ng.judy.ops_per_sec on:


commit: 6ccdcb6d3a741c4e005ca6ffd4a62ddf8b5bead3 ("mm, pcp: reduce detecting time of consecutive high order page freeing")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
parameters:

nr_threads: 100%
testtime: 60s
class: cpu-cache
test: judy
disk: 1SSD
cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+-------------------------------------------------------------------------------------------------+
| testcase: change | lmbench3: lmbench3.TCP.socket.bandwidth.10MB.MB/sec 23.7% improvement |
| test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory |
| test parameters | cpufreq_governor=performance |
| | mode=development |
| | nr_threads=100% |
| | test=TCP |
| | test_memory_size=50% |
+------------------+-------------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.file-ioctl.ops_per_sec -6.6% regression |
| test machine | 36 threads 1 sockets Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz (Skylake) with 32G memory |
| test parameters | class=filesystem |
| | cpufreq_governor=performance |
| | disk=1SSD |
| | fs=btrfs |
| | nr_threads=10% |
| | test=file-ioctl |
| | testtime=60s |
+------------------+-------------------------------------------------------------------------------------------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231123/[email protected]

=========================================================================================
class/compiler/cpufreq_governor/disk/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
cpu-cache/gcc-12/performance/1SSD/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/judy/stress-ng/60s

commit:
57c0419c5f ("mm, pcp: decrease PCP high if free pages < high watermark")
6ccdcb6d3a ("mm, pcp: reduce detecting time of consecutive high order page freeing")

57c0419c5f0ea2cc 6ccdcb6d3a741c4e005ca6ffd4a
---------------- ---------------------------
%stddev %change %stddev
\ | \
4.57 ? 5% +46.8% 6.71 ? 17% iostat.cpu.system
2842 +1.0% 2871 turbostat.Bzy_MHz
0.12 ? 3% +0.4 0.55 ? 26% mpstat.cpu.all.soft%
3.05 ? 6% +1.8 4.86 ? 20% mpstat.cpu.all.sys%
81120642 -2.9% 78746159 proc-vmstat.numa_hit
80886548 -2.9% 78513494 proc-vmstat.numa_local
82771023 -2.9% 80399459 proc-vmstat.pgalloc_normal
82356596 -2.9% 79991041 proc-vmstat.pgfree
12325708 ? 3% +5.3% 12974746 perf-stat.i.dTLB-load-misses
0.38 ? 44% +27.2% 0.48 perf-stat.overall.cpi
668.74 ? 44% +24.7% 834.02 perf-stat.overall.cycles-between-cache-misses
0.00 ? 45% +0.0 0.01 ? 10% perf-stat.overall.dTLB-load-miss-rate%
10040254 ? 44% +26.0% 12650801 perf-stat.ps.dTLB-load-misses
7036371 ? 3% -2.8% 6842720 stress-ng.judy.Judy_delete_operations_per_sec
9244466 ? 3% -7.8% 8524505 ? 3% stress-ng.judy.Judy_insert_operations_per_sec
2912 ? 3% -4.7% 2774 stress-ng.judy.ops_per_sec
13316 ? 8% +22.8% 16355 ? 13% stress-ng.time.maximum_resident_set_size
445.86 ? 5% +64.2% 732.21 ? 15% stress-ng.time.system_time
40885 ? 40% +373.8% 193712 ? 11% sched_debug.cfs_rq:/.left_vruntime.avg
465264 ? 31% +142.5% 1128399 ? 5% sched_debug.cfs_rq:/.left_vruntime.stddev
8322 ? 34% +140.8% 20039 ? 17% sched_debug.cfs_rq:/.load.avg
40886 ? 40% +373.8% 193713 ? 11% sched_debug.cfs_rq:/.right_vruntime.avg
465274 ? 31% +142.5% 1128401 ? 5% sched_debug.cfs_rq:/.right_vruntime.stddev
818.77 ? 10% +43.3% 1172 ? 5% sched_debug.cpu.curr->pid.stddev
0.05 ? 74% +659.6% 0.41 ? 35% perf-sched.sch_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
0.10 ? 48% +140.3% 0.24 ? 11% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
0.01 ? 14% +102.6% 0.03 ? 29% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.05 ?122% +1322.6% 0.65 ? 20% perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
1.70 ? 79% +729.3% 14.10 ? 48% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
1.08 ?101% +233.4% 3.60 ? 7% perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
0.01 ? 8% +54.7% 0.02 ? 18% perf-sched.total_sch_delay.average.ms
0.18 ? 5% +555.7% 1.20 ? 38% perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
0.21 ? 4% +524.6% 1.29 ? 47% perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
235.65 ? 31% -57.0% 101.40 ? 17% perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
127.50 ?100% +126.3% 288.50 ? 9% perf-sched.wait_and_delay.count.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function_single
125.83 ?144% +407.2% 638.17 ? 27% perf-sched.wait_and_delay.count.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
344.50 ? 36% +114.6% 739.33 ? 24% perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64
0.92 ?114% +482.2% 5.38 ? 47% perf-sched.wait_and_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function_single
3.22 ? 89% +223.9% 10.44 ? 50% perf-sched.wait_and_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
0.18 ? 43% +471.8% 1.01 ? 36% perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages.__folio_alloc.vma_alloc_folio.do_anonymous_page
34.39 ? 46% +88.8% 64.95 ? 18% perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.21 ? 13% +813.6% 1.95 ? 38% perf-sched.wait_time.avg.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.constprop
0.18 ? 15% +457.1% 1.02 ? 58% perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.unmap_region.constprop.0
417.61 ? 68% -87.6% 51.85 ?146% perf-sched.wait_time.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.22 ? 25% +614.2% 1.57 ? 71% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
0.18 ? 5% +556.3% 1.20 ? 38% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
0.21 ? 4% +524.6% 1.29 ? 47% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
38.72 ? 39% -53.1% 18.17 ? 30% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
235.60 ? 31% -57.0% 101.37 ? 17% perf-sched.wait_time.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
2.17 ? 30% +45.3% 3.16 ? 13% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
1.02 ?131% +574.3% 6.90 ? 52% perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages.__folio_alloc.vma_alloc_folio.do_anonymous_page
0.18 ?191% +92359.0% 169.05 ?219% perf-sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
69.64 ? 44% +33.2% 92.76 ? 4% perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.64 ? 67% +653.6% 4.82 ? 54% perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.unmap_region.constprop.0
1.75 ? 49% +206.5% 5.38 ? 47% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function_single
3.22 ? 89% +223.9% 10.44 ? 50% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi


***************************************************************************************************
lkp-ivb-2ep1: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_threads/rootfs/tbox_group/test/test_memory_size/testcase:
gcc-12/performance/x86_64-rhel-8.3/development/100%/debian-11.1-x86_64-20220510.cgz/lkp-ivb-2ep1/TCP/50%/lmbench3

commit:
57c0419c5f ("mm, pcp: decrease PCP high if free pages < high watermark")
6ccdcb6d3a ("mm, pcp: reduce detecting time of consecutive high order page freeing")

57c0419c5f0ea2cc 6ccdcb6d3a741c4e005ca6ffd4a
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.07 ? 38% +105.0% 0.14 ? 32% perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
26.75 -4.9% 25.45 turbostat.RAMWatt
678809 +7.2% 727594 ? 2% vmstat.system.cs
97929782 -13.1% 85054266 numa-numastat.node0.local_node
97933343 -13.1% 85056081 numa-numastat.node0.numa_hit
97933344 -13.1% 85055901 numa-vmstat.node0.numa_hit
97929783 -13.1% 85054086 numa-vmstat.node0.numa_local
32188 +23.7% 39813 lmbench3.TCP.socket.bandwidth.10MB.MB/sec
652.63 -4.4% 624.04 lmbench3.time.elapsed_time
652.63 -4.4% 624.04 lmbench3.time.elapsed_time.max
8597 -5.9% 8092 lmbench3.time.system_time
0.88 ? 7% -0.1 0.76 ? 5% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.71 ? 10% -0.1 0.61 ? 7% perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_read.ksys_read.do_syscall_64
0.78 ? 3% -0.1 0.70 ? 6% perf-profile.children.cycles-pp.security_socket_recvmsg
0.36 ? 9% +0.1 0.42 ? 11% perf-profile.children.cycles-pp.skb_page_frag_refill
0.40 ? 10% +0.1 0.48 ? 12% perf-profile.children.cycles-pp.sk_page_frag_refill
0.51 ? 4% -0.1 0.44 ? 13% perf-profile.self.cycles-pp.sock_read_iter
0.36 ? 10% +0.1 0.42 ? 11% perf-profile.self.cycles-pp.skb_page_frag_refill
158897 ? 2% -6.8% 148107 proc-vmstat.nr_anon_pages
160213 ? 2% -6.8% 149290 proc-vmstat.nr_inactive_anon
160213 ? 2% -6.8% 149290 proc-vmstat.nr_zone_inactive_anon
1.715e+08 -7.1% 1.593e+08 proc-vmstat.numa_hit
1.715e+08 -7.1% 1.592e+08 proc-vmstat.numa_local
1.367e+09 -7.1% 1.27e+09 proc-vmstat.pgalloc_normal
2324641 -2.7% 2261187 proc-vmstat.pgfault
1.367e+09 -7.1% 1.27e+09 proc-vmstat.pgfree
77011 -4.4% 73597 proc-vmstat.pgreuse
5.99 ? 3% -29.9% 4.20 ? 4% perf-stat.i.MPKI
7.914e+09 ? 2% +4.5% 8.271e+09 perf-stat.i.branch-instructions
1.51e+08 +4.6% 1.579e+08 perf-stat.i.branch-misses
7.65 ? 4% -0.9 6.73 ? 3% perf-stat.i.cache-miss-rate%
66394790 ? 2% -21.9% 51865866 ? 3% perf-stat.i.cache-misses
682132 +7.2% 731279 ? 2% perf-stat.i.context-switches
4.01 -16.0% 3.37 perf-stat.i.cpi
71772 ? 4% +11.5% 80055 ? 8% perf-stat.i.cycles-between-cache-misses
9.368e+09 ? 2% +3.6% 9.706e+09 perf-stat.i.dTLB-stores
33695419 ? 2% +7.1% 36096466 ? 2% perf-stat.i.iTLB-load-misses
573897 ? 35% -38.6% 352477 ? 19% perf-stat.i.iTLB-loads
4.09e+10 ? 2% +4.5% 4.273e+10 perf-stat.i.instructions
0.37 +4.3% 0.39 perf-stat.i.ipc
0.09 ? 22% -44.0% 0.05 ? 26% perf-stat.i.major-faults
490.16 ? 2% -8.6% 448.21 ? 2% perf-stat.i.metric.K/sec
635.38 ? 2% +3.5% 657.46 perf-stat.i.metric.M/sec
37.54 +2.3 39.84 perf-stat.i.node-load-miss-rate%
8300835 ? 2% -10.8% 7406820 ? 2% perf-stat.i.node-load-misses
76993977 ? 3% -6.6% 71936169 ? 3% perf-stat.i.node-loads
26.58 ? 4% +4.1 30.71 ? 3% perf-stat.i.node-store-miss-rate%
2341211 ? 4% -29.6% 1648802 ? 3% perf-stat.i.node-store-misses
34198780 ? 3% -33.2% 22857201 ? 3% perf-stat.i.node-stores
1.63 -25.5% 1.21 ? 3% perf-stat.overall.MPKI
10.67 -2.3 8.36 perf-stat.overall.cache-miss-rate%
2.83 -5.2% 2.69 perf-stat.overall.cpi
1740 +27.3% 2216 ? 3% perf-stat.overall.cycles-between-cache-misses
0.35 +5.5% 0.37 perf-stat.overall.ipc
9.73 -0.4 9.34 perf-stat.overall.node-load-miss-rate%
6.39 +0.3 6.72 perf-stat.overall.node-store-miss-rate%
7.914e+09 ? 2% +4.6% 8.276e+09 perf-stat.ps.branch-instructions
1.509e+08 +4.7% 1.579e+08 perf-stat.ps.branch-misses
66615187 ? 2% -22.1% 51881477 ? 3% perf-stat.ps.cache-misses
679734 +7.2% 729007 ? 2% perf-stat.ps.context-switches
9.369e+09 ? 2% +3.7% 9.712e+09 perf-stat.ps.dTLB-stores
33673038 ? 2% +7.2% 36098564 ? 2% perf-stat.ps.iTLB-load-misses
4.09e+10 ? 2% +4.6% 4.276e+10 perf-stat.ps.instructions
0.09 ? 23% -44.4% 0.05 ? 26% perf-stat.ps.major-faults
8328473 ? 2% -11.0% 7410272 ? 2% perf-stat.ps.node-load-misses
77301667 ? 3% -6.9% 71997671 ? 3% perf-stat.ps.node-loads
2344250 ? 4% -29.7% 1647553 ? 3% perf-stat.ps.node-store-misses
34315831 ? 3% -33.4% 22865994 ? 3% perf-stat.ps.node-stores



***************************************************************************************************
lkp-skl-d08: 36 threads 1 sockets Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz (Skylake) with 32G memory
=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
filesystem/gcc-12/performance/1SSD/btrfs/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-skl-d08/file-ioctl/stress-ng/60s

commit:
57c0419c5f ("mm, pcp: decrease PCP high if free pages < high watermark")
6ccdcb6d3a ("mm, pcp: reduce detecting time of consecutive high order page freeing")

57c0419c5f0ea2cc 6ccdcb6d3a741c4e005ca6ffd4a
---------------- ---------------------------
%stddev %change %stddev
\ | \
127.00 ? 10% +36.1% 172.83 ? 15% perf-c2c.HITM.local
0.00 ? 72% +130.4% 0.01 ? 30% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.alloc_extent_state.__clear_extent_bit.btrfs_clone_files
14.83 ? 19% +33.7% 19.83 ? 10% sched_debug.cpu.nr_uninterruptible.max
339939 -6.6% 317593 stress-ng.file-ioctl.ops
5665 -6.6% 5293 stress-ng.file-ioctl.ops_per_sec
6444 ? 4% -25.2% 4820 ? 5% stress-ng.time.involuntary_context_switches
89198237 -6.5% 83411572 proc-vmstat.numa_hit
89117176 -6.8% 83056324 proc-vmstat.numa_local
92833230 -6.6% 86743293 proc-vmstat.pgalloc_normal
92791999 -6.6% 86700599 proc-vmstat.pgfree
0.25 ? 56% +110.2% 0.53 ? 12% perf-stat.i.major-faults
127575 ? 27% +138.3% 303957 ? 3% perf-stat.i.node-stores
0.25 ? 56% +110.2% 0.52 ? 12% perf-stat.ps.major-faults
125751 ? 27% +138.3% 299653 ? 3% perf-stat.ps.node-stores
1.199e+12 -2.1% 1.174e+12 perf-stat.total.instructions
15.80 -0.7 15.14 perf-profile.calltrace.cycles-pp.filemap_read_folio.do_read_cache_folio.vfs_dedupe_file_range_compare.__generic_remap_file_range_prep.generic_remap_file_range_prep
15.46 -0.6 14.84 perf-profile.calltrace.cycles-pp.btrfs_read_folio.filemap_read_folio.do_read_cache_folio.vfs_dedupe_file_range_compare.__generic_remap_file_range_prep
9.84 -0.5 9.32 perf-profile.calltrace.cycles-pp.memcmp.vfs_dedupe_file_range_compare.__generic_remap_file_range_prep.generic_remap_file_range_prep.btrfs_remap_file_range
11.95 -0.4 11.52 perf-profile.calltrace.cycles-pp.btrfs_do_readpage.btrfs_read_folio.filemap_read_folio.do_read_cache_folio.vfs_dedupe_file_range_compare
8.72 ? 2% -0.4 8.28 perf-profile.calltrace.cycles-pp.filemap_add_folio.do_read_cache_folio.vfs_dedupe_file_range_compare.__generic_remap_file_range_prep.generic_remap_file_range_prep
5.56 ? 2% -0.4 5.18 perf-profile.calltrace.cycles-pp.__filemap_add_folio.filemap_add_folio.do_read_cache_folio.vfs_dedupe_file_range_compare.__generic_remap_file_range_prep
0.64 ? 10% -0.3 0.36 ? 71% perf-profile.calltrace.cycles-pp.find_free_extent.btrfs_reserve_extent.__btrfs_prealloc_file_range.btrfs_prealloc_file_range.btrfs_fallocate
2.57 ? 5% -0.3 2.29 ? 2% perf-profile.calltrace.cycles-pp.ioctl_preallocate.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl
2.44 ? 6% -0.3 2.17 ? 2% perf-profile.calltrace.cycles-pp.btrfs_fallocate.vfs_fallocate.ioctl_preallocate.__x64_sys_ioctl.do_syscall_64
2.53 ? 5% -0.3 2.26 ? 2% perf-profile.calltrace.cycles-pp.vfs_fallocate.ioctl_preallocate.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.66 ? 9% -0.2 0.46 ? 45% perf-profile.calltrace.cycles-pp.btrfs_reserve_extent.__btrfs_prealloc_file_range.btrfs_prealloc_file_range.btrfs_fallocate.vfs_fallocate
1.42 ? 3% -0.1 1.31 ? 4% perf-profile.calltrace.cycles-pp.clear_state_bit.__clear_extent_bit.btrfs_invalidate_folio.truncate_cleanup_folio.truncate_inode_pages_range
0.70 ? 4% -0.1 0.62 ? 2% perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.__filemap_add_folio.filemap_add_folio.do_read_cache_folio.vfs_dedupe_file_range_compare
0.69 ? 4% -0.1 0.63 ? 4% perf-profile.calltrace.cycles-pp.btrfs_punch_hole.btrfs_fallocate.vfs_fallocate.ioctl_preallocate.__x64_sys_ioctl
29.90 +0.6 30.49 perf-profile.calltrace.cycles-pp.do_read_cache_folio.vfs_dedupe_file_range_compare.__generic_remap_file_range_prep.generic_remap_file_range_prep.btrfs_remap_file_range
0.00 +0.9 0.86 ? 6% perf-profile.calltrace.cycles-pp.__list_del_entry_valid_or_report.rmqueue_bulk.__rmqueue_pcplist.rmqueue.get_page_from_freelist
68.10 +1.2 69.29 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl.stress_run
68.47 +1.2 69.68 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.ioctl.stress_run
67.35 +1.2 68.59 perf-profile.calltrace.cycles-pp.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl.stress_run
21.54 ? 3% +1.5 23.02 perf-profile.calltrace.cycles-pp.ioctl_file_clone.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe
21.51 ? 3% +1.5 23.00 perf-profile.calltrace.cycles-pp.do_clone_file_range.vfs_clone_file_range.ioctl_file_clone.do_vfs_ioctl.__x64_sys_ioctl
21.46 ? 3% +1.5 22.94 perf-profile.calltrace.cycles-pp.btrfs_remap_file_range.do_clone_file_range.vfs_clone_file_range.ioctl_file_clone.do_vfs_ioctl
21.53 ? 3% +1.5 23.01 perf-profile.calltrace.cycles-pp.vfs_clone_file_range.ioctl_file_clone.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64
0.00 +1.5 1.49 ? 3% perf-profile.calltrace.cycles-pp.__free_one_page.free_pcppages_bulk.free_unref_page_commit.free_unref_page.btrfs_clone
21.15 ? 3% +1.5 22.66 perf-profile.calltrace.cycles-pp.btrfs_clone_files.btrfs_remap_file_range.do_clone_file_range.vfs_clone_file_range.ioctl_file_clone
64.61 +1.5 66.16 perf-profile.calltrace.cycles-pp.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl
2.66 ? 2% +1.8 4.51 ? 3% perf-profile.calltrace.cycles-pp.folio_alloc.do_read_cache_folio.vfs_dedupe_file_range_compare.__generic_remap_file_range_prep.generic_remap_file_range_prep
0.97 ? 3% +1.8 2.82 ? 5% perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages.folio_alloc.do_read_cache_folio
2.02 ? 3% +1.9 3.90 ? 4% perf-profile.calltrace.cycles-pp.__alloc_pages.folio_alloc.do_read_cache_folio.vfs_dedupe_file_range_compare.__generic_remap_file_range_prep
1.27 ? 2% +1.9 3.17 ? 4% perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.folio_alloc.do_read_cache_folio.vfs_dedupe_file_range_compare
0.35 ? 70% +2.0 2.31 ? 5% perf-profile.calltrace.cycles-pp.__rmqueue_pcplist.rmqueue.get_page_from_freelist.__alloc_pages.folio_alloc
0.00 +2.0 2.00 ? 4% perf-profile.calltrace.cycles-pp.rmqueue_bulk.__rmqueue_pcplist.rmqueue.get_page_from_freelist.__alloc_pages
1.72 ? 2% +2.1 3.78 perf-profile.calltrace.cycles-pp.btrfs_clone.btrfs_clone_files.btrfs_remap_file_range.do_clone_file_range.vfs_clone_file_range
0.00 +2.1 2.09 ? 2% perf-profile.calltrace.cycles-pp.free_pcppages_bulk.free_unref_page_commit.free_unref_page.btrfs_clone.btrfs_clone_files
0.00 +2.1 2.12 ? 2% perf-profile.calltrace.cycles-pp.free_unref_page_commit.free_unref_page.btrfs_clone.btrfs_clone_files.btrfs_remap_file_range
0.00 +2.1 2.14 ? 2% perf-profile.calltrace.cycles-pp.free_unref_page.btrfs_clone.btrfs_clone_files.btrfs_remap_file_range.do_clone_file_range
15.81 -0.7 15.15 perf-profile.children.cycles-pp.filemap_read_folio
15.47 -0.6 14.86 perf-profile.children.cycles-pp.btrfs_read_folio
9.89 -0.5 9.38 perf-profile.children.cycles-pp.memcmp
11.98 -0.4 11.54 perf-profile.children.cycles-pp.btrfs_do_readpage
8.74 ? 2% -0.4 8.30 perf-profile.children.cycles-pp.filemap_add_folio
9.73 ? 3% -0.4 9.35 perf-profile.children.cycles-pp.__clear_extent_bit
5.66 ? 2% -0.4 5.30 perf-profile.children.cycles-pp.__filemap_add_folio
2.45 ? 6% -0.3 2.17 ? 2% perf-profile.children.cycles-pp.btrfs_fallocate
2.57 ? 5% -0.3 2.29 ? 2% perf-profile.children.cycles-pp.ioctl_preallocate
2.53 ? 5% -0.3 2.26 ? 2% perf-profile.children.cycles-pp.vfs_fallocate
4.67 ? 2% -0.3 4.41 ? 3% perf-profile.children.cycles-pp.__set_extent_bit
4.83 ? 2% -0.3 4.58 ? 3% perf-profile.children.cycles-pp.lock_extent
5.06 ? 2% -0.2 4.82 ? 2% perf-profile.children.cycles-pp.alloc_extent_state
4.11 ? 2% -0.2 3.94 ? 2% perf-profile.children.cycles-pp.kmem_cache_alloc
1.37 ? 4% -0.1 1.25 ? 2% perf-profile.children.cycles-pp.__mod_lruvec_page_state
0.66 ? 9% -0.1 0.54 ? 6% perf-profile.children.cycles-pp.btrfs_reserve_extent
0.64 ? 10% -0.1 0.53 ? 6% perf-profile.children.cycles-pp.find_free_extent
0.96 ? 4% -0.1 0.87 ? 6% perf-profile.children.cycles-pp.__wake_up
0.62 ? 4% -0.1 0.54 ? 6% perf-profile.children.cycles-pp.__cond_resched
1.20 ? 4% -0.1 1.12 ? 3% perf-profile.children.cycles-pp.free_extent_state
0.99 ? 3% -0.1 0.92 ? 4% perf-profile.children.cycles-pp.syscall_return_via_sysret
0.89 ? 3% -0.1 0.81 ? 5% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.69 ? 4% -0.1 0.64 ? 4% perf-profile.children.cycles-pp.btrfs_punch_hole
0.12 ? 10% -0.0 0.09 ? 10% perf-profile.children.cycles-pp.__fget_light
0.02 ?141% +0.0 0.06 ? 13% perf-profile.children.cycles-pp.calc_available_free_space
0.29 ? 8% +0.1 0.39 ? 6% perf-profile.children.cycles-pp.__mod_zone_page_state
0.09 ? 17% +0.2 0.25 ? 6% perf-profile.children.cycles-pp.__kmalloc_node
0.09 ? 15% +0.2 0.25 ? 4% perf-profile.children.cycles-pp.kvmalloc_node
0.08 ? 11% +0.2 0.24 ? 4% perf-profile.children.cycles-pp.__kmalloc_large_node
0.24 ? 13% +0.2 0.41 ? 4% perf-profile.children.cycles-pp.__list_add_valid_or_report
0.32 ? 15% +0.6 0.91 ? 4% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
30.03 +0.6 30.64 perf-profile.children.cycles-pp.do_read_cache_folio
1.10 ? 4% +0.6 1.72 ? 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.58 ? 6% +0.9 1.50 ? 5% perf-profile.children.cycles-pp.__list_del_entry_valid_or_report
67.36 +1.2 68.60 perf-profile.children.cycles-pp.__x64_sys_ioctl
21.52 ? 3% +1.5 23.00 perf-profile.children.cycles-pp.do_clone_file_range
21.54 ? 3% +1.5 23.02 perf-profile.children.cycles-pp.ioctl_file_clone
21.53 ? 3% +1.5 23.01 perf-profile.children.cycles-pp.vfs_clone_file_range
21.16 ? 3% +1.5 22.66 perf-profile.children.cycles-pp.btrfs_clone_files
0.00 +1.5 1.52 ? 3% perf-profile.children.cycles-pp.__free_one_page
64.61 +1.5 66.16 perf-profile.children.cycles-pp.do_vfs_ioctl
64.16 +1.5 65.71 perf-profile.children.cycles-pp.btrfs_remap_file_range
2.68 ? 3% +1.8 4.52 ? 3% perf-profile.children.cycles-pp.folio_alloc
0.54 ? 6% +2.0 2.51 ? 5% perf-profile.children.cycles-pp.__rmqueue_pcplist
1.03 ? 3% +2.0 3.04 ? 5% perf-profile.children.cycles-pp.rmqueue
2.16 ? 3% +2.0 4.19 ? 4% perf-profile.children.cycles-pp.__alloc_pages
1.32 ? 2% +2.1 3.42 ? 4% perf-profile.children.cycles-pp.get_page_from_freelist
0.00 +2.1 2.10 ? 2% perf-profile.children.cycles-pp.free_pcppages_bulk
2.66 ? 2% +2.1 4.77 perf-profile.children.cycles-pp.btrfs_clone
0.03 ?100% +2.1 2.17 ? 2% perf-profile.children.cycles-pp.free_unref_page
0.40 ? 6% +2.2 2.55 ? 2% perf-profile.children.cycles-pp.free_unref_page_commit
0.00 +2.2 2.21 ? 4% perf-profile.children.cycles-pp.rmqueue_bulk
9.82 -0.5 9.32 perf-profile.self.cycles-pp.memcmp
0.84 ? 5% -0.1 0.76 ? 6% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
1.13 ? 4% -0.1 1.05 ? 2% perf-profile.self.cycles-pp.free_extent_state
0.99 ? 3% -0.1 0.92 ? 4% perf-profile.self.cycles-pp.syscall_return_via_sysret
0.22 ? 8% -0.1 0.16 ? 13% perf-profile.self.cycles-pp.find_free_extent
0.38 ? 4% -0.1 0.32 ? 8% perf-profile.self.cycles-pp.__cond_resched
0.12 ? 10% -0.0 0.08 ? 11% perf-profile.self.cycles-pp.__fget_light
0.06 ? 7% -0.0 0.04 ? 45% perf-profile.self.cycles-pp.__x64_sys_ioctl
0.07 ? 15% +0.0 0.10 ? 9% perf-profile.self.cycles-pp.folio_alloc
0.28 ? 10% +0.1 0.36 ? 7% perf-profile.self.cycles-pp.get_page_from_freelist
0.26 ? 8% +0.1 0.36 ? 4% perf-profile.self.cycles-pp.__mod_zone_page_state
0.22 ? 14% +0.2 0.38 ? 5% perf-profile.self.cycles-pp.__list_add_valid_or_report
0.00 +0.2 0.24 ? 6% perf-profile.self.cycles-pp.free_pcppages_bulk
0.32 ? 15% +0.6 0.91 ? 4% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.00 +0.6 0.62 ? 10% perf-profile.self.cycles-pp.rmqueue_bulk
0.55 ? 6% +0.9 1.46 ? 5% perf-profile.self.cycles-pp.__list_del_entry_valid_or_report
0.00 +1.3 1.32 ? 4% perf-profile.self.cycles-pp.__free_one_page





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


2023-11-23 05:43:21

by Huang, Ying

[permalink] [raw]
Subject: Re: [linus:master] [mm, pcp] 6ccdcb6d3a: stress-ng.judy.ops_per_sec -4.7% regression

Hi,

Thanks for test!

kernel test robot <[email protected]> writes:

> Hello,
>
> kernel test robot noticed a -4.7% regression of stress-ng.judy.ops_per_sec on:
>
>
> commit: 6ccdcb6d3a741c4e005ca6ffd4a62ddf8b5bead3 ("mm, pcp: reduce detecting time of consecutive high order page freeing")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> testcase: stress-ng
> test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
> parameters:
>
> nr_threads: 100%
> testtime: 60s
> class: cpu-cache
> test: judy
> disk: 1SSD
> cpufreq_governor: performance
>
>
> In addition to that, the commit also has significant impact on the following tests:
>
> +------------------+-------------------------------------------------------------------------------------------------+
> | testcase: change | lmbench3: lmbench3.TCP.socket.bandwidth.10MB.MB/sec 23.7% improvement |
> | test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory |
> | test parameters | cpufreq_governor=performance |
> | | mode=development |
> | | nr_threads=100% |
> | | test=TCP |
> | | test_memory_size=50% |
> +------------------+-------------------------------------------------------------------------------------------------+
> | testcase: change | stress-ng: stress-ng.file-ioctl.ops_per_sec -6.6% regression |
> | test machine | 36 threads 1 sockets Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz (Skylake) with 32G memory |
> | test parameters | class=filesystem |
> | | cpufreq_governor=performance |
> | | disk=1SSD |
> | | fs=btrfs |
> | | nr_threads=10% |
> | | test=file-ioctl |
> | | testtime=60s |
> +------------------+-------------------------------------------------------------------------------------------------+

It's expected that this commit will benefit some workload (mainly
network, inter-process communication related) and hurt some workload.
But the whole series should have no much regression. Can you try the
whole series for the regression test cases? The series start from
commit ca71fe1ad922 ("mm, pcp: avoid to drain PCP when process exit") to
commit 6ccdcb6d3a74 ("mm, pcp: reduce detecting time of consecutive high
order page freeing").

--
Best Regards,
Huang, Ying

2023-11-24 06:55:21

by kernel test robot

[permalink] [raw]
Subject: Re: [linus:master] [mm, pcp] 6ccdcb6d3a: stress-ng.judy.ops_per_sec -4.7% regression

hi, Huang Ying,

On Thu, Nov 23, 2023 at 01:40:02PM +0800, Huang, Ying wrote:
> Hi,
>
> Thanks for test!
>
> kernel test robot <[email protected]> writes:
>
> > Hello,
> >
> > kernel test robot noticed a -4.7% regression of stress-ng.judy.ops_per_sec on:
> >
> >
> > commit: 6ccdcb6d3a741c4e005ca6ffd4a62ddf8b5bead3 ("mm, pcp: reduce detecting time of consecutive high order page freeing")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > testcase: stress-ng
> > test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
> > parameters:
> >
> > nr_threads: 100%
> > testtime: 60s
> > class: cpu-cache
> > test: judy
> > disk: 1SSD
> > cpufreq_governor: performance
> >
> >
> > In addition to that, the commit also has significant impact on the following tests:
> >
> > +------------------+-------------------------------------------------------------------------------------------------+
> > | testcase: change | lmbench3: lmbench3.TCP.socket.bandwidth.10MB.MB/sec 23.7% improvement |
> > | test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory |
> > | test parameters | cpufreq_governor=performance |
> > | | mode=development |
> > | | nr_threads=100% |
> > | | test=TCP |
> > | | test_memory_size=50% |
> > +------------------+-------------------------------------------------------------------------------------------------+
> > | testcase: change | stress-ng: stress-ng.file-ioctl.ops_per_sec -6.6% regression |
> > | test machine | 36 threads 1 sockets Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz (Skylake) with 32G memory |
> > | test parameters | class=filesystem |
> > | | cpufreq_governor=performance |
> > | | disk=1SSD |
> > | | fs=btrfs |
> > | | nr_threads=10% |
> > | | test=file-ioctl |
> > | | testtime=60s |
> > +------------------+-------------------------------------------------------------------------------------------------+
>
> It's expected that this commit will benefit some workload (mainly
> network, inter-process communication related) and hurt some workload.
> But the whole series should have no much regression. Can you try the
> whole series for the regression test cases? The series start from
> commit ca71fe1ad922 ("mm, pcp: avoid to drain PCP when process exit") to
> commit 6ccdcb6d3a74 ("mm, pcp: reduce detecting time of consecutive high
> order page freeing").

since:
* 6ccdcb6d3a741 mm, pcp: reduce detecting time of consecutive high order page freeing
* 57c0419c5f0ea mm, pcp: decrease PCP high if free pages < high watermark
* 51a755c56dc05 mm: tune PCP high automatically
* 90b41691b9881 mm: add framework for PCP high auto-tuning
* c0a242394cb98 mm, page_alloc: scale the number of pages that are batch allocated
* 52166607ecc98 mm: restrict the pcp batch scale factor to avoid too long latency
* 362d37a106dd3 mm, pcp: reduce lock contention for draining high-order pages
* 94a3bfe4073cd cacheinfo: calculate size of per-CPU data cache slice
* ca71fe1ad9221 mm, pcp: avoid to drain PCP when process exit
* 1f4f7f0f8845d mm/oom_killer: simplify OOM killer info dump helper

I tested 1f4f7f0f8845d vs 6ccdcb6d3a741.

for stress-ng.judy.ops_per_sec, there is a smaller regression (-2.0%):
(full comparison is attached as ncompare-judy)

=========================================================================================
class/compiler/cpufreq_governor/disk/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
cpu-cache/gcc-12/performance/1SSD/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/judy/stress-ng/60s

1f4f7f0f8845dbac 6ccdcb6d3a741c4e005ca6ffd4a
---------------- ---------------------------
%stddev %change %stddev
\ | \
6925490 -0.9% 6862477 stress-ng.judy.Judy_delete_operations_per_sec
22515488 -0.4% 22420191 stress-ng.judy.Judy_find_operations_per_sec
9036524 -3.9% 8685310 ? 3% stress-ng.judy.Judy_insert_operations_per_sec
171299 -2.0% 167905 stress-ng.judy.ops
2853 -2.0% 2796 stress-ng.judy.ops_per_sec


for stress-ng.file-ioctl.ops_per_sec, there is a similar regression (-6.9%):
(full comparison is attached as ncompare-file-ioctl)

=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
filesystem/gcc-12/performance/1SSD/btrfs/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-skl-d08/file-ioctl/stress-ng/60s

1f4f7f0f8845dbac 6ccdcb6d3a741c4e005ca6ffd4a
---------------- ---------------------------
%stddev %change %stddev
\ | \
340971 -6.9% 317411 stress-ng.file-ioctl.ops
5682 -6.9% 5290 stress-ng.file-ioctl.ops_per_sec

>
> --
> Best Regards,
> Huang, Ying
>


Attachments:
(No filename) (6.22 kB)
ncompare-judy (179.95 kB)
ncompare-file-ioctl (204.33 kB)
Download all attachments