2021-03-01 07:50:47

by Oliver Sang

[permalink] [raw]
Subject: [mm, slub] 8ff60eb052: stress-ng.rawpkt.ops_per_sec -47.9% regression


Greeting,

FYI, we noticed a -47.9% regression of stress-ng.rawpkt.ops_per_sec due to commit:


commit: 8ff60eb052eeba95cfb3efe16b08c9199f8121cf ("mm, slub: consider rest of partial list if acquire_slab() fails")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: stress-ng
on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory
with following parameters:

nr_threads: 100%
disk: 1HDD
testtime: 60s
class: network
test: rawpkt
cpufreq_governor: performance
ucode: 0x5003006




If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml
bin/lkp run compatible-job.yaml

=========================================================================================
class/compiler/cpufreq_governor/disk/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
network/gcc-9/performance/1HDD/x86_64-rhel-8.3/100%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp5/rawpkt/stress-ng/60s/0x5003006

commit:
e609571b5f ("Merge tag 'nfs-for-5.11-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs")
8ff60eb052 ("mm, slub: consider rest of partial list if acquire_slab() fails")

e609571b5ffa3528 8ff60eb052eeba95cfb3efe16b0
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.124e+09 ? 2% -47.8% 5.863e+08 ? 8% stress-ng.rawpkt.ops
18655820 ? 2% -47.9% 9723826 ? 8% stress-ng.rawpkt.ops_per_sec
928438 ? 19% -59.8% 372785 ? 4% stress-ng.time.involuntary_context_switches
6084 +3.6% 6301 stress-ng.time.percent_of_cpu_this_job_got
3736 +4.4% 3901 stress-ng.time.system_time
55.88 ? 4% -50.1% 27.86 ? 7% stress-ng.time.user_time
1191546 ? 21% -87.4% 150267 ? 36% stress-ng.time.voluntary_context_switches
1.01 ? 2% -37.9% 0.62 ? 5% iostat.cpu.user
113603 ? 5% +10.7% 125808 ? 8% numa-meminfo.node1.Slab
163259 ? 21% +45.3% 237153 ? 16% numa-numastat.node1.local_node
0.84 ? 2% -0.2 0.69 ? 2% mpstat.cpu.all.irq%
1.02 ? 2% -0.4 0.63 ? 5% mpstat.cpu.all.usr%
35285 ? 20% -72.1% 9859 ? 12% vmstat.system.cs
223592 -9.0% 203509 ? 5% vmstat.system.in
262.17 ? 20% +56.8% 411.00 ? 21% perf-sched.total_wait_and_delay.count.ms
0.04 ?155% -93.5% 0.00 ?102% perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.[unknown]
43.67 ?110% +316.0% 181.67 ? 49% perf-sched.wait_and_delay.count.pipe_read.new_sync_read.vfs_read.ksys_read
0.04 ?155% -92.2% 0.00 ?101% perf-sched.wait_and_delay.max.ms.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.[unknown]
435842 ? 2% +14.1% 497175 ? 3% proc-vmstat.numa_hit
349209 ? 3% +17.6% 410544 ? 4% proc-vmstat.numa_local
31061 ? 4% -8.4% 28439 ? 5% proc-vmstat.pgactivate
450752 ? 2% +14.7% 516982 ? 3% proc-vmstat.pgalloc_normal
279007 ? 5% +25.9% 351289 ? 5% proc-vmstat.pgfree
11704 ? 2% +9.7% 12840 proc-vmstat.pgreuse
15741 ? 2% +14.8% 18066 slabinfo.kmalloc-512.active_objs
2054 ? 3% +13.3% 2327 slabinfo.kmalloc-512.active_slabs
16440 ? 3% +13.3% 18619 slabinfo.kmalloc-512.num_objs
2054 ? 3% +13.3% 2327 slabinfo.kmalloc-512.num_slabs
852.00 ? 4% -18.8% 692.00 ? 8% slabinfo.kmem_cache_node.active_objs
896.00 ? 4% -17.9% 736.00 ? 8% slabinfo.kmem_cache_node.num_objs
976.08 ? 9% +24.9% 1218 ? 3% sched_debug.cfs_rq:/.util_est_enqueued.max
188.78 ? 3% +26.5% 238.72 ? 4% sched_debug.cfs_rq:/.util_est_enqueued.stddev
761508 ? 2% -9.4% 690280 ? 6% sched_debug.cpu.avg_idle.avg
293608 ? 5% +24.7% 365999 ? 6% sched_debug.cpu.avg_idle.stddev
1094 ? 4% +55.0% 1696 ? 14% sched_debug.cpu.clock_task.stddev
12810 ? 17% -63.6% 4658 ? 8% sched_debug.cpu.nr_switches.avg
23981 ? 16% -46.7% 12772 ? 16% sched_debug.cpu.nr_switches.max
9530 ? 17% -69.5% 2904 ? 6% sched_debug.cpu.nr_switches.min
2755 ? 19% -40.1% 1649 ? 16% sched_debug.cpu.nr_switches.stddev
86875 ? 7% -33.6% 57728 ? 18% softirqs.CPU0.NET_RX
88565 ? 4% -29.0% 62881 ? 10% softirqs.CPU1.NET_RX
90473 ? 4% -20.9% 71605 ? 14% softirqs.CPU14.NET_RX
87831 ? 5% -16.6% 73211 ? 12% softirqs.CPU20.NET_RX
87462 ? 6% -15.4% 73965 ? 11% softirqs.CPU22.NET_RX
87859 ? 3% -18.9% 71265 ? 9% softirqs.CPU25.NET_RX
84595 ? 6% -14.0% 72776 ? 9% softirqs.CPU26.NET_RX
85293 ? 5% -13.6% 73687 ? 8% softirqs.CPU3.NET_RX
88162 ? 3% -17.5% 72728 ? 17% softirqs.CPU31.NET_RX
90329 ? 4% -20.6% 71720 ? 20% softirqs.CPU4.NET_RX
88323 -13.5% 76403 ? 10% softirqs.CPU46.NET_RX
87494 ? 3% -15.8% 73639 ? 10% softirqs.CPU49.NET_RX
88203 ? 3% -19.8% 70756 ? 25% softirqs.CPU50.NET_RX
89676 ? 2% -19.4% 72241 ? 23% softirqs.CPU53.NET_RX
91841 ? 5% -13.1% 79841 ? 9% softirqs.CPU58.NET_RX
85602 ? 6% -17.2% 70897 ? 7% softirqs.CPU61.NET_RX
86180 ? 7% -12.8% 75147 ? 6% softirqs.CPU72.NET_RX
87646 ? 4% -17.9% 71952 ? 16% softirqs.CPU84.NET_RX
89651 ? 2% -10.2% 80518 ? 4% softirqs.CPU87.NET_RX
89839 ? 4% -12.5% 78636 ? 4% softirqs.CPU91.NET_RX
84982 ? 3% -9.4% 77013 ? 3% softirqs.CPU92.NET_RX
88677 ? 3% -12.6% 77504 ? 8% softirqs.CPU94.NET_RX
85184 ? 4% -11.5% 75412 ? 2% softirqs.CPU95.NET_RX
8289001 -9.6% 7490295 softirqs.NET_RX
263934 ? 6% -32.6% 177774 ? 3% softirqs.RCU
1.251e+10 -21.5% 9.824e+09 ? 2% perf-stat.i.branch-instructions
1.13 -0.2 0.94 ? 2% perf-stat.i.branch-miss-rate%
1.284e+08 ? 2% -36.4% 81641703 ? 5% perf-stat.i.branch-misses
29.36 -2.2 27.19 ? 2% perf-stat.i.cache-miss-rate%
3.904e+08 ? 3% -34.9% 2.542e+08 ? 6% perf-stat.i.cache-misses
1.314e+09 ? 3% -30.1% 9.188e+08 ? 5% perf-stat.i.cache-references
36014 ? 20% -72.3% 9966 ? 12% perf-stat.i.context-switches
4.21 +33.0% 5.60 ? 3% perf-stat.i.cpi
3948 ? 21% -81.1% 744.68 ? 25% perf-stat.i.cpu-migrations
704.33 ? 3% +48.5% 1045 ? 5% perf-stat.i.cycles-between-cache-misses
1.719e+10 -24.6% 1.297e+10 ? 3% perf-stat.i.dTLB-loads
9.821e+09 ? 2% -36.1% 6.278e+09 ? 5% perf-stat.i.dTLB-stores
50428350 ? 2% -38.8% 30876163 ? 6% perf-stat.i.iTLB-load-misses
5.91e+10 -24.8% 4.445e+10 ? 3% perf-stat.i.instructions
1291 +19.3% 1540 ? 4% perf-stat.i.instructions-per-iTLB-miss
0.26 -22.1% 0.20 ? 2% perf-stat.i.ipc
0.92 ? 8% -29.0% 0.65 ? 4% perf-stat.i.metric.K/sec
428.01 -26.6% 314.14 ? 3% perf-stat.i.metric.M/sec
96603525 ? 2% -38.8% 59084362 ? 5% perf-stat.i.node-load-misses
6056756 ? 7% -33.1% 4049124 ? 12% perf-stat.i.node-loads
1.404e+08 -28.8% 99976367 ? 5% perf-stat.i.node-store-misses
1.03 -0.2 0.83 ? 2% perf-stat.overall.branch-miss-rate%
29.71 -2.1 27.65 ? 2% perf-stat.overall.cache-miss-rate%
4.31 +33.2% 5.74 ? 3% perf-stat.overall.cpi
652.28 ? 3% +54.3% 1006 ? 6% perf-stat.overall.cycles-between-cache-misses
1172 +23.1% 1443 ? 4% perf-stat.overall.instructions-per-iTLB-miss
0.23 -24.9% 0.17 ? 3% perf-stat.overall.ipc
1.233e+10 -21.5% 9.672e+09 ? 2% perf-stat.ps.branch-instructions
1.264e+08 ? 2% -36.5% 80273397 ? 5% perf-stat.ps.branch-misses
3.848e+08 ? 3% -34.9% 2.504e+08 ? 6% perf-stat.ps.cache-misses
1.296e+09 ? 3% -30.1% 9.052e+08 ? 5% perf-stat.ps.cache-references
35593 ? 20% -72.5% 9799 ? 12% perf-stat.ps.context-switches
3905 ? 21% -81.3% 731.25 ? 24% perf-stat.ps.cpu-migrations
1.693e+10 -24.6% 1.277e+10 ? 3% perf-stat.ps.dTLB-loads
9.679e+09 ? 2% -36.1% 6.182e+09 ? 5% perf-stat.ps.dTLB-stores
49690918 ? 2% -38.8% 30402390 ? 6% perf-stat.ps.iTLB-load-misses
5.822e+10 -24.8% 4.377e+10 ? 3% perf-stat.ps.instructions
95230786 ? 2% -38.9% 58194445 ? 5% perf-stat.ps.node-load-misses
5976769 ? 7% -33.2% 3994178 ? 12% perf-stat.ps.node-loads
1.383e+08 -28.8% 98460706 ? 5% perf-stat.ps.node-store-misses
3.679e+12 -24.6% 2.772e+12 ? 3% perf-stat.total.instructions
589979 ? 18% -80.3% 115952 ? 20% interrupts.CAL:Function_call_interrupts
6494 ? 32% -70.6% 1910 ? 44% interrupts.CPU0.CAL:Function_call_interrupts
2674 ? 26% -84.3% 420.83 ? 43% interrupts.CPU0.RES:Rescheduling_interrupts
6789 ? 25% -70.7% 1989 ? 36% interrupts.CPU1.CAL:Function_call_interrupts
2856 ? 25% -81.9% 515.67 ? 44% interrupts.CPU1.RES:Rescheduling_interrupts
6130 ? 27% -79.7% 1245 ? 33% interrupts.CPU10.CAL:Function_call_interrupts
2883 ? 26% -85.6% 416.33 ? 48% interrupts.CPU10.RES:Rescheduling_interrupts
6709 ? 8% -82.5% 1176 ? 19% interrupts.CPU11.CAL:Function_call_interrupts
2980 ? 17% -87.2% 380.67 ? 38% interrupts.CPU11.RES:Rescheduling_interrupts
6011 ? 18% -80.2% 1188 ? 24% interrupts.CPU12.CAL:Function_call_interrupts
2855 ? 19% -86.3% 392.50 ? 36% interrupts.CPU12.RES:Rescheduling_interrupts
6334 ? 21% -81.5% 1172 ? 19% interrupts.CPU13.CAL:Function_call_interrupts
2873 ? 28% -87.1% 372.00 ? 35% interrupts.CPU13.RES:Rescheduling_interrupts
5601 ? 19% -76.4% 1322 ? 26% interrupts.CPU14.CAL:Function_call_interrupts
2640 ? 20% -84.7% 403.50 ? 34% interrupts.CPU14.RES:Rescheduling_interrupts
6966 ? 36% -82.8% 1197 ? 22% interrupts.CPU15.CAL:Function_call_interrupts
3039 ? 27% -87.4% 382.67 ? 37% interrupts.CPU15.RES:Rescheduling_interrupts
6364 ? 18% -81.4% 1182 ? 24% interrupts.CPU16.CAL:Function_call_interrupts
2866 ? 20% -87.1% 369.00 ? 37% interrupts.CPU16.RES:Rescheduling_interrupts
6189 ? 25% -81.4% 1151 ? 18% interrupts.CPU17.CAL:Function_call_interrupts
2850 ? 29% -87.3% 361.50 ? 33% interrupts.CPU17.RES:Rescheduling_interrupts
7179 ? 36% -84.4% 1119 ? 19% interrupts.CPU18.CAL:Function_call_interrupts
3107 ? 34% -89.0% 341.83 ? 32% interrupts.CPU18.RES:Rescheduling_interrupts
6694 ? 24% -81.2% 1259 ? 22% interrupts.CPU19.CAL:Function_call_interrupts
2961 ? 21% -86.7% 395.17 ? 32% interrupts.CPU19.RES:Rescheduling_interrupts
6879 ? 33% -79.0% 1442 ? 26% interrupts.CPU2.CAL:Function_call_interrupts
2898 ? 33% -84.8% 440.50 ? 40% interrupts.CPU2.RES:Rescheduling_interrupts
6089 ? 14% -77.8% 1354 ? 37% interrupts.CPU20.CAL:Function_call_interrupts
2787 ? 18% -85.0% 419.33 ? 47% interrupts.CPU20.RES:Rescheduling_interrupts
6853 ? 29% -79.8% 1382 ? 43% interrupts.CPU21.CAL:Function_call_interrupts
3062 ? 28% -86.9% 401.67 ? 45% interrupts.CPU21.RES:Rescheduling_interrupts
6319 ? 29% -80.0% 1263 ? 18% interrupts.CPU22.CAL:Function_call_interrupts
2842 ? 31% -85.6% 409.33 ? 29% interrupts.CPU22.RES:Rescheduling_interrupts
6688 ? 33% -81.3% 1251 ? 35% interrupts.CPU23.CAL:Function_call_interrupts
2994 ? 26% -86.8% 396.00 ? 49% interrupts.CPU23.RES:Rescheduling_interrupts
6494 ? 20% -82.2% 1155 ? 17% interrupts.CPU24.CAL:Function_call_interrupts
2742 ? 26% -89.1% 299.50 ? 23% interrupts.CPU24.RES:Rescheduling_interrupts
5696 ? 20% -79.8% 1148 ? 21% interrupts.CPU25.CAL:Function_call_interrupts
2661 ? 23% -89.4% 282.67 ? 20% interrupts.CPU25.RES:Rescheduling_interrupts
6224 ? 15% -80.3% 1225 ? 17% interrupts.CPU26.CAL:Function_call_interrupts
2778 ? 22% -88.2% 327.83 ? 23% interrupts.CPU26.RES:Rescheduling_interrupts
5783 ? 22% -80.0% 1153 ? 20% interrupts.CPU27.CAL:Function_call_interrupts
2671 ? 23% -89.0% 294.67 ? 25% interrupts.CPU27.RES:Rescheduling_interrupts
5911 ? 14% -81.8% 1073 ? 29% interrupts.CPU28.CAL:Function_call_interrupts
2646 ? 21% -89.1% 288.83 ? 39% interrupts.CPU28.RES:Rescheduling_interrupts
5625 ? 11% -79.5% 1153 ? 26% interrupts.CPU29.CAL:Function_call_interrupts
2542 ? 19% -88.8% 283.83 ? 26% interrupts.CPU29.RES:Rescheduling_interrupts
6468 ? 20% -80.3% 1272 ? 29% interrupts.CPU3.CAL:Function_call_interrupts
2876 ? 24% -87.1% 372.33 ? 45% interrupts.CPU3.RES:Rescheduling_interrupts
5770 ? 18% -80.2% 1139 ? 27% interrupts.CPU30.CAL:Function_call_interrupts
2563 ? 21% -89.5% 269.67 ? 29% interrupts.CPU30.RES:Rescheduling_interrupts
5439 ? 15% -78.2% 1183 ? 29% interrupts.CPU31.CAL:Function_call_interrupts
2634 ? 24% -89.3% 282.83 ? 38% interrupts.CPU31.RES:Rescheduling_interrupts
5705 ? 23% -79.2% 1189 ? 25% interrupts.CPU32.CAL:Function_call_interrupts
2596 ? 25% -88.1% 308.83 ? 29% interrupts.CPU32.RES:Rescheduling_interrupts
5801 ? 14% -81.4% 1081 ? 15% interrupts.CPU33.CAL:Function_call_interrupts
2733 ? 25% -89.7% 280.17 ? 20% interrupts.CPU33.RES:Rescheduling_interrupts
6720 ? 38% -84.8% 1019 ? 23% interrupts.CPU34.CAL:Function_call_interrupts
2918 ? 33% -91.1% 260.67 ? 31% interrupts.CPU34.RES:Rescheduling_interrupts
5440 ? 23% -81.2% 1025 ? 17% interrupts.CPU35.CAL:Function_call_interrupts
2466 ? 25% -89.6% 257.67 ? 21% interrupts.CPU35.RES:Rescheduling_interrupts
6056 ? 32% -79.8% 1225 ? 25% interrupts.CPU36.CAL:Function_call_interrupts
2681 ? 28% -88.3% 315.00 ? 32% interrupts.CPU36.RES:Rescheduling_interrupts
6068 ? 24% -81.6% 1116 ? 19% interrupts.CPU37.CAL:Function_call_interrupts
2752 ? 31% -89.3% 293.83 ? 21% interrupts.CPU37.RES:Rescheduling_interrupts
7006 ? 29% -83.2% 1176 ? 26% interrupts.CPU38.CAL:Function_call_interrupts
2930 ? 30% -89.2% 317.33 ? 28% interrupts.CPU38.RES:Rescheduling_interrupts
5998 ? 23% -78.9% 1264 ? 30% interrupts.CPU39.CAL:Function_call_interrupts
2738 ? 30% -87.9% 331.17 ? 33% interrupts.CPU39.RES:Rescheduling_interrupts
5704 ? 28% -72.2% 1584 ? 51% interrupts.CPU4.CAL:Function_call_interrupts
2724 ? 29% -82.8% 470.00 ? 57% interrupts.CPU4.RES:Rescheduling_interrupts
5984 ? 23% -81.6% 1101 ? 21% interrupts.CPU40.CAL:Function_call_interrupts
2693 ? 27% -88.8% 301.33 ? 33% interrupts.CPU40.RES:Rescheduling_interrupts
5981 ? 17% -80.5% 1164 ? 30% interrupts.CPU41.CAL:Function_call_interrupts
2607 ? 20% -87.8% 318.50 ? 34% interrupts.CPU41.RES:Rescheduling_interrupts
5725 ? 32% -79.8% 1159 ? 28% interrupts.CPU42.CAL:Function_call_interrupts
2590 ? 28% -88.4% 301.50 ? 33% interrupts.CPU42.RES:Rescheduling_interrupts
6014 ? 12% -81.5% 1114 ? 30% interrupts.CPU43.CAL:Function_call_interrupts
2679 ? 19% -89.2% 289.83 ? 31% interrupts.CPU43.RES:Rescheduling_interrupts
6010 ? 15% -80.2% 1190 ? 23% interrupts.CPU44.CAL:Function_call_interrupts
2750 ? 22% -88.8% 308.33 ? 29% interrupts.CPU44.RES:Rescheduling_interrupts
6372 ? 24% -82.1% 1143 ? 23% interrupts.CPU45.CAL:Function_call_interrupts
2753 ? 29% -89.2% 296.50 ? 29% interrupts.CPU45.RES:Rescheduling_interrupts
5551 ? 20% -79.1% 1157 ? 27% interrupts.CPU46.CAL:Function_call_interrupts
2620 ? 29% -89.2% 283.00 ? 29% interrupts.CPU46.RES:Rescheduling_interrupts
6120 ? 11% -81.9% 1105 ? 22% interrupts.CPU47.CAL:Function_call_interrupts
2667 ? 20% -89.5% 280.50 ? 25% interrupts.CPU47.RES:Rescheduling_interrupts
6219 ? 23% -79.5% 1275 ? 24% interrupts.CPU48.CAL:Function_call_interrupts
2824 ? 25% -86.7% 374.33 ? 40% interrupts.CPU48.RES:Rescheduling_interrupts
6248 ? 19% -80.3% 1230 ? 20% interrupts.CPU49.CAL:Function_call_interrupts
2927 ? 23% -86.4% 396.83 ? 32% interrupts.CPU49.RES:Rescheduling_interrupts
5867 ? 26% -78.3% 1271 ? 33% interrupts.CPU5.CAL:Function_call_interrupts
2828 ? 26% -87.9% 343.67 ? 41% interrupts.CPU5.RES:Rescheduling_interrupts
6211 ? 30% -78.5% 1338 ? 17% interrupts.CPU50.CAL:Function_call_interrupts
2851 ? 31% -85.6% 409.33 ? 28% interrupts.CPU50.RES:Rescheduling_interrupts
6986 ? 23% -81.4% 1300 ? 29% interrupts.CPU51.CAL:Function_call_interrupts
3123 ? 26% -86.9% 409.83 ? 41% interrupts.CPU51.RES:Rescheduling_interrupts
6163 ? 22% -79.7% 1248 ? 31% interrupts.CPU52.CAL:Function_call_interrupts
2884 ? 26% -86.6% 387.17 ? 47% interrupts.CPU52.RES:Rescheduling_interrupts
5943 ? 22% -76.2% 1414 ? 49% interrupts.CPU53.CAL:Function_call_interrupts
2874 ? 24% -84.7% 441.17 ? 56% interrupts.CPU53.RES:Rescheduling_interrupts
5450 ? 19% -80.0% 1090 ? 22% interrupts.CPU54.CAL:Function_call_interrupts
2654 ? 22% -86.9% 347.50 ? 37% interrupts.CPU54.RES:Rescheduling_interrupts
6716 ? 37% -83.6% 1103 ? 22% interrupts.CPU55.CAL:Function_call_interrupts
3007 ? 35% -87.9% 362.83 ? 34% interrupts.CPU55.RES:Rescheduling_interrupts
6335 ? 21% -82.9% 1083 ? 23% interrupts.CPU56.CAL:Function_call_interrupts
2844 ? 21% -88.0% 342.17 ? 38% interrupts.CPU56.RES:Rescheduling_interrupts
7027 ? 33% -82.7% 1212 ? 27% interrupts.CPU57.CAL:Function_call_interrupts
2996 ? 23% -87.7% 368.33 ? 45% interrupts.CPU57.RES:Rescheduling_interrupts
5575 ? 20% -77.4% 1258 ? 28% interrupts.CPU58.CAL:Function_call_interrupts
2690 ? 20% -85.3% 395.00 ? 42% interrupts.CPU58.RES:Rescheduling_interrupts
6542 ? 23% -81.7% 1197 ? 19% interrupts.CPU59.CAL:Function_call_interrupts
2981 ? 25% -87.4% 375.00 ? 32% interrupts.CPU59.RES:Rescheduling_interrupts
6469 ? 37% -77.2% 1472 ? 42% interrupts.CPU6.CAL:Function_call_interrupts
2846 ? 28% -84.9% 430.17 ? 51% interrupts.CPU6.RES:Rescheduling_interrupts
6204 ? 18% -81.2% 1165 ? 17% interrupts.CPU60.CAL:Function_call_interrupts
2958 ? 16% -87.5% 368.83 ? 30% interrupts.CPU60.RES:Rescheduling_interrupts
6390 ? 24% -78.6% 1366 ? 32% interrupts.CPU61.CAL:Function_call_interrupts
2901 ? 23% -84.5% 450.83 ? 38% interrupts.CPU61.RES:Rescheduling_interrupts
6846 ? 39% -81.1% 1296 ? 30% interrupts.CPU62.CAL:Function_call_interrupts
2982 ? 28% -85.4% 434.33 ? 39% interrupts.CPU62.RES:Rescheduling_interrupts
6169 ? 24% -79.3% 1275 ? 29% interrupts.CPU63.CAL:Function_call_interrupts
2924 ? 27% -86.4% 397.00 ? 46% interrupts.CPU63.RES:Rescheduling_interrupts
5953 ? 24% -78.1% 1301 ? 37% interrupts.CPU64.CAL:Function_call_interrupts
2828 ? 23% -85.9% 399.67 ? 50% interrupts.CPU64.RES:Rescheduling_interrupts
6789 ? 35% -81.2% 1278 ? 22% interrupts.CPU65.CAL:Function_call_interrupts
2966 ? 34% -83.3% 494.33 ? 38% interrupts.CPU65.RES:Rescheduling_interrupts
6238 ? 13% -80.6% 1207 ? 21% interrupts.CPU66.CAL:Function_call_interrupts
2841 ? 20% -86.3% 388.17 ? 38% interrupts.CPU66.RES:Rescheduling_interrupts
7804 ? 31% -83.8% 1264 ? 20% interrupts.CPU67.CAL:Function_call_interrupts
3251 ? 30% -85.6% 469.50 ? 40% interrupts.CPU67.RES:Rescheduling_interrupts
6345 ? 22% -78.7% 1349 ? 38% interrupts.CPU68.CAL:Function_call_interrupts
2894 ? 23% -85.2% 429.83 ? 41% interrupts.CPU68.RES:Rescheduling_interrupts
6538 ? 33% -81.6% 1206 ? 21% interrupts.CPU69.CAL:Function_call_interrupts
2980 ? 32% -85.1% 444.00 ? 38% interrupts.CPU69.RES:Rescheduling_interrupts
5864 ? 14% -77.8% 1303 ? 45% interrupts.CPU7.CAL:Function_call_interrupts
2764 ? 21% -85.7% 396.00 ? 68% interrupts.CPU7.RES:Rescheduling_interrupts
7424 ? 41% -84.2% 1175 ? 22% interrupts.CPU70.CAL:Function_call_interrupts
3206 ? 36% -87.3% 405.67 ? 36% interrupts.CPU70.RES:Rescheduling_interrupts
5816 ? 15% -81.5% 1076 ? 18% interrupts.CPU71.CAL:Function_call_interrupts
2805 ? 23% -84.6% 431.50 ? 47% interrupts.CPU71.RES:Rescheduling_interrupts
5587 ? 18% -80.0% 1120 ? 19% interrupts.CPU72.CAL:Function_call_interrupts
2555 ? 24% -87.8% 312.50 ? 29% interrupts.CPU72.RES:Rescheduling_interrupts
6126 ? 27% -81.0% 1165 ? 18% interrupts.CPU73.CAL:Function_call_interrupts
2781 ? 28% -87.0% 361.00 ? 44% interrupts.CPU73.RES:Rescheduling_interrupts
5540 ? 19% -79.2% 1152 ? 26% interrupts.CPU74.CAL:Function_call_interrupts
2585 ? 23% -88.1% 307.33 ? 29% interrupts.CPU74.RES:Rescheduling_interrupts
5759 ? 13% -80.2% 1139 ? 21% interrupts.CPU75.CAL:Function_call_interrupts
2721 ? 21% -89.6% 284.17 ? 29% interrupts.CPU75.RES:Rescheduling_interrupts
5926 ? 30% -83.3% 991.50 ? 22% interrupts.CPU76.CAL:Function_call_interrupts
2658 ? 27% -90.7% 246.67 ? 30% interrupts.CPU76.RES:Rescheduling_interrupts
5383 ? 14% -80.8% 1033 ? 24% interrupts.CPU77.CAL:Function_call_interrupts
2531 ? 26% -89.7% 260.50 ? 29% interrupts.CPU77.RES:Rescheduling_interrupts
5828 ? 30% -81.3% 1091 ? 15% interrupts.CPU78.CAL:Function_call_interrupts
2628 ? 27% -89.0% 289.50 ? 17% interrupts.CPU78.RES:Rescheduling_interrupts
5660 ? 18% -81.8% 1030 ? 23% interrupts.CPU79.CAL:Function_call_interrupts
2688 ? 25% -90.1% 265.17 ? 31% interrupts.CPU79.RES:Rescheduling_interrupts
7237 ? 31% -81.7% 1324 ? 41% interrupts.CPU8.CAL:Function_call_interrupts
3074 ? 23% -87.1% 395.83 ? 51% interrupts.CPU8.RES:Rescheduling_interrupts
5761 ? 26% -81.0% 1093 ? 22% interrupts.CPU80.CAL:Function_call_interrupts
2635 ? 27% -88.8% 294.33 ? 30% interrupts.CPU80.RES:Rescheduling_interrupts
5602 ? 13% -79.3% 1160 ? 26% interrupts.CPU81.CAL:Function_call_interrupts
2646 ? 25% -88.4% 306.83 ? 28% interrupts.CPU81.RES:Rescheduling_interrupts
6164 ? 12% -82.0% 1107 ? 18% interrupts.CPU82.CAL:Function_call_interrupts
2695 ? 24% -88.6% 307.33 ? 17% interrupts.CPU82.RES:Rescheduling_interrupts
5549 ? 12% -80.5% 1082 ? 18% interrupts.CPU83.CAL:Function_call_interrupts
2522 ? 17% -89.2% 272.50 ? 23% interrupts.CPU83.RES:Rescheduling_interrupts
5684 ? 17% -76.9% 1314 ? 33% interrupts.CPU84.CAL:Function_call_interrupts
2585 ? 24% -87.5% 322.50 ? 34% interrupts.CPU84.RES:Rescheduling_interrupts
6007 ? 17% -81.3% 1122 ? 22% interrupts.CPU85.CAL:Function_call_interrupts
2721 ? 23% -89.3% 290.83 ? 24% interrupts.CPU85.RES:Rescheduling_interrupts
5862 ? 16% -79.5% 1203 ? 32% interrupts.CPU86.CAL:Function_call_interrupts
2611 ? 19% -88.2% 307.00 ? 39% interrupts.CPU86.RES:Rescheduling_interrupts
5473 ? 24% -79.3% 1134 ? 29% interrupts.CPU87.CAL:Function_call_interrupts
2574 ? 29% -88.2% 304.00 ? 32% interrupts.CPU87.RES:Rescheduling_interrupts
5942 ? 24% -82.7% 1026 ? 18% interrupts.CPU88.CAL:Function_call_interrupts
2700 ? 23% -90.4% 259.83 ? 23% interrupts.CPU88.RES:Rescheduling_interrupts
5635 ? 25% -81.7% 1028 ? 21% interrupts.CPU89.CAL:Function_call_interrupts
2525 ? 28% -89.7% 259.17 ? 24% interrupts.CPU89.RES:Rescheduling_interrupts
6870 ? 43% -81.6% 1262 ? 30% interrupts.CPU9.CAL:Function_call_interrupts
2999 ? 33% -87.1% 387.50 ? 37% interrupts.CPU9.RES:Rescheduling_interrupts
6213 ? 15% -83.5% 1026 ? 20% interrupts.CPU90.CAL:Function_call_interrupts
2722 ? 22% -90.5% 257.33 ? 21% interrupts.CPU90.RES:Rescheduling_interrupts
5178 ? 12% -77.5% 1164 ? 29% interrupts.CPU91.CAL:Function_call_interrupts
2463 ? 22% -88.3% 288.33 ? 35% interrupts.CPU91.RES:Rescheduling_interrupts
6085 ? 17% -81.4% 1131 ? 25% interrupts.CPU92.CAL:Function_call_interrupts
2795 ? 27% -89.9% 281.00 ? 30% interrupts.CPU92.RES:Rescheduling_interrupts
6144 ? 35% -81.8% 1117 ? 28% interrupts.CPU93.CAL:Function_call_interrupts
2684 ? 32% -85.9% 378.83 ? 81% interrupts.CPU93.RES:Rescheduling_interrupts
5420 ? 23% -80.0% 1086 ? 35% interrupts.CPU94.CAL:Function_call_interrupts
2550 ? 31% -89.5% 266.67 ? 29% interrupts.CPU94.RES:Rescheduling_interrupts
5215 ? 10% -78.0% 1145 ? 24% interrupts.CPU95.CAL:Function_call_interrupts
2429 ? 21% -84.7% 372.67 ? 17% interrupts.CPU95.RES:Rescheduling_interrupts
184.83 ? 4% +23.0% 227.33 ? 5% interrupts.IWI:IRQ_work_interrupts
266923 ? 23% -87.4% 33507 ? 30% interrupts.RES:Rescheduling_interrupts
68.52 -5.0 63.55 perf-profile.calltrace.cycles-pp.__dev_queue_xmit.packet_snd.sock_sendmsg.__sys_sendto.__x64_sys_sendto
69.00 -4.9 64.14 perf-profile.calltrace.cycles-pp.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe
69.00 -4.9 64.15 perf-profile.calltrace.cycles-pp.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe
68.96 -4.8 64.11 perf-profile.calltrace.cycles-pp.sock_sendmsg.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe
68.93 -4.8 64.09 perf-profile.calltrace.cycles-pp.packet_snd.sock_sendmsg.__sys_sendto.__x64_sys_sendto.do_syscall_64
4.83 ? 3% -2.9 1.89 ? 14% perf-profile.calltrace.cycles-pp.skb_recv_datagram.packet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64
4.76 ? 3% -2.9 1.86 ? 14% perf-profile.calltrace.cycles-pp.__skb_recv_datagram.skb_recv_datagram.packet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom
34.25 -2.7 31.52 perf-profile.calltrace.cycles-pp.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.packet_snd.sock_sendmsg
34.24 -2.7 31.51 perf-profile.calltrace.cycles-pp.__softirqentry_text_start.asm_call_sysvec_on_stack.do_softirq_own_stack.do_softirq.__local_bh_enable_ip
34.24 -2.7 31.51 perf-profile.calltrace.cycles-pp.asm_call_sysvec_on_stack.do_softirq_own_stack.do_softirq.__local_bh_enable_ip.__dev_queue_xmit
34.24 -2.7 31.52 perf-profile.calltrace.cycles-pp.do_softirq_own_stack.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.packet_snd
34.25 -2.7 31.53 perf-profile.calltrace.cycles-pp.__local_bh_enable_ip.__dev_queue_xmit.packet_snd.sock_sendmsg.__sys_sendto
34.21 -2.7 31.49 perf-profile.calltrace.cycles-pp.net_rx_action.__softirqentry_text_start.asm_call_sysvec_on_stack.do_softirq_own_stack.do_softirq
34.19 -2.7 31.47 perf-profile.calltrace.cycles-pp.process_backlog.net_rx_action.__softirqentry_text_start.asm_call_sysvec_on_stack.do_softirq_own_stack
34.17 -2.7 31.45 perf-profile.calltrace.cycles-pp.__netif_receive_skb_one_core.process_backlog.net_rx_action.__softirqentry_text_start.asm_call_sysvec_on_stack
34.07 -2.7 31.38 perf-profile.calltrace.cycles-pp.__netif_receive_skb_core.__netif_receive_skb_one_core.process_backlog.net_rx_action.__softirqentry_text_start
4.64 ? 2% -2.6 2.01 ? 11% perf-profile.calltrace.cycles-pp.consume_skb.skb_free_datagram.packet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom
4.32 ? 3% -2.6 1.72 ? 14% perf-profile.calltrace.cycles-pp.__skb_try_recv_datagram.__skb_recv_datagram.skb_recv_datagram.packet_recvmsg.__sys_recvfrom
34.24 -2.2 32.01 perf-profile.calltrace.cycles-pp.dev_hard_start_xmit.__dev_queue_xmit.packet_snd.sock_sendmsg.__sys_sendto
34.13 -2.2 31.91 perf-profile.calltrace.cycles-pp.dev_queue_xmit_nit.dev_hard_start_xmit.__dev_queue_xmit.packet_snd.sock_sendmsg
31.32 -2.2 29.10 perf-profile.calltrace.cycles-pp.packet_rcv.__netif_receive_skb_core.__netif_receive_skb_one_core.process_backlog.net_rx_action
31.28 -2.2 29.07 perf-profile.calltrace.cycles-pp.packet_rcv.dev_queue_xmit_nit.dev_hard_start_xmit.__dev_queue_xmit.packet_snd
3.54 ? 3% -2.2 1.38 ? 14% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__skb_try_recv_datagram.__skb_recv_datagram.skb_recv_datagram.packet_recvmsg
4.12 ? 2% -2.1 2.01 ? 10% perf-profile.calltrace.cycles-pp.skb_copy_datagram_iter.packet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64
4.07 ? 2% -2.1 1.99 ? 10% perf-profile.calltrace.cycles-pp.__skb_datagram_iter.skb_copy_datagram_iter.packet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom
5.46 ? 8% -1.7 3.77 ? 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.packet_rcv.dev_queue_xmit_nit.dev_hard_start_xmit.__dev_queue_xmit
5.26 ? 7% -1.6 3.68 ? 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.packet_rcv.__netif_receive_skb_core.__netif_receive_skb_one_core.process_backlog
3.24 ? 3% -1.5 1.71 ? 8% perf-profile.calltrace.cycles-pp.__skb_clone.packet_rcv.__netif_receive_skb_core.__netif_receive_skb_one_core.process_backlog
3.22 ? 2% -1.5 1.71 ? 7% perf-profile.calltrace.cycles-pp.__skb_clone.packet_rcv.dev_queue_xmit_nit.dev_hard_start_xmit.__dev_queue_xmit
2.28 ? 3% -1.4 0.84 ? 15% perf-profile.calltrace.cycles-pp.skb_release_all.consume_skb.skb_free_datagram.packet_recvmsg.__sys_recvfrom
2.25 ? 3% -1.4 0.82 ? 15% perf-profile.calltrace.cycles-pp.skb_release_head_state.skb_release_all.consume_skb.skb_free_datagram.packet_recvmsg
2.19 ? 3% -1.4 0.80 ? 15% perf-profile.calltrace.cycles-pp.sock_rfree.skb_release_head_state.skb_release_all.consume_skb.skb_free_datagram
2.47 ? 2% -1.4 1.11 ? 12% perf-profile.calltrace.cycles-pp.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.packet_recvmsg.__sys_recvfrom
2.39 -1.3 1.08 ? 12% perf-profile.calltrace.cycles-pp.__check_object_size.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.packet_recvmsg
1.98 ? 4% -1.2 0.76 ? 15% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__skb_try_recv_datagram.__skb_recv_datagram.skb_recv_datagram
4.49 ? 8% -1.2 3.27 ? 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.packet_rcv.dev_queue_xmit_nit.dev_hard_start_xmit
2.25 ? 2% -1.2 1.09 ? 9% perf-profile.calltrace.cycles-pp.skb_release_data.consume_skb.skb_free_datagram.packet_recvmsg.__sys_recvfrom
4.31 ? 8% -1.1 3.18 ? 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.packet_rcv.__netif_receive_skb_core.__netif_receive_skb_one_core
1.95 ? 4% -1.1 0.83 ? 10% perf-profile.calltrace.cycles-pp.sock_def_readable.packet_rcv.dev_queue_xmit_nit.dev_hard_start_xmit.__dev_queue_xmit
1.90 ? 5% -1.1 0.84 ? 9% perf-profile.calltrace.cycles-pp.sock_def_readable.packet_rcv.__netif_receive_skb_core.__netif_receive_skb_one_core.process_backlog
1.64 ? 2% -0.9 0.74 ? 11% perf-profile.calltrace.cycles-pp.__slab_free.kmem_cache_free.skb_free_datagram.packet_recvmsg.__sys_recvfrom
1.65 ? 2% -0.8 0.82 ? 10% perf-profile.calltrace.cycles-pp.move_addr_to_user.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.27 ? 2% -0.5 0.73 ? 8% perf-profile.calltrace.cycles-pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.packet_recvmsg.__sys_recvfrom
0.87 ? 2% -0.4 0.45 ? 45% perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.packet_recvmsg
1.19 ? 3% -0.2 1.00 ? 3% perf-profile.calltrace.cycles-pp.kfree_skb.packet_rcv.__netif_receive_skb_core.__netif_receive_skb_one_core.process_backlog
1.12 ? 3% -0.2 0.95 ? 3% perf-profile.calltrace.cycles-pp.kfree_skb.packet_rcv.dev_queue_xmit_nit.dev_hard_start_xmit.__dev_queue_xmit
99.08 +0.4 99.53 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
98.74 +0.6 99.36 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
5.58 ? 12% +4.4 9.98 ? 2% perf-profile.calltrace.cycles-pp.skb_clone.packet_rcv.__netif_receive_skb_core.__netif_receive_skb_one_core.process_backlog
5.47 ? 12% +4.5 9.94 ? 2% perf-profile.calltrace.cycles-pp.kmem_cache_alloc.skb_clone.packet_rcv.__netif_receive_skb_core.__netif_receive_skb_one_core
5.46 ? 12% +4.5 9.92 ? 2% perf-profile.calltrace.cycles-pp.skb_clone.packet_rcv.dev_queue_xmit_nit.dev_hard_start_xmit.__dev_queue_xmit
5.37 ? 12% +4.5 9.88 ? 2% perf-profile.calltrace.cycles-pp.kmem_cache_alloc.skb_clone.packet_rcv.dev_queue_xmit_nit.dev_hard_start_xmit
5.18 ? 13% +4.5 9.71 ? 2% perf-profile.calltrace.cycles-pp.__slab_alloc.kmem_cache_alloc.skb_clone.packet_rcv.__netif_receive_skb_core
5.08 ? 13% +4.6 9.66 ? 2% perf-profile.calltrace.cycles-pp.__slab_alloc.kmem_cache_alloc.skb_clone.packet_rcv.dev_queue_xmit_nit
29.57 ? 2% +5.5 35.12 perf-profile.calltrace.cycles-pp.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
29.48 ? 2% +5.6 35.08 perf-profile.calltrace.cycles-pp.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
26.62 ? 2% +7.1 33.67 perf-profile.calltrace.cycles-pp.packet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
8.83 ? 15% +8.8 17.66 ? 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.get_partial_node.___slab_alloc.__slab_alloc.kmem_cache_alloc
8.58 ? 15% +8.9 17.48 ? 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.get_partial_node.___slab_alloc.__slab_alloc
9.46 ? 14% +9.0 18.44 ? 2% perf-profile.calltrace.cycles-pp.get_partial_node.___slab_alloc.__slab_alloc.kmem_cache_alloc.skb_clone
10.19 ? 13% +9.1 19.28 ? 2% perf-profile.calltrace.cycles-pp.___slab_alloc.__slab_alloc.kmem_cache_alloc.skb_clone.packet_rcv
14.31 ? 6% +14.0 28.27 ? 3% perf-profile.calltrace.cycles-pp.skb_free_datagram.packet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64
9.50 ? 11% +16.7 26.21 ? 4% perf-profile.calltrace.cycles-pp.kmem_cache_free.skb_free_datagram.packet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom
6.09 ? 18% +17.8 23.88 ? 5% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.unfreeze_partials.put_cpu_partial.kmem_cache_free
6.37 ? 18% +17.9 24.23 ? 5% perf-profile.calltrace.cycles-pp._raw_spin_lock.unfreeze_partials.put_cpu_partial.kmem_cache_free.skb_free_datagram
6.92 ? 17% +18.0 24.94 ? 5% perf-profile.calltrace.cycles-pp.unfreeze_partials.put_cpu_partial.kmem_cache_free.skb_free_datagram.packet_recvmsg
7.03 ? 16% +18.0 25.06 ? 5% perf-profile.calltrace.cycles-pp.put_cpu_partial.kmem_cache_free.skb_free_datagram.packet_recvmsg.__sys_recvfrom
68.52 -5.0 63.55 perf-profile.children.cycles-pp.__dev_queue_xmit
69.00 -4.9 64.14 perf-profile.children.cycles-pp.__sys_sendto
69.00 -4.8 64.15 perf-profile.children.cycles-pp.__x64_sys_sendto
68.96 -4.8 64.11 perf-profile.children.cycles-pp.sock_sendmsg
68.93 -4.8 64.09 perf-profile.children.cycles-pp.packet_snd
62.72 -4.5 58.25 perf-profile.children.cycles-pp.packet_rcv
5.98 ? 3% -3.4 2.63 ? 11% perf-profile.children.cycles-pp.consume_skb
6.54 ? 3% -3.0 3.49 ? 7% perf-profile.children.cycles-pp.__skb_clone
4.84 ? 3% -2.9 1.90 ? 14% perf-profile.children.cycles-pp.skb_recv_datagram
4.78 ? 3% -2.9 1.86 ? 14% perf-profile.children.cycles-pp.__skb_recv_datagram
34.54 -2.8 31.72 perf-profile.children.cycles-pp.asm_call_sysvec_on_stack
34.25 -2.7 31.52 perf-profile.children.cycles-pp.do_softirq_own_stack
34.25 -2.7 31.52 perf-profile.children.cycles-pp.do_softirq
34.25 -2.7 31.52 perf-profile.children.cycles-pp.__softirqentry_text_start
34.25 -2.7 31.53 perf-profile.children.cycles-pp.__local_bh_enable_ip
34.21 -2.7 31.49 perf-profile.children.cycles-pp.net_rx_action
34.19 -2.7 31.47 perf-profile.children.cycles-pp.process_backlog
34.17 -2.7 31.46 perf-profile.children.cycles-pp.__netif_receive_skb_one_core
34.11 -2.7 31.41 perf-profile.children.cycles-pp.__netif_receive_skb_core
4.33 ? 3% -2.6 1.72 ? 14% perf-profile.children.cycles-pp.__skb_try_recv_datagram
3.81 ? 4% -2.4 1.40 ? 14% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
34.24 -2.2 32.01 perf-profile.children.cycles-pp.dev_hard_start_xmit
34.16 -2.2 31.93 perf-profile.children.cycles-pp.dev_queue_xmit_nit
3.86 ? 4% -2.2 1.67 ? 10% perf-profile.children.cycles-pp.sock_def_readable
4.12 ? 2% -2.1 2.02 ? 11% perf-profile.children.cycles-pp.skb_copy_datagram_iter
4.07 ? 2% -2.1 1.99 ? 11% perf-profile.children.cycles-pp.__skb_datagram_iter
2.29 ? 3% -1.4 0.84 ? 15% perf-profile.children.cycles-pp.skb_release_all
2.26 ? 3% -1.4 0.83 ? 15% perf-profile.children.cycles-pp.skb_release_head_state
2.20 ? 3% -1.4 0.80 ? 15% perf-profile.children.cycles-pp.sock_rfree
2.53 ? 2% -1.4 1.16 ? 12% perf-profile.children.cycles-pp.__check_object_size
2.49 ? 2% -1.4 1.12 ? 12% perf-profile.children.cycles-pp.simple_copy_to_iter
2.27 ? 2% -1.2 1.10 ? 9% perf-profile.children.cycles-pp.skb_release_data
1.74 ? 2% -0.9 0.82 ? 11% perf-profile.children.cycles-pp.__slab_free
1.68 ? 2% -0.8 0.84 ? 11% perf-profile.children.cycles-pp.move_addr_to_user
1.28 ? 2% -0.5 0.73 ? 8% perf-profile.children.cycles-pp._copy_to_iter
1.01 ? 2% -0.4 0.61 ? 6% perf-profile.children.cycles-pp.copy_user_generic_unrolled
2.32 ? 3% -0.4 1.96 ? 3% perf-profile.children.cycles-pp.kfree_skb
0.70 ? 2% -0.3 0.35 ? 10% perf-profile.children.cycles-pp._copy_to_user
0.88 ? 2% -0.3 0.54 ? 7% perf-profile.children.cycles-pp.copyout
0.55 -0.3 0.27 ? 12% perf-profile.children.cycles-pp.sock_recvmsg
0.59 ? 3% -0.3 0.31 ? 10% perf-profile.children.cycles-pp.__might_fault
0.53 ? 3% -0.3 0.27 ? 9% perf-profile.children.cycles-pp.syscall_return_via_sysret
0.49 -0.3 0.24 ? 12% perf-profile.children.cycles-pp.security_socket_recvmsg
0.41 -0.2 0.20 ? 12% perf-profile.children.cycles-pp.aa_sk_perm
0.42 ? 4% -0.2 0.21 ? 10% perf-profile.children.cycles-pp.___might_sleep
0.35 ? 3% -0.2 0.17 ? 10% perf-profile.children.cycles-pp.__check_heap_object
0.32 ? 3% -0.2 0.16 ? 10% perf-profile.children.cycles-pp.sockfd_lookup_light
0.52 -0.2 0.36 ? 5% perf-profile.children.cycles-pp.__copy_skb_header
0.26 ? 9% -0.1 0.12 ? 18% perf-profile.children.cycles-pp.__skb_try_recv_from_queue
0.28 ? 2% -0.1 0.14 ? 11% perf-profile.children.cycles-pp.__get_user_4
0.28 ? 3% -0.1 0.14 ? 11% perf-profile.children.cycles-pp.__put_user_nocheck_4
0.27 ? 2% -0.1 0.14 ? 12% perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
0.44 ? 4% -0.1 0.32 ? 4% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.49 ? 3% -0.1 0.37 ? 4% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.23 ? 3% -0.1 0.11 ? 11% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
0.24 ? 3% -0.1 0.12 ? 11% perf-profile.children.cycles-pp.__fget_light
0.23 ? 4% -0.1 0.12 ? 12% perf-profile.children.cycles-pp.__entry_text_start
0.42 ? 4% -0.1 0.30 ? 5% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.41 ? 4% -0.1 0.29 ? 5% perf-profile.children.cycles-pp.hrtimer_interrupt
0.22 ? 3% -0.1 0.11 ? 10% perf-profile.children.cycles-pp.__might_sleep
0.20 ? 5% -0.1 0.09 ? 10% perf-profile.children.cycles-pp.__virt_addr_valid
0.19 ? 3% -0.1 0.09 ? 11% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.10 -0.1 0.03 ?100% perf-profile.children.cycles-pp.import_single_range
0.21 ? 5% -0.1 0.14 ? 8% perf-profile.children.cycles-pp.tick_sched_timer
0.25 ? 5% -0.1 0.19 ? 4% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.10 ? 4% -0.1 0.04 ? 45% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
0.18 ? 6% -0.1 0.12 ? 6% perf-profile.children.cycles-pp.update_process_times
0.18 ? 6% -0.1 0.13 ? 5% perf-profile.children.cycles-pp.tick_sched_handle
0.14 ? 6% -0.0 0.09 ? 5% perf-profile.children.cycles-pp.scheduler_tick
0.15 ? 4% -0.0 0.11 ? 6% perf-profile.children.cycles-pp.__x86_indirect_thunk_rax
0.09 ? 10% -0.0 0.06 ? 13% perf-profile.children.cycles-pp.ktime_get
0.10 ? 10% -0.0 0.07 ? 5% perf-profile.children.cycles-pp.task_tick_fair
0.08 ? 6% -0.0 0.05 perf-profile.children.cycles-pp.ip_rcv_core
0.08 ? 6% -0.0 0.05 ? 8% perf-profile.children.cycles-pp.ip_rcv
0.08 ? 4% -0.0 0.07 ? 7% perf-profile.children.cycles-pp.__kmalloc_reserve
0.08 -0.0 0.07 ? 7% perf-profile.children.cycles-pp.__kmalloc_node_track_caller
0.11 ? 4% -0.0 0.10 perf-profile.children.cycles-pp.skb_push
0.08 ? 6% -0.0 0.06 ? 7% perf-profile.children.cycles-pp.loopback_xmit
0.21 ? 5% +0.1 0.30 ? 5% perf-profile.children.cycles-pp.__list_del_entry_valid
0.28 ? 4% +0.1 0.42 ? 2% perf-profile.children.cycles-pp.sock_alloc_send_pskb
0.22 ? 5% +0.1 0.37 ? 2% perf-profile.children.cycles-pp.alloc_skb_with_frags
0.21 ? 5% +0.1 0.36 ? 2% perf-profile.children.cycles-pp.__alloc_skb
0.12 ? 8% +0.2 0.29 ? 2% perf-profile.children.cycles-pp.kmem_cache_alloc_node
99.12 +0.4 99.56 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
98.77 +0.6 99.39 perf-profile.children.cycles-pp.do_syscall_64
29.59 ? 2% +5.5 35.13 perf-profile.children.cycles-pp.__x64_sys_recvfrom
29.50 ? 2% +5.6 35.09 perf-profile.children.cycles-pp.__sys_recvfrom
26.64 ? 2% +7.0 33.68 perf-profile.children.cycles-pp.packet_recvmsg
11.20 ? 12% +9.2 20.39 ? 2% perf-profile.children.cycles-pp.skb_clone
11.03 ? 12% +9.3 20.31 ? 2% perf-profile.children.cycles-pp.kmem_cache_alloc
9.72 ? 14% +9.4 19.16 ? 2% perf-profile.children.cycles-pp.get_partial_node
10.51 ? 13% +9.6 20.06 ? 2% perf-profile.children.cycles-pp.___slab_alloc
10.59 ? 13% +9.6 20.18 ? 2% perf-profile.children.cycles-pp.__slab_alloc
14.33 ? 6% +14.0 28.28 ? 3% perf-profile.children.cycles-pp.skb_free_datagram
9.60 ? 11% +16.7 26.28 ? 4% perf-profile.children.cycles-pp.kmem_cache_free
6.94 ? 17% +18.0 24.95 ? 5% perf-profile.children.cycles-pp.unfreeze_partials
7.05 ? 16% +18.0 25.10 ? 5% perf-profile.children.cycles-pp.put_cpu_partial
25.93 ? 6% +23.3 49.26 ? 3% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
26.26 ? 6% +23.8 50.07 ? 3% perf-profile.children.cycles-pp._raw_spin_lock
26.58 -3.6 22.95 ? 2% perf-profile.self.cycles-pp.packet_rcv
6.00 ? 3% -2.9 3.12 ? 8% perf-profile.self.cycles-pp.__skb_clone
3.49 ? 3% -1.9 1.63 ? 9% perf-profile.self.cycles-pp.sock_def_readable
3.34 ? 2% -1.9 1.49 ? 12% perf-profile.self.cycles-pp.packet_recvmsg
2.18 ? 3% -1.4 0.79 ? 15% perf-profile.self.cycles-pp.sock_rfree
2.26 ? 2% -1.2 1.10 ? 9% perf-profile.self.cycles-pp.skb_release_data
1.89 ? 2% -1.0 0.84 ? 13% perf-profile.self.cycles-pp.__check_object_size
1.60 ? 2% -1.0 0.63 ? 14% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
2.55 ? 3% -1.0 1.58 ? 3% perf-profile.self.cycles-pp._raw_spin_lock
1.72 ? 3% -0.9 0.81 ? 10% perf-profile.self.cycles-pp.__slab_free
1.36 ? 4% -0.7 0.63 ? 9% perf-profile.self.cycles-pp.consume_skb
2.71 ? 3% -0.5 2.25 ? 4% perf-profile.self.cycles-pp.__netif_receive_skb_core
0.88 ? 3% -0.5 0.43 ? 11% perf-profile.self.cycles-pp.kmem_cache_free
0.98 -0.4 0.59 ? 6% perf-profile.self.cycles-pp.copy_user_generic_unrolled
2.27 ? 3% -0.3 1.92 ? 3% perf-profile.self.cycles-pp.kfree_skb
2.56 ? 3% -0.3 2.26 ? 4% perf-profile.self.cycles-pp.dev_queue_xmit_nit
0.53 ? 3% -0.3 0.27 ? 9% perf-profile.self.cycles-pp.syscall_return_via_sysret
0.39 ? 4% -0.3 0.14 ? 13% perf-profile.self.cycles-pp.__skb_recv_datagram
0.42 ? 4% -0.2 0.21 ? 10% perf-profile.self.cycles-pp.___might_sleep
0.29 ? 2% -0.2 0.12 ? 17% perf-profile.self.cycles-pp.__skb_try_recv_datagram
0.34 ? 3% -0.2 0.17 ? 9% perf-profile.self.cycles-pp.__check_heap_object
0.30 ? 3% -0.2 0.14 ? 11% perf-profile.self.cycles-pp.__skb_datagram_iter
0.51 ? 2% -0.2 0.35 ? 5% perf-profile.self.cycles-pp.__copy_skb_header
0.26 ? 9% -0.1 0.12 ? 15% perf-profile.self.cycles-pp.__skb_try_recv_from_queue
0.59 ? 3% -0.1 0.45 ? 5% perf-profile.self.cycles-pp.kmem_cache_alloc
0.27 ? 2% -0.1 0.14 ? 11% perf-profile.self.cycles-pp.__get_user_4
0.27 ? 3% -0.1 0.14 ? 9% perf-profile.self.cycles-pp.__put_user_nocheck_4
0.25 ? 2% -0.1 0.13 ? 12% perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
0.23 ? 2% -0.1 0.11 ? 12% perf-profile.self.cycles-pp.__sys_recvfrom
0.24 ? 3% -0.1 0.12 ? 11% perf-profile.self.cycles-pp.__fget_light
0.24 ? 2% -0.1 0.13 ? 13% perf-profile.self.cycles-pp.aa_sk_perm
0.23 ? 4% -0.1 0.12 ? 12% perf-profile.self.cycles-pp.__entry_text_start
0.15 ? 4% -0.1 0.04 ? 45% perf-profile.self.cycles-pp.skb_free_datagram
0.20 ? 4% -0.1 0.10 ? 9% perf-profile.self.cycles-pp._copy_to_iter
0.21 ? 3% -0.1 0.10 ? 9% perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
0.20 ? 3% -0.1 0.10 ? 11% perf-profile.self.cycles-pp.__might_sleep
0.19 ? 6% -0.1 0.09 ? 11% perf-profile.self.cycles-pp.__virt_addr_valid
0.17 ? 4% -0.1 0.07 ? 11% perf-profile.self.cycles-pp.skb_clone
0.18 ? 2% -0.1 0.09 ? 13% perf-profile.self.cycles-pp.move_addr_to_user
0.17 ? 4% -0.1 0.09 ? 13% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.09 ? 9% -0.1 0.03 ? 70% perf-profile.self.cycles-pp.__might_fault
0.09 ? 8% -0.1 0.03 ?100% perf-profile.self.cycles-pp.ktime_get
0.09 ? 6% -0.0 0.06 perf-profile.self.cycles-pp.__x86_indirect_thunk_rax
0.44 ? 2% +0.1 0.52 ? 3% perf-profile.self.cycles-pp.get_partial_node
0.21 ? 5% +0.1 0.30 ? 5% perf-profile.self.cycles-pp.__list_del_entry_valid
0.79 +0.1 0.90 ? 2% perf-profile.self.cycles-pp.___slab_alloc
0.50 ? 6% +0.2 0.67 perf-profile.self.cycles-pp.unfreeze_partials
25.86 ? 6% +23.4 49.22 ? 3% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath



stress-ng.time.user_time

60 +----------------------------------------------------------------------+
|. .++.+. .+ +. .+.++.+ +. .++.+. .++. .+ + + + |
55 |-+. +. .+ + + ++.+.+ + + +.+ +.+.|
50 |-+ + + |
| |
45 |-+ |
| |
40 |-+ |
| O O |
35 |-+ |
30 |-+ O |
| O O O O O O O |
25 |-+ O O O O O O O O O OO O O |
| OO O O O |
20 +----------------------------------------------------------------------+


stress-ng.time.system_time

3950 +--------------------------------------------------------------------+
| O O O O O |
3900 |-OO O O OO O O O O O O OO OO O OO |
| O O O |
| O |
3850 |-+ |
| |
3800 |-+ |
| |
3750 |-++ +.+ + .+. .+. + + ++ .|
| + + + : .+.+ :+ .++.+ ++ + + : + : + + .++ |
|+ + +.+ +. .+ +.+ +.+.+ +.+ +. .+ +.+ |
3700 |-+ + + |
| |
3650 +--------------------------------------------------------------------+


stress-ng.time.percent_of_cpu_this_job_got

6350 +--------------------------------------------------------------------+
| O O O O O O O O O O O |
6300 |-O O O O O O O O OO O OO |
| O O O |
6250 |-+ O |
| |
6200 |-+ |
| |
6150 |-+ |
| + + + |
6100 |-+ : +.+ + +. .+. .+. :: :: .++ ++.|
|: : + :.+.+.+ + + .+ + ++ ++. : : : : + + + |
6050 |:+ + + +. + +.+ +.+ +.+ +. + +.+ |
| + + |
6000 +--------------------------------------------------------------------+


stress-ng.rawpkt.ops

1.2e+09 +-----------------------------------------------------------------+
|. .+. .+.++.+.+ + .+.++.+.+ .+.++.+.++.+.++.+.++ +. .++.+. +.|
1.1e+09 |-++ ++ + + + + |
1e+09 |-+ |
| |
9e+08 |-+ |
| |
8e+08 |-+ |
| |
7e+08 |-+ O O O O |
6e+08 |-O O |
| O OO O O O O OO |
5e+08 |-+O O O O O O O O O O O O |
| O |
4e+08 +-----------------------------------------------------------------+


stress-ng.rawpkt.ops_per_sec

2e+07 +-----------------------------------------------------------------+
|. .+. .+.++.+.+ + .+.++.+.+ .+.+ .+.++.+.++.+.++ +. .++.+. +.|
1.8e+07 |-++ ++ + + + + + |
| |
1.6e+07 |-+ |
| |
1.4e+07 |-+ |
| |
1.2e+07 |-+ O O |
| O O |
1e+07 |-O O O O O |
| O O OO O O O O O O O O O O |
8e+06 |-+O O O O O |
| |
6e+06 +-----------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Oliver Sang


Attachments:
(No filename) (60.03 kB)
config-5.11.0-rc3-00033-g8ff60eb052ee (175.07 kB)
job-script (8.00 kB)
job.yaml (5.54 kB)
reproduce (349.00 B)
Download all attachments

2021-03-09 23:59:02

by Linus Torvalds

[permalink] [raw]
Subject: Re: [mm, slub] 8ff60eb052: stress-ng.rawpkt.ops_per_sec -47.9% regression

Jann,
it looks like that change of yours made a rather big negative impact
on this load.

On Sun, Feb 28, 2021 at 11:49 PM kernel test robot
<[email protected]> wrote:
>
> FYI, we noticed a -47.9% regression of stress-ng.rawpkt.ops_per_sec due to commit:

Looking at the profile, nothing really obvious stands out, although
some of the numbers imply more polling, and less waiting, ie:

Lots less context switching::

> 12810 ą 17% -63.6% 4658 ą 8% sched_debug.cpu.nr_switches.avg

Less time spent sending packets:

> 68.52 -5.0 63.55 perf-profile.calltrace.cycles-pp.__dev_queue_xmit.packet_snd.sock_sendmsg.__sys_sendto.__x64_sys_sendto
> 69.00 -4.9 64.14 perf-profile.calltrace.cycles-pp.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe

and quite a lot more time spent in what looks like the receive path,
which allocates the packets:

> 5.47 ą 12% +4.5 9.94 ą 2% perf-profile.calltrace.cycles-pp.kmem_cache_alloc.skb_clone.packet_rcv.__netif_receive_skb_core.__netif_receive_skb_one_core
> 5.46 ą 12% +4.5 9.92 ą 2% perf-profile.calltrace.cycles-pp.skb_clone.packet_rcv.dev_queue_xmit_nit.dev_hard_start_xmit.__dev_queue_xmit
> 5.37 ą 12% +4.5 9.88 ą 2% perf-profile.calltrace.cycles-pp.kmem_cache_alloc.skb_clone.packet_rcv.dev_queue_xmit_nit.dev_hard_start_xmit
> 5.18 ą 13% +4.5 9.71 ą 2% perf-profile.calltrace.cycles-pp.__slab_alloc.kmem_cache_alloc.skb_clone.packet_rcv.__netif_receive_skb_core
> 5.08 ą 13% +4.6 9.66 ą 2% perf-profile.calltrace.cycles-pp.__slab_alloc.kmem_cache_alloc.skb_clone.packet_rcv.dev_queue_xmit_nit
> 29.57 ą 2% +5.5 35.12 perf-profile.calltrace.cycles-pp.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 29.48 ą 2% +5.6 35.08 perf-profile.calltrace.cycles-pp.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 26.62 ą 2% +7.1 33.67 perf-profile.calltrace.cycles-pp.packet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 8.83 ą 15% +8.8 17.66 ą 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.get_partial_node.___slab_alloc.__slab_alloc.kmem_cache_alloc
> 8.58 ą 15% +8.9 17.48 ą 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.get_partial_node.___slab_alloc.__slab_alloc
> 9.46 ą 14% +9.0 18.44 ą 2% perf-profile.calltrace.cycles-pp.get_partial_node.___slab_alloc.__slab_alloc.kmem_cache_alloc.skb_clone
> 10.19 ą 13% +9.1 19.28 ą 2% perf-profile.calltrace.cycles-pp.___slab_alloc.__slab_alloc.kmem_cache_alloc.skb_clone.packet_rcv
> 14.31 ą 6% +14.0 28.27 ą 3% perf-profile.calltrace.cycles-pp.skb_free_datagram.packet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64
> 9.50 ą 11% +16.7 26.21 ą 4% perf-profile.calltrace.cycles-pp.kmem_cache_free.skb_free_datagram.packet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom
> 6.09 ą 18% +17.8 23.88 ą 5% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.unfreeze_partials.put_cpu_partial.kmem_cache_free
> 6.37 ą 18% +17.9 24.23 ą 5% perf-profile.calltrace.cycles-pp._raw_spin_lock.unfreeze_partials.put_cpu_partial.kmem_cache_free.skb_free_datagram
> 6.92 ą 17% +18.0 24.94 ą 5% perf-profile.calltrace.cycles-pp.unfreeze_partials.put_cpu_partial.kmem_cache_free.skb_free_datagram.packet_recvmsg
> 7.03 ą 16% +18.0 25.06 ą 5% perf-profile.calltrace.cycles-pp.put_cpu_partial.kmem_cache_free.skb_free_datagram.packet_recvmsg.__sys_recvfrom

.. and I think the reason is here:

> 26.26 ą 6% +23.8 50.07 ą 3% perf-profile.children.cycles-pp._raw_spin_lock

Look at that +23.8 for _raw_spin_lock, and it really shows up here too:

> 25.86 ą 6% +23.4 49.22 ą 3% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath

I think what is going on is that your change caused the "contention on
the freelist" case to now loop - possibly several times, and
expensively with atomic operations - while you are holding the
'n->list_lock' spinlock in get_partial_node().

End result: contention on the freelist now becomes *huge* contention
on that list_lock instead.

Before, it would release the list lock, and generally then (maybe) try
again. Or more likely just get another page and avoid the contention.

So when you wrote:

However, the current code accidentally stops looking at the partial list
completely in that case. Especially on kernels without CONFIG_NUMA set,
this means that get_partial() fails and new_slab_objects() falls back to
new_slab(), allocating new pages. This could lead to an unnecessary
increase in memory fragmentation.

it really looks like this might well have been very intentional
indeed. Or at least very beneficial for _some_ loads.

Comments?

Linus

2021-03-10 07:01:25

by Christoph Lameter

[permalink] [raw]
Subject: Re: [mm, slub] 8ff60eb052: stress-ng.rawpkt.ops_per_sec -47.9% regression

On Tue, 9 Mar 2021, Linus Torvalds wrote:

> So when you wrote:
>
> However, the current code accidentally stops looking at the partial list
> completely in that case. Especially on kernels without CONFIG_NUMA set,
> this means that get_partial() fails and new_slab_objects() falls back to
> new_slab(), allocating new pages. This could lead to an unnecessary
> increase in memory fragmentation.
>
> it really looks like this might well have been very intentional
> indeed. Or at least very beneficial for _some_ loads.
>
> Comments?

Yes the thought was that adding an additional page when contention is
there on the page objects will increase possible concurrency while
avoiding locks and increase the ability to allocate / free concurrently
from a multitude of objects.

2021-03-10 18:22:15

by Linus Torvalds

[permalink] [raw]
Subject: Re: [mm, slub] 8ff60eb052: stress-ng.rawpkt.ops_per_sec -47.9% regression

On Tue, Mar 9, 2021 at 10:59 PM Christoph Lameter <[email protected]> wrote:
>
> >
> > it really looks like this might well have been very intentional
> > indeed. Or at least very beneficial for _some_ loads.
>
> Yes the thought was that adding an additional page when contention is
> there on the page objects will increase possible concurrency while
> avoiding locks and increase the ability to allocate / free concurrently
> from a multitude of objects.

I wonder if we might have a "try twice before failing" middle ground,
rather than break out on the very first cmpxchg failure (or continue
forever).

Yes, yes, it claims a "Fixes:", but the commit it claims to fix really
does explicitly _mention_ avoiding the loop in the commit message, and
this kernel test robot report very much implies that that original
commit was right, and the "fix" is wrong.

Jann - if you had other loads that showed problems, that would be
worth documenting.

And as mentioned, maybe having a _limited_ retry, rather than a
"continue for as long as there is contention" that clearly regresses
on this (perhaps odd) load?

But for now, I think the thing to do is revert.

Linus