Greeting,
FYI, we noticed a -8.6% regression of redis.set_total_throughput_rps due to commit:
commit: f3412b3879b4f7c4313b186b03940d4791345534 ("net: make sure net_rx_action() calls skb_defer_free_flush()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: redis
on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
with following parameters:
all: 1
sc_overcommit_memory: 1
sc_somaxconn: 65535
thp_enabled: never
thp_defrag: never
cluster: cs-localhost
cpu_node_bind: even
nr_processes: 4
test: set,get
data_size: 1024
n_client: 5
requests: 68000000
n_pipeline: 3
key_len: 68000000
cpufreq_governor: performance
test-description: Redis benchmark is the utility to check the performance of Redis by running commands done by N clients at the same time sending M total queries (it is similar to the Apache's ab utility).
test-url: https://redis.io/topics/benchmarks
If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-lkp/[email protected]
Details are as below:
=========================================================================================
all/cluster/compiler/cpu_node_bind/cpufreq_governor/data_size/kconfig/key_len/n_client/n_pipeline/nr_processes/requests/rootfs/sc_overcommit_memory/sc_somaxconn/tbox_group/test/testcase/thp_defrag/thp_enabled:
1/cs-localhost/gcc-11/even/performance/1024/x86_64-rhel-8.3/68000000/5/3/4/68000000/debian-11.1-x86_64-20220510.cgz/1/65535/lkp-csl-2sp7/set,get/redis/never/never
commit:
be5fd933f8 ("Merge branch 'add-reset-deassertion-for-aspeed-mdio'")
f3412b3879 ("net: make sure net_rx_action() calls skb_defer_free_flush()")
be5fd933f8c15967 f3412b3879b4f7c4313b186b039
---------------- ---------------------------
%stddev %change %stddev
\ | \
252015 -9.5% 227984 ? 2% redis.get_avg_throughput_rps
67.46 +10.6% 74.62 ? 2% redis.get_avg_time_sec
756045 -9.5% 683953 ? 2% redis.get_total_throughput_rps
202.38 +10.6% 223.86 ? 2% redis.get_total_time_sec
205530 -8.6% 187839 redis.set_avg_throughput_rps
82.71 +9.5% 90.53 redis.set_avg_time_sec
616591 -8.6% 563518 redis.set_total_throughput_rps
248.14 +9.5% 271.59 redis.set_total_time_sec
154.24 +9.6% 169.06 ? 2% redis.time.elapsed_time
154.24 +9.6% 169.06 ? 2% redis.time.elapsed_time.max
42820 ? 3% +18.7% 50810 ? 8% redis.time.involuntary_context_switches
263.43 +5.4% 277.60 redis.time.system_time
1.17 +0.3 1.50 ? 2% mpstat.cpu.all.soft%
3.952e+08 +23.9% 4.898e+08 vmstat.memory.free
0.35 ? 10% -0.1 0.27 ? 14% turbostat.C1%
21655037 ? 24% +41.0% 30533175 ? 2% turbostat.C1E
8586843 -26.1% 6343681 numa-numastat.node0.local_node
8614193 -25.9% 6381593 numa-numastat.node0.numa_hit
11431037 -18.6% 9300164 numa-numastat.node1.local_node
11488598 -18.6% 9350032 numa-numastat.node1.numa_hit
3.939e+08 +23.8% 4.877e+08 meminfo.MemAvailable
3.961e+08 +23.7% 4.898e+08 meminfo.MemFree
1.32e+08 -71.0% 38219657 meminfo.Memused
48034792 -99.6% 186078 meminfo.SUnreclaim
48141413 -99.4% 292816 meminfo.Slab
1.32e+08 -70.8% 38570676 meminfo.max_used_kB
1.968e+08 +26.4% 2.487e+08 numa-meminfo.node0.MemFree
67057835 -77.4% 15180098 numa-meminfo.node0.MemUsed
24036023 ? 2% -99.5% 110016 ? 3% numa-meminfo.node0.SUnreclaim
24124421 ? 2% -99.2% 196729 ? 2% numa-meminfo.node0.Slab
22960 ?127% +211.3% 71480 ? 43% numa-meminfo.node1.Inactive(file)
1.992e+08 +21.0% 2.412e+08 numa-meminfo.node1.MemFree
64969647 -64.5% 23043393 numa-meminfo.node1.MemUsed
24021896 -99.7% 76037 ? 5% numa-meminfo.node1.SUnreclaim
24040119 -99.6% 96063 ? 6% numa-meminfo.node1.Slab
49193940 +26.4% 62165766 numa-vmstat.node0.nr_free_pages
6010453 ? 2% -99.5% 27506 ? 3% numa-vmstat.node0.nr_slab_unreclaimable
8614420 -25.9% 6381821 numa-vmstat.node0.numa_hit
8587070 -26.1% 6343909 numa-vmstat.node0.numa_local
49803307 +21.1% 60287281 numa-vmstat.node1.nr_free_pages
5740 ?127% +211.8% 17896 ? 43% numa-vmstat.node1.nr_inactive_file
6006861 -99.7% 19010 ? 5% numa-vmstat.node1.nr_slab_unreclaimable
5740 ?127% +211.8% 17896 ? 43% numa-vmstat.node1.nr_zone_inactive_file
11488668 -18.6% 9350066 numa-vmstat.node1.numa_hit
11431107 -18.6% 9300198 numa-vmstat.node1.numa_local
520.47 ? 25% +55.0% 806.75 ? 20% sched_debug.cfs_rq:/.load_avg.max
97.63 ? 24% +48.1% 144.55 ? 19% sched_debug.cfs_rq:/.load_avg.stddev
56.18 ? 64% +113.7% 120.03 ? 31% sched_debug.cfs_rq:/.removed.load_avg.stddev
3.31 ? 78% +193.3% 9.71 ? 39% sched_debug.cfs_rq:/.removed.runnable_avg.avg
21.14 ? 70% +147.9% 52.39 ? 37% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
3.31 ? 78% +193.3% 9.71 ? 39% sched_debug.cfs_rq:/.removed.util_avg.avg
21.14 ? 70% +147.9% 52.39 ? 37% sched_debug.cfs_rq:/.removed.util_avg.stddev
47802 ? 32% -55.9% 21078 ? 36% sched_debug.cfs_rq:/.spread0.max
7311 ? 4% -21.3% 5756 ? 10% sched_debug.cpu.avg_idle.min
752476 ? 23% -66.9% 249133 ? 29% sched_debug.cpu.nr_switches.max
119990 ? 21% -54.4% 54695 ? 24% sched_debug.cpu.nr_switches.stddev
1883 ? 3% +9.6% 2064 ? 8% proc-vmstat.nr_active_anon
9833460 +23.8% 12175527 proc-vmstat.nr_dirty_background_threshold
19690964 +23.8% 24380824 proc-vmstat.nr_dirty_threshold
99012757 +23.7% 1.225e+08 proc-vmstat.nr_free_pages
5515 ? 3% -11.1% 4901 ? 13% proc-vmstat.nr_shmem
12009233 -99.6% 46518 proc-vmstat.nr_slab_unreclaimable
1883 ? 3% +9.6% 2064 ? 8% proc-vmstat.nr_zone_active_anon
20104413 -21.7% 15734336 proc-vmstat.numa_hit
20021345 -21.8% 15647296 proc-vmstat.numa_local
20106105 -21.7% 15735973 proc-vmstat.pgalloc_normal
2391744 ? 3% +105.8% 4922820 proc-vmstat.pgfree
24271 +5.9% 25697 ? 2% proc-vmstat.pgreuse
2.685e+09 -7.1% 2.494e+09 ? 2% perf-stat.i.branch-instructions
1.10 +0.0 1.12 perf-stat.i.branch-miss-rate%
30474911 -5.1% 28916630 ? 3% perf-stat.i.branch-misses
86117392 -6.2% 80770756 ? 2% perf-stat.i.cache-misses
1.039e+08 -7.1% 96556388 ? 2% perf-stat.i.cache-references
1.40 +8.7% 1.52 ? 3% perf-stat.i.cpi
220.93 +7.0% 236.43 ? 3% perf-stat.i.cycles-between-cache-misses
3.851e+09 -7.4% 3.568e+09 ? 2% perf-stat.i.dTLB-loads
606669 -7.8% 559360 perf-stat.i.dTLB-store-misses
2.094e+09 -7.4% 1.94e+09 ? 2% perf-stat.i.dTLB-stores
16798215 -5.8% 15820186 ? 2% perf-stat.i.iTLB-load-misses
1.353e+10 -7.2% 1.256e+10 ? 2% perf-stat.i.instructions
0.72 -7.6% 0.67 ? 3% perf-stat.i.ipc
657.86 ? 5% +52.4% 1002 ? 17% perf-stat.i.metric.K/sec
90.85 -7.7% 83.88 ? 2% perf-stat.i.metric.M/sec
167749 -7.1% 155916 perf-stat.i.minor-faults
74.54 +3.6 78.17 perf-stat.i.node-load-miss-rate%
26164317 +6.4% 27849868 ? 2% perf-stat.i.node-load-misses
9207088 -14.1% 7912609 perf-stat.i.node-loads
57.34 ? 11% +26.2 83.58 perf-stat.i.node-store-miss-rate%
6626833 ? 12% +41.2% 9354265 ? 3% perf-stat.i.node-store-misses
5283168 ? 14% -60.9% 2064586 ? 2% perf-stat.i.node-stores
167752 -7.1% 155918 perf-stat.i.page-faults
1.14 +0.0 1.16 perf-stat.overall.branch-miss-rate%
1.39 +8.5% 1.51 ? 3% perf-stat.overall.cpi
218.81 +7.3% 234.87 ? 3% perf-stat.overall.cycles-between-cache-misses
0.72 -7.8% 0.66 ? 3% perf-stat.overall.ipc
73.98 +3.9 77.87 perf-stat.overall.node-load-miss-rate%
55.58 ? 11% +26.3 81.92 perf-stat.overall.node-store-miss-rate%
2.667e+09 -7.1% 2.479e+09 ? 2% perf-stat.ps.branch-instructions
30284745 -5.1% 28735640 ? 3% perf-stat.ps.branch-misses
85544824 -6.1% 80295476 ? 2% perf-stat.ps.cache-misses
1.032e+08 -7.0% 95982345 ? 2% perf-stat.ps.cache-references
3.826e+09 -7.3% 3.547e+09 ? 2% perf-stat.ps.dTLB-loads
603282 -7.8% 555963 perf-stat.ps.dTLB-store-misses
2.081e+09 -7.3% 1.928e+09 ? 2% perf-stat.ps.dTLB-stores
16687441 -5.7% 15728404 ? 2% perf-stat.ps.iTLB-load-misses
1.345e+10 -7.2% 1.248e+10 ? 2% perf-stat.ps.instructions
166736 -7.0% 154987 perf-stat.ps.minor-faults
25993409 +6.5% 27688192 ? 2% perf-stat.ps.node-load-misses
9142252 -14.0% 7866166 perf-stat.ps.node-loads
6583167 ? 12% +41.3% 9299683 ? 3% perf-stat.ps.node-store-misses
5252541 ? 14% -61.0% 2051094 ? 2% perf-stat.ps.node-stores
166739 -7.0% 154989 perf-stat.ps.page-faults
2.078e+12 +1.7% 2.114e+12 perf-stat.total.instructions
1.38 ? 9% -1.0 0.36 ? 70% perf-profile.calltrace.cycles-pp.__dev_queue_xmit.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_skb.tcp_write_xmit
1.45 ? 7% -0.3 1.12 ? 17% perf-profile.calltrace.cycles-pp.__alloc_skb.tcp_stream_alloc_skb.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg
1.01 ? 9% -0.2 0.85 ? 11% perf-profile.calltrace.cycles-pp.skb_copy_datagram_iter.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg.sock_read_iter
0.00 +0.7 0.65 ? 6% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.net_rx_action.__softirqentry_text_start.do_softirq.__local_bh_enable_ip
0.00 +0.7 0.67 ? 29% perf-profile.calltrace.cycles-pp.__kfree_skb.net_rx_action.__softirqentry_text_start.do_softirq.__local_bh_enable_ip
0.00 +0.9 0.86 ? 29% perf-profile.calltrace.cycles-pp.skb_release_data.__kfree_skb.tcp_clean_rtx_queue.tcp_ack.tcp_rcv_established
0.00 +0.9 0.91 ? 29% perf-profile.calltrace.cycles-pp.__kfree_skb.tcp_clean_rtx_queue.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv
1.50 ? 11% +2.0 3.45 ? 4% perf-profile.calltrace.cycles-pp.tcp_clean_rtx_queue.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv
2.76 ? 9% +2.0 4.72 ? 5% perf-profile.calltrace.cycles-pp.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_deliver_rcu
11.76 ? 7% +2.4 14.18 ? 6% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
12.09 ? 7% +2.5 14.57 ? 5% perf-profile.calltrace.cycles-pp.__libc_write
11.93 ? 7% +2.5 14.42 ? 5% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_write
11.87 ? 7% +2.5 14.36 ? 5% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
14.67 ? 7% +3.7 18.40 ? 7% perf-profile.calltrace.cycles-pp.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_skb.tcp_write_xmit.__tcp_push_pending_frames
12.72 ? 7% +4.1 16.84 ? 7% perf-profile.calltrace.cycles-pp.net_rx_action.__softirqentry_text_start.do_softirq.__local_bh_enable_ip.ip_finish_output2
13.00 ? 7% +4.1 17.13 ? 7% perf-profile.calltrace.cycles-pp.__softirqentry_text_start.do_softirq.__local_bh_enable_ip.ip_finish_output2.__ip_queue_xmit
13.05 ? 7% +4.1 17.18 ? 7% perf-profile.calltrace.cycles-pp.do_softirq.__local_bh_enable_ip.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_skb
13.06 ? 7% +4.1 17.21 ? 7% perf-profile.calltrace.cycles-pp.__local_bh_enable_ip.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_skb.tcp_write_xmit
0.68 ? 9% -0.4 0.25 ? 19% perf-profile.children.cycles-pp.tcp_cleanup_rbuf
1.39 ? 9% -0.4 0.97 ? 7% perf-profile.children.cycles-pp.__dev_queue_xmit
0.54 ? 9% -0.3 0.20 ? 13% perf-profile.children.cycles-pp.___slab_alloc
1.46 ? 8% -0.3 1.20 ? 4% perf-profile.children.cycles-pp.__alloc_skb
0.44 ? 9% -0.2 0.28 ? 7% perf-profile.children.cycles-pp.kmalloc_reserve
0.42 ? 9% -0.2 0.27 ? 7% perf-profile.children.cycles-pp.__kmalloc_node_track_caller
0.35 ? 10% -0.2 0.20 ? 8% perf-profile.children.cycles-pp.kmem_cache_alloc_node
0.14 ? 13% -0.1 0.08 ? 10% perf-profile.children.cycles-pp.__alloc_pages
0.13 ? 12% -0.1 0.07 ? 18% perf-profile.children.cycles-pp.get_page_from_freelist
0.13 ? 9% -0.0 0.10 ? 14% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
0.08 ? 7% +0.0 0.10 ? 10% perf-profile.children.cycles-pp.obj_cgroup_charge
0.09 ? 12% +0.1 0.16 ? 9% perf-profile.children.cycles-pp.validate_xmit_skb
0.00 +0.1 0.09 ? 8% perf-profile.children.cycles-pp.free_unref_page
0.46 ? 16% +0.1 0.57 ? 7% perf-profile.children.cycles-pp.skb_page_frag_refill
0.46 ? 15% +0.1 0.58 ? 8% perf-profile.children.cycles-pp.sk_page_frag_refill
0.71 ? 11% +0.1 0.85 ? 7% perf-profile.children.cycles-pp.__skb_clone
0.05 ? 9% +0.2 0.24 ? 22% perf-profile.children.cycles-pp.kfree_skbmem
0.01 ?200% +0.3 0.32 ? 4% perf-profile.children.cycles-pp.kfree
0.14 ? 12% +0.4 0.55 perf-profile.children.cycles-pp.__ksize
0.00 +0.6 0.55 ? 4% perf-profile.children.cycles-pp.dst_release
0.04 ? 90% +0.6 0.60 ? 5% perf-profile.children.cycles-pp.skb_release_head_state
0.14 ? 21% +0.6 0.72 ? 8% perf-profile.children.cycles-pp.__slab_free
0.14 ? 17% +0.6 0.72 ? 8% perf-profile.children.cycles-pp.skb_attempt_defer_free
0.78 ? 6% +0.6 1.36 ? 8% perf-profile.children.cycles-pp.kmem_cache_free
0.29 ? 28% +1.2 1.47 ? 5% perf-profile.children.cycles-pp.skb_release_data
1.14 ? 8% +1.5 2.66 ? 5% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.55 ? 23% +1.6 2.14 ? 5% perf-profile.children.cycles-pp.__kfree_skb
1.51 ? 11% +2.0 3.47 ? 4% perf-profile.children.cycles-pp.tcp_clean_rtx_queue
2.77 ? 9% +2.0 4.74 ? 5% perf-profile.children.cycles-pp.tcp_ack
14.68 ? 7% +3.7 18.40 ? 7% perf-profile.children.cycles-pp.ip_finish_output2
14.24 ? 7% +3.9 18.13 ? 7% perf-profile.children.cycles-pp.__softirqentry_text_start
13.05 ? 7% +4.1 17.18 ? 7% perf-profile.children.cycles-pp.do_softirq
12.72 ? 7% +4.1 16.86 ? 7% perf-profile.children.cycles-pp.net_rx_action
13.11 ? 7% +4.1 17.26 ? 7% perf-profile.children.cycles-pp.__local_bh_enable_ip
0.83 ? 10% -0.5 0.34 ? 8% perf-profile.self.cycles-pp.__dev_queue_xmit
0.66 ? 9% -0.4 0.24 ? 18% perf-profile.self.cycles-pp.tcp_cleanup_rbuf
0.53 ? 8% -0.4 0.16 ? 15% perf-profile.self.cycles-pp.__alloc_skb
0.48 ? 10% -0.2 0.25 ? 12% perf-profile.self.cycles-pp.__ip_queue_xmit
0.20 ? 19% -0.1 0.06 ? 48% perf-profile.self.cycles-pp.__kfree_skb
0.41 ? 18% -0.1 0.28 ? 16% perf-profile.self.cycles-pp.tcp_rtt_estimator
0.14 ? 6% -0.0 0.11 ? 25% perf-profile.self.cycles-pp.orc_find
0.12 ? 17% +0.0 0.15 ? 7% perf-profile.self.cycles-pp.__kmalloc_node_track_caller
0.10 ? 11% +0.1 0.15 ? 14% perf-profile.self.cycles-pp.___slab_alloc
0.02 ?122% +0.1 0.10 ? 8% perf-profile.self.cycles-pp.validate_xmit_skb
0.00 +0.1 0.09 ? 14% perf-profile.self.cycles-pp.kfree
0.40 ? 16% +0.1 0.50 ? 10% perf-profile.self.cycles-pp.skb_page_frag_refill
0.65 ? 10% +0.1 0.80 ? 7% perf-profile.self.cycles-pp.__skb_clone
0.05 ? 9% +0.2 0.22 ? 17% perf-profile.self.cycles-pp.kfree_skbmem
0.14 ? 16% +0.2 0.34 ? 10% perf-profile.self.cycles-pp.kmem_cache_free
0.14 ? 11% +0.4 0.54 ? 2% perf-profile.self.cycles-pp.__ksize
0.00 +0.5 0.54 ? 5% perf-profile.self.cycles-pp.dst_release
0.14 ? 21% +0.6 0.70 ? 10% perf-profile.self.cycles-pp.__slab_free
0.36 ? 12% +0.6 1.00 ? 5% perf-profile.self.cycles-pp.tcp_clean_rtx_queue
0.15 ? 6% +0.7 0.89 ? 8% perf-profile.self.cycles-pp.net_rx_action
0.24 ? 21% +0.8 1.05 ? 7% perf-profile.self.cycles-pp.skb_release_data
1.12 ? 7% +1.5 2.64 ? 5% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests