2021-03-17 13:37:25

by kernel test robot

[permalink] [raw]
Subject: [bpf] a9ed15dae0: netperf.Throughput_tps 3.9% improvement



Greeting,

FYI, we noticed a 3.9% improvement of netperf.Throughput_tps due to commit:


commit: a9ed15dae0755a0368735e0556a462d8519bdb05 ("bpf: Split cgroup_bpf_enabled per attach type")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: netperf
on test machine: 88 threads Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory
with following parameters:

ip: ipv4
runtime: 300s
nr_threads: 25%
cluster: cs-localhost
test: UDP_RR
cpufreq_governor: performance
ucode: 0x5003006

test-description: Netperf is a benchmark that can be use to measure various aspect of networking performance.
test-url: http://www.netperf.org/netperf/





Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml
bin/lkp run compatible-job.yaml

=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase/ucode:
cs-localhost/gcc-9/performance/ipv4/x86_64-rhel-8.3/25%/debian-10.4-x86_64-20200603.cgz/300s/lkp-csl-2sp9/UDP_RR/netperf/0x5003006

commit:
20f2505fb4 ("bpf: Try to avoid kzalloc in cgroup/{s,g}etsockopt")
a9ed15dae0 ("bpf: Split cgroup_bpf_enabled per attach type")

20f2505fb436cfa6 a9ed15dae0755a0368735e0556a
---------------- ---------------------------
%stddev %change %stddev
\ | \
2049344 +3.9% 2129386 netperf.Throughput_total_tps
93152 +3.9% 96790 netperf.Throughput_tps
9796 ? 2% +7.2% 10501 netperf.time.involuntary_context_switches
6.147e+08 +3.9% 6.387e+08 netperf.time.voluntary_context_switches
6.148e+08 +3.9% 6.388e+08 netperf.workload
10.42 -1.1 9.36 turbostat.C1%
8089495 +3.9% 8405213 vmstat.system.cs
2.774e+09 -10.2% 2.492e+09 cpuidle.C1.time
1.754e+09 +10.7% 1.941e+09 cpuidle.POLL.time
5.579e+08 +17.4% 6.552e+08 cpuidle.POLL.usage
3.00 ?223% +686.6% 23.56 ?105% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
15.59 ? 74% +109.0% 32.59 ? 12% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork
0.01 ? 71% +1483.3% 0.11 ? 56% perf-sched.sch_delay.max.ms.preempt_schedule_common._cond_resched.stop_one_cpu.affine_move_task.__set_cpus_allowed_ptr
1182 ? 73% +82.1% 2152 ? 6% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork
0.01 ? 71% +65.8% 0.01 ? 4% perf-sched.total_sch_delay.average.ms
833.13 ? 82% +100.0% 1666 ? 28% perf-sched.wait_and_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
833.12 ? 82% +100.0% 1666 ? 28% perf-sched.wait_time.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
3550136 ? 7% +17.9% 4185038 ? 5% softirqs.CPU0.NET_RX
7292845 ? 8% +11.0% 8091559 ? 4% softirqs.CPU11.NET_RX
16054 ? 11% -13.2% 13942 ? 5% softirqs.CPU16.RCU
15910 ? 9% -11.6% 14064 ? 6% softirqs.CPU22.RCU
15262 ? 17% -15.3% 12920 ? 4% softirqs.CPU3.RCU
15230 ? 13% -16.3% 12751 ? 5% softirqs.CPU49.RCU
14806 ? 13% -13.5% 12811 ? 3% softirqs.CPU50.RCU
14895 ? 14% -15.5% 12580 ? 6% softirqs.CPU55.RCU
14851 ? 12% -13.7% 12810 ? 4% softirqs.CPU59.RCU
5330188 ? 3% +8.8% 5801721 ? 5% softirqs.CPU63.NET_RX
5480730 ? 8% +15.0% 6302697 ? 4% softirqs.CPU80.NET_RX
731732 ? 3% +22.3% 894653 ? 5% interrupts.CAL:Function_call_interrupts
6652 ? 31% +58.2% 10526 ? 29% interrupts.CPU25.CAL:Function_call_interrupts
9123 ? 24% +57.8% 14392 ? 9% interrupts.CPU3.CAL:Function_call_interrupts
4532 ? 10% +19.6% 5418 ? 7% interrupts.CPU34.NMI:Non-maskable_interrupts
4532 ? 10% +19.6% 5418 ? 7% interrupts.CPU34.PMI:Performance_monitoring_interrupts
12587 ? 4% -18.6% 10245 ? 6% interrupts.CPU36.RES:Rescheduling_interrupts
3988 ? 30% +43.1% 5706 ? 9% interrupts.CPU37.NMI:Non-maskable_interrupts
3988 ? 30% +43.1% 5706 ? 9% interrupts.CPU37.PMI:Performance_monitoring_interrupts
4298 ? 20% +23.8% 5321 ? 10% interrupts.CPU38.NMI:Non-maskable_interrupts
4298 ? 20% +23.8% 5321 ? 10% interrupts.CPU38.PMI:Performance_monitoring_interrupts
2638 ? 29% +83.9% 4851 ? 20% interrupts.CPU43.NMI:Non-maskable_interrupts
2638 ? 29% +83.9% 4851 ? 20% interrupts.CPU43.PMI:Performance_monitoring_interrupts
9514 ? 19% +37.5% 13080 ? 12% interrupts.CPU47.CAL:Function_call_interrupts
8019 ? 25% +47.6% 11836 ? 34% interrupts.CPU50.CAL:Function_call_interrupts
3317 ? 37% +54.5% 5125 ? 9% interrupts.CPU63.NMI:Non-maskable_interrupts
3317 ? 37% +54.5% 5125 ? 9% interrupts.CPU63.PMI:Performance_monitoring_interrupts
6959 ? 46% +86.9% 13006 ? 30% interrupts.CPU73.CAL:Function_call_interrupts
19.24 +3.9% 20.00 perf-stat.i.MPKI
2.521e+08 +1.0% 2.547e+08 perf-stat.i.branch-misses
10735680 ? 8% +54.9% 16625059 ? 29% perf-stat.i.cache-misses
1.401e+09 +4.4% 1.462e+09 perf-stat.i.cache-references
8146058 +3.9% 8463745 perf-stat.i.context-switches
1.47 +1.5% 1.49 perf-stat.i.cpi
1.072e+11 +1.9% 1.093e+11 perf-stat.i.cpu-cycles
14335 ? 11% -35.3% 9272 ? 31% perf-stat.i.cycles-between-cache-misses
83.51 +0.9 84.43 perf-stat.i.iTLB-load-miss-rate%
38402582 -6.7% 35845533 perf-stat.i.iTLB-loads
0.69 -1.4% 0.68 perf-stat.i.ipc
1.22 +1.9% 1.24 perf-stat.i.metric.GHz
1939336 ? 3% +11.7% 2166223 ? 6% perf-stat.i.node-store-misses
18.96 +3.9% 19.69 perf-stat.overall.MPKI
1.45 +1.4% 1.47 perf-stat.overall.cpi
10050 ? 8% -29.4% 7096 ? 25% perf-stat.overall.cycles-between-cache-misses
83.62 +0.9 84.55 perf-stat.overall.iTLB-load-miss-rate%
0.69 -1.4% 0.68 perf-stat.overall.ipc
36166 -3.3% 34988 perf-stat.overall.path-length
2.512e+08 +1.0% 2.538e+08 perf-stat.ps.branch-misses
10702292 ? 8% +54.9% 16573593 ? 29% perf-stat.ps.cache-misses
1.396e+09 +4.4% 1.457e+09 perf-stat.ps.cache-references
8118168 +3.9% 8434896 perf-stat.ps.context-switches
1.069e+11 +1.9% 1.089e+11 perf-stat.ps.cpu-cycles
38273181 -6.7% 35723998 perf-stat.ps.iTLB-loads
1932777 ? 3% +11.7% 2159007 ? 6% perf-stat.ps.node-store-misses
14.95 -0.9 14.04 ? 6% perf-profile.calltrace.cycles-pp.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
14.81 -0.9 13.92 ? 6% perf-profile.calltrace.cycles-pp.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
13.33 -0.9 12.45 ? 6% perf-profile.calltrace.cycles-pp.inet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
13.28 -0.9 12.40 ? 6% perf-profile.calltrace.cycles-pp.udp_recvmsg.inet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64
0.72 ? 2% -0.1 0.67 ? 7% perf-profile.calltrace.cycles-pp.__kmalloc_node_track_caller.__kmalloc_reserve.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
14.97 -0.9 14.06 ? 6% perf-profile.children.cycles-pp.__x64_sys_recvfrom
14.82 -0.9 13.93 ? 6% perf-profile.children.cycles-pp.__sys_recvfrom
13.34 -0.9 12.45 ? 6% perf-profile.children.cycles-pp.inet_recvmsg
13.29 -0.9 12.41 ? 6% perf-profile.children.cycles-pp.udp_recvmsg
1.01 -0.6 0.43 ? 9% perf-profile.children.cycles-pp._raw_spin_lock_bh
0.27 ? 4% -0.1 0.18 ? 5% perf-profile.children.cycles-pp.__netif_receive_skb_core
0.55 ? 2% -0.1 0.48 ? 7% perf-profile.children.cycles-pp.__slab_free
0.44 ? 2% -0.1 0.38 ? 9% perf-profile.children.cycles-pp.skb_set_owner_w
0.74 ? 2% -0.1 0.68 ? 7% perf-profile.children.cycles-pp.__kmalloc_node_track_caller
0.29 -0.1 0.24 ? 5% perf-profile.children.cycles-pp.___perf_sw_event
0.21 -0.0 0.16 ? 8% perf-profile.children.cycles-pp.migrate_enable
0.25 -0.0 0.20 ? 9% perf-profile.children.cycles-pp.__might_sleep
0.12 ? 5% -0.0 0.08 ? 10% perf-profile.children.cycles-pp._cond_resched
0.07 ? 5% -0.0 0.04 ? 71% perf-profile.children.cycles-pp.raw_local_deliver
0.24 ? 3% -0.0 0.20 ? 6% perf-profile.children.cycles-pp.security_socket_recvmsg
0.18 ? 4% -0.0 0.15 ? 7% perf-profile.children.cycles-pp.__ip_finish_output
0.12 ? 4% -0.0 0.09 ? 10% perf-profile.children.cycles-pp.rcu_read_unlock_strict
0.26 ? 2% -0.0 0.23 ? 7% perf-profile.children.cycles-pp.sock_recvmsg
0.24 ? 2% -0.0 0.22 ? 4% perf-profile.children.cycles-pp.ipv4_mtu
0.00 +0.1 0.05 ? 8% perf-profile.children.cycles-pp.rcu_note_context_switch
0.28 ? 2% +0.1 0.34 ? 4% perf-profile.children.cycles-pp.skb_release_data
0.39 ? 3% +0.1 0.53 ? 7% perf-profile.children.cycles-pp.__cgroup_bpf_run_filter_skb
0.27 ? 4% +0.2 0.43 ? 7% perf-profile.children.cycles-pp.ip_finish_output
0.99 -0.6 0.42 ? 9% perf-profile.self.cycles-pp._raw_spin_lock_bh
0.27 ? 3% -0.1 0.18 ? 7% perf-profile.self.cycles-pp.__netif_receive_skb_core
0.22 ? 4% -0.1 0.14 ? 8% perf-profile.self.cycles-pp.validate_xmit_skb
0.59 ? 2% -0.1 0.52 ? 5% perf-profile.self.cycles-pp.udp_sendmsg
0.22 ? 3% -0.1 0.15 ? 9% perf-profile.self.cycles-pp.__local_bh_enable_ip
0.54 ? 2% -0.1 0.47 ? 7% perf-profile.self.cycles-pp.__slab_free
0.43 ? 2% -0.1 0.37 ? 9% perf-profile.self.cycles-pp.skb_set_owner_w
0.67 ? 2% -0.1 0.61 ? 7% perf-profile.self.cycles-pp.__skb_wait_for_more_packets
0.29 ? 3% -0.0 0.25 ? 6% perf-profile.self.cycles-pp.__kmalloc_node_track_caller
0.19 ? 3% -0.0 0.15 ? 6% perf-profile.self.cycles-pp.migrate_enable
0.07 ? 11% -0.0 0.03 ? 70% perf-profile.self.cycles-pp.raw_local_deliver
0.21 ? 3% -0.0 0.18 ? 8% perf-profile.self.cycles-pp.__might_sleep
0.13 ? 11% -0.0 0.10 ? 10% perf-profile.self.cycles-pp.udp_unicast_rcv_skb
0.21 ? 4% -0.0 0.18 ? 6% perf-profile.self.cycles-pp.___perf_sw_event
0.09 ? 5% -0.0 0.07 ? 10% perf-profile.self.cycles-pp.rcu_read_unlock_strict
0.23 ? 2% -0.0 0.21 ? 4% perf-profile.self.cycles-pp.ipv4_mtu
0.12 ? 4% -0.0 0.10 ? 9% perf-profile.self.cycles-pp.migrate_disable
0.08 ? 6% -0.0 0.06 ? 11% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.06 ? 7% +0.0 0.10 ? 8% perf-profile.self.cycles-pp.asm_call_sysvec_on_stack
0.27 ? 3% +0.0 0.31 ? 10% perf-profile.self.cycles-pp.__udp_enqueue_schedule_skb
0.28 ? 2% +0.1 0.34 ? 4% perf-profile.self.cycles-pp.skb_release_data
0.01 ?223% +0.1 0.07 ? 6% perf-profile.self.cycles-pp.__ip_select_ident



netperf.Throughput_tps

97500 +-------------------------------------------------------------------+
97000 |-+ |
| O |
96500 |-+ O O |
96000 |-+ |
| |
95500 |-+ |
95000 |-+ |
94500 |-+ |
| |
94000 |-+ |
93500 |-+ ......+.............+............ |
| ...+...... +.............|
93000 |-+ ....... |
92500 +-------------------------------------------------------------------+


netperf.Throughput_total_tps

2.16e+06 +----------------------------------------------------------------+
| |
2.14e+06 |-+ O |
| O |
2.12e+06 |-+ O O |
| |
2.1e+06 |-+ |
| |
2.08e+06 |-+ |
| |
2.06e+06 |-+ ......+............ |
| ......+...... +............+............|
2.04e+06 |...... |
| |
2.02e+06 +----------------------------------------------------------------+


netperf.workload

6.45e+08 +----------------------------------------------------------------+
| O |
6.4e+08 |-+ O |
| O |
6.35e+08 |-+ O |
| |
6.3e+08 |-+ |
| |
6.25e+08 |-+ |
| |
6.2e+08 |-+ |
| ......+............+............ |
6.15e+08 |-+ ......+...... +............|
|...... |
6.1e+08 +----------------------------------------------------------------+


netperf.time.voluntary_context_switches

6.45e+08 +----------------------------------------------------------------+
| O |
6.4e+08 |-+ |
| O O |
6.35e+08 |-+ O |
| |
6.3e+08 |-+ |
| |
6.25e+08 |-+ |
| |
6.2e+08 |-+ |
| ......+............+............ |
6.15e+08 |-+ ......+...... +............|
|...... |
6.1e+08 +----------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (19.38 kB)
config-5.11.0-rc4-00516-ga9ed15dae075 (175.07 kB)
job-script (8.24 kB)
job.yaml (5.55 kB)
reproduce (1.38 kB)
Download all attachments