2021-09-12 15:19:13

by kernel test robot

[permalink] [raw]
Subject: [sched/fair] 56498cfb04: netperf.Throughput_tps -5.4% regression



Greeting,

FYI, we noticed a -5.4% regression of netperf.Throughput_tps due to commit:


commit: 56498cfb045d7147cdcba33795d19429afcd1d00 ("sched/fair: Avoid a second scan of target in select_idle_cpu")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master



in testcase: netperf
on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz with 256G memory
with following parameters:

ip: ipv4
runtime: 300s
nr_threads: 50%
cluster: cs-localhost
test: UDP_RR
cpufreq_governor: performance
ucode: 0xd000280

test-description: Netperf is a benchmark that can be use to measure various aspect of networking performance.
test-url: http://www.netperf.org/netperf/



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
bin/lkp run generated-yaml-file

=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase/ucode:
cs-localhost/gcc-9/performance/ipv4/x86_64-rhel-8.3/50%/debian-10.4-x86_64-20200603.cgz/300s/lkp-icl-2sp2/UDP_RR/netperf/0xd000280

commit:
89aafd67f2 ("sched/fair: Use prev instead of new target as recent_used_cpu")
56498cfb04 ("sched/fair: Avoid a second scan of target in select_idle_cpu")

89aafd67f28c9e3b 56498cfb045d7147cdcba33795d
---------------- ---------------------------
%stddev %change %stddev
\ | \
4980199 -5.4% 4710753 netperf.Throughput_total_tps
77815 -5.4% 73605 netperf.Throughput_tps
3057 -1.5% 3012 netperf.time.percent_of_cpu_this_job_got
8502 -1.0% 8417 netperf.time.system_time
704.60 ? 6% -7.3% 652.99 netperf.time.user_time
1.475e+09 -4.6% 1.407e+09 netperf.time.voluntary_context_switches
1.494e+09 -5.4% 1.413e+09 netperf.workload
81047 ? 24% +100.4% 162384 ? 32% numa-meminfo.node1.AnonPages.max
74515 ? 45% -62.9% 27610 ? 70% numa-numastat.node0.other_node
74346 ? 45% -62.2% 28115 ? 67% numa-vmstat.node0.numa_other
44.36 +0.8 45.20 turbostat.C1%
8565 ? 16% -14.3% 7342 ? 20% interrupts.CPU93.NMI:Non-maskable_interrupts
8565 ? 16% -14.3% 7342 ? 20% interrupts.CPU93.PMI:Performance_monitoring_interrupts
19377271 -5.2% 18374412 vmstat.system.cs
9888699 -5.0% 9397833 vmstat.system.in
789686 ? 4% +5.8% 835223 ? 3% proc-vmstat.nr_mapped
3724 +2.9% 3834 proc-vmstat.nr_page_table_pages
1120470 +5.3% 1179918 ? 4% proc-vmstat.numa_pte_updates
9.40 ? 2% +37.2% 12.90 perf-stat.i.MPKI
3.325e+10 -5.3% 3.148e+10 perf-stat.i.branch-instructions
4.653e+08 -4.8% 4.431e+08 perf-stat.i.branch-misses
14090850 ? 7% +23.6% 17410487 ? 7% perf-stat.i.cache-misses
1.558e+09 +29.7% 2.021e+09 perf-stat.i.cache-references
19525342 -5.1% 18528333 perf-stat.i.context-switches
1.48 +4.0% 1.54 perf-stat.i.cpi
2.478e+11 -2.3% 2.422e+11 perf-stat.i.cpu-cycles
20475 ? 7% -23.0% 15760 ? 9% perf-stat.i.cycles-between-cache-misses
0.02 ? 2% +0.0 0.03 ? 92% perf-stat.i.dTLB-load-miss-rate%
8389062 ? 2% +12.1% 9400511 ? 2% perf-stat.i.dTLB-load-misses
4.883e+10 -5.4% 4.619e+10 perf-stat.i.dTLB-loads
0.00 ? 6% +0.0 0.00 ? 56% perf-stat.i.dTLB-store-miss-rate%
2.983e+10 -5.5% 2.821e+10 perf-stat.i.dTLB-stores
1.688e+11 -5.4% 1.598e+11 perf-stat.i.instructions
0.68 -3.4% 0.66 perf-stat.i.ipc
1.94 -2.3% 1.89 perf-stat.i.metric.GHz
886.48 -4.9% 842.91 perf-stat.i.metric.M/sec
9.23 ? 2% +37.1% 12.65 perf-stat.overall.MPKI
1.47 +3.3% 1.52 perf-stat.overall.cpi
17698 ? 7% -21.0% 13980 ? 7% perf-stat.overall.cycles-between-cache-misses
0.02 ? 2% +0.0 0.02 ? 3% perf-stat.overall.dTLB-load-miss-rate%
0.00 ? 6% +0.0 0.00 ? 6% perf-stat.overall.dTLB-store-miss-rate%
0.68 -3.2% 0.66 perf-stat.overall.ipc
3.314e+10 -5.3% 3.137e+10 perf-stat.ps.branch-instructions
4.638e+08 -4.8% 4.416e+08 perf-stat.ps.branch-misses
14042045 ? 8% +23.6% 17351753 ? 7% perf-stat.ps.cache-misses
1.553e+09 +29.7% 2.014e+09 perf-stat.ps.cache-references
19460780 -5.1% 18466153 perf-stat.ps.context-switches
2.47e+11 -2.3% 2.414e+11 perf-stat.ps.cpu-cycles
8348081 ? 2% +12.1% 9359431 ? 2% perf-stat.ps.dTLB-load-misses
4.866e+10 -5.4% 4.603e+10 perf-stat.ps.dTLB-loads
2.973e+10 -5.5% 2.811e+10 perf-stat.ps.dTLB-stores
1.683e+11 -5.4% 1.592e+11 perf-stat.ps.instructions
5.085e+13 -4.8% 4.841e+13 perf-stat.total.instructions
44.71 -1.0 43.72 perf-profile.calltrace.cycles-pp.asm_sysvec_reschedule_ipi.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
27.78 -0.6 27.20 perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
27.63 -0.6 27.07 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary
26.02 -0.5 25.49 perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
1.81 ? 2% -0.2 1.58 ? 2% perf-profile.calltrace.cycles-pp.ip_route_output_flow.udp_sendmsg.sock_sendmsg.__sys_sendto.__x64_sys_sendto
1.76 ? 2% -0.2 1.53 ? 2% perf-profile.calltrace.cycles-pp.ip_route_output_key_hash.ip_route_output_flow.udp_sendmsg.sock_sendmsg.__sys_sendto
1.51 ? 2% -0.2 1.31 ? 2% perf-profile.calltrace.cycles-pp.ip_route_output_key_hash_rcu.ip_route_output_key_hash.ip_route_output_flow.udp_sendmsg.sock_sendmsg
2.75 ? 2% -0.1 2.64 ? 2% perf-profile.calltrace.cycles-pp.menu_select.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
1.02 ? 4% -0.1 0.92 ? 4% perf-profile.calltrace.cycles-pp.fib_table_lookup.ip_route_output_key_hash_rcu.ip_route_output_key_hash.ip_route_output_flow.udp_sendmsg
0.97 -0.1 0.91 perf-profile.calltrace.cycles-pp.ip_generic_getfrag.__ip_append_data.ip_make_skb.udp_sendmsg.sock_sendmsg
0.77 -0.1 0.70 perf-profile.calltrace.cycles-pp.security_socket_sendmsg.sock_sendmsg.__sys_sendto.__x64_sys_sendto.do_syscall_64
0.61 -0.1 0.55 perf-profile.calltrace.cycles-pp._copy_from_iter_full.ip_generic_getfrag.__ip_append_data.ip_make_skb.udp_sendmsg
5.78 -0.1 5.73 perf-profile.calltrace.cycles-pp.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.do_idle
1.44 -0.0 1.40 perf-profile.calltrace.cycles-pp.ttwu_do_wakeup.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
1.34 -0.0 1.29 perf-profile.calltrace.cycles-pp.check_preempt_curr.ttwu_do_wakeup.try_to_wake_up.autoremove_wake_function.__wake_up_common
0.84 -0.0 0.80 perf-profile.calltrace.cycles-pp.native_apic_msr_eoi_write.sysvec_reschedule_ipi.asm_sysvec_reschedule_ipi.acpi_idle_do_entry.acpi_idle_enter
0.54 -0.0 0.52 perf-profile.calltrace.cycles-pp.__switch_to
0.63 -0.0 0.60 perf-profile.calltrace.cycles-pp.update_load_avg.set_next_entity.pick_next_task_fair.pick_next_task.__schedule
0.55 -0.0 0.53 perf-profile.calltrace.cycles-pp.pick_next_task.__schedule.schedule.schedule_timeout.__skb_wait_for_more_packets
1.04 +0.0 1.08 perf-profile.calltrace.cycles-pp.update_load_avg.dequeue_entity.dequeue_task_fair.__schedule.schedule
0.62 ? 2% +0.0 0.66 perf-profile.calltrace.cycles-pp.update_load_avg.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up
0.58 ? 2% +0.0 0.63 perf-profile.calltrace.cycles-pp.__slab_free.__consume_stateless_skb.udp_recvmsg.inet_recvmsg.__sys_recvfrom
0.70 ? 2% +0.1 0.75 perf-profile.calltrace.cycles-pp.perf_trace_sched_wakeup_template.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
2.13 +0.1 2.18 perf-profile.calltrace.cycles-pp.sysvec_reschedule_ipi.asm_sysvec_reschedule_ipi.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
1.29 ? 2% +0.1 1.35 perf-profile.calltrace.cycles-pp.update_curr.dequeue_entity.dequeue_task_fair.__schedule.schedule
0.92 +0.1 0.98 perf-profile.calltrace.cycles-pp.irqtime_account_irq.__softirqentry_text_start.do_softirq.__local_bh_enable_ip.ip_finish_output2
0.80 ? 2% +0.1 0.86 perf-profile.calltrace.cycles-pp.sock_wfree.loopback_xmit.dev_hard_start_xmit.__dev_queue_xmit.ip_finish_output2
0.65 ? 2% +0.1 0.73 perf-profile.calltrace.cycles-pp.switch_mm_irqs_off.__schedule.schedule_idle.do_idle.cpu_startup_entry
0.94 +0.1 1.03 perf-profile.calltrace.cycles-pp.tick_nohz_idle_exit.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
0.92 ? 2% +0.1 1.00 perf-profile.calltrace.cycles-pp.__check_object_size.udp_recvmsg.inet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom
0.96 +0.1 1.05 perf-profile.calltrace.cycles-pp.select_task_rq_fair.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
1.64 ? 2% +0.1 1.73 perf-profile.calltrace.cycles-pp.loopback_xmit.dev_hard_start_xmit.__dev_queue_xmit.ip_finish_output2.ip_output
1.34 ? 3% +0.1 1.43 perf-profile.calltrace.cycles-pp.__consume_stateless_skb.udp_recvmsg.inet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom
1.77 +0.1 1.87 perf-profile.calltrace.cycles-pp.dev_hard_start_xmit.__dev_queue_xmit.ip_finish_output2.ip_output.ip_send_skb
2.96 +0.1 3.06 perf-profile.calltrace.cycles-pp.sock_alloc_send_pskb.__ip_append_data.ip_make_skb.udp_sendmsg.sock_sendmsg
2.52 +0.1 2.62 perf-profile.calltrace.cycles-pp.__dev_queue_xmit.ip_finish_output2.ip_output.ip_send_skb.udp_send_skb
0.56 ? 3% +0.1 0.66 perf-profile.calltrace.cycles-pp.update_ts_time_stats.tick_nohz_idle_exit.do_idle.cpu_startup_entry.start_secondary
1.09 ? 2% +0.1 1.20 perf-profile.calltrace.cycles-pp._copy_to_iter.udp_recvmsg.inet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom
0.84 ? 3% +0.1 0.95 perf-profile.calltrace.cycles-pp.irqentry_enter.sysvec_reschedule_ipi.asm_sysvec_reschedule_ipi.acpi_idle_do_entry.acpi_idle_enter
0.68 ? 2% +0.1 0.78 perf-profile.calltrace.cycles-pp.select_idle_sibling.select_task_rq_fair.try_to_wake_up.autoremove_wake_function.__wake_up_common
3.13 +0.1 3.23 perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_task_fair.__schedule.schedule.schedule_timeout
0.73 ? 3% +0.1 0.83 perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.udp_recvmsg.inet_recvmsg.__sys_recvfrom
0.62 ? 4% +0.1 0.72 perf-profile.calltrace.cycles-pp.copy_user_generic_unrolled.copyout._copy_to_iter.udp_recvmsg.inet_recvmsg
3.50 +0.1 3.60 perf-profile.calltrace.cycles-pp.dequeue_task_fair.__schedule.schedule.schedule_timeout.__skb_wait_for_more_packets
0.79 ? 3% +0.1 0.91 perf-profile.calltrace.cycles-pp.__ksize.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.__ip_append_data
7.12 +0.1 7.25 perf-profile.calltrace.cycles-pp.schedule.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg
6.89 +0.2 7.04 perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp
0.99 ? 3% +0.2 1.15 perf-profile.calltrace.cycles-pp.update_rq_clock.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
1.90 +0.2 2.07 perf-profile.calltrace.cycles-pp.set_next_entity.pick_next_task_fair.pick_next_task.__schedule.schedule_idle
1.94 +0.2 2.10 perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
7.79 +0.2 8.00 perf-profile.calltrace.cycles-pp.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg.inet_recvmsg
2.62 +0.2 2.82 perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
2.52 +0.2 2.73 perf-profile.calltrace.cycles-pp.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
2.79 +0.2 3.02 perf-profile.calltrace.cycles-pp.pick_next_task_fair.pick_next_task.__schedule.schedule_idle.do_idle
3.10 +0.3 3.36 perf-profile.calltrace.cycles-pp.pick_next_task.__schedule.schedule_idle.do_idle.cpu_startup_entry
0.35 ? 70% +0.3 0.62 perf-profile.calltrace.cycles-pp.nr_iowait_cpu.update_ts_time_stats.tick_nohz_idle_exit.do_idle.cpu_startup_entry
9.02 +0.3 9.31 perf-profile.calltrace.cycles-pp.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg.inet_recvmsg.__sys_recvfrom
6.05 +0.4 6.40 perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
5.78 +0.4 6.15 perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
10.77 +0.4 11.17 perf-profile.calltrace.cycles-pp.__skb_recv_udp.udp_recvmsg.inet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom
0.00 +0.5 0.52 ? 3% perf-profile.calltrace.cycles-pp.skb_release_data.__consume_stateless_skb.udp_recvmsg.inet_recvmsg.__sys_recvfrom
18.15 +0.6 18.73 perf-profile.calltrace.cycles-pp.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
17.97 +0.6 18.56 perf-profile.calltrace.cycles-pp.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
55.03 +0.6 55.64 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
8.05 +0.6 8.67 perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.sock_def_readable
8.28 +0.6 8.90 perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.sock_def_readable.__udp_enqueue_schedule_skb
15.65 +0.7 16.30 perf-profile.calltrace.cycles-pp.inet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
15.52 +0.7 16.18 perf-profile.calltrace.cycles-pp.udp_recvmsg.inet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64
15.18 +0.7 15.86 perf-profile.calltrace.cycles-pp.net_rx_action.__softirqentry_text_start.do_softirq.__local_bh_enable_ip.ip_finish_output2
14.65 +0.7 15.34 perf-profile.calltrace.cycles-pp.__napi_poll.net_rx_action.__softirqentry_text_start.do_softirq.__local_bh_enable_ip
17.11 +0.7 17.81 perf-profile.calltrace.cycles-pp.__local_bh_enable_ip.ip_finish_output2.ip_output.ip_send_skb.udp_send_skb
16.91 +0.7 17.61 perf-profile.calltrace.cycles-pp.do_softirq.__local_bh_enable_ip.ip_finish_output2.ip_output.ip_send_skb
16.65 +0.7 17.36 perf-profile.calltrace.cycles-pp.__softirqentry_text_start.do_softirq.__local_bh_enable_ip.ip_finish_output2.ip_output
8.79 +0.7 9.51 perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.sock_def_readable.__udp_enqueue_schedule_skb.udp_queue_rcv_one_skb
9.06 +0.7 9.79 perf-profile.calltrace.cycles-pp.__wake_up_common_lock.sock_def_readable.__udp_enqueue_schedule_skb.udp_queue_rcv_one_skb.udp_unicast_rcv_skb
21.83 +0.7 22.58 perf-profile.calltrace.cycles-pp.udp_send_skb.udp_sendmsg.sock_sendmsg.__sys_sendto.__x64_sys_sendto
14.55 +0.8 15.31 perf-profile.calltrace.cycles-pp.process_backlog.__napi_poll.net_rx_action.__softirqentry_text_start.do_softirq
21.40 +0.8 22.16 perf-profile.calltrace.cycles-pp.ip_send_skb.udp_send_skb.udp_sendmsg.sock_sendmsg.__sys_sendto
21.02 +0.8 21.78 perf-profile.calltrace.cycles-pp.ip_output.ip_send_skb.udp_send_skb.udp_sendmsg.sock_sendmsg
20.17 +0.8 20.95 perf-profile.calltrace.cycles-pp.ip_finish_output2.ip_output.ip_send_skb.udp_send_skb.udp_sendmsg
9.51 +0.8 10.30 perf-profile.calltrace.cycles-pp.sock_def_readable.__udp_enqueue_schedule_skb.udp_queue_rcv_one_skb.udp_unicast_rcv_skb.__udp4_lib_rcv
9.87 +0.8 10.66 perf-profile.calltrace.cycles-pp.__udp_enqueue_schedule_skb.udp_queue_rcv_one_skb.udp_unicast_rcv_skb.__udp4_lib_rcv.ip_protocol_deliver_rcu
13.38 +0.8 14.20 perf-profile.calltrace.cycles-pp.__netif_receive_skb_one_core.process_backlog.__napi_poll.net_rx_action.__softirqentry_text_start
12.87 +0.8 13.72 perf-profile.calltrace.cycles-pp.ip_rcv.__netif_receive_skb_one_core.process_backlog.__napi_poll.net_rx_action
11.06 +0.9 11.91 perf-profile.calltrace.cycles-pp.udp_queue_rcv_one_skb.udp_unicast_rcv_skb.__udp4_lib_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish
12.27 +0.9 13.13 perf-profile.calltrace.cycles-pp.ip_local_deliver_finish.ip_local_deliver.ip_rcv.__netif_receive_skb_one_core.process_backlog
12.21 +0.9 13.07 perf-profile.calltrace.cycles-pp.ip_protocol_deliver_rcu.ip_local_deliver_finish.ip_local_deliver.ip_rcv.__netif_receive_skb_one_core
12.00 +0.9 12.86 perf-profile.calltrace.cycles-pp.__udp4_lib_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish.ip_local_deliver.ip_rcv
11.15 +0.9 12.01 perf-profile.calltrace.cycles-pp.udp_unicast_rcv_skb.__udp4_lib_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish.ip_local_deliver
12.35 +0.9 13.21 perf-profile.calltrace.cycles-pp.ip_local_deliver.ip_rcv.__netif_receive_skb_one_core.process_backlog.__napi_poll
28.03 -0.6 27.44 perf-profile.children.cycles-pp.cpuidle_enter
27.91 -0.6 27.34 perf-profile.children.cycles-pp.cpuidle_enter_state
26.24 -0.5 25.72 perf-profile.children.cycles-pp.acpi_idle_enter
25.83 -0.5 25.32 perf-profile.children.cycles-pp.acpi_idle_do_entry
23.91 -0.5 23.43 perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
1.82 ? 2% -0.2 1.59 ? 2% perf-profile.children.cycles-pp.ip_route_output_flow
1.77 -0.2 1.55 ? 2% perf-profile.children.cycles-pp.ip_route_output_key_hash
1.54 ? 2% -0.2 1.34 ? 2% perf-profile.children.cycles-pp.ip_route_output_key_hash_rcu
2.82 ? 3% -0.1 2.70 ? 2% perf-profile.children.cycles-pp.menu_select
1.03 ? 4% -0.1 0.93 ? 4% perf-profile.children.cycles-pp.fib_table_lookup
0.77 ? 2% -0.1 0.69 perf-profile.children.cycles-pp.syscall_return_via_sysret
0.58 ? 2% -0.1 0.51 ? 2% perf-profile.children.cycles-pp.kmem_cache_alloc_node
0.63 -0.1 0.57 perf-profile.children.cycles-pp._copy_from_iter_full
0.78 -0.1 0.72 perf-profile.children.cycles-pp.security_socket_sendmsg
0.99 -0.1 0.93 perf-profile.children.cycles-pp.ip_generic_getfrag
1.23 -0.1 1.17 perf-profile.children.cycles-pp.sched_clock_cpu
0.58 -0.1 0.52 perf-profile.children.cycles-pp.move_addr_to_kernel
1.46 -0.1 1.41 perf-profile.children.cycles-pp.ttwu_do_wakeup
0.45 ? 2% -0.0 0.40 perf-profile.children.cycles-pp._copy_from_user
1.12 -0.0 1.08 perf-profile.children.cycles-pp.__switch_to
0.34 ? 4% -0.0 0.30 perf-profile.children.cycles-pp.rcu_read_unlock_strict
1.35 -0.0 1.30 perf-profile.children.cycles-pp.check_preempt_curr
1.32 -0.0 1.27 perf-profile.children.cycles-pp.native_irq_return_iret
1.00 -0.0 0.95 perf-profile.children.cycles-pp.native_sched_clock
0.17 ? 6% -0.0 0.13 ? 3% perf-profile.children.cycles-pp.xfrm_lookup_route
1.00 ? 2% -0.0 0.96 perf-profile.children.cycles-pp.native_apic_msr_eoi_write
0.23 ? 4% -0.0 0.19 perf-profile.children.cycles-pp.__cond_resched
0.20 ? 11% -0.0 0.16 ? 6% perf-profile.children.cycles-pp.ip_setup_cork
0.24 ? 4% -0.0 0.20 ? 2% perf-profile.children.cycles-pp.import_single_range
0.89 -0.0 0.86 perf-profile.children.cycles-pp.sockfd_lookup_light
0.11 ? 3% -0.0 0.08 ? 5% perf-profile.children.cycles-pp.apparmor_socket_sendmsg
0.50 -0.0 0.48 perf-profile.children.cycles-pp.tick_nohz_idle_enter
0.07 ? 9% -0.0 0.04 ? 45% perf-profile.children.cycles-pp.find_exception
0.13 ? 6% -0.0 0.10 ? 3% perf-profile.children.cycles-pp.__fdget
0.33 -0.0 0.30 perf-profile.children.cycles-pp.copy_fpregs_to_fpstate
0.43 -0.0 0.41 perf-profile.children.cycles-pp.__wrgsbase_inactive
0.11 ? 6% -0.0 0.09 ? 4% perf-profile.children.cycles-pp.xfrm_lookup_with_ifid
0.09 ? 7% -0.0 0.07 ? 5% perf-profile.children.cycles-pp.rb_erase
0.07 ? 10% -0.0 0.05 perf-profile.children.cycles-pp.finish_wait
0.41 -0.0 0.39 perf-profile.children.cycles-pp.sock_recvmsg
0.36 -0.0 0.34 ? 2% perf-profile.children.cycles-pp.security_socket_recvmsg
0.25 ? 4% -0.0 0.23 ? 2% perf-profile.children.cycles-pp.resched_curr
0.31 -0.0 0.30 perf-profile.children.cycles-pp.kfree
0.27 -0.0 0.25 perf-profile.children.cycles-pp.copyin
0.06 ? 7% -0.0 0.05 ? 8% perf-profile.children.cycles-pp.native_smp_send_reschedule
0.43 ? 2% +0.0 0.46 ? 2% perf-profile.children.cycles-pp._raw_spin_lock_bh
0.28 +0.0 0.31 ? 2% perf-profile.children.cycles-pp.available_idle_cpu
0.44 ? 3% +0.0 0.48 perf-profile.children.cycles-pp.perf_tp_event
0.48 +0.0 0.52 ? 2% perf-profile.children.cycles-pp.pick_next_entity
0.40 ? 3% +0.1 0.46 ? 2% perf-profile.children.cycles-pp.__list_del_entry_valid
2.17 +0.1 2.22 perf-profile.children.cycles-pp.sysvec_reschedule_ipi
0.91 +0.1 0.97 perf-profile.children.cycles-pp.__slab_free
2.40 +0.1 2.46 perf-profile.children.cycles-pp.update_load_avg
0.71 ? 2% +0.1 0.77 perf-profile.children.cycles-pp.perf_trace_sched_wakeup_template
0.97 +0.1 1.04 perf-profile.children.cycles-pp.irqtime_account_irq
0.40 ? 3% +0.1 0.47 ? 2% perf-profile.children.cycles-pp.skb_set_owner_w
1.44 +0.1 1.50 perf-profile.children.cycles-pp.update_curr
0.81 ? 2% +0.1 0.88 perf-profile.children.cycles-pp.sock_wfree
0.37 ? 5% +0.1 0.44 perf-profile.children.cycles-pp.cpus_share_cache
0.42 ? 5% +0.1 0.50 ? 3% perf-profile.children.cycles-pp.sock_def_write_space
0.45 ? 6% +0.1 0.53 ? 4% perf-profile.children.cycles-pp.skb_release_data
1.37 +0.1 1.45 perf-profile.children.cycles-pp.__check_object_size
0.66 ? 2% +0.1 0.74 perf-profile.children.cycles-pp.switch_mm_irqs_off
0.97 +0.1 1.06 perf-profile.children.cycles-pp.tick_nohz_idle_exit
0.99 +0.1 1.08 perf-profile.children.cycles-pp.select_task_rq_fair
1.09 ? 2% +0.1 1.18 perf-profile.children.cycles-pp.copy_user_generic_unrolled
1.67 ? 2% +0.1 1.76 perf-profile.children.cycles-pp.loopback_xmit
1.35 ? 3% +0.1 1.44 perf-profile.children.cycles-pp.__consume_stateless_skb
1.78 ? 2% +0.1 1.88 perf-profile.children.cycles-pp.dev_hard_start_xmit
2.55 +0.1 2.64 perf-profile.children.cycles-pp.__dev_queue_xmit
2.97 +0.1 3.07 perf-profile.children.cycles-pp.sock_alloc_send_pskb
0.70 ? 2% +0.1 0.80 perf-profile.children.cycles-pp.select_idle_sibling
0.74 ? 3% +0.1 0.84 perf-profile.children.cycles-pp.copyout
1.11 ? 2% +0.1 1.21 perf-profile.children.cycles-pp._copy_to_iter
0.87 ? 3% +0.1 0.97 perf-profile.children.cycles-pp.irqentry_enter
3.21 +0.1 3.32 perf-profile.children.cycles-pp.dequeue_entity
0.59 ? 2% +0.1 0.69 perf-profile.children.cycles-pp.update_ts_time_stats
0.59 ? 3% +0.1 0.70 perf-profile.children.cycles-pp.nr_iowait_cpu
3.52 +0.1 3.62 perf-profile.children.cycles-pp.dequeue_task_fair
0.84 ? 3% +0.1 0.96 perf-profile.children.cycles-pp.__ksize
7.14 +0.1 7.27 perf-profile.children.cycles-pp.schedule
2.01 +0.2 2.17 perf-profile.children.cycles-pp.set_next_entity
2.01 +0.2 2.17 perf-profile.children.cycles-pp.enqueue_entity
1.71 ? 2% +0.2 1.90 perf-profile.children.cycles-pp.update_rq_clock
7.82 +0.2 8.02 perf-profile.children.cycles-pp.schedule_timeout
2.54 +0.2 2.76 perf-profile.children.cycles-pp.enqueue_task_fair
2.63 +0.2 2.84 perf-profile.children.cycles-pp.ttwu_do_activate
3.16 +0.2 3.39 perf-profile.children.cycles-pp.pick_next_task_fair
3.70 +0.2 3.95 perf-profile.children.cycles-pp.pick_next_task
9.06 +0.3 9.34 perf-profile.children.cycles-pp.__skb_wait_for_more_packets
6.12 +0.4 6.48 perf-profile.children.cycles-pp.schedule_idle
10.86 +0.4 11.26 perf-profile.children.cycles-pp.__skb_recv_udp
12.90 +0.5 13.41 perf-profile.children.cycles-pp.__schedule
18.19 +0.6 18.77 perf-profile.children.cycles-pp.__x64_sys_recvfrom
18.01 +0.6 18.60 perf-profile.children.cycles-pp.__sys_recvfrom
55.13 +0.6 55.74 perf-profile.children.cycles-pp.do_syscall_64
8.11 +0.6 8.73 perf-profile.children.cycles-pp.try_to_wake_up
8.30 +0.6 8.92 perf-profile.children.cycles-pp.autoremove_wake_function
15.66 +0.6 16.31 perf-profile.children.cycles-pp.inet_recvmsg
15.54 +0.7 16.20 perf-profile.children.cycles-pp.udp_recvmsg
15.22 +0.7 15.89 perf-profile.children.cycles-pp.net_rx_action
17.23 +0.7 17.92 perf-profile.children.cycles-pp.__local_bh_enable_ip
16.94 +0.7 17.64 perf-profile.children.cycles-pp.do_softirq
14.78 +0.7 15.48 perf-profile.children.cycles-pp.__napi_poll
16.77 +0.7 17.48 perf-profile.children.cycles-pp.__softirqentry_text_start
8.80 +0.7 9.52 perf-profile.children.cycles-pp.__wake_up_common
9.09 +0.7 9.82 perf-profile.children.cycles-pp.__wake_up_common_lock
14.59 +0.7 15.34 perf-profile.children.cycles-pp.process_backlog
21.84 +0.8 22.60 perf-profile.children.cycles-pp.udp_send_skb
21.43 +0.8 22.19 perf-profile.children.cycles-pp.ip_send_skb
21.04 +0.8 21.81 perf-profile.children.cycles-pp.ip_output
20.19 +0.8 20.98 perf-profile.children.cycles-pp.ip_finish_output2
9.52 +0.8 10.31 perf-profile.children.cycles-pp.sock_def_readable
9.89 +0.8 10.68 perf-profile.children.cycles-pp.__udp_enqueue_schedule_skb
13.40 +0.8 14.23 perf-profile.children.cycles-pp.__netif_receive_skb_one_core
12.90 +0.8 13.75 perf-profile.children.cycles-pp.ip_rcv
11.09 +0.8 11.94 perf-profile.children.cycles-pp.udp_queue_rcv_one_skb
12.30 +0.9 13.15 perf-profile.children.cycles-pp.ip_local_deliver_finish
12.23 +0.9 13.09 perf-profile.children.cycles-pp.ip_protocol_deliver_rcu
12.02 +0.9 12.88 perf-profile.children.cycles-pp.__udp4_lib_rcv
11.16 +0.9 12.02 perf-profile.children.cycles-pp.udp_unicast_rcv_skb
12.36 +0.9 13.22 perf-profile.children.cycles-pp.ip_local_deliver
21.50 -0.5 20.98 perf-profile.self.cycles-pp.acpi_idle_do_entry
1.08 ? 2% -0.1 0.94 perf-profile.self.cycles-pp.udp_sendmsg
0.77 -0.1 0.68 perf-profile.self.cycles-pp.syscall_return_via_sysret
0.42 ? 4% -0.1 0.36 ? 2% perf-profile.self.cycles-pp.ip_route_output_key_hash_rcu
0.53 -0.1 0.46 perf-profile.self.cycles-pp.process_backlog
1.01 ? 2% -0.1 0.96 perf-profile.self.cycles-pp.menu_select
1.00 -0.0 0.96 perf-profile.self.cycles-pp.native_apic_msr_eoi_write
0.96 -0.0 0.92 perf-profile.self.cycles-pp.native_sched_clock
1.04 -0.0 1.00 perf-profile.self.cycles-pp.__switch_to
1.31 -0.0 1.27 perf-profile.self.cycles-pp.native_irq_return_iret
0.14 ? 4% -0.0 0.10 ? 4% perf-profile.self.cycles-pp.__cond_resched
0.57 -0.0 0.54 perf-profile.self.cycles-pp.cpuidle_enter_state
0.20 ? 5% -0.0 0.16 ? 3% perf-profile.self.cycles-pp.rcu_read_unlock_strict
0.22 ? 4% -0.0 0.19 ? 5% perf-profile.self.cycles-pp.kmem_cache_free
0.17 ? 4% -0.0 0.14 ? 3% perf-profile.self.cycles-pp._copy_from_iter_full
0.10 ? 8% -0.0 0.07 ? 5% perf-profile.self.cycles-pp._copy_from_user
0.14 ? 10% -0.0 0.11 ? 3% perf-profile.self.cycles-pp.ip_setup_cork
0.22 ? 5% -0.0 0.19 perf-profile.self.cycles-pp.import_single_range
0.44 ? 3% -0.0 0.41 ? 2% perf-profile.self.cycles-pp.__ip_append_data
0.32 ? 2% -0.0 0.29 ? 2% perf-profile.self.cycles-pp.kmem_cache_alloc_node
0.10 ? 7% -0.0 0.08 ? 4% perf-profile.self.cycles-pp.__fdget
0.29 -0.0 0.26 perf-profile.self.cycles-pp.__napi_poll
0.49 -0.0 0.47 ? 2% perf-profile.self.cycles-pp.__softirqentry_text_start
0.40 -0.0 0.37 perf-profile.self.cycles-pp.net_rx_action
0.23 ? 3% -0.0 0.20 ? 3% perf-profile.self.cycles-pp.ip_route_output_key_hash
0.10 ? 3% -0.0 0.08 ? 4% perf-profile.self.cycles-pp.apparmor_socket_sendmsg
0.46 ? 2% -0.0 0.44 ? 2% perf-profile.self.cycles-pp.___might_sleep
0.32 -0.0 0.30 ? 2% perf-profile.self.cycles-pp.copy_fpregs_to_fpstate
0.30 -0.0 0.28 perf-profile.self.cycles-pp.kfree
0.42 -0.0 0.40 perf-profile.self.cycles-pp.__wrgsbase_inactive
0.27 -0.0 0.26 perf-profile.self.cycles-pp.security_socket_sendmsg
0.11 ? 3% -0.0 0.10 ? 5% perf-profile.self.cycles-pp.security_socket_recvmsg
0.10 ? 4% -0.0 0.09 ? 4% perf-profile.self.cycles-pp.xfrm_lookup_with_ifid
0.07 ? 7% -0.0 0.05 perf-profile.self.cycles-pp.rb_erase
0.17 ? 4% -0.0 0.15 ? 2% perf-profile.self.cycles-pp.schedule
0.18 ? 2% -0.0 0.17 perf-profile.self.cycles-pp.sched_clock_cpu
0.06 ? 7% -0.0 0.05 ? 7% perf-profile.self.cycles-pp.native_smp_send_reschedule
0.19 -0.0 0.18 perf-profile.self.cycles-pp.__x64_sys_sendto
0.14 -0.0 0.13 perf-profile.self.cycles-pp.tick_nohz_next_event
0.13 ? 2% +0.0 0.15 ? 3% perf-profile.self.cycles-pp.perf_trace_sched_wakeup_template
0.81 +0.0 0.84 perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
0.38 +0.0 0.40 ? 2% perf-profile.self.cycles-pp.pick_next_task
0.19 ? 18% +0.0 0.22 ? 4% perf-profile.self.cycles-pp.poll_idle
0.26 ? 2% +0.0 0.29 perf-profile.self.cycles-pp.available_idle_cpu
0.41 ? 2% +0.0 0.45 perf-profile.self.cycles-pp._raw_spin_lock_bh
0.54 ? 2% +0.0 0.58 perf-profile.self.cycles-pp.enqueue_task_fair
0.46 +0.0 0.50 ? 2% perf-profile.self.cycles-pp.pick_next_entity
0.33 ? 3% +0.0 0.37 perf-profile.self.cycles-pp.perf_tp_event
0.38 ? 3% +0.1 0.43 ? 2% perf-profile.self.cycles-pp.__list_del_entry_valid
0.90 +0.1 0.95 perf-profile.self.cycles-pp.__slab_free
0.90 ? 2% +0.1 0.95 perf-profile.self.cycles-pp.update_load_avg
1.53 +0.1 1.59 perf-profile.self.cycles-pp.__schedule
0.41 ? 4% +0.1 0.47 perf-profile.self.cycles-pp.sock_def_readable
0.40 ? 3% +0.1 0.46 ? 2% perf-profile.self.cycles-pp.skb_set_owner_w
0.37 ? 4% +0.1 0.44 perf-profile.self.cycles-pp.cpus_share_cache
0.76 ? 4% +0.1 0.83 perf-profile.self.cycles-pp.update_curr
0.42 ? 5% +0.1 0.49 ? 2% perf-profile.self.cycles-pp.sock_def_write_space
0.68 ? 2% +0.1 0.76 perf-profile.self.cycles-pp.schedule_timeout
0.60 ? 3% +0.1 0.67 perf-profile.self.cycles-pp.irqtime_account_irq
0.44 ? 6% +0.1 0.52 ? 3% perf-profile.self.cycles-pp.skb_release_data
0.86 ? 2% +0.1 0.94 perf-profile.self.cycles-pp.__skb_recv_udp
0.65 ? 2% +0.1 0.73 perf-profile.self.cycles-pp.switch_mm_irqs_off
0.54 ? 4% +0.1 0.62 ? 2% perf-profile.self.cycles-pp.udp_queue_rcv_one_skb
0.83 ? 2% +0.1 0.92 perf-profile.self.cycles-pp.__check_object_size
1.04 ? 3% +0.1 1.13 perf-profile.self.cycles-pp.copy_user_generic_unrolled
0.50 ? 4% +0.1 0.59 perf-profile.self.cycles-pp.__wake_up_common
0.58 ? 5% +0.1 0.68 perf-profile.self.cycles-pp.irqentry_enter
0.93 +0.1 1.04 perf-profile.self.cycles-pp.enqueue_entity
0.82 ? 2% +0.1 0.92 perf-profile.self.cycles-pp.__skb_wait_for_more_packets
0.58 ? 3% +0.1 0.68 perf-profile.self.cycles-pp.nr_iowait_cpu
0.82 ? 3% +0.1 0.94 perf-profile.self.cycles-pp.__ksize
0.75 ? 4% +0.1 0.87 perf-profile.self.cycles-pp.try_to_wake_up
1.19 ? 2% +0.2 1.39 perf-profile.self.cycles-pp.set_next_entity
1.24 ? 4% +0.2 1.45 perf-profile.self.cycles-pp.update_rq_clock



netperf.Throughput_tps

90000 +-------------------------------------------------------------------+
| .+. |
80000 |.+.+.++.+ +.+.++.+.+.+.+.++.+.+.+.+.++ +.+.+ +.+.+.+.+.++.+.+.|
70000 |-O O OO O O O O OO O O O O O O O O O O: O : O : O: O O O OO |
| : : : : |
60000 |-+ : : : : |
50000 |-+ : : : : |
| : : : : |
40000 |-+ : : : : |
30000 |-+ : : :: |
| : : :: |
20000 |-+ :: : |
10000 |-+ : : |
| : : |
0 +-------------------------------------------------------------------+


netperf.Throughput_total_tps

6e+06 +-------------------------------------------------------------------+
| |
5e+06 |.+.+.++.+.+.+.+.++.+.+.+.+.++.+.+.+.+.++ +.+.+ +.+.+.+.+.++.+.+.|
| O O OO O O O O OO O O O O OO O O O O O: O : O : O: O O O O OO |
| : : : : |
4e+06 |-+ : : : : |
| : : : : |
3e+06 |-+ : : : : |
| : : : : |
2e+06 |-+ : : :: |
| : : :: |
| :: : |
1e+06 |-+ : : |
| : : |
0 +-------------------------------------------------------------------+


netperf.workload

1.6e+09 +-----------------------------------------------------------------+
|.+.++.+.+ +.+.+.+.++.+.+.+.++.+.+.++.+ +.++ +.+.++.+.+.++.+.|
1.4e+09 |-O OO O O OO O O O OO O O O OO O O OO : O : O: O : O O O O O |
1.2e+09 |-+ : : : : |
| : : : : |
1e+09 |-+ : : : : |
| : : : : |
8e+08 |-+ : : : : |
| : : : : |
6e+08 |-+ : : : : |
4e+08 |-+ : : : : |
| : : |
2e+08 |-+ : : |
| : : |
0 +-----------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (43.67 kB)
config-5.13.0-rc6-00086-g56498cfb045d (176.83 kB)
job-script (8.18 kB)
job.yaml (5.61 kB)
reproduce (3.47 kB)
Download all attachments

2021-09-22 12:49:31

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [sched/fair] 56498cfb04: netperf.Throughput_tps -5.4% regression

On Sun, Sep 12, 2021 at 11:34:47PM +0800, kernel test robot wrote:
>
>
> Greeting,
>
> FYI, we noticed a -5.4% regression of netperf.Throughput_tps due to commit:
>
>
> commit: 56498cfb045d7147cdcba33795d19429afcd1d00 ("sched/fair: Avoid a second scan of target in select_idle_cpu")
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

Mel, was this in line with your own benchmarks?

2021-09-22 13:44:55

by Mel Gorman

[permalink] [raw]
Subject: Re: [sched/fair] 56498cfb04: netperf.Throughput_tps -5.4% regression

On Wed, Sep 22, 2021 at 02:44:00PM +0200, Peter Zijlstra wrote:
> On Sun, Sep 12, 2021 at 11:34:47PM +0800, kernel test robot wrote:
> >
> >
> > Greeting,
> >
> > FYI, we noticed a -5.4% regression of netperf.Throughput_tps due to commit:
> >
> >
> > commit: 56498cfb045d7147cdcba33795d19429afcd1d00 ("sched/fair: Avoid a second scan of target in select_idle_cpu")
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>
> Mel, was this in line with your own benchmarks?

UDP-RR was not run but I could add it for future reference as a
socket-based-ping-pong test. However, it would not be equivalent to lkp
as I only run one client/server.

For UDP_STREAM with single client the significant differences reported
were;

machine1: +1.07% to +1.54% depending on packet size
machine2: -1.4% to +0.9%
machine3: +1.5% to -2.46%
machine4: +1.16% to +1.64%
machine5: -1.59% to +1.23%
machine6: -2.10% to +1.83%

So it was a mix of small gains and some regressions with more gains than
losses. As netperf is running localhost, it can be a bit unreliable and
other workloads showed more gains than losses. On machine 2, total system
CPU usage went from 1195.21 seconds to 1197.52 seconds but activities like
context switches and interrupt deliveries were broadly similar. There
were differences in the total number of slab pages used but roughly
similar trends to probably reflect the system starting state more than
anything else.

On balance, I concluded that rescanning target is wasteful and that while
there might be slight variances, they would be difficult to consistent
reproduce. The largest concern is that skipping target means that one
additional new rq is potentially examined. That would incur a small
penalty if it was a wasteful search.

For the LKP test, the nr_threads are 50% so I expect with two sockets,
the machine is fully loaded and would be vulnerable to load-balancing
artifacts as client and server threads move around. Hence, I ended up
thinking that this result was likely a false positive.

--
Mel Gorman
SUSE Labs

2021-09-22 14:35:50

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [sched/fair] 56498cfb04: netperf.Throughput_tps -5.4% regression

On Wed, Sep 22, 2021 at 02:42:47PM +0100, Mel Gorman wrote:
> For the LKP test, the nr_threads are 50% so I expect with two sockets,
> the machine is fully loaded and would be vulnerable to load-balancing
> artifacts as client and server threads move around. Hence, I ended up
> thinking that this result was likely a false positive.

Fair enough, lets ignore it for now then :-)