2022-02-28 17:39:27

by kernel test robot

[permalink] [raw]
Subject: [mm/page_alloc] 39907a939a: netperf.Throughput_Mbps -18.1% regression



Greeting,

FYI, we noticed a -18.1% regression of netperf.Throughput_Mbps due to commit:


commit: 39907a939a34033eeea112751f0e4330628d3a9a ("mm/page_alloc: Limit number of high-order pages on PCP during bulk free")
https://git.kernel.org/cgit/linux/kernel/git/mel/linux.git mm-pcpllist-v1r2

in testcase: netperf
on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz with 128G memory
with following parameters:

ip: ipv4
runtime: 300s
nr_threads: 1
cluster: cs-localhost
test: UDP_STREAM
cpufreq_governor: performance
ucode: 0xd000331

test-description: Netperf is a benchmark that can be use to measure various aspect of networking performance.
test-url: http://www.netperf.org/netperf/



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase/ucode:
cs-localhost/gcc-9/performance/ipv4/x86_64-rhel-8.3/1/debian-10.4-x86_64-20200603.cgz/300s/lkp-icl-2sp4/UDP_STREAM/netperf/0xd000331

commit:
2009ed59ab ("mm/page_alloc: Free pages in a single pass during bulk free")
39907a939a ("mm/page_alloc: Limit number of high-order pages on PCP during bulk free")

2009ed59ab8200e6 39907a939a34033eeea112751f0
---------------- ---------------------------
%stddev %change %stddev
\ | \
122291 -18.1% 100120 netperf.Throughput_Mbps
122291 -18.1% 100120 netperf.Throughput_total_Mbps
90.83 -2.0% 89.00 netperf.time.percent_of_cpu_this_job_got
70006621 -18.1% 57314514 netperf.workload
75331549 ? 3% +14.8% 86505421 cpuidle..usage
29422 +31.0% 38553 ? 2% meminfo.Shmem
9976 ? 36% +111.0% 21053 ? 30% numa-meminfo.node1.Shmem
77114773 +14.8% 88560654 turbostat.IRQ
1.371e+08 -28.6% 97904821 ? 44% numa-numastat.node0.local_node
1.366e+08 -29.0% 96946927 ? 44% numa-numastat.node0.numa_hit
1754 ? 2% +6848.5% 121876 ? 2% vmstat.system.cs
253706 +14.8% 291171 vmstat.system.in
35.54 ? 3% -7.2% 32.98 ? 3% boot-time.boot
17.61 ? 6% -14.1% 15.13 ? 8% boot-time.dhcp
4043 ? 3% -8.1% 3714 ? 4% boot-time.idle
69337635 -28.5% 49607209 ? 43% numa-vmstat.node0.numa_hit
69744503 -28.2% 50050712 ? 43% numa-vmstat.node0.numa_local
2499 ? 36% +112.0% 5297 ? 29% numa-vmstat.node1.nr_shmem
248090 ? 6% +24.4% 308574 ? 5% perf-stat.i.cache-misses
1683 ? 2% +7192.7% 122793 ? 2% perf-stat.i.context-switches
8145 ? 17% +37.1% 11170 ? 12% perf-stat.i.node-loads
35521 ? 18% +35.9% 48285 ? 18% perf-stat.i.node-stores
0.05 ? 14% +0.0 0.06 ? 8% perf-stat.overall.cache-miss-rate%
49509 ? 15% -20.5% 39345 ? 8% perf-stat.overall.cycles-between-cache-misses
18429 +22.1% 22495 perf-stat.overall.path-length
247210 ? 6% +24.4% 307535 ? 5% perf-stat.ps.cache-misses
1677 ? 2% +7194.0% 122383 ? 2% perf-stat.ps.context-switches
8114 ? 17% +37.2% 11131 ? 12% perf-stat.ps.node-loads
35383 ? 18% +36.0% 48111 ? 18% perf-stat.ps.node-stores
71035 +2.8% 73029 proc-vmstat.nr_inactive_anon
9465 +4.4% 9881 ? 2% proc-vmstat.nr_mapped
7362 +30.3% 9592 ? 3% proc-vmstat.nr_shmem
71035 +2.8% 73029 proc-vmstat.nr_zone_inactive_anon
1.371e+08 -14.3% 1.174e+08 ? 2% proc-vmstat.numa_hit
1.375e+08 -14.1% 1.182e+08 proc-vmstat.numa_local
15448 ?110% +207.4% 47492 ? 28% proc-vmstat.numa_pte_updates
8244 ? 4% +1194.8% 106745 ? 6% proc-vmstat.pgactivate
1.352e+08 -14.5% 1.155e+08 proc-vmstat.pgalloc_normal
1059186 +1.2% 1072281 proc-vmstat.pgfault
1.352e+08 -14.5% 1.156e+08 proc-vmstat.pgfree
24.78 ? 8% -5.4 19.42 ? 15% perf-profile.calltrace.cycles-pp.__skb_datagram_iter.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg.__sys_recvfrom
24.81 ? 8% -5.4 19.45 ? 15% perf-profile.calltrace.cycles-pp.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom
23.21 ? 8% -5.0 18.16 ? 15% perf-profile.calltrace.cycles-pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg
22.81 ? 8% -5.0 17.80 ? 15% perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.udp_recvmsg
22.64 ? 8% -5.0 17.68 ? 15% perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter
1.63 ? 18% +0.5 2.16 ? 9% perf-profile.calltrace.cycles-pp.ip_rcv.__netif_receive_skb_one_core.process_backlog.__napi_poll.net_rx_action
1.50 ? 18% +0.5 2.05 ? 9% perf-profile.calltrace.cycles-pp.ip_local_deliver.ip_rcv.__netif_receive_skb_one_core.process_backlog.__napi_poll
1.48 ? 18% +0.6 2.03 ? 9% perf-profile.calltrace.cycles-pp.ip_local_deliver_finish.ip_local_deliver.ip_rcv.__netif_receive_skb_one_core.process_backlog
1.46 ? 18% +0.6 2.02 ? 9% perf-profile.calltrace.cycles-pp.ip_protocol_deliver_rcu.ip_local_deliver_finish.ip_local_deliver.ip_rcv.__netif_receive_skb_one_core
1.37 ? 18% +0.6 1.94 ? 9% perf-profile.calltrace.cycles-pp.__udp4_lib_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish.ip_local_deliver.ip_rcv
1.09 ? 19% +0.6 1.70 ? 10% perf-profile.calltrace.cycles-pp.udp_unicast_rcv_skb.__udp4_lib_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish.ip_local_deliver
1.03 ? 18% +0.7 1.68 ? 10% perf-profile.calltrace.cycles-pp.udp_queue_rcv_one_skb.udp_unicast_rcv_skb.__udp4_lib_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish
0.00 +0.7 0.66 ? 9% perf-profile.calltrace.cycles-pp.free_pcppages_bulk.free_unref_page.skb_release_data.__consume_stateless_skb.udp_recvmsg
0.00 +0.7 0.69 ? 12% perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.sock_def_readable
0.00 +0.7 0.70 ? 12% perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.sock_def_readable.__udp_enqueue_schedule_skb
0.00 +0.7 0.74 ? 16% perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
0.00 +0.8 0.77 ? 16% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
0.00 +0.8 0.81 ? 13% perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.sock_def_readable.__udp_enqueue_schedule_skb.udp_queue_rcv_one_skb
0.00 +0.8 0.85 ? 12% perf-profile.calltrace.cycles-pp.__wake_up_common_lock.sock_def_readable.__udp_enqueue_schedule_skb.udp_queue_rcv_one_skb.udp_unicast_rcv_skb
0.00 +0.9 0.88 ? 16% perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp
0.00 +0.9 0.90 ? 15% perf-profile.calltrace.cycles-pp.schedule.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg
0.00 +0.9 0.94 ? 12% perf-profile.calltrace.cycles-pp.sock_def_readable.__udp_enqueue_schedule_skb.udp_queue_rcv_one_skb.udp_unicast_rcv_skb.__udp4_lib_rcv
0.00 +1.0 0.97 ? 15% perf-profile.calltrace.cycles-pp.schedule_timeout.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg.inet_recvmsg
0.18 ?141% +1.1 1.28 ? 9% perf-profile.calltrace.cycles-pp.__udp_enqueue_schedule_skb.udp_queue_rcv_one_skb.udp_unicast_rcv_skb.__udp4_lib_rcv.ip_protocol_deliver_rcu
0.00 +1.2 1.24 ? 15% perf-profile.calltrace.cycles-pp.__skb_wait_for_more_packets.__skb_recv_udp.udp_recvmsg.inet_recvmsg.__sys_recvfrom
0.51 ? 45% +1.6 2.09 ? 16% perf-profile.calltrace.cycles-pp.__skb_recv_udp.udp_recvmsg.inet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom
24.80 ? 8% -5.4 19.44 ? 15% perf-profile.children.cycles-pp.__skb_datagram_iter
24.81 ? 8% -5.4 19.46 ? 15% perf-profile.children.cycles-pp.skb_copy_datagram_iter
23.23 ? 8% -5.0 18.18 ? 15% perf-profile.children.cycles-pp._copy_to_iter
22.84 ? 8% -5.0 17.82 ? 15% perf-profile.children.cycles-pp.copyout
0.41 ? 16% -0.2 0.22 ? 27% perf-profile.children.cycles-pp.udp_rmem_release
0.52 ? 8% -0.1 0.39 ? 12% perf-profile.children.cycles-pp.free_pcp_prepare
0.18 ? 18% -0.1 0.06 ? 45% perf-profile.children.cycles-pp.free_unref_page_commit
0.08 ? 19% -0.0 0.04 ? 45% perf-profile.children.cycles-pp.kmem_cache_free
0.10 ? 16% +0.0 0.15 ? 8% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.03 ?100% +0.1 0.08 ? 13% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
0.00 +0.1 0.06 ? 13% perf-profile.children.cycles-pp.ttwu_do_wakeup
0.00 +0.1 0.06 ? 16% perf-profile.children.cycles-pp.__update_load_avg_se
0.00 +0.1 0.06 ? 16% perf-profile.children.cycles-pp.perf_trace_sched_wakeup_template
0.00 +0.1 0.06 ? 21% perf-profile.children.cycles-pp.nohz_run_idle_balance
0.00 +0.1 0.07 ? 15% perf-profile.children.cycles-pp.__switch_to_asm
0.00 +0.1 0.07 ? 21% perf-profile.children.cycles-pp.llist_add_batch
0.00 +0.1 0.07 ? 21% perf-profile.children.cycles-pp.__smp_call_single_queue
0.02 ?141% +0.1 0.10 ? 32% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
0.09 ? 20% +0.1 0.17 ? 7% perf-profile.children.cycles-pp.__list_add_valid
0.00 +0.1 0.08 ? 24% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
0.00 +0.1 0.08 ? 14% perf-profile.children.cycles-pp.prepare_to_wait_exclusive
0.20 ? 19% +0.1 0.29 ? 19% perf-profile.children.cycles-pp.skb_set_owner_w
0.00 +0.1 0.09 ? 27% perf-profile.children.cycles-pp.flush_smp_call_function_queue
0.08 ? 17% +0.1 0.17 ? 26% perf-profile.children.cycles-pp.__sk_mem_reduce_allocated
0.07 ? 80% +0.1 0.17 ? 27% perf-profile.children.cycles-pp._raw_spin_lock_bh
0.00 +0.1 0.14 ? 12% perf-profile.children.cycles-pp.set_next_entity
0.04 ? 72% +0.1 0.18 ? 15% perf-profile.children.cycles-pp.__zone_watermark_ok
0.00 +0.2 0.16 ? 18% perf-profile.children.cycles-pp.enqueue_entity
0.00 +0.2 0.17 ? 24% perf-profile.children.cycles-pp.sched_ttwu_pending
0.00 +0.2 0.18 ? 10% perf-profile.children.cycles-pp.__switch_to
0.00 +0.2 0.18 ? 14% perf-profile.children.cycles-pp.update_load_avg
0.00 +0.2 0.19 ? 16% perf-profile.children.cycles-pp.ttwu_queue_wakelist
0.00 +0.2 0.20 ? 12% perf-profile.children.cycles-pp.enqueue_task_fair
0.30 ? 7% +0.2 0.51 ? 7% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.00 +0.2 0.21 ? 12% perf-profile.children.cycles-pp.ttwu_do_activate
0.00 +0.2 0.22 ? 14% perf-profile.children.cycles-pp.update_curr
0.00 +0.2 0.23 ? 14% perf-profile.children.cycles-pp.pick_next_task_fair
0.00 +0.3 0.25 ? 26% perf-profile.children.cycles-pp.__sysvec_call_function_single
0.00 +0.3 0.31 ? 25% perf-profile.children.cycles-pp.sysvec_call_function_single
0.36 ? 20% +0.3 0.67 ? 9% perf-profile.children.cycles-pp.free_pcppages_bulk
0.00 +0.4 0.36 ? 27% perf-profile.children.cycles-pp.finish_task_switch
0.00 +0.4 0.38 ? 16% perf-profile.children.cycles-pp.dequeue_entity
0.00 +0.4 0.41 ? 16% perf-profile.children.cycles-pp.dequeue_task_fair
0.00 +0.5 0.47 ? 24% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
3.39 ? 4% +0.5 3.87 ? 6% perf-profile.children.cycles-pp.__softirqentry_text_start
1.63 ? 18% +0.5 2.16 ? 9% perf-profile.children.cycles-pp.ip_rcv
1.50 ? 18% +0.5 2.05 ? 9% perf-profile.children.cycles-pp.ip_local_deliver
1.48 ? 18% +0.6 2.04 ? 9% perf-profile.children.cycles-pp.ip_local_deliver_finish
1.47 ? 18% +0.6 2.02 ? 9% perf-profile.children.cycles-pp.ip_protocol_deliver_rcu
1.38 ? 18% +0.6 1.95 ? 9% perf-profile.children.cycles-pp.__udp4_lib_rcv
1.10 ? 19% +0.6 1.71 ? 10% perf-profile.children.cycles-pp.udp_unicast_rcv_skb
1.04 ? 18% +0.7 1.69 ? 10% perf-profile.children.cycles-pp.udp_queue_rcv_one_skb
0.00 +0.7 0.71 ? 12% perf-profile.children.cycles-pp.autoremove_wake_function
0.00 +0.7 0.71 ? 12% perf-profile.children.cycles-pp.try_to_wake_up
0.00 +0.8 0.78 ? 16% perf-profile.children.cycles-pp.schedule_idle
0.00 +0.8 0.82 ? 13% perf-profile.children.cycles-pp.__wake_up_common
0.46 ? 15% +0.8 1.29 ? 9% perf-profile.children.cycles-pp.__udp_enqueue_schedule_skb
0.00 +0.9 0.86 ? 56% perf-profile.children.cycles-pp.poll_idle
0.00 +0.9 0.86 ? 12% perf-profile.children.cycles-pp.__wake_up_common_lock
0.03 ?100% +0.9 0.95 ? 12% perf-profile.children.cycles-pp.sock_def_readable
0.02 ?142% +0.9 0.94 ? 14% perf-profile.children.cycles-pp.schedule
0.00 +1.0 0.98 ? 14% perf-profile.children.cycles-pp.schedule_timeout
0.00 +1.2 1.24 ? 15% perf-profile.children.cycles-pp.__skb_wait_for_more_packets
0.59 ? 13% +1.5 2.11 ? 16% perf-profile.children.cycles-pp.__skb_recv_udp
0.07 ? 16% +1.6 1.68 ? 14% perf-profile.children.cycles-pp.__schedule
0.23 ? 18% -0.2 0.03 ?103% perf-profile.self.cycles-pp.udp_rmem_release
0.52 ? 9% -0.1 0.38 ? 12% perf-profile.self.cycles-pp.free_pcp_prepare
0.15 ? 15% -0.1 0.04 ? 71% perf-profile.self.cycles-pp.free_unref_page_commit
0.29 ? 11% -0.1 0.20 ? 19% perf-profile.self.cycles-pp.__skb_datagram_iter
0.25 ? 13% -0.1 0.16 ? 17% perf-profile.self.cycles-pp.udp_recvmsg
0.14 ? 16% -0.0 0.10 ? 10% perf-profile.self.cycles-pp.__alloc_pages
0.08 ? 20% -0.0 0.03 ? 70% perf-profile.self.cycles-pp.kmem_cache_free
0.00 +0.1 0.06 ? 13% perf-profile.self.cycles-pp.__update_load_avg_se
0.00 +0.1 0.07 ? 15% perf-profile.self.cycles-pp.__switch_to_asm
0.00 +0.1 0.07 ? 23% perf-profile.self.cycles-pp.llist_add_batch
0.02 ?142% +0.1 0.09 ? 23% perf-profile.self.cycles-pp.sock_def_readable
0.00 +0.1 0.07 ? 11% perf-profile.self.cycles-pp.schedule_timeout
0.00 +0.1 0.08 ? 20% perf-profile.self.cycles-pp.enqueue_entity
0.00 +0.1 0.08 ? 26% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
0.08 ? 21% +0.1 0.16 ? 5% perf-profile.self.cycles-pp.__list_add_valid
0.02 ?141% +0.1 0.10 ? 32% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.20 ? 17% +0.1 0.29 ? 19% perf-profile.self.cycles-pp.skb_set_owner_w
0.00 +0.1 0.09 ? 17% perf-profile.self.cycles-pp.set_next_entity
0.00 +0.1 0.09 ? 16% perf-profile.self.cycles-pp.update_curr
0.17 ? 19% +0.1 0.27 ? 11% perf-profile.self.cycles-pp.skb_page_frag_refill
0.07 ? 12% +0.1 0.17 ? 25% perf-profile.self.cycles-pp.__sk_mem_reduce_allocated
0.06 ? 79% +0.1 0.16 ? 28% perf-profile.self.cycles-pp._raw_spin_lock_bh
0.00 +0.1 0.11 ? 19% perf-profile.self.cycles-pp.__wake_up_common
0.00 +0.1 0.13 ? 24% perf-profile.self.cycles-pp.try_to_wake_up
0.00 +0.1 0.13 ? 21% perf-profile.self.cycles-pp.__skb_wait_for_more_packets
0.05 ? 75% +0.1 0.18 ? 6% perf-profile.self.cycles-pp.update_rq_clock
0.00 +0.2 0.15 ? 27% perf-profile.self.cycles-pp.finish_task_switch
0.17 ? 18% +0.2 0.32 ? 12% perf-profile.self.cycles-pp.skb_release_data
0.02 ?141% +0.2 0.18 ? 18% perf-profile.self.cycles-pp.__zone_watermark_ok
0.00 +0.2 0.17 ? 12% perf-profile.self.cycles-pp.__switch_to
0.28 ? 6% +0.2 0.47 ? 6% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.00 +0.3 0.30 ? 18% perf-profile.self.cycles-pp.__schedule
0.04 ? 72% +0.4 0.40 ? 21% perf-profile.self.cycles-pp.__skb_recv_udp
0.00 +0.7 0.69 ? 70% perf-profile.self.cycles-pp.poll_idle




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (18.98 kB)
config-5.17.0-rc5-00006-g39907a939a34 (164.54 kB)
job-script (8.31 kB)
job.yaml (5.68 kB)
reproduce (341.00 B)
Download all attachments