2022-04-20 20:41:54

by kernel test robot

[permalink] [raw]
Subject: [memcg] a8c49af3be: hackbench.throughput -13.7% regression



Greeting,

FYI, we noticed a -13.7% regression of hackbench.throughput due to commit:


commit: a8c49af3be5f0b4e105ef678bcf14ef102c270be ("memcg: add per-memcg total kernel memory stat")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: hackbench
on test machine: 144 threads 4 sockets Intel(R) Xeon(R) Gold 5318H CPU @ 2.50GHz with 128G memory
with following parameters:

nr_threads: 100%
iterations: 4
mode: process
ipc: socket
cpufreq_governor: performance
ucode: 0x7002402

test-description: Hackbench is both a benchmark and a stress test for the Linux kernel scheduler.
test-url: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/sched/cfs-scheduler/hackbench.c



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase/ucode:
gcc-11/performance/socket/4/x86_64-rhel-8.3/process/100%/debian-10.4-x86_64-20200603.cgz/lkp-cpl-4sp1/hackbench/0x7002402

commit:
086f694a75 ("memcg: replace in_interrupt() with !in_task()")
a8c49af3be ("memcg: add per-memcg total kernel memory stat")

086f694a75e1a283 a8c49af3be5f0b4e105ef678bcf
---------------- ---------------------------
%stddev %change %stddev
\ | \
146519 -13.7% 126397 hackbench.throughput
465.89 +16.0% 540.43 hackbench.time.elapsed_time
465.89 +16.0% 540.43 hackbench.time.elapsed_time.max
1.365e+08 +134.1% 3.195e+08 ? 4% hackbench.time.involuntary_context_switches
1081515 -1.2% 1068489 hackbench.time.minor_page_faults
64911 +16.3% 75465 hackbench.time.system_time
881.14 +13.8% 1002 ? 2% hackbench.time.user_time
4.932e+08 +64.6% 8.118e+08 ? 2% hackbench.time.voluntary_context_switches
8890038 ? 2% -9.7% 8027307 ? 5% cpuidle..usage
0.92 ? 8% -0.2 0.75 ? 3% mpstat.cpu.all.idle%
520.66 +14.4% 595.64 uptime.boot
4116231 ? 24% -31.2% 2829999 vmstat.memory.cache
2419 +15.2% 2787 vmstat.procs.r
1365633 +53.8% 2100818 ? 3% vmstat.system.cs
474436 +7.1% 508066 vmstat.system.in
6918013 ? 2% -10.6% 6186096 ? 6% turbostat.C1
0.17 ? 2% -0.0 0.13 ? 8% turbostat.C1%
2.219e+08 +24.2% 2.756e+08 turbostat.IRQ
77.06 ? 41% +89.9 166.98 ? 18% turbostat.PKG_%
168785 ? 4% -13.4% 146121 ? 6% turbostat.POLL
12.90 -3.0% 12.51 turbostat.RAMWatt
22559 ? 31% -72.5% 6205 ? 17% numa-vmstat.node1.nr_mapped
44041 ? 72% -83.4% 7326 ? 29% numa-vmstat.node2.nr_mapped
356860 ? 83% -72.4% 98563 ? 35% numa-vmstat.node3.nr_file_pages
298316 ? 98% -81.4% 55524 ? 39% numa-vmstat.node3.nr_inactive_anon
173115 ? 67% -81.7% 31623 ? 31% numa-vmstat.node3.nr_mapped
331661 ? 84% -73.6% 87449 ? 20% numa-vmstat.node3.nr_shmem
298316 ? 98% -81.4% 55524 ? 39% numa-vmstat.node3.nr_zone_inactive_anon
89451 ? 30% -72.7% 24378 ? 16% numa-meminfo.node1.Mapped
646515 ? 44% -40.7% 383140 ? 10% numa-meminfo.node1.MemUsed
176662 ? 74% -83.8% 28692 ? 31% numa-meminfo.node2.Mapped
1412634 ? 82% -72.5% 388045 ? 37% numa-meminfo.node3.FilePages
1177774 ? 97% -81.5% 218044 ? 39% numa-meminfo.node3.Inactive
1177724 ? 97% -81.5% 218020 ? 39% numa-meminfo.node3.Inactive(anon)
683613 ? 66% -82.0% 122794 ? 30% numa-meminfo.node3.Mapped
1311837 ? 83% -73.8% 343589 ? 20% numa-meminfo.node3.Shmem
264986 ? 14% -25.0% 198630 meminfo.AnonHugePages
565103 ? 6% -15.3% 478657 meminfo.AnonPages
3964125 ? 25% -32.5% 2674706 meminfo.Cached
4438897 ? 23% -31.1% 3056960 meminfo.Committed_AS
2072000 ? 50% -67.2% 680434 ? 3% meminfo.Inactive
2071852 ? 50% -67.2% 680284 ? 3% meminfo.Inactive(anon)
1089862 ? 40% -76.3% 258095 ? 8% meminfo.Mapped
6895125 ? 16% -23.8% 5253586 meminfo.Memused
1702379 ? 58% -75.7% 413429 ? 9% meminfo.Shmem
8070861 ? 9% -15.9% 6789345 meminfo.max_used_kB
141199 ? 7% -15.3% 119614 proc-vmstat.nr_anon_pages
991360 ? 25% -32.5% 669416 proc-vmstat.nr_file_pages
518274 ? 50% -67.1% 170500 ? 3% proc-vmstat.nr_inactive_anon
273252 ? 40% -76.2% 65048 ? 7% proc-vmstat.nr_mapped
425922 ? 58% -75.6% 104096 ? 9% proc-vmstat.nr_shmem
171279 -2.1% 167755 proc-vmstat.nr_slab_unreclaimable
518274 ? 50% -67.1% 170500 ? 3% proc-vmstat.nr_zone_inactive_anon
469239 ? 6% -15.6% 396202 ? 6% proc-vmstat.numa_hint_faults
185166 ? 12% -28.6% 132127 ? 6% proc-vmstat.numa_hint_faults_local
3355174 ? 6% -9.3% 3042305 proc-vmstat.numa_hit
808.67 ? 49% -77.9% 178.83 ? 30% proc-vmstat.numa_huge_pte_updates
3159673 ? 7% -10.2% 2838054 proc-vmstat.numa_local
1090825 ? 20% -35.3% 706208 ? 8% proc-vmstat.numa_pte_updates
3354920 ? 6% -9.7% 3030203 proc-vmstat.pgalloc_normal
128080 +5.2% 134798 proc-vmstat.pgreuse
15.63 +17.7% 18.39 ? 2% sched_debug.cfs_rq:/.h_nr_running.avg
29887102 +17.5% 35111287 sched_debug.cfs_rq:/.min_vruntime.avg
31706400 +19.7% 37943439 sched_debug.cfs_rq:/.min_vruntime.max
28371071 +17.7% 33383999 sched_debug.cfs_rq:/.min_vruntime.min
16172 +15.1% 18619 sched_debug.cfs_rq:/.runnable_avg.avg
28777 ? 2% +12.6% 32399 ? 4% sched_debug.cfs_rq:/.runnable_avg.max
4370 ? 3% +21.2% 5295 sched_debug.cfs_rq:/.runnable_avg.stddev
-970533 +134.4% -2274896 sched_debug.cfs_rq:/.spread0.avg
-2495736 +59.8% -3988405 sched_debug.cfs_rq:/.spread0.min
652.14 +17.4% 765.56 ? 2% sched_debug.cfs_rq:/.util_est_enqueued.avg
268143 +12.2% 300732 sched_debug.cpu.clock.avg
268614 +12.2% 301319 sched_debug.cpu.clock.max
267598 +12.1% 300055 sched_debug.cpu.clock.min
265605 +12.2% 297967 sched_debug.cpu.clock_task.avg
266221 +12.2% 298692 sched_debug.cpu.clock_task.max
255412 +13.1% 288952 sched_debug.cpu.clock_task.min
614363 ? 12% -15.2% 521164 ? 3% sched_debug.cpu.max_idle_balance_cost.max
15786 ? 75% -83.1% 2670 ?128% sched_debug.cpu.max_idle_balance_cost.stddev
15.62 +17.6% 18.37 ? 2% sched_debug.cpu.nr_running.avg
2018911 +84.0% 3714095 ? 3% sched_debug.cpu.nr_switches.avg
2458462 +96.3% 4825151 ? 7% sched_debug.cpu.nr_switches.max
1666277 ? 2% +82.2% 3036247 ? 2% sched_debug.cpu.nr_switches.min
223152 ? 5% +55.8% 347737 ? 24% sched_debug.cpu.nr_switches.stddev
267560 +12.1% 300014 sched_debug.cpu_clk
266694 +12.2% 299146 sched_debug.ktime
267988 +12.5% 301408 sched_debug.sched_clk
9.18 +13.0% 10.38 perf-stat.i.MPKI
1.666e+10 -5.8% 1.569e+10 perf-stat.i.branch-instructions
0.66 +0.1 0.74 perf-stat.i.branch-miss-rate%
1.077e+08 +6.1% 1.143e+08 perf-stat.i.branch-misses
29.90 +2.1 31.97 perf-stat.i.cache-miss-rate%
2.342e+08 ? 2% +11.6% 2.612e+08 perf-stat.i.cache-misses
7.854e+08 +4.5% 8.208e+08 perf-stat.i.cache-references
1383443 +50.8% 2085643 ? 2% perf-stat.i.context-switches
5.45 +8.5% 5.92 perf-stat.i.cpi
2018 ? 2% -10.9% 1798 perf-stat.i.cycles-between-cache-misses
0.04 ? 8% +0.0 0.05 ? 5% perf-stat.i.dTLB-load-miss-rate%
10109576 ? 8% +23.4% 12476386 ? 4% perf-stat.i.dTLB-load-misses
2.532e+10 -6.9% 2.356e+10 perf-stat.i.dTLB-loads
1.519e+10 -6.8% 1.415e+10 perf-stat.i.dTLB-stores
78.43 -9.1 69.30 perf-stat.i.iTLB-load-miss-rate%
11711446 +62.4% 19016979 ? 2% perf-stat.i.iTLB-loads
8.626e+10 -6.3% 8.082e+10 perf-stat.i.instructions
2055 ? 3% -7.7% 1896 ? 2% perf-stat.i.instructions-per-iTLB-miss
0.19 -6.3% 0.18 perf-stat.i.ipc
401.73 -6.5% 375.76 perf-stat.i.metric.M/sec
8239 -8.9% 7507 perf-stat.i.minor-faults
54.62 +10.0 64.63 perf-stat.i.node-load-miss-rate%
33956668 +21.8% 41363894 ? 2% perf-stat.i.node-load-misses
28154709 ? 2% -20.2% 22471179 ? 2% perf-stat.i.node-loads
42.25 +6.7 48.93 ? 2% perf-stat.i.node-store-miss-rate%
30479438 -27.6% 22069554 ? 3% perf-stat.i.node-stores
8283 -8.8% 7551 perf-stat.i.page-faults
9.07 +11.1% 10.08 perf-stat.overall.MPKI
0.64 +0.1 0.73 perf-stat.overall.branch-miss-rate%
30.15 +2.0 32.12 perf-stat.overall.cache-miss-rate%
5.43 +6.3% 5.77 perf-stat.overall.cpi
1986 -10.3% 1782 perf-stat.overall.cycles-between-cache-misses
0.04 ? 8% +0.0 0.05 ? 4% perf-stat.overall.dTLB-load-miss-rate%
78.66 -9.2 69.43 perf-stat.overall.iTLB-load-miss-rate%
2032 ? 3% -8.2% 1866 ? 2% perf-stat.overall.instructions-per-iTLB-miss
0.18 -5.9% 0.17 perf-stat.overall.ipc
53.32 +10.4 63.69 perf-stat.overall.node-load-miss-rate%
40.73 +7.3 48.00 ? 2% perf-stat.overall.node-store-miss-rate%
1.66e+10 -5.3% 1.572e+10 perf-stat.ps.branch-instructions
1.065e+08 +7.2% 1.141e+08 perf-stat.ps.branch-misses
2.353e+08 +11.4% 2.62e+08 perf-stat.ps.cache-misses
7.803e+08 +4.6% 8.158e+08 perf-stat.ps.cache-references
1363684 +54.1% 2101320 ? 3% perf-stat.ps.context-switches
9967137 ? 8% +25.5% 12506012 ? 4% perf-stat.ps.dTLB-load-misses
2.525e+10 -6.7% 2.357e+10 perf-stat.ps.dTLB-loads
1.515e+10 -6.6% 1.415e+10 perf-stat.ps.dTLB-stores
11478062 +66.4% 19099333 ? 3% perf-stat.ps.iTLB-loads
8.601e+10 -5.9% 8.091e+10 perf-stat.ps.instructions
6594 -14.3% 5654 perf-stat.ps.minor-faults
32996552 +22.6% 40437559 perf-stat.ps.node-load-misses
28888598 ? 2% -20.2% 23051761 perf-stat.ps.node-loads
31218471 -27.8% 22546119 ? 2% perf-stat.ps.node-stores
6619 -14.2% 5676 perf-stat.ps.page-faults
4.015e+13 +9.1% 4.38e+13 perf-stat.total.instructions
13.60 -13.6 0.00 perf-profile.calltrace.cycles-pp.obj_cgroup_charge_pages.obj_cgroup_charge.slab_pre_alloc_hook.__kmalloc_node_track_caller.kmalloc_reserve
12.93 ? 2% -12.9 0.00 perf-profile.calltrace.cycles-pp.page_counter_charge.obj_cgroup_charge_pages.obj_cgroup_charge.slab_pre_alloc_hook.__kmalloc_node_track_caller
6.85 -6.9 0.00 perf-profile.calltrace.cycles-pp.obj_cgroup_charge_pages.obj_cgroup_charge.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags
6.51 -6.5 0.00 perf-profile.calltrace.cycles-pp.page_counter_charge.obj_cgroup_charge_pages.obj_cgroup_charge.kmem_cache_alloc_node.__alloc_skb
13.00 -0.9 12.06 ? 4% perf-profile.calltrace.cycles-pp.page_counter_uncharge.obj_cgroup_uncharge_pages.memcg_slab_free_hook.kfree.skb_release_data
2.33 ? 4% -0.9 1.40 ? 4% perf-profile.calltrace.cycles-pp.skb_release_head_state.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
2.31 ? 4% -0.9 1.38 ? 4% perf-profile.calltrace.cycles-pp.unix_destruct_scm.skb_release_head_state.consume_skb.unix_stream_read_generic.unix_stream_recvmsg
2.26 ? 4% -0.9 1.35 ? 4% perf-profile.calltrace.cycles-pp.sock_wfree.unix_destruct_scm.skb_release_head_state.consume_skb.unix_stream_read_generic
1.50 ? 12% -0.9 0.64 ? 3% perf-profile.calltrace.cycles-pp.skb_queue_tail.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.new_sync_write
1.47 ? 12% -0.9 0.62 ? 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.skb_queue_tail.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
49.06 -0.8 48.22 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
48.95 -0.8 48.11 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
1.99 ? 8% -0.8 1.17 ? 3% perf-profile.calltrace.cycles-pp._raw_spin_lock.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.new_sync_write
49.90 -0.8 49.10 perf-profile.calltrace.cycles-pp.read
48.36 -0.8 47.57 perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
48.16 -0.8 47.38 perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
47.58 -0.8 46.81 perf-profile.calltrace.cycles-pp.sock_read_iter.new_sync_read.vfs_read.ksys_read.do_syscall_64
47.73 -0.8 46.96 perf-profile.calltrace.cycles-pp.new_sync_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.46 ? 8% -0.8 0.71 ? 5% perf-profile.calltrace.cycles-pp.skb_unlink.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.new_sync_read
1.43 ? 9% -0.7 0.68 ? 5% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.skb_unlink.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
1.68 ? 7% -0.7 0.97 ? 4% perf-profile.calltrace.cycles-pp._raw_spin_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.new_sync_read
1.69 ? 10% -0.7 1.04 ? 5% perf-profile.calltrace.cycles-pp.skb_set_owner_w.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
6.50 -0.6 5.96 ? 4% perf-profile.calltrace.cycles-pp.page_counter_uncharge.obj_cgroup_uncharge_pages.memcg_slab_free_hook.kmem_cache_free.unix_stream_read_generic
1.59 ? 6% -0.5 1.12 ? 6% perf-profile.calltrace.cycles-pp.__slab_free.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
0.91 ? 8% -0.5 0.44 ? 45% perf-profile.calltrace.cycles-pp.mutex_unlock.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.new_sync_read
2.05 ? 2% -0.4 1.68 ? 2% perf-profile.calltrace.cycles-pp.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
2.06 ? 2% -0.4 1.70 ? 2% perf-profile.calltrace.cycles-pp.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.new_sync_read
2.03 ? 2% -0.4 1.67 ? 2% perf-profile.calltrace.cycles-pp.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg
1.20 ? 8% -0.3 0.88 ? 6% perf-profile.calltrace.cycles-pp.__slab_free.kfree.skb_release_data.consume_skb.unix_stream_read_generic
0.85 ? 6% -0.2 0.63 ? 2% perf-profile.calltrace.cycles-pp.__check_object_size.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor
0.89 ? 5% -0.2 0.68 ? 2% perf-profile.calltrace.cycles-pp.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic
1.09 -0.1 0.95 ? 2% perf-profile.calltrace.cycles-pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic
0.93 -0.1 0.80 ? 2% perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor
0.87 -0.1 0.75 ? 3% perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter
0.90 ? 2% -0.1 0.84 perf-profile.calltrace.cycles-pp.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.new_sync_write
0.98 ? 6% +0.2 1.18 ? 5% perf-profile.calltrace.cycles-pp.memcg_slab_post_alloc_hook.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
0.99 ? 6% +0.2 1.19 ? 5% perf-profile.calltrace.cycles-pp.memcg_slab_post_alloc_hook.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
0.70 ? 9% +0.2 0.91 ? 6% perf-profile.calltrace.cycles-pp.switch_mm_irqs_off.__schedule.schedule.schedule_timeout.unix_stream_data_wait
0.62 ? 9% +0.2 0.85 ? 7% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.mod_objcg_state.memcg_slab_post_alloc_hook.kmem_cache_alloc_node.__alloc_skb
0.63 ? 9% +0.2 0.86 ? 7% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.mod_objcg_state.memcg_slab_post_alloc_hook.__kmalloc_node_track_caller.kmalloc_reserve
0.71 ? 8% +0.2 0.95 ? 7% perf-profile.calltrace.cycles-pp.mod_objcg_state.memcg_slab_post_alloc_hook.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags
0.71 ? 8% +0.2 0.96 ? 7% perf-profile.calltrace.cycles-pp.mod_objcg_state.memcg_slab_post_alloc_hook.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb
0.60 ? 6% +0.3 0.92 ? 6% perf-profile.calltrace.cycles-pp.mod_objcg_state.memcg_slab_free_hook.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg
0.44 ? 45% +0.4 0.83 ? 7% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.mod_objcg_state.memcg_slab_free_hook.kmem_cache_free.unix_stream_read_generic
1.10 ? 6% +0.5 1.57 ? 6% perf-profile.calltrace.cycles-pp.propagate_protected_usage.page_counter_uncharge.obj_cgroup_uncharge_pages.memcg_slab_free_hook.kmem_cache_free
0.00 +0.5 0.52 ? 2% perf-profile.calltrace.cycles-pp.get_obj_cgroup_from_current.slab_pre_alloc_hook.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb
0.61 ? 46% +0.5 1.13 ? 17% perf-profile.calltrace.cycles-pp.perf_tp_event.perf_trace_sched_switch.__schedule.schedule.exit_to_user_mode_loop
0.63 ? 46% +0.5 1.17 ? 17% perf-profile.calltrace.cycles-pp.perf_trace_sched_switch.__schedule.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare
0.00 +0.6 0.60 ? 8% perf-profile.calltrace.cycles-pp.__mod_memcg_state.memcg_account_kmem.obj_cgroup_charge.slab_pre_alloc_hook.__kmalloc_node_track_caller
0.00 +0.6 0.62 ? 8% perf-profile.calltrace.cycles-pp.memcg_account_kmem.obj_cgroup_charge.slab_pre_alloc_hook.__kmalloc_node_track_caller.kmalloc_reserve
6.82 +0.6 7.46 ? 4% perf-profile.calltrace.cycles-pp.obj_cgroup_uncharge_pages.memcg_slab_free_hook.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg
6.90 +0.7 7.60 ? 4% perf-profile.calltrace.cycles-pp.obj_cgroup_charge.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
1.31 ? 14% +0.7 2.02 ? 15% perf-profile.calltrace.cycles-pp.__schedule.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
1.31 ? 14% +0.7 2.04 ? 15% perf-profile.calltrace.cycles-pp.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
1.33 ? 14% +0.7 2.06 ? 15% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.44 ? 14% +0.7 2.18 ? 15% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
1.48 ? 13% +0.7 2.23 ? 15% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
8.04 +0.8 8.82 ? 4% perf-profile.calltrace.cycles-pp.memcg_slab_free_hook.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
17.07 ? 2% +0.9 17.95 ? 4% perf-profile.calltrace.cycles-pp.kfree.skb_release_data.consume_skb.unix_stream_read_generic.unix_stream_recvmsg
2.22 ? 7% +1.0 3.20 ? 6% perf-profile.calltrace.cycles-pp.propagate_protected_usage.page_counter_uncharge.obj_cgroup_uncharge_pages.memcg_slab_free_hook.kfree
0.00 +1.0 0.98 ? 3% perf-profile.calltrace.cycles-pp.get_obj_cgroup_from_current.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
2.10 ? 18% +1.0 3.13 ? 12% perf-profile.calltrace.cycles-pp.perf_event_output_forward.__perf_event_overflow.perf_tp_event.perf_trace_sched_wakeup_template.try_to_wake_up
2.11 ? 18% +1.0 3.14 ? 12% perf-profile.calltrace.cycles-pp.__perf_event_overflow.perf_tp_event.perf_trace_sched_wakeup_template.try_to_wake_up.autoremove_wake_function
2.02 ? 18% +1.0 3.06 ? 12% perf-profile.calltrace.cycles-pp.perf_prepare_sample.perf_event_output_forward.__perf_event_overflow.perf_tp_event.perf_trace_sched_wakeup_template
2.00 ? 19% +1.1 3.08 ? 13% perf-profile.calltrace.cycles-pp.__perf_event_overflow.perf_tp_event.perf_trace_sched_stat_runtime.update_curr.enqueue_entity
2.06 ? 19% +1.1 3.15 ? 13% perf-profile.calltrace.cycles-pp.perf_tp_event.perf_trace_sched_stat_runtime.update_curr.enqueue_entity.enqueue_task_fair
2.21 ? 18% +1.1 3.31 ? 12% perf-profile.calltrace.cycles-pp.perf_tp_event.perf_trace_sched_wakeup_template.try_to_wake_up.autoremove_wake_function.__wake_up_common
2.29 ? 18% +1.1 3.39 ? 12% perf-profile.calltrace.cycles-pp.perf_trace_sched_wakeup_template.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
2.28 ? 18% +1.1 3.39 ? 13% perf-profile.calltrace.cycles-pp.update_curr.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up
2.13 ? 19% +1.1 3.24 ? 13% perf-profile.calltrace.cycles-pp.perf_trace_sched_stat_runtime.update_curr.enqueue_entity.enqueue_task_fair.ttwu_do_activate
2.24 ? 18% +1.2 3.43 ? 13% perf-profile.calltrace.cycles-pp.__perf_event_overflow.perf_tp_event.perf_trace_sched_stat_runtime.update_curr.dequeue_entity
2.66 ? 16% +1.2 3.86 ? 13% perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
2.30 ? 19% +1.2 3.52 ? 13% perf-profile.calltrace.cycles-pp.perf_tp_event.perf_trace_sched_stat_runtime.update_curr.dequeue_entity.dequeue_task_fair
2.36 ? 19% +1.2 3.59 ? 12% perf-profile.calltrace.cycles-pp.perf_trace_sched_stat_runtime.update_curr.dequeue_entity.dequeue_task_fair.__schedule
2.67 ? 17% +1.2 3.92 ? 12% perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_task_fair.__schedule.schedule.schedule_timeout
2.46 ? 25% +1.3 3.71 ? 14% perf-profile.calltrace.cycles-pp.perf_prepare_sample.perf_event_output_forward.__perf_event_overflow.perf_tp_event.perf_trace_sched_switch
2.47 ? 18% +1.3 3.74 ? 12% perf-profile.calltrace.cycles-pp.update_curr.dequeue_entity.dequeue_task_fair.__schedule.schedule
3.05 ? 16% +1.3 4.37 ? 12% perf-profile.calltrace.cycles-pp.dequeue_task_fair.__schedule.schedule.schedule_timeout.unix_stream_data_wait
45.75 +1.3 47.10 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
13.65 +1.4 15.01 ? 4% perf-profile.calltrace.cycles-pp.obj_cgroup_charge.slab_pre_alloc_hook.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb
15.23 +1.4 16.59 ? 4% perf-profile.calltrace.cycles-pp.memcg_slab_free_hook.kfree.skb_release_data.consume_skb.unix_stream_read_generic
3.14 ? 15% +1.4 4.52 ? 12% perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
3.10 ? 15% +1.4 4.49 ? 12% perf-profile.calltrace.cycles-pp.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
45.99 +1.4 47.38 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
45.30 +1.4 46.70 perf-profile.calltrace.cycles-pp.new_sync_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
9.04 +1.4 10.46 ? 4% perf-profile.calltrace.cycles-pp.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg
2.01 ? 45% +1.4 3.44 ? 13% perf-profile.calltrace.cycles-pp.perf_callchain_user.get_perf_callchain.perf_callchain.perf_prepare_sample.perf_event_output_forward
44.40 +1.4 45.85 perf-profile.calltrace.cycles-pp.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.new_sync_write.vfs_write
44.71 +1.5 46.17 perf-profile.calltrace.cycles-pp.sock_sendmsg.sock_write_iter.new_sync_write.vfs_write.ksys_write
13.63 +1.5 15.12 ? 4% perf-profile.calltrace.cycles-pp.obj_cgroup_uncharge_pages.memcg_slab_free_hook.kfree.skb_release_data.consume_skb
44.93 +1.5 46.43 perf-profile.calltrace.cycles-pp.sock_write_iter.new_sync_write.vfs_write.ksys_write.do_syscall_64
0.00 +1.6 1.59 ? 7% perf-profile.calltrace.cycles-pp.propagate_protected_usage.page_counter_charge.obj_cgroup_charge.kmem_cache_alloc_node.__alloc_skb
16.08 +1.7 17.75 ? 4% perf-profile.calltrace.cycles-pp.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
16.12 +1.7 17.80 ? 4% perf-profile.calltrace.cycles-pp.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg
1.50 ? 52% +1.7 3.19 ? 18% perf-profile.calltrace.cycles-pp.unwind_next_frame.perf_callchain_kernel.get_perf_callchain.perf_callchain.perf_prepare_sample
14.02 +1.7 15.74 ? 4% perf-profile.calltrace.cycles-pp.slab_pre_alloc_hook.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
1.02 ? 76% +1.8 2.82 ? 13% perf-profile.calltrace.cycles-pp.__unwind_start.perf_callchain_kernel.get_perf_callchain.perf_callchain.perf_prepare_sample
28.62 +1.9 30.50 ? 4% perf-profile.calltrace.cycles-pp.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.new_sync_write
47.60 +2.2 49.76 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
48.33 +2.2 50.51 perf-profile.calltrace.cycles-pp.write
47.73 +2.2 49.91 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
4.22 ? 19% +2.3 6.49 ? 13% perf-profile.calltrace.cycles-pp.perf_event_output_forward.__perf_event_overflow.perf_tp_event.perf_trace_sched_stat_runtime.update_curr
4.08 ? 19% +2.3 6.36 ? 13% perf-profile.calltrace.cycles-pp.perf_prepare_sample.perf_event_output_forward.__perf_event_overflow.perf_tp_event.perf_trace_sched_stat_runtime
7.93 ? 14% +2.9 10.81 ? 12% perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.sock_def_readable
26.00 +2.9 28.90 ? 4% perf-profile.calltrace.cycles-pp.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg
26.06 +2.9 28.96 ? 4% perf-profile.calltrace.cycles-pp.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
0.00 +3.1 3.14 ? 6% perf-profile.calltrace.cycles-pp.propagate_protected_usage.page_counter_charge.obj_cgroup_charge.slab_pre_alloc_hook.__kmalloc_node_track_caller
5.15 ? 19% +3.2 8.32 ? 15% perf-profile.calltrace.cycles-pp.perf_callchain_kernel.get_perf_callchain.perf_callchain.perf_prepare_sample.perf_event_output_forward
8.10 ? 20% +4.4 12.51 ? 13% perf-profile.calltrace.cycles-pp.get_perf_callchain.perf_callchain.perf_prepare_sample.perf_event_output_forward.__perf_event_overflow
8.14 ? 20% +4.4 12.56 ? 13% perf-profile.calltrace.cycles-pp.perf_callchain.perf_prepare_sample.perf_event_output_forward.__perf_event_overflow.perf_tp_event
0.00 +6.0 5.98 ? 4% perf-profile.calltrace.cycles-pp.page_counter_charge.obj_cgroup_charge.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags
0.00 +11.9 11.87 ? 5% perf-profile.calltrace.cycles-pp.page_counter_charge.obj_cgroup_charge.slab_pre_alloc_hook.__kmalloc_node_track_caller.kmalloc_reserve
20.47 -20.5 0.00 perf-profile.children.cycles-pp.obj_cgroup_charge_pages
5.80 ? 8% -2.8 3.03 ? 9% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
19.49 -1.6 17.88 ? 4% perf-profile.children.cycles-pp.page_counter_charge
20.00 -1.6 18.44 ? 4% perf-profile.children.cycles-pp.page_counter_uncharge
5.82 -1.3 4.56 ? 5% perf-profile.children.cycles-pp._raw_spin_lock
2.32 ? 4% -0.9 1.38 ? 4% perf-profile.children.cycles-pp.unix_destruct_scm
2.33 ? 4% -0.9 1.40 ? 4% perf-profile.children.cycles-pp.skb_release_head_state
2.27 ? 4% -0.9 1.35 ? 4% perf-profile.children.cycles-pp.sock_wfree
1.50 ? 12% -0.9 0.64 ? 3% perf-profile.children.cycles-pp.skb_queue_tail
50.05 -0.8 49.25 perf-profile.children.cycles-pp.read
48.37 -0.8 47.58 perf-profile.children.cycles-pp.ksys_read
2.80 ? 7% -0.8 2.02 ? 6% perf-profile.children.cycles-pp.__slab_free
48.18 -0.8 47.39 perf-profile.children.cycles-pp.vfs_read
47.58 -0.8 46.82 perf-profile.children.cycles-pp.sock_read_iter
47.74 -0.8 46.98 perf-profile.children.cycles-pp.new_sync_read
1.46 ? 8% -0.8 0.71 ? 5% perf-profile.children.cycles-pp.skb_unlink
0.74 ? 34% -0.7 0.02 ? 99% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
0.74 ? 34% -0.7 0.02 ? 99% perf-profile.children.cycles-pp.cpu_startup_entry
0.74 ? 34% -0.7 0.02 ? 99% perf-profile.children.cycles-pp.do_idle
3.69 ? 6% -0.7 3.01 ? 6% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
1.70 ? 10% -0.7 1.04 ? 5% perf-profile.children.cycles-pp.skb_set_owner_w
0.63 ? 21% -0.5 0.11 ? 7% perf-profile.children.cycles-pp.do_user_addr_fault
0.92 ? 7% -0.4 0.53 ? 5% perf-profile.children.cycles-pp.mutex_unlock
2.07 ? 2% -0.4 1.70 ? 2% perf-profile.children.cycles-pp.unix_stream_read_actor
2.06 ? 2% -0.4 1.69 ? 2% perf-profile.children.cycles-pp.skb_copy_datagram_iter
2.04 ? 2% -0.4 1.67 ? 2% perf-profile.children.cycles-pp.__skb_datagram_iter
1.13 ? 17% -0.4 0.77 ? 11% perf-profile.children.cycles-pp.exc_page_fault
1.31 ? 7% -0.3 0.96 ? 20% perf-profile.children.cycles-pp.___slab_alloc
1.06 ? 10% -0.3 0.74 ? 28% perf-profile.children.cycles-pp.__unfreeze_partials
0.76 ? 6% -0.3 0.48 ? 3% perf-profile.children.cycles-pp.unix_write_space
1.26 ? 5% -0.3 1.00 perf-profile.children.cycles-pp.__check_object_size
0.26 ? 12% -0.2 0.03 ?103% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.84 ? 8% -0.2 0.62 ? 25% perf-profile.children.cycles-pp.get_partial_node
0.89 ? 5% -0.2 0.68 ? 2% perf-profile.children.cycles-pp.simple_copy_to_iter
0.63 ? 5% -0.2 0.42 ? 6% perf-profile.children.cycles-pp.prepare_to_wait
0.24 ? 14% -0.2 0.03 ?101% perf-profile.children.cycles-pp.sysvec_call_function_single
0.23 ? 13% -0.2 0.03 ?101% perf-profile.children.cycles-pp.__sysvec_call_function_single
1.20 ? 2% -0.2 1.02 ? 2% perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
0.65 ? 5% -0.2 0.48 perf-profile.children.cycles-pp.mutex_lock
1.10 -0.1 0.96 ? 2% perf-profile.children.cycles-pp._copy_to_iter
0.93 -0.1 0.81 ? 3% perf-profile.children.cycles-pp.copyout
0.35 ? 13% -0.1 0.23 ? 9% perf-profile.children.cycles-pp.set_task_cpu
0.16 ? 30% -0.1 0.08 ? 17% perf-profile.children.cycles-pp.__generic_file_write_iter
0.16 ? 30% -0.1 0.08 ? 17% perf-profile.children.cycles-pp.generic_perform_write
0.16 ? 30% -0.1 0.08 ? 19% perf-profile.children.cycles-pp.generic_file_write_iter
0.91 ? 2% -0.1 0.84 perf-profile.children.cycles-pp.skb_copy_datagram_from_iter
0.22 ? 8% -0.1 0.16 ? 11% perf-profile.children.cycles-pp.finish_task_switch
0.33 ? 5% -0.1 0.27 perf-profile.children.cycles-pp.copyin
0.42 ? 3% -0.1 0.36 ? 2% perf-profile.children.cycles-pp.aa_sk_perm
0.51 -0.1 0.46 ? 6% perf-profile.children.cycles-pp.page_counter_try_charge
0.54 ? 2% -0.1 0.49 ? 5% perf-profile.children.cycles-pp.try_charge_memcg
0.48 ? 2% -0.1 0.43 perf-profile.children.cycles-pp._copy_from_iter
0.45 ? 2% -0.1 0.40 ? 2% perf-profile.children.cycles-pp.syscall_return_via_sysret
0.35 ? 2% -0.0 0.30 ? 5% perf-profile.children.cycles-pp.apparmor_file_permission
0.48 ? 3% -0.0 0.44 perf-profile.children.cycles-pp.__entry_text_start
0.45 ? 2% -0.0 0.41 ? 5% perf-profile.children.cycles-pp.update_cfs_group
0.44 ? 2% -0.0 0.41 perf-profile.children.cycles-pp.__might_resched
0.30 ? 3% -0.0 0.26 perf-profile.children.cycles-pp.__check_heap_object
0.40 ? 2% -0.0 0.36 ? 4% perf-profile.children.cycles-pp.security_file_permission
0.14 ? 9% -0.0 0.10 ? 6% perf-profile.children.cycles-pp.__list_del_entry_valid
0.28 ? 5% -0.0 0.24 ? 2% perf-profile.children.cycles-pp.__cond_resched
0.25 ? 3% -0.0 0.22 ? 2% perf-profile.children.cycles-pp.security_socket_recvmsg
0.20 ? 2% -0.0 0.17 ? 4% perf-profile.children.cycles-pp.__might_sleep
0.14 ? 3% -0.0 0.12 perf-profile.children.cycles-pp.wait_for_unix_gc
0.16 -0.0 0.14 ? 4% perf-profile.children.cycles-pp.__might_fault
0.09 ? 4% -0.0 0.07 ? 6% perf-profile.children.cycles-pp.check_stack_object
0.10 ? 5% -0.0 0.08 ? 12% perf-profile.children.cycles-pp.finish_wait
0.05 +0.0 0.08 perf-profile.children.cycles-pp.__build_skb_around
0.04 ? 45% +0.0 0.08 ? 14% perf-profile.children.cycles-pp.get_callchain_entry
0.03 ? 70% +0.0 0.07 ? 13% perf-profile.children.cycles-pp.___perf_sw_event
0.10 ? 12% +0.0 0.14 ? 11% perf-profile.children.cycles-pp.error_entry
0.06 ? 46% +0.1 0.11 ? 14% perf-profile.children.cycles-pp.kvm_guest_state
0.10 ? 18% +0.1 0.15 ? 14% perf-profile.children.cycles-pp.bad_get_user
0.00 +0.1 0.06 ? 9% perf-profile.children.cycles-pp.is_ftrace_trampoline
0.01 ?223% +0.1 0.06 ? 11% perf-profile.children.cycles-pp.rb_insert_color
0.00 +0.1 0.06 ? 13% perf-profile.children.cycles-pp.__wrgsbase_inactive
0.26 ? 4% +0.1 0.32 ? 7% perf-profile.children.cycles-pp.__switch_to
0.14 ? 6% +0.1 0.21 ? 6% perf-profile.children.cycles-pp.__switch_to_asm
0.14 ? 20% +0.1 0.22 ? 10% perf-profile.children.cycles-pp.perf_output_begin_forward
0.22 ? 8% +0.1 0.31 ? 9% perf-profile.children.cycles-pp.check_preempt_curr
0.19 ? 10% +0.1 0.27 ? 9% perf-profile.children.cycles-pp.check_preempt_wakeup
0.19 ? 20% +0.1 0.28 ? 12% perf-profile.children.cycles-pp.core_kernel_text
0.24 ? 7% +0.1 0.34 ? 8% perf-profile.children.cycles-pp.ttwu_do_wakeup
0.26 ? 19% +0.1 0.36 ? 10% perf-profile.children.cycles-pp.__perf_event_header__init_id
0.28 ? 11% +0.2 0.43 ? 8% perf-profile.children.cycles-pp.reweight_entity
0.51 ? 17% +0.2 0.73 ? 11% perf-profile.children.cycles-pp.native_irq_return_iret
1.05 ? 7% +0.3 1.32 ? 7% perf-profile.children.cycles-pp.switch_mm_irqs_off
0.72 ? 19% +0.3 0.99 ? 12% perf-profile.children.cycles-pp.__kernel_text_address
0.63 ? 19% +0.3 0.92 ? 12% perf-profile.children.cycles-pp.deref_stack_reg
0.85 ? 19% +0.3 1.18 ? 12% perf-profile.children.cycles-pp.unwind_get_return_address
1.98 ? 6% +0.4 2.37 ? 5% perf-profile.children.cycles-pp.memcg_slab_post_alloc_hook
1.25 ? 18% +0.6 1.83 ? 12% perf-profile.children.cycles-pp.__orc_find
1.60 ? 12% +0.7 2.25 ? 14% perf-profile.children.cycles-pp.exit_to_user_mode_loop
1.92 ? 11% +0.7 2.61 ? 14% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
2.00 ? 11% +0.7 2.70 ? 13% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
2.57 ? 9% +0.8 3.38 ? 7% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
17.08 ? 2% +0.9 17.96 ? 4% perf-profile.children.cycles-pp.kfree
2.90 ? 8% +0.9 3.78 ? 7% perf-profile.children.cycles-pp.mod_objcg_state
2.36 ? 18% +1.0 3.36 ? 13% perf-profile.children.cycles-pp.__unwind_start
2.41 ? 18% +1.1 3.49 ? 12% perf-profile.children.cycles-pp.perf_trace_sched_wakeup_template
0.40 ? 3% +1.1 1.51 ? 3% perf-profile.children.cycles-pp.get_obj_cgroup_from_current
96.82 +1.1 97.94 perf-profile.children.cycles-pp.do_syscall_64
97.04 +1.1 98.20 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
3.45 ? 14% +1.2 4.64 ? 11% perf-profile.children.cycles-pp.enqueue_task_fair
3.47 ? 14% +1.2 4.68 ? 11% perf-profile.children.cycles-pp.ttwu_do_activate
2.60 ? 18% +1.2 3.82 ? 12% perf-profile.children.cycles-pp.__get_user_nocheck_8
2.85 ? 17% +1.2 4.08 ? 12% perf-profile.children.cycles-pp.dequeue_entity
3.27 ? 15% +1.3 4.54 ? 12% perf-profile.children.cycles-pp.dequeue_task_fair
2.78 ? 18% +1.3 4.11 ? 12% perf-profile.children.cycles-pp.perf_callchain_user
45.75 +1.4 47.10 perf-profile.children.cycles-pp.vfs_write
45.99 +1.4 47.39 perf-profile.children.cycles-pp.ksys_write
45.31 +1.4 46.71 perf-profile.children.cycles-pp.new_sync_write
9.07 +1.4 10.48 ? 4% perf-profile.children.cycles-pp.kmem_cache_alloc_node
44.42 +1.4 45.86 perf-profile.children.cycles-pp.unix_stream_sendmsg
44.72 +1.5 46.18 perf-profile.children.cycles-pp.sock_sendmsg
44.93 +1.5 46.43 perf-profile.children.cycles-pp.sock_write_iter
16.09 +1.7 17.76 ? 4% perf-profile.children.cycles-pp.__kmalloc_node_track_caller
16.13 +1.7 17.80 ? 4% perf-profile.children.cycles-pp.kmalloc_reserve
0.00 +1.7 1.70 ? 8% perf-profile.children.cycles-pp.__mod_memcg_state
14.03 +1.7 15.76 ? 4% perf-profile.children.cycles-pp.slab_pre_alloc_hook
0.00 +1.7 1.75 ? 8% perf-profile.children.cycles-pp.memcg_account_kmem
28.63 +1.9 30.50 ? 4% perf-profile.children.cycles-pp.sock_alloc_send_pskb
20.56 +2.1 22.62 ? 4% perf-profile.children.cycles-pp.obj_cgroup_charge
20.48 +2.1 22.59 ? 4% perf-profile.children.cycles-pp.obj_cgroup_uncharge_pages
23.29 +2.1 25.42 ? 4% perf-profile.children.cycles-pp.memcg_slab_free_hook
4.74 ? 18% +2.1 6.88 ? 13% perf-profile.children.cycles-pp.unwind_next_frame
48.48 +2.2 50.64 perf-profile.children.cycles-pp.write
5.10 ? 18% +2.3 7.40 ? 12% perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime
5.57 ? 18% +2.3 7.91 ? 11% perf-profile.children.cycles-pp.update_curr
7.46 ? 6% +2.7 10.18 ? 6% perf-profile.children.cycles-pp.propagate_protected_usage
6.32 ? 18% +2.8 9.11 ? 12% perf-profile.children.cycles-pp.perf_callchain_kernel
26.06 +2.9 28.96 ? 4% perf-profile.children.cycles-pp.alloc_skb_with_frags
26.01 +2.9 28.91 ? 4% perf-profile.children.cycles-pp.__alloc_skb
9.25 ? 18% +4.2 13.46 ? 12% perf-profile.children.cycles-pp.get_perf_callchain
9.29 ? 18% +4.2 13.51 ? 12% perf-profile.children.cycles-pp.perf_callchain
10.18 ? 18% +4.3 14.47 ? 12% perf-profile.children.cycles-pp.__perf_event_overflow
10.14 ? 18% +4.3 14.42 ? 12% perf-profile.children.cycles-pp.perf_event_output_forward
9.77 ? 18% +4.4 14.14 ? 12% perf-profile.children.cycles-pp.perf_prepare_sample
10.54 ? 18% +4.4 14.96 ? 12% perf-profile.children.cycles-pp.perf_tp_event
16.13 ? 2% -2.9 13.22 ? 3% perf-profile.self.cycles-pp.page_counter_uncharge
15.90 ? 2% -2.9 13.00 ? 4% perf-profile.self.cycles-pp.page_counter_charge
3.68 ? 8% -1.8 1.92 perf-profile.self.cycles-pp._raw_spin_lock_irqsave
4.22 ? 5% -1.6 2.62 perf-profile.self.cycles-pp._raw_spin_lock
2.77 ? 7% -0.8 1.99 ? 6% perf-profile.self.cycles-pp.__slab_free
1.55 ? 8% -0.8 0.79 ? 6% perf-profile.self.cycles-pp.sock_def_readable
3.68 ? 6% -0.7 3.00 ? 6% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
1.68 ? 10% -0.7 1.03 ? 5% perf-profile.self.cycles-pp.skb_set_owner_w
1.48 ? 6% -0.6 0.86 ? 5% perf-profile.self.cycles-pp.sock_wfree
1.21 ? 5% -0.4 0.76 ? 4% perf-profile.self.cycles-pp.unix_stream_read_generic
0.90 ? 8% -0.4 0.52 ? 5% perf-profile.self.cycles-pp.mutex_unlock
1.25 ? 7% -0.4 0.88 ? 3% perf-profile.self.cycles-pp.memcg_slab_free_hook
0.64 ? 10% -0.3 0.37 ? 3% perf-profile.self.cycles-pp.unix_stream_sendmsg
0.68 ? 8% -0.2 0.47 perf-profile.self.cycles-pp.__alloc_skb
0.40 ? 6% -0.2 0.22 ? 4% perf-profile.self.cycles-pp.consume_skb
1.19 ? 2% -0.2 1.01 ? 2% perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
0.66 ? 7% -0.2 0.48 ? 3% perf-profile.self.cycles-pp.__check_object_size
0.51 ? 7% -0.2 0.35 perf-profile.self.cycles-pp.mutex_lock
0.42 ? 8% -0.1 0.28 ? 4% perf-profile.self.cycles-pp.kmem_cache_free
0.32 ? 7% -0.1 0.18 ? 2% perf-profile.self.cycles-pp.unix_write_space
0.46 ? 7% -0.1 0.34 ? 11% perf-profile.self.cycles-pp.___slab_alloc
0.54 ? 2% -0.1 0.45 perf-profile.self.cycles-pp.memcg_slab_post_alloc_hook
0.36 ? 6% -0.1 0.27 ? 3% perf-profile.self.cycles-pp.__kmalloc_node_track_caller
0.37 ? 4% -0.1 0.30 ? 8% perf-profile.self.cycles-pp.update_load_avg
0.24 ? 2% -0.1 0.18 ? 2% perf-profile.self.cycles-pp.sock_read_iter
0.44 ? 2% -0.0 0.39 ? 2% perf-profile.self.cycles-pp.syscall_return_via_sysret
0.45 ? 2% -0.0 0.40 ? 5% perf-profile.self.cycles-pp.update_cfs_group
0.13 ? 7% -0.0 0.08 ? 5% perf-profile.self.cycles-pp.sock_alloc_send_pskb
0.29 ? 3% -0.0 0.25 perf-profile.self.cycles-pp.__check_heap_object
0.32 ? 3% -0.0 0.28 ? 7% perf-profile.self.cycles-pp.skb_release_data
0.22 ? 3% -0.0 0.18 ? 4% perf-profile.self.cycles-pp.aa_sk_perm
0.43 ? 2% -0.0 0.40 perf-profile.self.cycles-pp.__might_resched
0.13 ? 7% -0.0 0.10 ? 4% perf-profile.self.cycles-pp.__list_del_entry_valid
0.23 ? 4% -0.0 0.20 ? 7% perf-profile.self.cycles-pp.apparmor_file_permission
0.20 ? 3% -0.0 0.17 ? 4% perf-profile.self.cycles-pp.kmem_cache_alloc_node
0.16 ? 3% -0.0 0.12 ? 6% perf-profile.self.cycles-pp.page_counter_try_charge
0.22 ? 2% -0.0 0.20 ? 4% perf-profile.self.cycles-pp.vfs_write
0.18 ? 2% -0.0 0.15 ? 4% perf-profile.self.cycles-pp.__might_sleep
0.20 -0.0 0.18 ? 3% perf-profile.self.cycles-pp.new_sync_write
0.10 ? 5% -0.0 0.08 perf-profile.self.cycles-pp.kfree
0.08 ? 5% -0.0 0.07 ? 7% perf-profile.self.cycles-pp.check_stack_object
0.10 ? 4% -0.0 0.09 perf-profile.self.cycles-pp.__cond_resched
0.06 ? 6% +0.0 0.08 ? 6% perf-profile.self.cycles-pp.skb_copy_datagram_from_iter
0.23 ? 3% +0.0 0.25 ? 3% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.10 ? 3% +0.0 0.12 ? 6% perf-profile.self.cycles-pp.do_syscall_64
0.11 ? 6% +0.0 0.14 ? 9% perf-profile.self.cycles-pp.enqueue_task_fair
0.11 ? 10% +0.0 0.14 ? 8% perf-profile.self.cycles-pp.native_sched_clock
0.04 ? 45% +0.0 0.08 ? 14% perf-profile.self.cycles-pp.get_callchain_entry
0.04 ? 44% +0.0 0.08 ? 7% perf-profile.self.cycles-pp.ksys_write
0.04 ? 44% +0.0 0.08 perf-profile.self.cycles-pp.__build_skb_around
0.08 ? 13% +0.0 0.12 ? 12% perf-profile.self.cycles-pp.error_entry
0.05 ? 45% +0.0 0.09 ? 10% perf-profile.self.cycles-pp.__perf_event_header__init_id
0.11 ? 16% +0.0 0.15 ? 11% perf-profile.self.cycles-pp.__kernel_text_address
0.01 ?223% +0.0 0.06 ? 9% perf-profile.self.cycles-pp.select_task_rq_fair
0.10 ? 22% +0.0 0.14 ? 11% perf-profile.self.cycles-pp.__task_pid_nr_ns
0.08 ? 18% +0.0 0.13 ? 13% perf-profile.self.cycles-pp.asm_exc_page_fault
0.21 ? 3% +0.0 0.26 ? 6% perf-profile.self.cycles-pp.sock_write_iter
0.06 ? 46% +0.1 0.11 ? 12% perf-profile.self.cycles-pp.kvm_guest_state
0.01 ?223% +0.1 0.06 ? 13% perf-profile.self.cycles-pp.___perf_sw_event
0.00 +0.1 0.06 ? 9% perf-profile.self.cycles-pp.is_ftrace_trampoline
0.12 ? 17% +0.1 0.18 ? 14% perf-profile.self.cycles-pp.ftrace_graph_ret_addr
0.00 +0.1 0.06 ? 11% perf-profile.self.cycles-pp.rb_insert_color
0.16 ? 9% +0.1 0.22 ? 12% perf-profile.self.cycles-pp.enqueue_entity
0.00 +0.1 0.06 ? 9% perf-profile.self.cycles-pp.schedule
0.11 ? 17% +0.1 0.17 ? 9% perf-profile.self.cycles-pp.check_preempt_wakeup
0.12 ? 19% +0.1 0.18 ? 11% perf-profile.self.cycles-pp.core_kernel_text
0.31 +0.1 0.38 ? 7% perf-profile.self.cycles-pp.mod_objcg_state
0.14 ? 6% +0.1 0.21 ? 7% perf-profile.self.cycles-pp.__switch_to_asm
0.13 ? 22% +0.1 0.20 ? 11% perf-profile.self.cycles-pp.perf_output_begin_forward
0.00 +0.1 0.07 ? 11% perf-profile.self.cycles-pp.refill_stock
0.24 ? 16% +0.1 0.33 ? 8% perf-profile.self.cycles-pp.perf_tp_event
0.20 ? 19% +0.1 0.30 ? 13% perf-profile.self.cycles-pp.__unwind_start
0.24 ? 19% +0.1 0.34 ? 14% perf-profile.self.cycles-pp.kernel_text_address
0.14 ? 19% +0.1 0.24 ? 8% perf-profile.self.cycles-pp.perf_callchain_user
0.13 ? 9% +0.2 0.28 ? 6% perf-profile.self.cycles-pp.reweight_entity
0.51 ? 17% +0.2 0.73 ? 11% perf-profile.self.cycles-pp.native_irq_return_iret
0.53 ? 19% +0.2 0.78 ? 12% perf-profile.self.cycles-pp.perf_callchain_kernel
0.58 ? 18% +0.3 0.84 ? 12% perf-profile.self.cycles-pp.deref_stack_reg
1.04 ? 7% +0.3 1.30 ? 7% perf-profile.self.cycles-pp.switch_mm_irqs_off
1.25 ? 18% +0.6 1.83 ? 12% perf-profile.self.cycles-pp.__orc_find
1.42 ? 18% +0.7 2.12 ? 13% perf-profile.self.cycles-pp.__get_user_nocheck_8
2.54 ? 9% +0.8 3.36 ? 7% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
2.06 ? 18% +0.9 2.99 ? 12% perf-profile.self.cycles-pp.unwind_next_frame
0.39 ? 4% +1.1 1.49 ? 3% perf-profile.self.cycles-pp.get_obj_cgroup_from_current
0.00 +1.7 1.68 ? 8% perf-profile.self.cycles-pp.__mod_memcg_state
7.37 ? 6% +2.7 10.08 ? 6% perf-profile.self.cycles-pp.propagate_protected_usage
0.44 ? 21% +2.8 3.20 ? 8% perf-profile.self.cycles-pp.obj_cgroup_uncharge_pages
0.09 ? 8% +3.2 3.27 ? 8% perf-profile.self.cycles-pp.obj_cgroup_charge




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://01.org/lkp



Attachments:
(No filename) (52.52 kB)
config-5.17.0-00033-ga8c49af3be5f (163.79 kB)
job-script (8.08 kB)
job.yaml (5.52 kB)
reproduce (543.00 B)
Download all attachments

2022-04-27 10:22:47

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [memcg] a8c49af3be: hackbench.throughput -13.7% regression

Hi Yosry,

On 4/20/2022 1:58 PM, kernel test robot wrote:
>
>
> Greeting,
>
> FYI, we noticed a -13.7% regression of hackbench.throughput due to commit:
>
>
> commit: a8c49af3be5f0b4e105ef678bcf14ef102c270be ("memcg: add per-memcg total kernel memory stat")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: hackbench
> on test machine: 144 threads 4 sockets Intel(R) Xeon(R) Gold 5318H CPU @ 2.50GHz with 128G memory
> with following parameters:
>
> nr_threads: 100%
> iterations: 4
> mode: process
> ipc: socket
> cpufreq_governor: performance
> ucode: 0x7002402
>
> test-description: Hackbench is both a benchmark and a stress test for the Linux kernel scheduler.
> test-url: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/sched/cfs-scheduler/hackbench.c
>
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> sudo bin/lkp install job.yaml # job file is attached in this email
> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> sudo bin/lkp run generated-yaml-file
>
> # if come across any failure that blocks the test,
> # please remove ~/.lkp and /lkp dir to run from a clean state.
>
> =========================================================================================
> compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase/ucode:
> gcc-11/performance/socket/4/x86_64-rhel-8.3/process/100%/debian-10.4-x86_64-20200603.cgz/lkp-cpl-4sp1/hackbench/0x7002402
>
> commit:
> 086f694a75 ("memcg: replace in_interrupt() with !in_task()")
> a8c49af3be ("memcg: add per-memcg total kernel memory stat")
>
> 086f694a75e1a283 a8c49af3be5f0b4e105ef678bcf
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 146519 -13.7% 126397 hackbench.throughput
> 465.89 +16.0% 540.43 hackbench.time.elapsed_time
> 465.89 +16.0% 540.43 hackbench.time.elapsed_time.max
> 1.365e+08 +134.1% 3.195e+08 ± 4% hackbench.time.involuntary_context_switches
> 1081515 -1.2% 1068489 hackbench.time.minor_page_faults
> 64911 +16.3% 75465 hackbench.time.system_time

Just FYI.

If I comment out one line added by the commit <a8c49af3be> :
static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages)
{
/* mod_memcg_state(memcg, MEMCG_KMEM, nr_pages); */ <--- comment out this line.
if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
if (nr_pages > 0)
page_counter_charge(&memcg->kmem, nr_pages);
else
page_counter_uncharge(&memcg->kmem, -nr_pages);
}
}

The regression is almost gone:

086f694a75e1a283 9ff9ec89a6dcf39f901ff0a84fd
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
7632:13 -44932% 1791:3 dmesg.timestamp:last
1:13 -8% :3 kmsg.common_interrupt:#No_irq_handler_for_vector
2:13 -20% :3 kmsg.timestamp:common_interrupt:#No_irq_handler_for_vector
4072:13 -24827% 844:3 kmsg.timestamp:last
%stddev %change %stddev
\ | \
144327 ± 3% -1.9% 141594 ± 5% hackbench.throughput
regression dropped to -1.9% from -13.7%

473.44 ± 3% +1.9% 482.23 ± 5% hackbench.time.elapsed_time


Regards
Yin, Fengwei

2022-04-27 10:41:23

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [memcg] a8c49af3be: hackbench.throughput -13.7% regression

Hi Yin,

On Tue, Apr 26, 2022 at 7:53 PM Yin Fengwei <[email protected]> wrote:
>
> Hi Yosry,
>
> On 4/20/2022 1:58 PM, kernel test robot wrote:
> >
> >
> > Greeting,
> >
> > FYI, we noticed a -13.7% regression of hackbench.throughput due to commit:
> >
> >
> > commit: a8c49af3be5f0b4e105ef678bcf14ef102c270be ("memcg: add per-memcg total kernel memory stat")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > in testcase: hackbench
> > on test machine: 144 threads 4 sockets Intel(R) Xeon(R) Gold 5318H CPU @ 2.50GHz with 128G memory
> > with following parameters:
> >
> > nr_threads: 100%
> > iterations: 4
> > mode: process
> > ipc: socket
> > cpufreq_governor: performance
> > ucode: 0x7002402
> >
> > test-description: Hackbench is both a benchmark and a stress test for the Linux kernel scheduler.
> > test-url: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/sched/cfs-scheduler/hackbench.c
> >
> >
> >
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot <[email protected]>
> >
> >
> > Details are as below:
> > -------------------------------------------------------------------------------------------------->
> >
> >
> > To reproduce:
> >
> > git clone https://github.com/intel/lkp-tests.git
> > cd lkp-tests
> > sudo bin/lkp install job.yaml # job file is attached in this email
> > bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> > sudo bin/lkp run generated-yaml-file
> >
> > # if come across any failure that blocks the test,
> > # please remove ~/.lkp and /lkp dir to run from a clean state.
> >
> > =========================================================================================
> > compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase/ucode:
> > gcc-11/performance/socket/4/x86_64-rhel-8.3/process/100%/debian-10.4-x86_64-20200603.cgz/lkp-cpl-4sp1/hackbench/0x7002402
> >
> > commit:
> > 086f694a75 ("memcg: replace in_interrupt() with !in_task()")
> > a8c49af3be ("memcg: add per-memcg total kernel memory stat")
> >
> > 086f694a75e1a283 a8c49af3be5f0b4e105ef678bcf
> > ---------------- ---------------------------
> > %stddev %change %stddev
> > \ | \
> > 146519 -13.7% 126397 hackbench.throughput
> > 465.89 +16.0% 540.43 hackbench.time.elapsed_time
> > 465.89 +16.0% 540.43 hackbench.time.elapsed_time.max
> > 1.365e+08 +134.1% 3.195e+08 ± 4% hackbench.time.involuntary_context_switches
> > 1081515 -1.2% 1068489 hackbench.time.minor_page_faults
> > 64911 +16.3% 75465 hackbench.time.system_time
>
> Just FYI.
>
> If I comment out one line added by the commit <a8c49af3be> :
> static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages)
> {
> /* mod_memcg_state(memcg, MEMCG_KMEM, nr_pages); */ <--- comment out this line.

Thanks so much for looking into this. Unfortunately this line is
essentially the commit core, all the other changes are more or less
refactoring or adding interfaces to read this stat through.

I am not sure why this specific callsite causes regression, there are
many callsites that modify stats similarly (whether through
mod_memcg_state() directly or through other variants like
mod_lruvec_state()). Maybe the kmem call path is exercised more
heavily in this benchmark than other call paths that update stats.

The only seemingly expensive operation in the mod_memcg_state() path
is the call to cgroup_rstat_updated() (through memcg_rstat_updated()).
One idea off the top of my head is to batch calls to
cgroup_rstat_updated(), similar to what 11192d9c124d ("memcg: flush
stats only if updated") did on the flush side. I am interested to see
what memcg maintainers think about this problem (and the proposed
solution).

> if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> if (nr_pages > 0)
> page_counter_charge(&memcg->kmem, nr_pages);
> else
> page_counter_uncharge(&memcg->kmem, -nr_pages);
> }
> }
>
> The regression is almost gone:
>
> 086f694a75e1a283 9ff9ec89a6dcf39f901ff0a84fd
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> 7632:13 -44932% 1791:3 dmesg.timestamp:last
> 1:13 -8% :3 kmsg.common_interrupt:#No_irq_handler_for_vector
> 2:13 -20% :3 kmsg.timestamp:common_interrupt:#No_irq_handler_for_vector
> 4072:13 -24827% 844:3 kmsg.timestamp:last
> %stddev %change %stddev
> \ | \
> 144327 ± 3% -1.9% 141594 ± 5% hackbench.throughput
> regression dropped to -1.9% from -13.7%
>
> 473.44 ± 3% +1.9% 482.23 ± 5% hackbench.time.elapsed_time
>
>
> Regards
> Yin, Fengwei

2022-04-27 10:41:48

by Shakeel Butt

[permalink] [raw]
Subject: Re: [memcg] a8c49af3be: hackbench.throughput -13.7% regression

On Tue, Apr 26, 2022 at 9:34 PM Yosry Ahmed <[email protected]> wrote:
>
[...]
>
> The only seemingly expensive operation in the mod_memcg_state() path
> is the call to cgroup_rstat_updated() (through memcg_rstat_updated()).
> One idea off the top of my head is to batch calls to
> cgroup_rstat_updated(), similar to what 11192d9c124d ("memcg: flush
> stats only if updated") did on the flush side. I am interested to see
> what memcg maintainers think about this problem (and the proposed
> solution).
>

I am suspecting this is some code alignment changes which is causing
this difference. Let's first confirm the issue before thinking about
the optimizations.