2023-06-12 07:54:49

by kernel test robot

[permalink] [raw]
Subject: [linux-next:master] [splice] 2cb1e08985: stress-ng.sendfile.ops_per_sec 11.6% improvement



Hello,

kernel test robot noticed a 11.6% improvement of stress-ng.sendfile.ops_per_sec on:


commit: 2cb1e08985e3dc59d0a4ebf770a87e3e2410d985 ("splice: Use filemap_splice_read() instead of generic_file_splice_read()")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

nr_threads: 100%
testtime: 60s
class: pipe
test: sendfile
cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
pipe/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp8/sendfile/stress-ng/60s

commit:
ab82513126 ("cifs: Use filemap_splice_read()")
2cb1e08985 ("splice: Use filemap_splice_read() instead of generic_file_splice_read()")

ab82513126f8b426 2cb1e08985e3dc59d0a4ebf770a
---------------- ---------------------------
%stddev %change %stddev
\ | \
568180 -1.5% 559667 proc-vmstat.pgalloc_normal
348772 -1.7% 342744 proc-vmstat.pgfault
39953 +11.7% 44609 stress-ng.sendfile.MB_per_sec_sent_to_/dev/null
38320456 +11.6% 42768635 stress-ng.sendfile.ops
638671 +11.6% 712807 stress-ng.sendfile.ops_per_sec
0.18 ? 6% -0.1 0.11 ? 8% perf-stat.i.branch-miss-rate%
61342100 -61.5% 23631851 ? 3% perf-stat.i.branch-misses
0.74 +3.7% 0.77 perf-stat.i.cpi
0.28 ?222% -0.3 0.00 ? 4% perf-stat.i.dTLB-load-miss-rate%
7.958e+11 ?223% -100.0% 105622 ? 6% perf-stat.i.dTLB-load-misses
8.398e+10 -10.4% 7.528e+10 perf-stat.i.dTLB-loads
4.702e+10 -17.3% 3.888e+10 perf-stat.i.dTLB-stores
2.965e+11 -4.7% 2.825e+11 perf-stat.i.instructions
1.36 -4.0% 1.31 perf-stat.i.ipc
1632 -7.4% 1511 perf-stat.i.metric.M/sec
0.11 -0.1 0.04 ? 3% perf-stat.overall.branch-miss-rate%
0.73 +4.8% 0.76 perf-stat.overall.cpi
16.38 ?223% -16.4 0.00 ? 6% perf-stat.overall.dTLB-load-miss-rate%
0.00 ? 3% +0.0 0.00 ? 3% perf-stat.overall.dTLB-store-miss-rate%
1.38 -4.6% 1.32 perf-stat.overall.ipc
60279316 -61.5% 23221084 ? 3% perf-stat.ps.branch-misses
7.58e+11 ?223% -100.0% 104910 ? 6% perf-stat.ps.dTLB-load-misses
8.264e+10 -10.4% 7.408e+10 perf-stat.ps.dTLB-loads
4.628e+10 -17.3% 3.826e+10 perf-stat.ps.dTLB-stores
2.918e+11 -4.7% 2.78e+11 perf-stat.ps.instructions
1.832e+13 -4.8% 1.745e+13 perf-stat.total.instructions
73.32 -73.3 0.00 perf-profile.calltrace.cycles-pp.generic_file_splice_read.splice_direct_to_actor.do_splice_direct.do_sendfile.__x64_sys_sendfile64
68.75 -68.7 0.00 perf-profile.calltrace.cycles-pp.filemap_read.generic_file_splice_read.splice_direct_to_actor.do_splice_direct.do_sendfile
24.87 ? 3% -24.9 0.00 perf-profile.calltrace.cycles-pp.filemap_get_pages.filemap_read.generic_file_splice_read.splice_direct_to_actor.do_splice_direct
23.72 ? 4% -23.7 0.00 perf-profile.calltrace.cycles-pp.filemap_get_read_batch.filemap_get_pages.filemap_read.generic_file_splice_read.splice_direct_to_actor
20.27 ? 3% -20.3 0.00 perf-profile.calltrace.cycles-pp.copy_page_to_iter_pipe.filemap_read.generic_file_splice_read.splice_direct_to_actor.do_splice_direct
0.58 +0.1 0.65 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_splice_read.splice_direct_to_actor.do_splice_direct
0.80 +0.1 0.89 perf-profile.calltrace.cycles-pp.security_file_permission.vfs_splice_read.splice_direct_to_actor.do_splice_direct.do_sendfile
1.78 +0.1 1.88 perf-profile.calltrace.cycles-pp.page_cache_pipe_buf_confirm.__splice_from_pipe.splice_from_pipe.direct_splice_actor.splice_direct_to_actor
1.33 ? 2% +0.1 1.47 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_splice_read.splice_direct_to_actor.do_splice_direct.do_sendfile
3.07 +0.3 3.39 perf-profile.calltrace.cycles-pp.vfs_splice_read.splice_direct_to_actor.do_splice_direct.do_sendfile.__x64_sys_sendfile64
0.00 +0.6 0.58 perf-profile.calltrace.cycles-pp.xas_descend.xas_load.filemap_get_read_batch.filemap_get_pages.filemap_splice_read
0.00 +0.6 0.61 perf-profile.calltrace.cycles-pp.current_time.atime_needs_update.touch_atime.filemap_splice_read.splice_direct_to_actor
0.00 +1.3 1.30 perf-profile.calltrace.cycles-pp.atime_needs_update.touch_atime.filemap_splice_read.splice_direct_to_actor.do_splice_direct
0.00 +1.6 1.58 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_read_batch.filemap_get_pages.filemap_splice_read.splice_direct_to_actor
0.00 +1.7 1.65 perf-profile.calltrace.cycles-pp.touch_atime.filemap_splice_read.splice_direct_to_actor.do_splice_direct.do_sendfile
10.13 +1.7 11.83 perf-profile.calltrace.cycles-pp.page_cache_pipe_buf_release.__splice_from_pipe.splice_from_pipe.direct_splice_actor.splice_direct_to_actor
0.00 +2.2 2.20 perf-profile.calltrace.cycles-pp.folio_mark_accessed.filemap_splice_read.splice_direct_to_actor.do_splice_direct.do_sendfile
22.17 +2.6 24.73 perf-profile.calltrace.cycles-pp.splice_from_pipe.direct_splice_actor.splice_direct_to_actor.do_splice_direct.do_sendfile
22.48 +2.6 25.04 perf-profile.calltrace.cycles-pp.direct_splice_actor.splice_direct_to_actor.do_splice_direct.do_sendfile.__x64_sys_sendfile64
20.17 +2.8 22.99 perf-profile.calltrace.cycles-pp.__splice_from_pipe.splice_from_pipe.direct_splice_actor.splice_direct_to_actor.do_splice_direct
0.00 +13.8 13.80 perf-profile.calltrace.cycles-pp.release_pages.__pagevec_release.filemap_splice_read.splice_direct_to_actor.do_splice_direct
0.00 +14.4 14.44 perf-profile.calltrace.cycles-pp.__pagevec_release.filemap_splice_read.splice_direct_to_actor.do_splice_direct.do_sendfile
0.00 +18.5 18.54 perf-profile.calltrace.cycles-pp.splice_folio_into_pipe.filemap_splice_read.splice_direct_to_actor.do_splice_direct.do_sendfile
0.00 +25.9 25.92 perf-profile.calltrace.cycles-pp.filemap_get_read_batch.filemap_get_pages.filemap_splice_read.splice_direct_to_actor.do_splice_direct
0.00 +27.2 27.15 perf-profile.calltrace.cycles-pp.filemap_get_pages.filemap_splice_read.splice_direct_to_actor.do_splice_direct.do_sendfile
0.00 +69.0 69.03 perf-profile.calltrace.cycles-pp.filemap_splice_read.splice_direct_to_actor.do_splice_direct.do_sendfile.__x64_sys_sendfile64
73.48 -73.5 0.00 perf-profile.children.cycles-pp.generic_file_splice_read
69.92 -69.9 0.00 perf-profile.children.cycles-pp.filemap_read
20.75 -20.8 0.00 perf-profile.children.cycles-pp.copy_page_to_iter_pipe
3.04 -1.3 1.75 perf-profile.children.cycles-pp.touch_atime
2.54 -1.1 1.47 perf-profile.children.cycles-pp.atime_needs_update
1.20 -0.5 0.69 perf-profile.children.cycles-pp.current_time
2.84 -0.2 2.64 perf-profile.children.cycles-pp.folio_mark_accessed
0.34 -0.1 0.19 ? 2% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
0.26 ? 2% -0.1 0.15 ? 4% perf-profile.children.cycles-pp.make_vfsgid
0.25 ? 2% -0.1 0.16 ? 3% perf-profile.children.cycles-pp.make_vfsuid
0.08 +0.0 0.09 perf-profile.children.cycles-pp.pipe_unlock
0.26 +0.0 0.28 perf-profile.children.cycles-pp.__get_task_ioprio
0.26 +0.0 0.29 perf-profile.children.cycles-pp.aa_file_perm
0.30 ? 3% +0.0 0.33 perf-profile.children.cycles-pp.fsnotify_perm
0.18 ? 2% +0.0 0.20 ? 2% perf-profile.children.cycles-pp.rw_verify_area
0.69 +0.0 0.72 perf-profile.children.cycles-pp.xas_descend
0.42 +0.0 0.45 perf-profile.children.cycles-pp.xas_start
0.28 ? 2% +0.0 0.32 perf-profile.children.cycles-pp.splice_from_pipe_next
0.29 +0.0 0.33 ? 2% perf-profile.children.cycles-pp.rcu_all_qs
0.68 +0.1 0.75 perf-profile.children.cycles-pp.apparmor_file_permission
0.95 +0.1 1.04 perf-profile.children.cycles-pp.pipe_to_null
0.95 +0.1 1.06 perf-profile.children.cycles-pp.security_file_permission
1.76 +0.1 1.86 perf-profile.children.cycles-pp.xas_load
0.00 +0.1 0.11 perf-profile.children.cycles-pp.mlock_drain_local
0.76 ? 2% +0.1 0.87 perf-profile.children.cycles-pp.__cond_resched
2.20 +0.1 2.33 perf-profile.children.cycles-pp.page_cache_pipe_buf_confirm
1.43 ? 2% +0.2 1.58 perf-profile.children.cycles-pp.__fsnotify_parent
0.00 +0.2 0.25 perf-profile.children.cycles-pp.free_unref_page_list
0.00 +0.3 0.27 ? 3% perf-profile.children.cycles-pp.lru_add_drain_cpu
3.13 +0.3 3.46 perf-profile.children.cycles-pp.vfs_splice_read
0.00 +0.5 0.47 perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list
10.17 +1.7 11.83 perf-profile.children.cycles-pp.page_cache_pipe_buf_release
23.91 ? 4% +2.2 26.14 perf-profile.children.cycles-pp.filemap_get_read_batch
24.98 ? 3% +2.3 27.29 perf-profile.children.cycles-pp.filemap_get_pages
21.14 +2.4 23.56 perf-profile.children.cycles-pp.__splice_from_pipe
22.34 +2.6 24.91 perf-profile.children.cycles-pp.splice_from_pipe
22.54 +2.6 25.11 perf-profile.children.cycles-pp.direct_splice_actor
0.00 +14.0 13.99 perf-profile.children.cycles-pp.release_pages
0.00 +14.6 14.62 perf-profile.children.cycles-pp.__pagevec_release
0.00 +18.3 18.30 perf-profile.children.cycles-pp.splice_folio_into_pipe
0.00 +70.5 70.52 perf-profile.children.cycles-pp.filemap_splice_read
16.46 -16.5 0.00 perf-profile.self.cycles-pp.filemap_read
16.16 -16.2 0.00 perf-profile.self.cycles-pp.copy_page_to_iter_pipe
0.95 ? 2% -0.4 0.54 perf-profile.self.cycles-pp.atime_needs_update
0.86 -0.4 0.50 perf-profile.self.cycles-pp.current_time
0.50 -0.2 0.25 ? 2% perf-profile.self.cycles-pp.touch_atime
2.32 ? 2% -0.1 2.19 perf-profile.self.cycles-pp.folio_mark_accessed
0.27 -0.1 0.15 ? 3% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
0.20 ? 3% -0.1 0.11 ? 4% perf-profile.self.cycles-pp.make_vfsgid
0.19 ? 3% -0.1 0.12 ? 3% perf-profile.self.cycles-pp.make_vfsuid
0.23 ? 2% +0.0 0.25 perf-profile.self.cycles-pp.__get_task_ioprio
0.23 ? 2% +0.0 0.25 ? 2% perf-profile.self.cycles-pp.aa_file_perm
0.27 ? 3% +0.0 0.29 ? 2% perf-profile.self.cycles-pp.fsnotify_perm
0.56 +0.0 0.58 perf-profile.self.cycles-pp.xas_descend
0.14 ? 2% +0.0 0.16 ? 2% perf-profile.self.cycles-pp.rw_verify_area
0.35 +0.0 0.38 ? 2% perf-profile.self.cycles-pp.xas_start
0.19 ? 2% +0.0 0.22 ? 2% perf-profile.self.cycles-pp.rcu_all_qs
0.33 +0.0 0.35 ? 2% perf-profile.self.cycles-pp.splice_direct_to_actor
0.25 ? 2% +0.0 0.28 perf-profile.self.cycles-pp.splice_from_pipe_next
0.37 +0.0 0.41 perf-profile.self.cycles-pp.apparmor_file_permission
0.48 +0.0 0.52 perf-profile.self.cycles-pp.pipe_to_null
0.72 +0.0 0.76 perf-profile.self.cycles-pp.xas_load
0.31 ? 2% +0.0 0.35 ? 2% perf-profile.self.cycles-pp.security_file_permission
0.50 ? 2% +0.0 0.54 perf-profile.self.cycles-pp.vfs_splice_read
0.48 ? 2% +0.1 0.54 perf-profile.self.cycles-pp.__cond_resched
0.00 +0.1 0.07 ? 5% perf-profile.self.cycles-pp.mlock_drain_local
1.75 +0.1 1.85 perf-profile.self.cycles-pp.page_cache_pipe_buf_confirm
1.02 +0.1 1.14 perf-profile.self.cycles-pp.filemap_get_pages
1.39 ? 2% +0.1 1.54 perf-profile.self.cycles-pp.__fsnotify_parent
1.10 ? 2% +0.2 1.25 perf-profile.self.cycles-pp.splice_from_pipe
0.00 +0.2 0.19 ? 2% perf-profile.self.cycles-pp.free_unref_page_list
0.00 +0.2 0.24 ? 2% perf-profile.self.cycles-pp.lru_add_drain_cpu
0.00 +0.3 0.32 ? 2% perf-profile.self.cycles-pp.__pagevec_release
0.00 +0.4 0.40 perf-profile.self.cycles-pp.__mem_cgroup_uncharge_list
8.70 +0.6 9.31 perf-profile.self.cycles-pp.__splice_from_pipe
9.60 +1.6 11.20 perf-profile.self.cycles-pp.page_cache_pipe_buf_release
22.00 ? 4% +2.1 24.10 perf-profile.self.cycles-pp.filemap_get_read_batch
0.00 +6.5 6.50 perf-profile.self.cycles-pp.filemap_splice_read
0.00 +13.3 13.29 perf-profile.self.cycles-pp.release_pages
0.00 +17.5 17.53 perf-profile.self.cycles-pp.splice_folio_into_pipe




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



Attachments:
(No filename) (15.78 kB)
config-6.4.0-rc2-00028-g2cb1e08985e3 (161.11 kB)
job-script (9.24 kB)
job.yaml (6.50 kB)
reproduce (351.00 B)
Download all attachments

2023-06-12 12:31:03

by David Howells

[permalink] [raw]
Subject: Re: [linux-next:master] [splice] 2cb1e08985: stress-ng.sendfile.ops_per_sec 11.6% improvement

kernel test robot <[email protected]> wrote:

> kernel test robot noticed a 11.6% improvement of stress-ng.sendfile.ops_per_sec on:

If it's sending to a socket, this is entirely feasible. The
splice_to_socket() function now sends multiple pages in one go to the network
protocol's sendmsg() method to process instead of using sendpage to send one
page at a time.

David