2022-11-11 09:13:56

by Yujie Liu

[permalink] [raw]
Subject: [linus:master] [shmem] 4601e2fc8b: will-it-scale.per_thread_ops 13.7% improvement

Greeting,

FYI, we noticed a 13.7% improvement of will-it-scale.per_thread_ops due to commit:

commit: 4601e2fc8b57840660ce1a1ee98aea873fa15eee ("shmem: convert shmem_file_read_iter() to use shmem_get_folio()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: will-it-scale
on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
with following parameters:

nr_task: 100%
mode: thread
test: pread2
cpufreq_governor: performance

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale


Details are as below:

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/thread/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp5/pread2/will-it-scale

commit:
eff1f906c2 ("shmem: convert shmem_write_begin() to use shmem_get_folio()")
4601e2fc8b ("shmem: convert shmem_file_read_iter() to use shmem_get_folio()")

eff1f906c2dcd83c 4601e2fc8b57840660ce1a1ee98
---------------- ---------------------------
%stddev %change %stddev
\ | \
1508791 ? 3% +13.7% 1715505 ? 2% will-it-scale.128.threads
11786 ? 3% +13.7% 13401 ? 2% will-it-scale.per_thread_ops
1508791 ? 3% +13.7% 1715505 ? 2% will-it-scale.workload
2.92 ? 15% +43.7% 4.20 ? 16% turbostat.CPU%c1
58550 ? 4% -16.4% 48936 ? 5% sched_debug.cfs_rq:/.min_vruntime.stddev
0.20 ? 9% +17.5% 0.23 ? 5% sched_debug.cfs_rq:/.nr_running.stddev
58605 ? 5% -16.5% 48957 ? 5% sched_debug.cfs_rq:/.spread0.stddev
191.02 ? 4% +16.1% 221.72 ? 5% sched_debug.cfs_rq:/.util_est_enqueued.stddev
0.23 ? 3% +11.1% 0.25 ? 4% sched_debug.cpu.nr_running.stddev
12.20 -1.1% 12.07 perf-stat.i.cpi
0.00 ? 9% -0.0 0.00 ? 5% perf-stat.i.dTLB-store-miss-rate%
9.003e+08 ? 2% +6.4% 9.582e+08 perf-stat.i.dTLB-stores
82.71 +2.2 84.95 perf-stat.i.node-store-miss-rate%
5815837 +10.2% 6408731 perf-stat.i.node-store-misses
1223798 ? 2% -6.6% 1142824 ? 2% perf-stat.i.node-stores
12.19 -1.0% 12.06 perf-stat.overall.cpi
0.01 ? 3% -0.0 0.00 ? 5% perf-stat.overall.dTLB-store-miss-rate%
82.60 +2.2 84.85 perf-stat.overall.node-store-miss-rate%
6712074 ? 2% -12.0% 5904631 ? 2% perf-stat.overall.path-length
8.981e+08 ? 2% +6.4% 9.558e+08 perf-stat.ps.dTLB-stores
5796378 +10.2% 6387291 perf-stat.ps.node-store-misses
1220724 ? 2% -6.6% 1140426 ? 2% perf-stat.ps.node-stores
41.14 -41.1 0.00 perf-profile.calltrace.cycles-pp.shmem_getpage.shmem_file_read_iter.vfs_read.__x64_sys_pread64.do_syscall_64
41.10 -41.1 0.00 perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_getpage.shmem_file_read_iter.vfs_read.__x64_sys_pread64
41.04 -41.0 0.00 perf-profile.calltrace.cycles-pp.__filemap_get_folio.shmem_get_folio_gfp.shmem_getpage.shmem_file_read_iter.vfs_read
40.18 -40.2 0.00 perf-profile.calltrace.cycles-pp.folio_wait_bit_common.__filemap_get_folio.shmem_get_folio_gfp.shmem_getpage.shmem_file_read_iter
39.18 -39.2 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.folio_wait_bit_common.__filemap_get_folio.shmem_get_folio_gfp.shmem_getpage
0.00 +0.6 0.59 ? 7% perf-profile.calltrace.cycles-pp.io_schedule.folio_wait_bit_common.__filemap_get_folio.shmem_get_folio_gfp.shmem_file_read_iter
0.00 +39.4 39.45 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.folio_wait_bit_common.__filemap_get_folio.shmem_get_folio_gfp.shmem_file_read_iter
0.00 +40.5 40.46 perf-profile.calltrace.cycles-pp.folio_wait_bit_common.__filemap_get_folio.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read
0.00 +41.2 41.24 perf-profile.calltrace.cycles-pp.__filemap_get_folio.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read.__x64_sys_pread64
0.00 +41.3 41.30 perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read.__x64_sys_pread64.do_syscall_64
41.14 -41.1 0.00 perf-profile.children.cycles-pp.shmem_getpage
0.10 ? 4% +0.0 0.12 ? 4% perf-profile.children.cycles-pp.copyout
0.12 ? 3% +0.0 0.14 ? 3% perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
0.07 +0.0 0.09 perf-profile.children.cycles-pp.folio_unlock
0.12 ? 3% +0.0 0.14 ? 3% perf-profile.children.cycles-pp._copy_to_iter
0.13 ? 2% +0.0 0.15 ? 4% perf-profile.children.cycles-pp.copy_page_to_iter
0.00 +0.1 0.06 ? 9% perf-profile.children.cycles-pp.PageHeadHuge
0.46 -0.1 0.37 ? 3% perf-profile.self.cycles-pp.shmem_file_read_iter
0.82 ? 2% -0.1 0.74 ? 4% perf-profile.self.cycles-pp.__filemap_get_folio
0.12 ? 3% +0.0 0.14 ? 3% perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
0.07 +0.0 0.09 perf-profile.self.cycles-pp.folio_unlock


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://01.org/lkp


Attachments:
(No filename) (6.54 kB)
config-6.0.0-rc3-00333-g4601e2fc8b57 (166.92 kB)
job-script (8.00 kB)
job.yaml (5.42 kB)
reproduce (356.00 B)
Download all attachments