2024-03-03 13:17:34

by Oliver Sang

[permalink] [raw]
Subject: [viro-vfs:work.simple_recursive_removal] [rpc_pipe] 1d8f830b7f: fsmark.app_overhead 17.5% regression


hi, Al Viro,

besides below performance report, we noticed stderr
"A_dependency_job_for_nfs-idmapd.service_failed.See'journalctl-xe'for_details"
while testing which we didn't observe on parent.

we also noticed there are something like below in attached kmsg.xz:
kern :warn : [ 61.938032] NFSD: Unable to initialize client recovery tracking! (-110)
kern :info : [ 61.946579] NFSD: starting 90-second grace period (net f0000000)
kern :warn : [ 63.146060] NFSD: Unable to initialize client recovery tracking! (-110)
kern :info : [ 63.154531] NFSD: starting 90-second grace period (net f0000000)

which we didn't observed in parent tests, either.

we are not sure whether these impact the performance results, all these FYI.


Hello,

kernel test robot noticed a 17.5% regression of fsmark.app_overhead on:


commit: 1d8f830b7f27ba92a700a86988695e965f73b15a ("rpc_pipe: get rid of __rpc_lookup_create_exclusive()")
https://git.kernel.org/cgit/linux/kernel/git/viro/vfs.git work.simple_recursive_removal

testcase: fsmark
test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory
parameters:

iterations: 1x
nr_threads: 32t
disk: 1SSD
fs: xfs
fs2: nfsv4
filesize: 8K
test_size: 400M
sync_method: fsyncBeforeClose
nr_directories: 16d
nr_files_per_directory: 256fpd
cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+-------------------------------------------------------------------------------------------------+
| testcase: change | fsmark: fsmark.app_overhead 20.3% regression |
| test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory |
| test parameters | cpufreq_governor=performance |
| | disk=1SSD |
| | filesize=9B |
| | fs2=nfsv4 |
| | fs=xfs |
| | iterations=1x |
| | nr_directories=16d |
| | nr_files_per_directory=256fpd |
| | nr_threads=32t |
| | sync_method=fsyncBeforeClose |
| | test_size=400M |
+------------------+-------------------------------------------------------------------------------------------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240301/[email protected]

=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_directories/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
gcc-12/performance/1SSD/8K/nfsv4/xfs/1x/x86_64-rhel-8.3/16d/256fpd/32t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-ivb-2ep1/400M/fsmark

commit:
82a439e321 ("rpc_pipe: fold __rpc_mkpipe_dentry() into its sole caller")
1d8f830b7f ("rpc_pipe: get rid of __rpc_lookup_create_exclusive()")

82a439e3213e7095 1d8f830b7f27ba92a700a869886
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:6 100% 6:6 stderr.A_dependency_job_for_nfs-idmapd.service_failed.See'journalctl-xe'for_details
:6 100% 6:6 stderr.has_stderr
%stddev %change %stddev
\ | \
2398327 +12.3% 2692538 cpuidle..usage
107204 +140.2% 257509 ? 16% meminfo.AnonHugePages
304743 +9.2% 332637 ? 2% vmstat.system.cs
413300 ? 6% +14.4% 472827 ? 3% numa-numastat.node0.local_node
445362 ? 5% +12.9% 502812 ? 2% numa-numastat.node0.numa_hit
445128 ? 5% +13.1% 503240 ? 2% numa-vmstat.node0.numa_hit
413066 ? 6% +14.6% 473254 ? 3% numa-vmstat.node0.numa_local
66.15 ? 2% +164.6% 175.04 uptime.boot
2769 ? 2% +188.3% 7983 uptime.idle
65542525 +17.5% 77003387 fsmark.app_overhead
55.33 +4.2% 57.67 fsmark.time.percent_of_cpu_this_job_got
288191 +5.1% 302810 fsmark.time.voluntary_context_switches
74.50 +2.9% 76.68 iostat.cpu.idle
17.01 -10.9% 15.15 ? 2% iostat.cpu.iowait
2.59 -14.6% 2.21 iostat.cpu.user
19.75 -2.3 17.47 ? 2% mpstat.cpu.all.iowait%
2.63 -0.4 2.23 mpstat.cpu.all.usr%
12.34 ? 2% -17.5% 10.18 ? 2% mpstat.max_utilization_pct
53999 ? 2% +201.1% 162576 sched_debug.cpu_clk
53400 ? 2% +203.3% 161979 ? 2% sched_debug.ktime
54525 ? 2% +199.2% 163131 ? 2% sched_debug.sched_clk
52.21 +140.6% 125.62 ? 16% proc-vmstat.nr_anon_transparent_hugepages
8420 -3.2% 8154 proc-vmstat.nr_mapped
41883 +2.1% 42754 proc-vmstat.nr_slab_reclaimable
31420 +8.4% 34074 proc-vmstat.nr_slab_unreclaimable
704537 +7.9% 760346 proc-vmstat.numa_hit
654821 +8.5% 710630 proc-vmstat.numa_local
1039627 +6.1% 1103455 proc-vmstat.pgalloc_normal
729116 ? 6% +16.2% 846939 proc-vmstat.pgfree
1.13 ? 3% +9.8% 1.24 ? 2% perf-stat.i.MPKI
2.134e+09 -5.1% 2.026e+09 perf-stat.i.branch-instructions
4.84 -0.2 4.63 perf-stat.i.branch-miss-rate%
1.03e+08 -8.7% 94072597 perf-stat.i.branch-misses
6.16 ? 2% -0.3 5.86 ? 3% perf-stat.i.cache-miss-rate%
11302040 ? 2% +6.8% 12067382 ? 3% perf-stat.i.cache-misses
2.199e+08 +9.6% 2.409e+08 perf-stat.i.cache-references
360033 +14.8% 413385 perf-stat.i.context-switches
2239 ? 4% +13.0% 2530 ? 4% perf-stat.i.cpu-migrations
1472 ? 2% -6.0% 1384 ? 2% perf-stat.i.cycles-between-cache-misses
1.039e+10 -5.7% 9.795e+09 perf-stat.i.instructions
0.64 -5.0% 0.60 perf-stat.i.ipc
7.68 +14.3% 8.78 perf-stat.i.metric.K/sec
9527 ? 3% -13.3% 8258 ? 7% perf-stat.i.minor-faults
9528 ? 3% -13.3% 8258 ? 7% perf-stat.i.page-faults
1.09 ? 3% +13.1% 1.23 ? 2% perf-stat.overall.MPKI
4.83 -0.2 4.65 perf-stat.overall.branch-miss-rate%
1.59 +6.5% 1.70 perf-stat.overall.cpi
1464 ? 2% -5.9% 1378 ? 2% perf-stat.overall.cycles-between-cache-misses
0.63 -6.1% 0.59 perf-stat.overall.ipc
1.955e+09 -5.0% 1.857e+09 perf-stat.ps.branch-instructions
94404569 -8.6% 86246777 perf-stat.ps.branch-misses
10360850 ? 2% +6.8% 11061496 ? 3% perf-stat.ps.cache-misses
2.017e+08 +9.5% 2.209e+08 perf-stat.ps.cache-references
330250 +14.8% 379036 perf-stat.ps.context-switches
2054 ? 4% +12.9% 2320 ? 4% perf-stat.ps.cpu-migrations
9.517e+09 -5.7% 8.979e+09 perf-stat.ps.instructions
8697 ? 3% -13.1% 7557 ? 7% perf-stat.ps.minor-faults
8697 ? 3% -13.1% 7558 ? 7% perf-stat.ps.page-faults
1.151e+11 -6.0% 1.082e+11 perf-stat.total.instructions


***************************************************************************************************
lkp-ivb-2ep1: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory
=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_directories/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
gcc-12/performance/1SSD/9B/nfsv4/xfs/1x/x86_64-rhel-8.3/16d/256fpd/32t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-ivb-2ep1/400M/fsmark

commit:
82a439e321 ("rpc_pipe: fold __rpc_mkpipe_dentry() into its sole caller")
1d8f830b7f ("rpc_pipe: get rid of __rpc_lookup_create_exclusive()")

82a439e3213e7095 1d8f830b7f27ba92a700a869886
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:6 100% 6:6 stderr.A_dependency_job_for_nfs-idmapd.service_failed.See'journalctl-xe'for_details
:6 100% 6:6 stderr.has_stderr
%stddev %change %stddev
\ | \
4652582 +12.3% 5226642 cpuidle..usage
1.26 ? 74% +2.3 3.56 ? 31% perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
188802 ? 2% +63.3% 308290 ? 19% meminfo.AnonHugePages
140056 +10.9% 155322 meminfo.SUnreclaim
5.50 +0.6 6.05 ? 2% mpstat.cpu.all.sys%
12.67 -16.1% 10.63 mpstat.max_utilization_pct
655519 ? 2% +14.2% 748457 ? 2% numa-numastat.node0.local_node
680860 +14.6% 779999 ? 2% numa-numastat.node0.numa_hit
681047 +14.6% 780819 ? 2% numa-vmstat.node0.numa_hit
655706 ? 2% +14.3% 749281 ? 2% numa-vmstat.node0.numa_local
77.80 ? 3% +133.7% 181.80 uptime.boot
3182 ? 4% +156.6% 8168 uptime.idle
349730 +15.0% 402139 vmstat.system.cs
71199 +5.9% 75404 vmstat.system.in
1.219e+08 +20.3% 1.467e+08 fsmark.app_overhead
55.83 +3.3% 57.67 fsmark.time.percent_of_cpu_this_job_got
574683 +3.9% 597195 fsmark.time.voluntary_context_switches
18.20 -5.0% 17.30 iostat.cpu.iowait
6.17 +8.8% 6.72 iostat.cpu.system
2.48 -11.1% 2.21 iostat.cpu.user
56144 ? 4% +184.2% 159591 sched_debug.cpu_clk
55548 ? 4% +186.2% 158995 sched_debug.ktime
56677 ? 4% +182.5% 160122 sched_debug.sched_clk
91.96 ? 2% +64.1% 150.93 ? 18% proc-vmstat.nr_anon_transparent_hugepages
8527 -4.7% 8128 proc-vmstat.nr_mapped
35029 +10.9% 38832 proc-vmstat.nr_slab_unreclaimable
980307 +11.0% 1087887 proc-vmstat.numa_hit
930596 +11.4% 1036273 proc-vmstat.numa_local
1446464 +7.9% 1560287 proc-vmstat.pgalloc_normal
1.06 ? 2% +8.4% 1.15 ? 3% perf-stat.i.MPKI
5.00 -0.2 4.83 perf-stat.i.branch-miss-rate%
1.016e+08 -2.9% 98670000 perf-stat.i.branch-misses
10378740 ? 2% +9.1% 11323437 ? 3% perf-stat.i.cache-misses
2.348e+08 +9.3% 2.566e+08 perf-stat.i.cache-references
396967 +13.9% 452135 perf-stat.i.context-switches
1.70 +2.9% 1.74 perf-stat.i.cpi
1.647e+10 +4.9% 1.728e+10 perf-stat.i.cpu-cycles
2382 +13.4% 2702 ? 2% perf-stat.i.cpu-migrations
0.61 -4.0% 0.58 perf-stat.i.ipc
8.36 +14.0% 9.52 perf-stat.i.metric.K/sec
1.05 ? 3% +9.2% 1.14 ? 3% perf-stat.overall.MPKI
4.98 -0.2 4.83 perf-stat.overall.branch-miss-rate%
1.66 +5.1% 1.75 perf-stat.overall.cpi
0.60 -4.8% 0.57 perf-stat.overall.ipc
96867729 -2.8% 94186227 perf-stat.ps.branch-misses
9895781 ? 2% +9.2% 10807588 ? 3% perf-stat.ps.cache-misses
2.239e+08 +9.4% 2.449e+08 perf-stat.ps.cache-references
378576 +14.0% 431601 perf-stat.ps.context-switches
1.571e+10 +5.0% 1.65e+10 perf-stat.ps.cpu-cycles
2271 +13.5% 2579 ? 2% perf-stat.ps.cpu-migrations





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Attachments:
(No filename) (14.08 kB)
kmsg.xz (25.77 kB)
Download all attachments