2022-08-26 03:26:29

by kernel test robot

[permalink] [raw]
Subject: [smb3] 5efdd9122e: filebench.sum_operations/s -50.0% regression

Greeting,

FYI, we noticed a -50.0% regression of filebench.sum_operations/s due to commit:


commit: 5efdd9122eff772eae2feae9f0fc0ec02d4846a3 ("smb3: allow deferred close timeout to be configurable")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: filebench
on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz (Cascade Lake) with 128G memory
with following parameters:

disk: 1HDD
fs: ext4
fs2: cifs
test: filemicro_delete.f
cpufreq_governor: performance
ucode: 0x5003302


=========================================================================================
compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase/ucode:
gcc-11/performance/1HDD/cifs/ext4/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp9/filemicro_delete.f/filebench/0x5003302

commit:
dcb45fd7f5 ("cifs: Do not use tcon->cfid directly, use the cfid we get from open_cached_dir")
5efdd9122e ("smb3: allow deferred close timeout to be configurable")

dcb45fd7f501f864 5efdd9122eff772eae2feae9f0f
---------------- ---------------------------
%stddev %change %stddev
\ | \
515.95 -50.0% 257.98 filebench.sum_operations/s
4.81 ± 2% +1038.4% 54.78 ± 6% filebench.sum_time_ms/op
29.00 ± 8% +212.1% 90.50 ± 3% filebench.time.percent_of_cpu_this_job_got
24629 +2.7% 25297 filebench.time.voluntary_context_switches
7.685e+08 +19.3% 9.169e+08 ± 4% cpuidle..time
2.53 ± 6% -20.6% 2.00 ± 3% iostat.cpu.iowait
1506141 ± 8% +22.4% 1843256 ± 3% turbostat.IRQ
2.00 -50.0% 1.00 vmstat.procs.b
21969 ± 2% -9.5% 19885 ± 2% vmstat.system.cs
3.06 ± 7% -0.7 2.35 ± 4% mpstat.cpu.all.iowait%
0.79 ± 5% +0.5 1.27 ± 2% mpstat.cpu.all.sys%
0.89 ± 3% -0.1 0.79 ± 3% mpstat.cpu.all.usr%
34.55 ± 14% -34.8% 22.51 ± 27% sched_debug.cfs_rq:/.removed.runnable_avg.avg
119.64 ± 3% -20.0% 95.69 ± 17% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
34.55 ± 14% -34.8% 22.51 ± 27% sched_debug.cfs_rq:/.removed.util_avg.avg
119.64 ± 3% -20.0% 95.69 ± 17% sched_debug.cfs_rq:/.removed.util_avg.stddev
5249 +15.8% 6076 meminfo.Active
3866 ± 2% +17.7% 4552 meminfo.Active(anon)
1382 ± 4% +10.3% 1524 ± 4% meminfo.Active(file)
69791 ± 14% +39.8% 97553 ± 6% meminfo.AnonHugePages
72709 ± 2% +12.5% 81779 ± 3% meminfo.Inactive(file)
23219 +13.5% 26352 ± 3% meminfo.KernelStack
966.50 ± 2% +17.7% 1137 proc-vmstat.nr_active_anon
74302 +6.3% 78977 ± 2% proc-vmstat.nr_anon_pages
81133 +6.0% 85973 proc-vmstat.nr_inactive_anon
18172 ± 2% +12.5% 20442 ± 3% proc-vmstat.nr_inactive_file
23213 +13.5% 26348 ± 3% proc-vmstat.nr_kernel_stack
17983 +2.3% 18400 proc-vmstat.nr_mapped
7446 ± 2% +5.5% 7853 ± 3% proc-vmstat.nr_shmem
26888 +1.6% 27306 proc-vmstat.nr_slab_reclaimable
47220 +3.4% 48803 proc-vmstat.nr_slab_unreclaimable
966.50 ± 2% +17.7% 1137 proc-vmstat.nr_zone_active_anon
81133 +6.0% 85973 proc-vmstat.nr_zone_inactive_anon
18172 ± 2% +12.5% 20442 ± 3% proc-vmstat.nr_zone_inactive_file
361460 +2.5% 370454 proc-vmstat.numa_hit
946.67 +18.6% 1122 proc-vmstat.pgactivate
361562 +2.5% 370553 proc-vmstat.pgalloc_normal
187906 +4.7% 196761 proc-vmstat.pgfault
8189 +2.5% 8395 proc-vmstat.pgreuse
1.097e+09 +15.5% 1.267e+09 ± 7% perf-stat.i.branch-instructions
39079265 ± 6% -20.9% 30915354 ± 4% perf-stat.i.branch-misses
5093263 ± 4% -23.7% 3884752 ± 9% perf-stat.i.cache-misses
29213 -18.7% 23764 ± 5% perf-stat.i.context-switches
7.666e+09 ± 4% +5.7% 8.106e+09 ± 2% perf-stat.i.cpu-cycles
1877 ± 15% +75.1% 3287 ± 12% perf-stat.i.cycles-between-cache-misses
1735450 ± 3% -12.9% 1512060 ± 3% perf-stat.i.iTLB-load-misses
2898 ± 3% +34.4% 3895 ± 7% perf-stat.i.instructions-per-iTLB-miss
1493 -20.3% 1190 ± 7% perf-stat.i.major-faults
0.09 ± 3% +5.8% 0.09 ± 2% perf-stat.i.metric.GHz
48.47 ± 11% +8.4 56.83 ± 7% perf-stat.i.node-store-miss-rate%
283426 ± 4% -21.6% 222190 ± 10% perf-stat.i.node-stores
3.57 ± 7% -1.1 2.44 ± 6% perf-stat.overall.branch-miss-rate%
1508 ± 3% +39.8% 2108 ± 9% perf-stat.overall.cycles-between-cache-misses
3022 ± 3% +23.6% 3736 ± 5% perf-stat.overall.instructions-per-iTLB-miss
9.585e+08 +18.8% 1.138e+09 ± 6% perf-stat.ps.branch-instructions
34151514 ± 6% -18.8% 27725316 ± 4% perf-stat.ps.branch-misses
4450329 ± 5% -21.7% 3486409 ± 9% perf-stat.ps.cache-misses
25524 -16.4% 21333 ± 4% perf-stat.ps.context-switches
77139 +2.5% 79105 perf-stat.ps.cpu-clock
6.704e+09 ± 4% +8.7% 7.287e+09 perf-stat.ps.cpu-cycles
1.06e+09 +11.3% 1.18e+09 ± 5% perf-stat.ps.dTLB-loads
1517349 ± 3% -10.5% 1357716 ± 2% perf-stat.ps.iTLB-load-misses
4.582e+09 +10.8% 5.075e+09 ± 6% perf-stat.ps.instructions
1296 -18.1% 1061 ± 6% perf-stat.ps.major-faults
247613 ± 4% -19.5% 199283 ± 9% perf-stat.ps.node-stores
77139 +2.5% 79105 perf-stat.ps.task-clock
3.697e+10 +35.3% 5.003e+10 perf-stat.total.instructions
8.51 ± 91% -6.9 1.59 ±223% perf-profile.calltrace.cycles-pp.getdents64
8.34 ± 83% -6.7 1.67 ±141% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.open64
7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getdents64
7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.iterate_dir.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
6.26 ±115% -5.4 0.88 ±223% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.26 ±115% -5.4 0.88 ±223% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare
6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop
6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record
4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist
4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push
4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record
3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.mmput.exit_mm.do_exit
3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.mmput.exit_mm
3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.mmput
3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
8.51 ± 91% -6.9 1.59 ±223% perf-profile.children.cycles-pp.getdents64
8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.exit_to_user_mode_loop
8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
6.25 ±107% -6.2 0.00 perf-profile.children.cycles-pp.open64
7.63 ± 84% -6.0 1.59 ±223% perf-profile.children.cycles-pp.__x64_sys_getdents64
7.63 ± 84% -6.0 1.59 ±223% perf-profile.children.cycles-pp.iterate_dir
6.26 ±115% -4.6 1.67 ±141% perf-profile.children.cycles-pp.arch_do_signal_or_restart
6.26 ±115% -4.6 1.67 ±141% perf-profile.children.cycles-pp.get_signal
4.57 ± 73% -3.8 0.76 ±223% perf-profile.children.cycles-pp.perf_mmap__read_head
3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.unmap_vmas
3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.unmap_page_range
3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.zap_pmd_range
3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.zap_pte_range
8.54 ± 43% +8.6 17.19 ± 38% perf-profile.children.cycles-pp.asm_exc_page_fault


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


#regzbot introduced: 5efdd9122e


--
0-DAY CI Kernel Test Service
https://01.org/lkp


Attachments:
config-5.19.0-00024-g5efdd9122eff (166.60 kB)
job-script (8.28 kB)
job.yaml (5.72 kB)
reproduce (628.00 B)
Download all attachments

2022-09-04 11:31:25

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [smb3] 5efdd9122e: filebench.sum_operations/s -50.0% regression

Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

As per recent general discussions with the 0-day folks, I'm dropping
below regression from the list of tracked issues, as this report didn't
gain any traction. That for example can happen if the developers
considered the regression of no practical relevance, as they assume it
only materializes in micro-benchmarks, is due to a broken test case, or
some fluke.

Not sure if that or something else is the reason why this particular
report didn't gain any traction, but I lack the bandwidth to follow-up
on each and every regression some CI system found and reported. At the
same time I don't want to keep these reports in the list of tracked
issues forever, as that creates noise and makes it harder to spot the
important issues in regzbot's reports and lists. That's why I hereby
remove it:

#regzbot invalid: 0-day report that didn't get traction; might be of no
relevance

Ciao, Thorsten

On 26.08.22 04:41, kernel test robot wrote:
> Greeting,
>
> FYI, we noticed a -50.0% regression of filebench.sum_operations/s due to
> commit:
>
>
> commit: 5efdd9122eff772eae2feae9f0fc0ec02d4846a3 ("smb3: allow deferred
> close timeout to be configurable")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: filebench
> on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @
> 2.10GHz (Cascade Lake) with 128G memory
> with following parameters:
>
>     disk: 1HDD
>     fs: ext4
>     fs2: cifs
>     test: filemicro_delete.f
>     cpufreq_governor: performance
>     ucode: 0x5003302
>
>
> =========================================================================================
> compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase/ucode:
>  
> gcc-11/performance/1HDD/cifs/ext4/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp9/filemicro_delete.f/filebench/0x5003302
>
> commit:
>   dcb45fd7f5 ("cifs: Do not use tcon->cfid directly, use the cfid we get
> from open_cached_dir")
>   5efdd9122e ("smb3: allow deferred close timeout to be configurable")
>
> dcb45fd7f501f864 5efdd9122eff772eae2feae9f0f
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>     515.95           -50.0%     257.98        filebench.sum_operations/s
>       4.81 ±  2%   +1038.4%      54.78 ±  6%  filebench.sum_time_ms/op
>      29.00 ±  8%    +212.1%      90.50 ±  3% 
> filebench.time.percent_of_cpu_this_job_got
>      24629            +2.7%      25297       
> filebench.time.voluntary_context_switches
>  7.685e+08           +19.3%  9.169e+08 ±  4%  cpuidle..time
>       2.53 ±  6%     -20.6%       2.00 ±  3%  iostat.cpu.iowait
>    1506141 ±  8%     +22.4%    1843256 ±  3%  turbostat.IRQ
>       2.00           -50.0%       1.00        vmstat.procs.b
>      21969 ±  2%      -9.5%      19885 ±  2%  vmstat.system.cs
>       3.06 ±  7%      -0.7        2.35 ±  4%  mpstat.cpu.all.iowait%
>       0.79 ±  5%      +0.5        1.27 ±  2%  mpstat.cpu.all.sys%
>       0.89 ±  3%      -0.1        0.79 ±  3%  mpstat.cpu.all.usr%
>      34.55 ± 14%     -34.8%      22.51 ± 27% 
> sched_debug.cfs_rq:/.removed.runnable_avg.avg
>     119.64 ±  3%     -20.0%      95.69 ± 17% 
> sched_debug.cfs_rq:/.removed.runnable_avg.stddev
>      34.55 ± 14%     -34.8%      22.51 ± 27% 
> sched_debug.cfs_rq:/.removed.util_avg.avg
>     119.64 ±  3%     -20.0%      95.69 ± 17% 
> sched_debug.cfs_rq:/.removed.util_avg.stddev
>       5249           +15.8%       6076        meminfo.Active
>       3866 ±  2%     +17.7%       4552        meminfo.Active(anon)
>       1382 ±  4%     +10.3%       1524 ±  4%  meminfo.Active(file)
>      69791 ± 14%     +39.8%      97553 ±  6%  meminfo.AnonHugePages
>      72709 ±  2%     +12.5%      81779 ±  3%  meminfo.Inactive(file)
>      23219           +13.5%      26352 ±  3%  meminfo.KernelStack
>     966.50 ±  2%     +17.7%       1137        proc-vmstat.nr_active_anon
>      74302            +6.3%      78977 ±  2%  proc-vmstat.nr_anon_pages
>      81133            +6.0%      85973        proc-vmstat.nr_inactive_anon
>      18172 ±  2%     +12.5%      20442 ±  3%  proc-vmstat.nr_inactive_file
>      23213           +13.5%      26348 ±  3%  proc-vmstat.nr_kernel_stack
>      17983            +2.3%      18400        proc-vmstat.nr_mapped
>       7446 ±  2%      +5.5%       7853 ±  3%  proc-vmstat.nr_shmem
>      26888            +1.6%      27306       
> proc-vmstat.nr_slab_reclaimable
>      47220            +3.4%      48803       
> proc-vmstat.nr_slab_unreclaimable
>     966.50 ±  2%     +17.7%       1137       
> proc-vmstat.nr_zone_active_anon
>      81133            +6.0%      85973       
> proc-vmstat.nr_zone_inactive_anon
>      18172 ±  2%     +12.5%      20442 ±  3% 
> proc-vmstat.nr_zone_inactive_file
>     361460            +2.5%     370454        proc-vmstat.numa_hit
>     946.67           +18.6%       1122        proc-vmstat.pgactivate
>     361562            +2.5%     370553        proc-vmstat.pgalloc_normal
>     187906            +4.7%     196761        proc-vmstat.pgfault
>       8189            +2.5%       8395        proc-vmstat.pgreuse
>  1.097e+09           +15.5%  1.267e+09 ±  7% 
> perf-stat.i.branch-instructions
>   39079265 ±  6%     -20.9%   30915354 ±  4%  perf-stat.i.branch-misses
>    5093263 ±  4%     -23.7%    3884752 ±  9%  perf-stat.i.cache-misses
>      29213           -18.7%      23764 ±  5%  perf-stat.i.context-switches
>  7.666e+09 ±  4%      +5.7%  8.106e+09 ±  2%  perf-stat.i.cpu-cycles
>       1877 ± 15%     +75.1%       3287 ± 12% 
> perf-stat.i.cycles-between-cache-misses
>    1735450 ±  3%     -12.9%    1512060 ±  3%  perf-stat.i.iTLB-load-misses
>       2898 ±  3%     +34.4%       3895 ±  7% 
> perf-stat.i.instructions-per-iTLB-miss
>       1493           -20.3%       1190 ±  7%  perf-stat.i.major-faults
>       0.09 ±  3%      +5.8%       0.09 ±  2%  perf-stat.i.metric.GHz
>      48.47 ± 11%      +8.4       56.83 ±  7% 
> perf-stat.i.node-store-miss-rate%
>     283426 ±  4%     -21.6%     222190 ± 10%  perf-stat.i.node-stores
>       3.57 ±  7%      -1.1        2.44 ±  6% 
> perf-stat.overall.branch-miss-rate%
>       1508 ±  3%     +39.8%       2108 ±  9% 
> perf-stat.overall.cycles-between-cache-misses
>       3022 ±  3%     +23.6%       3736 ±  5% 
> perf-stat.overall.instructions-per-iTLB-miss
>  9.585e+08           +18.8%  1.138e+09 ±  6% 
> perf-stat.ps.branch-instructions
>   34151514 ±  6%     -18.8%   27725316 ±  4%  perf-stat.ps.branch-misses
>    4450329 ±  5%     -21.7%    3486409 ±  9%  perf-stat.ps.cache-misses
>      25524           -16.4%      21333 ±  4%  perf-stat.ps.context-switches
>      77139            +2.5%      79105        perf-stat.ps.cpu-clock
>  6.704e+09 ±  4%      +8.7%  7.287e+09        perf-stat.ps.cpu-cycles
>   1.06e+09           +11.3%   1.18e+09 ±  5%  perf-stat.ps.dTLB-loads
>    1517349 ±  3%     -10.5%    1357716 ±  2%  perf-stat.ps.iTLB-load-misses
>  4.582e+09           +10.8%  5.075e+09 ±  6%  perf-stat.ps.instructions
>       1296           -18.1%       1061 ±  6%  perf-stat.ps.major-faults
>     247613 ±  4%     -19.5%     199283 ±  9%  perf-stat.ps.node-stores
>      77139            +2.5%      79105        perf-stat.ps.task-clock
>  3.697e+10           +35.3%  5.003e+10        perf-stat.total.instructions
>       8.51 ± 91%      -6.9        1.59 ±223% 
> perf-profile.calltrace.cycles-pp.getdents64
>       8.34 ± 83%      -6.7        1.67 ±141% 
> perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       6.25 ±107%      -6.2        0.00       
> perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
>       6.25 ±107%      -6.2        0.00       
> perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
>       6.25 ±107%      -6.2        0.00       
> perf-profile.calltrace.cycles-pp.open64
>       7.63 ± 84%      -6.0        1.59 ±223% 
> perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getdents64
>       7.63 ± 84%      -6.0        1.59 ±223% 
> perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
>       7.63 ± 84%      -6.0        1.59 ±223% 
> perf-profile.calltrace.cycles-pp.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
>       7.63 ± 84%      -6.0        1.59 ±223% 
> perf-profile.calltrace.cycles-pp.iterate_dir.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
>       6.26 ±115%      -5.4        0.88 ±223% 
> perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       6.26 ±115%      -5.4        0.88 ±223% 
> perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       6.26 ±115%      -4.6        1.67 ±141% 
> perf-profile.calltrace.cycles-pp.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare
>       6.26 ±115%      -4.6        1.67 ±141% 
> perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop
>       6.26 ±115%      -4.6        1.67 ±141% 
> perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
>       6.26 ±115%      -4.6        1.67 ±141% 
> perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
>       4.57 ± 73%      -3.8        0.76 ±223% 
> perf-profile.calltrace.cycles-pp.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record
>       4.57 ± 73%      -3.8        0.76 ±223% 
> perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist
>       4.57 ± 73%      -3.8        0.76 ±223% 
> perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push
>       4.57 ± 73%      -3.8        0.76 ±223% 
> perf-profile.calltrace.cycles-pp.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record
>       3.38 ±103%      -3.4        0.00       
> perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.mmput.exit_mm.do_exit
>       3.38 ±103%      -3.4        0.00       
> perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.mmput.exit_mm
>       3.38 ±103%      -3.4        0.00       
> perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.mmput
>       3.38 ±103%      -3.4        0.00       
> perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
>       8.51 ± 91%      -6.9        1.59 ±223% 
> perf-profile.children.cycles-pp.getdents64
>       8.34 ± 83%      -6.7        1.67 ±141% 
> perf-profile.children.cycles-pp.exit_to_user_mode_prepare
>       8.34 ± 83%      -6.7        1.67 ±141% 
> perf-profile.children.cycles-pp.exit_to_user_mode_loop
>       8.34 ± 83%      -6.7        1.67 ±141% 
> perf-profile.children.cycles-pp.syscall_exit_to_user_mode
>       6.25 ±107%      -6.2        0.00       
> perf-profile.children.cycles-pp.open64
>       7.63 ± 84%      -6.0        1.59 ±223% 
> perf-profile.children.cycles-pp.__x64_sys_getdents64
>       7.63 ± 84%      -6.0        1.59 ±223% 
> perf-profile.children.cycles-pp.iterate_dir
>       6.26 ±115%      -4.6        1.67 ±141% 
> perf-profile.children.cycles-pp.arch_do_signal_or_restart
>       6.26 ±115%      -4.6        1.67 ±141% 
> perf-profile.children.cycles-pp.get_signal
>       4.57 ± 73%      -3.8        0.76 ±223% 
> perf-profile.children.cycles-pp.perf_mmap__read_head
>       3.38 ±103%      -3.4        0.00       
> perf-profile.children.cycles-pp.unmap_vmas
>       3.38 ±103%      -3.4        0.00       
> perf-profile.children.cycles-pp.unmap_page_range
>       3.38 ±103%      -3.4        0.00       
> perf-profile.children.cycles-pp.zap_pmd_range
>       3.38 ±103%      -3.4        0.00       
> perf-profile.children.cycles-pp.zap_pte_range
>       8.54 ± 43%      +8.6       17.19 ± 38% 
> perf-profile.children.cycles-pp.asm_exc_page_fault
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>
>
> To reproduce:
>
>         git clone https://github.com/intel/lkp-tests.git
>         cd lkp-tests
>         sudo bin/lkp install job.yaml           # job file is attached
> in this email
>         bin/lkp split-job --compatible job.yaml # generate the yaml file
> for lkp run
>         sudo bin/lkp run generated-yaml-file
>
>         # if come across any failure that blocks the test,
>         # please remove ~/.lkp and /lkp dir to run from a clean state.
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are
> provided
> for informational purposes only. Any difference in system hardware or
> software
> design or configuration may affect actual performance.
>
>
> #regzbot introduced: 5efdd9122e
>
>

2023-03-15 07:43:12

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [smb3] 5efdd9122e: filebench.sum_operations/s -50.0% regression



On 8/26/2022 10:41 AM, kernel test robot wrote:
> Greeting,
>
> FYI, we noticed a -50.0% regression of filebench.sum_operations/s due to commit:
>
>
> commit: 5efdd9122eff772eae2feae9f0fc0ec02d4846a3 ("smb3: allow deferred close timeout to be configurable")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: filebench
> on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz (Cascade Lake) with 128G memory
> with following parameters:
>
>     disk: 1HDD
>     fs: ext4
>     fs2: cifs
>     test: filemicro_delete.f
>     cpufreq_governor: performance
>     ucode: 0x5003302
Please note, we still could see this regresion on v6.3-rc2. And the regression is related with
the commit: 5efdd9122eff772eae2feae9f0fc0ec02d4846a3. This commit changed the default timeout
value from 1s to 5s. If change the timeout back to 1s as following:

diff --git a/fs/cifs/fs_context.c b/fs/cifs/fs_context.c
index 6d13f8207e96..6b930fb0c4bd 100644
--- a/fs/cifs/fs_context.c
+++ b/fs/cifs/fs_context.c
@@ -1537,7 +1537,7 @@ int smb3_init_fs_context(struct fs_context *fc)

ctx->acregmax = CIFS_DEF_ACTIMEO;
ctx->acdirmax = CIFS_DEF_ACTIMEO;
- ctx->closetimeo = SMB3_DEF_DCLOSETIMEO;
+ ctx->closetimeo = CIFS_DEF_ACTIMEO;

/* Most clients set timeout to 0, allows server to use its default */
ctx->handle_timeout = 0; /* See MS-SMB2 spec section 2.2.14.2.12 */

The regression is gone:
dcb45fd7f501f864 v6.3-rc2 32715af441411a5a266606be08f
---------------- --------------------------- ---------------------------
fail:runs %reproduction fail:runs %reproduction fail:runs
| | | | |
:25 0% :3 33% 1:3 last_state.booting
:25 0% :3 33% 1:3 last_state.is_incomplete_run
%stddev %change %stddev %change %stddev
\ | \ | \
515.95 -50.0% 257.98 -0.0% 515.92 filebench.sum_operations/s
5.04 ± 7% +833.7% 47.09 ± 2% -2.9% 4.90 ± 2% filebench.sum_time_ms/op
10438 -100.0% 0.33 ±141% -100.0% 0.50 ±100% filebench.time.major_page_faults
167575 -4.1% 160660 -4.4% 160140 filebench.time.maximum_resident_set_size
7138 ± 11% +141.4% 17235 ± 3% +147.6% 17677 filebench.time.minor_page_faults
28.68 ± 9% +199.9% 86.00 ± 7% -2.4% 28.00 filebench.time.percent_of_cpu_this_job_got
2453485 ± 54% -63.0% 907380 -66.2% 830273 ± 6% cpuidle..usage
0.61 ± 38% +0.8 1.41 ± 3% +0.2 0.80 ± 4% mpstat.cpu.all.sys%
142984 ± 13% -45.6% 77725 -47.5% 75106 vmstat.system.in
34.23 ± 7% +27.9% 43.79 +27.8% 43.74 boot-time.boot
17.09 ± 11% +66.0% 28.38 +65.5% 28.28 boot-time.dhcp
2661 ± 7% +37.5% 3659 +37.2% 3651 boot-time.idle
104737 ±185% -87.8% 12762 ± 10% -89.8% 10631 ± 4% turbostat.C1


32715af441411a5a266606be08f is v6.3-rc2 with the change to restore the timeout to 1s. Thanks.

Regards
Yin, Fengwei


>
>
> =========================================================================================
> compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase/ucode:
>   gcc-11/performance/1HDD/cifs/ext4/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp9/filemicro_delete.f/filebench/0x5003302
>
> commit:
>   dcb45fd7f5 ("cifs: Do not use tcon->cfid directly, use the cfid we get from open_cached_dir")
>   5efdd9122e ("smb3: allow deferred close timeout to be configurable")
>
> dcb45fd7f501f864 5efdd9122eff772eae2feae9f0f
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>     515.95           -50.0%     257.98        filebench.sum_operations/s
>       4.81 ±  2%   +1038.4%      54.78 ±  6%  filebench.sum_time_ms/op
>      29.00 ±  8%    +212.1%      90.50 ±  3%  filebench.time.percent_of_cpu_this_job_got
>      24629            +2.7%      25297        filebench.time.voluntary_context_switches
>  7.685e+08           +19.3%  9.169e+08 ±  4%  cpuidle..time
>       2.53 ±  6%     -20.6%       2.00 ±  3%  iostat.cpu.iowait
>    1506141 ±  8%     +22.4%    1843256 ±  3%  turbostat.IRQ
>       2.00           -50.0%       1.00        vmstat.procs.b
>      21969 ±  2%      -9.5%      19885 ±  2%  vmstat.system.cs
>       3.06 ±  7%      -0.7        2.35 ±  4%  mpstat.cpu.all.iowait%
>       0.79 ±  5%      +0.5        1.27 ±  2%  mpstat.cpu.all.sys%
>       0.89 ±  3%      -0.1        0.79 ±  3%  mpstat.cpu.all.usr%
>      34.55 ± 14%     -34.8%      22.51 ± 27%  sched_debug.cfs_rq:/.removed.runnable_avg.avg
>     119.64 ±  3%     -20.0%      95.69 ± 17%  sched_debug.cfs_rq:/.removed.runnable_avg.stddev
>      34.55 ± 14%     -34.8%      22.51 ± 27%  sched_debug.cfs_rq:/.removed.util_avg.avg
>     119.64 ±  3%     -20.0%      95.69 ± 17%  sched_debug.cfs_rq:/.removed.util_avg.stddev
>       5249           +15.8%       6076        meminfo.Active
>       3866 ±  2%     +17.7%       4552        meminfo.Active(anon)
>       1382 ±  4%     +10.3%       1524 ±  4%  meminfo.Active(file)
>      69791 ± 14%     +39.8%      97553 ±  6%  meminfo.AnonHugePages
>      72709 ±  2%     +12.5%      81779 ±  3%  meminfo.Inactive(file)
>      23219           +13.5%      26352 ±  3%  meminfo.KernelStack
>     966.50 ±  2%     +17.7%       1137        proc-vmstat.nr_active_anon
>      74302            +6.3%      78977 ±  2%  proc-vmstat.nr_anon_pages
>      81133            +6.0%      85973        proc-vmstat.nr_inactive_anon
>      18172 ±  2%     +12.5%      20442 ±  3%  proc-vmstat.nr_inactive_file
>      23213           +13.5%      26348 ±  3%  proc-vmstat.nr_kernel_stack
>      17983            +2.3%      18400        proc-vmstat.nr_mapped
>       7446 ±  2%      +5.5%       7853 ±  3%  proc-vmstat.nr_shmem
>      26888            +1.6%      27306        proc-vmstat.nr_slab_reclaimable
>      47220            +3.4%      48803        proc-vmstat.nr_slab_unreclaimable
>     966.50 ±  2%     +17.7%       1137        proc-vmstat.nr_zone_active_anon
>      81133            +6.0%      85973        proc-vmstat.nr_zone_inactive_anon
>      18172 ±  2%     +12.5%      20442 ±  3%  proc-vmstat.nr_zone_inactive_file
>     361460            +2.5%     370454        proc-vmstat.numa_hit
>     946.67           +18.6%       1122        proc-vmstat.pgactivate
>     361562            +2.5%     370553        proc-vmstat.pgalloc_normal
>     187906            +4.7%     196761        proc-vmstat.pgfault
>       8189            +2.5%       8395        proc-vmstat.pgreuse
>  1.097e+09           +15.5%  1.267e+09 ±  7%  perf-stat.i.branch-instructions
>   39079265 ±  6%     -20.9%   30915354 ±  4%  perf-stat.i.branch-misses
>    5093263 ±  4%     -23.7%    3884752 ±  9%  perf-stat.i.cache-misses
>      29213           -18.7%      23764 ±  5%  perf-stat.i.context-switches
>  7.666e+09 ±  4%      +5.7%  8.106e+09 ±  2%  perf-stat.i.cpu-cycles
>       1877 ± 15%     +75.1%       3287 ± 12%  perf-stat.i.cycles-between-cache-misses
>    1735450 ±  3%     -12.9%    1512060 ±  3%  perf-stat.i.iTLB-load-misses
>       2898 ±  3%     +34.4%       3895 ±  7%  perf-stat.i.instructions-per-iTLB-miss
>       1493           -20.3%       1190 ±  7%  perf-stat.i.major-faults
>       0.09 ±  3%      +5.8%       0.09 ±  2%  perf-stat.i.metric.GHz
>      48.47 ± 11%      +8.4       56.83 ±  7%  perf-stat.i.node-store-miss-rate%
>     283426 ±  4%     -21.6%     222190 ± 10%  perf-stat.i.node-stores
>       3.57 ±  7%      -1.1        2.44 ±  6%  perf-stat.overall.branch-miss-rate%
>       1508 ±  3%     +39.8%       2108 ±  9%  perf-stat.overall.cycles-between-cache-misses
>       3022 ±  3%     +23.6%       3736 ±  5%  perf-stat.overall.instructions-per-iTLB-miss
>  9.585e+08           +18.8%  1.138e+09 ±  6%  perf-stat.ps.branch-instructions
>   34151514 ±  6%     -18.8%   27725316 ±  4%  perf-stat.ps.branch-misses
>    4450329 ±  5%     -21.7%    3486409 ±  9%  perf-stat.ps.cache-misses
>      25524           -16.4%      21333 ±  4%  perf-stat.ps.context-switches
>      77139            +2.5%      79105        perf-stat.ps.cpu-clock
>  6.704e+09 ±  4%      +8.7%  7.287e+09        perf-stat.ps.cpu-cycles
>   1.06e+09           +11.3%   1.18e+09 ±  5%  perf-stat.ps.dTLB-loads
>    1517349 ±  3%     -10.5%    1357716 ±  2%  perf-stat.ps.iTLB-load-misses
>  4.582e+09           +10.8%  5.075e+09 ±  6%  perf-stat.ps.instructions
>       1296           -18.1%       1061 ±  6%  perf-stat.ps.major-faults
>     247613 ±  4%     -19.5%     199283 ±  9%  perf-stat.ps.node-stores
>      77139            +2.5%      79105        perf-stat.ps.task-clock
>  3.697e+10           +35.3%  5.003e+10        perf-stat.total.instructions
>       8.51 ± 91%      -6.9        1.59 ±223%  perf-profile.calltrace.cycles-pp.getdents64
>       8.34 ± 83%      -6.7        1.67 ±141%  perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       6.25 ±107%      -6.2        0.00        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
>       6.25 ±107%      -6.2        0.00        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
>       6.25 ±107%      -6.2        0.00        perf-profile.calltrace.cycles-pp.open64
>       7.63 ± 84%      -6.0        1.59 ±223%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getdents64
>       7.63 ± 84%      -6.0        1.59 ±223%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
>       7.63 ± 84%      -6.0        1.59 ±223%  perf-profile.calltrace.cycles-pp.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
>       7.63 ± 84%      -6.0        1.59 ±223%  perf-profile.calltrace.cycles-pp.iterate_dir.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
>       6.26 ±115%      -5.4        0.88 ±223%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       6.26 ±115%      -5.4        0.88 ±223%  perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       6.26 ±115%      -4.6        1.67 ±141%  perf-profile.calltrace.cycles-pp.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare
>       6.26 ±115%      -4.6        1.67 ±141%  perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop
>       6.26 ±115%      -4.6        1.67 ±141%  perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
>       6.26 ±115%      -4.6        1.67 ±141%  perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
>       4.57 ± 73%      -3.8        0.76 ±223%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record
>       4.57 ± 73%      -3.8        0.76 ±223%  perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist
>       4.57 ± 73%      -3.8        0.76 ±223%  perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push
>       4.57 ± 73%      -3.8        0.76 ±223%  perf-profile.calltrace.cycles-pp.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record
>       3.38 ±103%      -3.4        0.00        perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.mmput.exit_mm.do_exit
>       3.38 ±103%      -3.4        0.00        perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.mmput.exit_mm
>       3.38 ±103%      -3.4        0.00        perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.mmput
>       3.38 ±103%      -3.4        0.00        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
>       8.51 ± 91%      -6.9        1.59 ±223%  perf-profile.children.cycles-pp.getdents64
>       8.34 ± 83%      -6.7        1.67 ±141%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
>       8.34 ± 83%      -6.7        1.67 ±141%  perf-profile.children.cycles-pp.exit_to_user_mode_loop
>       8.34 ± 83%      -6.7        1.67 ±141%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
>       6.25 ±107%      -6.2        0.00        perf-profile.children.cycles-pp.open64
>       7.63 ± 84%      -6.0        1.59 ±223%  perf-profile.children.cycles-pp.__x64_sys_getdents64
>       7.63 ± 84%      -6.0        1.59 ±223%  perf-profile.children.cycles-pp.iterate_dir
>       6.26 ±115%      -4.6        1.67 ±141%  perf-profile.children.cycles-pp.arch_do_signal_or_restart
>       6.26 ±115%      -4.6        1.67 ±141%  perf-profile.children.cycles-pp.get_signal
>       4.57 ± 73%      -3.8        0.76 ±223%  perf-profile.children.cycles-pp.perf_mmap__read_head
>       3.38 ±103%      -3.4        0.00        perf-profile.children.cycles-pp.unmap_vmas
>       3.38 ±103%      -3.4        0.00        perf-profile.children.cycles-pp.unmap_page_range
>       3.38 ±103%      -3.4        0.00        perf-profile.children.cycles-pp.zap_pmd_range
>       3.38 ±103%      -3.4        0.00        perf-profile.children.cycles-pp.zap_pte_range
>       8.54 ± 43%      +8.6       17.19 ± 38%  perf-profile.children.cycles-pp.asm_exc_page_fault
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>
>
> To reproduce:
>
>         git clone https://github.com/intel/lkp-tests.git
>         cd lkp-tests
>         sudo bin/lkp install job.yaml           # job file is attached in this email
>         bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>         sudo bin/lkp run generated-yaml-file
>
>         # if come across any failure that blocks the test,
>         # please remove ~/.lkp and /lkp dir to run from a clean state.
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> #regzbot introduced: 5efdd9122e
>
>

2023-03-15 14:27:19

by Steve French

[permalink] [raw]
Subject: Re: [smb3] 5efdd9122e: filebench.sum_operations/s -50.0% regression

Can you verify what this perf looks like with "closetime=0" and "closetime=1"

Are there differences in /proc/fs/cifs/Stats when you run the same
steps with "closetimeo=1" vs. the more recent default (5 seconds)?

On Wed, Mar 15, 2023 at 2:46 AM Yin, Fengwei <[email protected]> wrote:
>
>
>
> On 8/26/2022 10:41 AM, kernel test robot wrote:
> > Greeting,
> >
> > FYI, we noticed a -50.0% regression of filebench.sum_operations/s due to commit:
> >
> >
> > commit: 5efdd9122eff772eae2feae9f0fc0ec02d4846a3 ("smb3: allow deferred close timeout to be configurable")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > in testcase: filebench
> > on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz (Cascade Lake) with 128G memory
> > with following parameters:
> >
> > disk: 1HDD
> > fs: ext4
> > fs2: cifs
> > test: filemicro_delete.f
> > cpufreq_governor: performance
> > ucode: 0x5003302
> Please note, we still could see this regresion on v6.3-rc2. And the regression is related with
> the commit: 5efdd9122eff772eae2feae9f0fc0ec02d4846a3. This commit changed the default timeout
> value from 1s to 5s. If change the timeout back to 1s as following:
>
> diff --git a/fs/cifs/fs_context.c b/fs/cifs/fs_context.c
> index 6d13f8207e96..6b930fb0c4bd 100644
> --- a/fs/cifs/fs_context.c
> +++ b/fs/cifs/fs_context.c
> @@ -1537,7 +1537,7 @@ int smb3_init_fs_context(struct fs_context *fc)
>
> ctx->acregmax = CIFS_DEF_ACTIMEO;
> ctx->acdirmax = CIFS_DEF_ACTIMEO;
> - ctx->closetimeo = SMB3_DEF_DCLOSETIMEO;
> + ctx->closetimeo = CIFS_DEF_ACTIMEO;
>
> /* Most clients set timeout to 0, allows server to use its default */
> ctx->handle_timeout = 0; /* See MS-SMB2 spec section 2.2.14.2.12 */
>
> The regression is gone:
> dcb45fd7f501f864 v6.3-rc2 32715af441411a5a266606be08f
> ---------------- --------------------------- ---------------------------
> fail:runs %reproduction fail:runs %reproduction fail:runs
> | | | | |
> :25 0% :3 33% 1:3 last_state.booting
> :25 0% :3 33% 1:3 last_state.is_incomplete_run
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 515.95 -50.0% 257.98 -0.0% 515.92 filebench.sum_operations/s
> 5.04 ± 7% +833.7% 47.09 ± 2% -2.9% 4.90 ± 2% filebench.sum_time_ms/op
> 10438 -100.0% 0.33 ±141% -100.0% 0.50 ±100% filebench.time.major_page_faults
> 167575 -4.1% 160660 -4.4% 160140 filebench.time.maximum_resident_set_size
> 7138 ± 11% +141.4% 17235 ± 3% +147.6% 17677 filebench.time.minor_page_faults
> 28.68 ± 9% +199.9% 86.00 ± 7% -2.4% 28.00 filebench.time.percent_of_cpu_this_job_got
> 2453485 ± 54% -63.0% 907380 -66.2% 830273 ± 6% cpuidle..usage
> 0.61 ± 38% +0.8 1.41 ± 3% +0.2 0.80 ± 4% mpstat.cpu.all.sys%
> 142984 ± 13% -45.6% 77725 -47.5% 75106 vmstat.system.in
> 34.23 ± 7% +27.9% 43.79 +27.8% 43.74 boot-time.boot
> 17.09 ± 11% +66.0% 28.38 +65.5% 28.28 boot-time.dhcp
> 2661 ± 7% +37.5% 3659 +37.2% 3651 boot-time.idle
> 104737 ±185% -87.8% 12762 ± 10% -89.8% 10631 ± 4% turbostat.C1
>
>
> 32715af441411a5a266606be08f is v6.3-rc2 with the change to restore the timeout to 1s. Thanks.
>
> Regards
> Yin, Fengwei
>
>
> >
> >
> > =========================================================================================
> > compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase/ucode:
> > gcc-11/performance/1HDD/cifs/ext4/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp9/filemicro_delete.f/filebench/0x5003302
> >
> > commit:
> > dcb45fd7f5 ("cifs: Do not use tcon->cfid directly, use the cfid we get from open_cached_dir")
> > 5efdd9122e ("smb3: allow deferred close timeout to be configurable")
> >
> > dcb45fd7f501f864 5efdd9122eff772eae2feae9f0f
> > ---------------- ---------------------------
> > %stddev %change %stddev
> > \ | \
> > 515.95 -50.0% 257.98 filebench.sum_operations/s
> > 4.81 ± 2% +1038.4% 54.78 ± 6% filebench.sum_time_ms/op
> > 29.00 ± 8% +212.1% 90.50 ± 3% filebench.time.percent_of_cpu_this_job_got
> > 24629 +2.7% 25297 filebench.time.voluntary_context_switches
> > 7.685e+08 +19.3% 9.169e+08 ± 4% cpuidle..time
> > 2.53 ± 6% -20.6% 2.00 ± 3% iostat.cpu.iowait
> > 1506141 ± 8% +22.4% 1843256 ± 3% turbostat.IRQ
> > 2.00 -50.0% 1.00 vmstat.procs.b
> > 21969 ± 2% -9.5% 19885 ± 2% vmstat.system.cs
> > 3.06 ± 7% -0.7 2.35 ± 4% mpstat.cpu.all.iowait%
> > 0.79 ± 5% +0.5 1.27 ± 2% mpstat.cpu.all.sys%
> > 0.89 ± 3% -0.1 0.79 ± 3% mpstat.cpu.all.usr%
> > 34.55 ± 14% -34.8% 22.51 ± 27% sched_debug.cfs_rq:/.removed.runnable_avg.avg
> > 119.64 ± 3% -20.0% 95.69 ± 17% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
> > 34.55 ± 14% -34.8% 22.51 ± 27% sched_debug.cfs_rq:/.removed.util_avg.avg
> > 119.64 ± 3% -20.0% 95.69 ± 17% sched_debug.cfs_rq:/.removed.util_avg.stddev
> > 5249 +15.8% 6076 meminfo.Active
> > 3866 ± 2% +17.7% 4552 meminfo.Active(anon)
> > 1382 ± 4% +10.3% 1524 ± 4% meminfo.Active(file)
> > 69791 ± 14% +39.8% 97553 ± 6% meminfo.AnonHugePages
> > 72709 ± 2% +12.5% 81779 ± 3% meminfo.Inactive(file)
> > 23219 +13.5% 26352 ± 3% meminfo.KernelStack
> > 966.50 ± 2% +17.7% 1137 proc-vmstat.nr_active_anon
> > 74302 +6.3% 78977 ± 2% proc-vmstat.nr_anon_pages
> > 81133 +6.0% 85973 proc-vmstat.nr_inactive_anon
> > 18172 ± 2% +12.5% 20442 ± 3% proc-vmstat.nr_inactive_file
> > 23213 +13.5% 26348 ± 3% proc-vmstat.nr_kernel_stack
> > 17983 +2.3% 18400 proc-vmstat.nr_mapped
> > 7446 ± 2% +5.5% 7853 ± 3% proc-vmstat.nr_shmem
> > 26888 +1.6% 27306 proc-vmstat.nr_slab_reclaimable
> > 47220 +3.4% 48803 proc-vmstat.nr_slab_unreclaimable
> > 966.50 ± 2% +17.7% 1137 proc-vmstat.nr_zone_active_anon
> > 81133 +6.0% 85973 proc-vmstat.nr_zone_inactive_anon
> > 18172 ± 2% +12.5% 20442 ± 3% proc-vmstat.nr_zone_inactive_file
> > 361460 +2.5% 370454 proc-vmstat.numa_hit
> > 946.67 +18.6% 1122 proc-vmstat.pgactivate
> > 361562 +2.5% 370553 proc-vmstat.pgalloc_normal
> > 187906 +4.7% 196761 proc-vmstat.pgfault
> > 8189 +2.5% 8395 proc-vmstat.pgreuse
> > 1.097e+09 +15.5% 1.267e+09 ± 7% perf-stat.i.branch-instructions
> > 39079265 ± 6% -20.9% 30915354 ± 4% perf-stat.i.branch-misses
> > 5093263 ± 4% -23.7% 3884752 ± 9% perf-stat.i.cache-misses
> > 29213 -18.7% 23764 ± 5% perf-stat.i.context-switches
> > 7.666e+09 ± 4% +5.7% 8.106e+09 ± 2% perf-stat.i.cpu-cycles
> > 1877 ± 15% +75.1% 3287 ± 12% perf-stat.i.cycles-between-cache-misses
> > 1735450 ± 3% -12.9% 1512060 ± 3% perf-stat.i.iTLB-load-misses
> > 2898 ± 3% +34.4% 3895 ± 7% perf-stat.i.instructions-per-iTLB-miss
> > 1493 -20.3% 1190 ± 7% perf-stat.i.major-faults
> > 0.09 ± 3% +5.8% 0.09 ± 2% perf-stat.i.metric.GHz
> > 48.47 ± 11% +8.4 56.83 ± 7% perf-stat.i.node-store-miss-rate%
> > 283426 ± 4% -21.6% 222190 ± 10% perf-stat.i.node-stores
> > 3.57 ± 7% -1.1 2.44 ± 6% perf-stat.overall.branch-miss-rate%
> > 1508 ± 3% +39.8% 2108 ± 9% perf-stat.overall.cycles-between-cache-misses
> > 3022 ± 3% +23.6% 3736 ± 5% perf-stat.overall.instructions-per-iTLB-miss
> > 9.585e+08 +18.8% 1.138e+09 ± 6% perf-stat.ps.branch-instructions
> > 34151514 ± 6% -18.8% 27725316 ± 4% perf-stat.ps.branch-misses
> > 4450329 ± 5% -21.7% 3486409 ± 9% perf-stat.ps.cache-misses
> > 25524 -16.4% 21333 ± 4% perf-stat.ps.context-switches
> > 77139 +2.5% 79105 perf-stat.ps.cpu-clock
> > 6.704e+09 ± 4% +8.7% 7.287e+09 perf-stat.ps.cpu-cycles
> > 1.06e+09 +11.3% 1.18e+09 ± 5% perf-stat.ps.dTLB-loads
> > 1517349 ± 3% -10.5% 1357716 ± 2% perf-stat.ps.iTLB-load-misses
> > 4.582e+09 +10.8% 5.075e+09 ± 6% perf-stat.ps.instructions
> > 1296 -18.1% 1061 ± 6% perf-stat.ps.major-faults
> > 247613 ± 4% -19.5% 199283 ± 9% perf-stat.ps.node-stores
> > 77139 +2.5% 79105 perf-stat.ps.task-clock
> > 3.697e+10 +35.3% 5.003e+10 perf-stat.total.instructions
> > 8.51 ± 91% -6.9 1.59 ±223% perf-profile.calltrace.cycles-pp.getdents64
> > 8.34 ± 83% -6.7 1.67 ±141% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
> > 6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
> > 6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.open64
> > 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getdents64
> > 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
> > 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
> > 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.iterate_dir.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
> > 6.26 ±115% -5.4 0.88 ±223% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 6.26 ±115% -5.4 0.88 ±223% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare
> > 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop
> > 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
> > 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
> > 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record
> > 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist
> > 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push
> > 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record
> > 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.mmput.exit_mm.do_exit
> > 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.mmput.exit_mm
> > 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.mmput
> > 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
> > 8.51 ± 91% -6.9 1.59 ±223% perf-profile.children.cycles-pp.getdents64
> > 8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
> > 8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.exit_to_user_mode_loop
> > 8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> > 6.25 ±107% -6.2 0.00 perf-profile.children.cycles-pp.open64
> > 7.63 ± 84% -6.0 1.59 ±223% perf-profile.children.cycles-pp.__x64_sys_getdents64
> > 7.63 ± 84% -6.0 1.59 ±223% perf-profile.children.cycles-pp.iterate_dir
> > 6.26 ±115% -4.6 1.67 ±141% perf-profile.children.cycles-pp.arch_do_signal_or_restart
> > 6.26 ±115% -4.6 1.67 ±141% perf-profile.children.cycles-pp.get_signal
> > 4.57 ± 73% -3.8 0.76 ±223% perf-profile.children.cycles-pp.perf_mmap__read_head
> > 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.unmap_vmas
> > 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.unmap_page_range
> > 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.zap_pmd_range
> > 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.zap_pte_range
> > 8.54 ± 43% +8.6 17.19 ± 38% perf-profile.children.cycles-pp.asm_exc_page_fault
> >
> >
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot <[email protected]>
> >
> >
> > To reproduce:
> >
> > git clone https://github.com/intel/lkp-tests.git
> > cd lkp-tests
> > sudo bin/lkp install job.yaml # job file is attached in this email
> > bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> > sudo bin/lkp run generated-yaml-file
> >
> > # if come across any failure that blocks the test,
> > # please remove ~/.lkp and /lkp dir to run from a clean state.
> >
> >
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> >
> >
> > #regzbot introduced: 5efdd9122e
> >
> >



--
Thanks,

Steve

2023-03-21 09:42:37

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [smb3] 5efdd9122e: filebench.sum_operations/s -50.0% regression



On 3/15/2023 10:26 PM, Steve French wrote:
> Can you verify what this perf looks like with "closetime=0" and "closetime=1"
for commit 5efdd9122eff772eae2feae9f0fc0ec02d4846a3, if the closetimeo=0 or
closetimeo=1 is used when mount cifs, the regression is gone.

>
> Are there differences in /proc/fs/cifs/Stats when you run the same
> steps with "closetimeo=1" vs. the more recent default (5 seconds)?
I am still trying to figure out how to dump this file in our automation system.
I will send out the data once I get it work. Thanks.


Regards
Yin, Fengwei

>
> On Wed, Mar 15, 2023 at 2:46 AM Yin, Fengwei <[email protected]> wrote:
>>
>>
>>
>> On 8/26/2022 10:41 AM, kernel test robot wrote:
>>> Greeting,
>>>
>>> FYI, we noticed a -50.0% regression of filebench.sum_operations/s due to commit:
>>>
>>>
>>> commit: 5efdd9122eff772eae2feae9f0fc0ec02d4846a3 ("smb3: allow deferred close timeout to be configurable")
>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>>
>>> in testcase: filebench
>>> on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz (Cascade Lake) with 128G memory
>>> with following parameters:
>>>
>>> disk: 1HDD
>>> fs: ext4
>>> fs2: cifs
>>> test: filemicro_delete.f
>>> cpufreq_governor: performance
>>> ucode: 0x5003302
>> Please note, we still could see this regresion on v6.3-rc2. And the regression is related with
>> the commit: 5efdd9122eff772eae2feae9f0fc0ec02d4846a3. This commit changed the default timeout
>> value from 1s to 5s. If change the timeout back to 1s as following:
>>
>> diff --git a/fs/cifs/fs_context.c b/fs/cifs/fs_context.c
>> index 6d13f8207e96..6b930fb0c4bd 100644
>> --- a/fs/cifs/fs_context.c
>> +++ b/fs/cifs/fs_context.c
>> @@ -1537,7 +1537,7 @@ int smb3_init_fs_context(struct fs_context *fc)
>>
>> ctx->acregmax = CIFS_DEF_ACTIMEO;
>> ctx->acdirmax = CIFS_DEF_ACTIMEO;
>> - ctx->closetimeo = SMB3_DEF_DCLOSETIMEO;
>> + ctx->closetimeo = CIFS_DEF_ACTIMEO;
>>
>> /* Most clients set timeout to 0, allows server to use its default */
>> ctx->handle_timeout = 0; /* See MS-SMB2 spec section 2.2.14.2.12 */
>>
>> The regression is gone:
>> dcb45fd7f501f864 v6.3-rc2 32715af441411a5a266606be08f
>> ---------------- --------------------------- ---------------------------
>> fail:runs %reproduction fail:runs %reproduction fail:runs
>> | | | | |
>> :25 0% :3 33% 1:3 last_state.booting
>> :25 0% :3 33% 1:3 last_state.is_incomplete_run
>> %stddev %change %stddev %change %stddev
>> \ | \ | \
>> 515.95 -50.0% 257.98 -0.0% 515.92 filebench.sum_operations/s
>> 5.04 ± 7% +833.7% 47.09 ± 2% -2.9% 4.90 ± 2% filebench.sum_time_ms/op
>> 10438 -100.0% 0.33 ±141% -100.0% 0.50 ±100% filebench.time.major_page_faults
>> 167575 -4.1% 160660 -4.4% 160140 filebench.time.maximum_resident_set_size
>> 7138 ± 11% +141.4% 17235 ± 3% +147.6% 17677 filebench.time.minor_page_faults
>> 28.68 ± 9% +199.9% 86.00 ± 7% -2.4% 28.00 filebench.time.percent_of_cpu_this_job_got
>> 2453485 ± 54% -63.0% 907380 -66.2% 830273 ± 6% cpuidle..usage
>> 0.61 ± 38% +0.8 1.41 ± 3% +0.2 0.80 ± 4% mpstat.cpu.all.sys%
>> 142984 ± 13% -45.6% 77725 -47.5% 75106 vmstat.system.in
>> 34.23 ± 7% +27.9% 43.79 +27.8% 43.74 boot-time.boot
>> 17.09 ± 11% +66.0% 28.38 +65.5% 28.28 boot-time.dhcp
>> 2661 ± 7% +37.5% 3659 +37.2% 3651 boot-time.idle
>> 104737 ±185% -87.8% 12762 ± 10% -89.8% 10631 ± 4% turbostat.C1
>>
>>
>> 32715af441411a5a266606be08f is v6.3-rc2 with the change to restore the timeout to 1s. Thanks.
>>
>> Regards
>> Yin, Fengwei
>>
>>
>>>
>>>
>>> =========================================================================================
>>> compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase/ucode:
>>> gcc-11/performance/1HDD/cifs/ext4/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp9/filemicro_delete.f/filebench/0x5003302
>>>
>>> commit:
>>> dcb45fd7f5 ("cifs: Do not use tcon->cfid directly, use the cfid we get from open_cached_dir")
>>> 5efdd9122e ("smb3: allow deferred close timeout to be configurable")
>>>
>>> dcb45fd7f501f864 5efdd9122eff772eae2feae9f0f
>>> ---------------- ---------------------------
>>> %stddev %change %stddev
>>> \ | \
>>> 515.95 -50.0% 257.98 filebench.sum_operations/s
>>> 4.81 ± 2% +1038.4% 54.78 ± 6% filebench.sum_time_ms/op
>>> 29.00 ± 8% +212.1% 90.50 ± 3% filebench.time.percent_of_cpu_this_job_got
>>> 24629 +2.7% 25297 filebench.time.voluntary_context_switches
>>> 7.685e+08 +19.3% 9.169e+08 ± 4% cpuidle..time
>>> 2.53 ± 6% -20.6% 2.00 ± 3% iostat.cpu.iowait
>>> 1506141 ± 8% +22.4% 1843256 ± 3% turbostat.IRQ
>>> 2.00 -50.0% 1.00 vmstat.procs.b
>>> 21969 ± 2% -9.5% 19885 ± 2% vmstat.system.cs
>>> 3.06 ± 7% -0.7 2.35 ± 4% mpstat.cpu.all.iowait%
>>> 0.79 ± 5% +0.5 1.27 ± 2% mpstat.cpu.all.sys%
>>> 0.89 ± 3% -0.1 0.79 ± 3% mpstat.cpu.all.usr%
>>> 34.55 ± 14% -34.8% 22.51 ± 27% sched_debug.cfs_rq:/.removed.runnable_avg.avg
>>> 119.64 ± 3% -20.0% 95.69 ± 17% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
>>> 34.55 ± 14% -34.8% 22.51 ± 27% sched_debug.cfs_rq:/.removed.util_avg.avg
>>> 119.64 ± 3% -20.0% 95.69 ± 17% sched_debug.cfs_rq:/.removed.util_avg.stddev
>>> 5249 +15.8% 6076 meminfo.Active
>>> 3866 ± 2% +17.7% 4552 meminfo.Active(anon)
>>> 1382 ± 4% +10.3% 1524 ± 4% meminfo.Active(file)
>>> 69791 ± 14% +39.8% 97553 ± 6% meminfo.AnonHugePages
>>> 72709 ± 2% +12.5% 81779 ± 3% meminfo.Inactive(file)
>>> 23219 +13.5% 26352 ± 3% meminfo.KernelStack
>>> 966.50 ± 2% +17.7% 1137 proc-vmstat.nr_active_anon
>>> 74302 +6.3% 78977 ± 2% proc-vmstat.nr_anon_pages
>>> 81133 +6.0% 85973 proc-vmstat.nr_inactive_anon
>>> 18172 ± 2% +12.5% 20442 ± 3% proc-vmstat.nr_inactive_file
>>> 23213 +13.5% 26348 ± 3% proc-vmstat.nr_kernel_stack
>>> 17983 +2.3% 18400 proc-vmstat.nr_mapped
>>> 7446 ± 2% +5.5% 7853 ± 3% proc-vmstat.nr_shmem
>>> 26888 +1.6% 27306 proc-vmstat.nr_slab_reclaimable
>>> 47220 +3.4% 48803 proc-vmstat.nr_slab_unreclaimable
>>> 966.50 ± 2% +17.7% 1137 proc-vmstat.nr_zone_active_anon
>>> 81133 +6.0% 85973 proc-vmstat.nr_zone_inactive_anon
>>> 18172 ± 2% +12.5% 20442 ± 3% proc-vmstat.nr_zone_inactive_file
>>> 361460 +2.5% 370454 proc-vmstat.numa_hit
>>> 946.67 +18.6% 1122 proc-vmstat.pgactivate
>>> 361562 +2.5% 370553 proc-vmstat.pgalloc_normal
>>> 187906 +4.7% 196761 proc-vmstat.pgfault
>>> 8189 +2.5% 8395 proc-vmstat.pgreuse
>>> 1.097e+09 +15.5% 1.267e+09 ± 7% perf-stat.i.branch-instructions
>>> 39079265 ± 6% -20.9% 30915354 ± 4% perf-stat.i.branch-misses
>>> 5093263 ± 4% -23.7% 3884752 ± 9% perf-stat.i.cache-misses
>>> 29213 -18.7% 23764 ± 5% perf-stat.i.context-switches
>>> 7.666e+09 ± 4% +5.7% 8.106e+09 ± 2% perf-stat.i.cpu-cycles
>>> 1877 ± 15% +75.1% 3287 ± 12% perf-stat.i.cycles-between-cache-misses
>>> 1735450 ± 3% -12.9% 1512060 ± 3% perf-stat.i.iTLB-load-misses
>>> 2898 ± 3% +34.4% 3895 ± 7% perf-stat.i.instructions-per-iTLB-miss
>>> 1493 -20.3% 1190 ± 7% perf-stat.i.major-faults
>>> 0.09 ± 3% +5.8% 0.09 ± 2% perf-stat.i.metric.GHz
>>> 48.47 ± 11% +8.4 56.83 ± 7% perf-stat.i.node-store-miss-rate%
>>> 283426 ± 4% -21.6% 222190 ± 10% perf-stat.i.node-stores
>>> 3.57 ± 7% -1.1 2.44 ± 6% perf-stat.overall.branch-miss-rate%
>>> 1508 ± 3% +39.8% 2108 ± 9% perf-stat.overall.cycles-between-cache-misses
>>> 3022 ± 3% +23.6% 3736 ± 5% perf-stat.overall.instructions-per-iTLB-miss
>>> 9.585e+08 +18.8% 1.138e+09 ± 6% perf-stat.ps.branch-instructions
>>> 34151514 ± 6% -18.8% 27725316 ± 4% perf-stat.ps.branch-misses
>>> 4450329 ± 5% -21.7% 3486409 ± 9% perf-stat.ps.cache-misses
>>> 25524 -16.4% 21333 ± 4% perf-stat.ps.context-switches
>>> 77139 +2.5% 79105 perf-stat.ps.cpu-clock
>>> 6.704e+09 ± 4% +8.7% 7.287e+09 perf-stat.ps.cpu-cycles
>>> 1.06e+09 +11.3% 1.18e+09 ± 5% perf-stat.ps.dTLB-loads
>>> 1517349 ± 3% -10.5% 1357716 ± 2% perf-stat.ps.iTLB-load-misses
>>> 4.582e+09 +10.8% 5.075e+09 ± 6% perf-stat.ps.instructions
>>> 1296 -18.1% 1061 ± 6% perf-stat.ps.major-faults
>>> 247613 ± 4% -19.5% 199283 ± 9% perf-stat.ps.node-stores
>>> 77139 +2.5% 79105 perf-stat.ps.task-clock
>>> 3.697e+10 +35.3% 5.003e+10 perf-stat.total.instructions
>>> 8.51 ± 91% -6.9 1.59 ±223% perf-profile.calltrace.cycles-pp.getdents64
>>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
>>> 6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
>>> 6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.open64
>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getdents64
>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.iterate_dir.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
>>> 6.26 ±115% -5.4 0.88 ±223% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 6.26 ±115% -5.4 0.88 ±223% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare
>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop
>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
>>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record
>>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist
>>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push
>>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record
>>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.mmput.exit_mm.do_exit
>>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.mmput.exit_mm
>>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.mmput
>>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
>>> 8.51 ± 91% -6.9 1.59 ±223% perf-profile.children.cycles-pp.getdents64
>>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
>>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.exit_to_user_mode_loop
>>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
>>> 6.25 ±107% -6.2 0.00 perf-profile.children.cycles-pp.open64
>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.children.cycles-pp.__x64_sys_getdents64
>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.children.cycles-pp.iterate_dir
>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.children.cycles-pp.arch_do_signal_or_restart
>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.children.cycles-pp.get_signal
>>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.children.cycles-pp.perf_mmap__read_head
>>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.unmap_vmas
>>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.unmap_page_range
>>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.zap_pmd_range
>>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.zap_pte_range
>>> 8.54 ± 43% +8.6 17.19 ± 38% perf-profile.children.cycles-pp.asm_exc_page_fault
>>>
>>>
>>> If you fix the issue, kindly add following tag
>>> Reported-by: kernel test robot <[email protected]>
>>>
>>>
>>> To reproduce:
>>>
>>> git clone https://github.com/intel/lkp-tests.git
>>> cd lkp-tests
>>> sudo bin/lkp install job.yaml # job file is attached in this email
>>> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>>> sudo bin/lkp run generated-yaml-file
>>>
>>> # if come across any failure that blocks the test,
>>> # please remove ~/.lkp and /lkp dir to run from a clean state.
>>>
>>>
>>> Disclaimer:
>>> Results have been estimated based on internal Intel analysis and are provided
>>> for informational purposes only. Any difference in system hardware or software
>>> design or configuration may affect actual performance.
>>>
>>>
>>> #regzbot introduced: 5efdd9122e
>>>
>>>
>
>
>

2023-03-23 07:49:19

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [smb3] 5efdd9122e: filebench.sum_operations/s -50.0% regression

On 3/15/23 22:26, Steve French wrote:
> Can you verify what this perf looks like with "closetime=0" and "closetime=1"
>
> Are there differences in /proc/fs/cifs/Stats when you run the same
> steps with "closetimeo=1" vs. the more recent default (5 seconds)?
The dump of /proc/fs/cifs/Stats are attached. You can tell which
closetimeo is used from the file name. Thanks.


Regards
Yin, Fengwei

>
> On Wed, Mar 15, 2023 at 2:46 AM Yin, Fengwei <[email protected]> wrote:
>>
>>
>>
>> On 8/26/2022 10:41 AM, kernel test robot wrote:
>>> Greeting,
>>>
>>> FYI, we noticed a -50.0% regression of filebench.sum_operations/s due to commit:
>>>
>>>
>>> commit: 5efdd9122eff772eae2feae9f0fc0ec02d4846a3 ("smb3: allow deferred close timeout to be configurable")
>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>>
>>> in testcase: filebench
>>> on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz (Cascade Lake) with 128G memory
>>> with following parameters:
>>>
>>> disk: 1HDD
>>> fs: ext4
>>> fs2: cifs
>>> test: filemicro_delete.f
>>> cpufreq_governor: performance
>>> ucode: 0x5003302
>> Please note, we still could see this regresion on v6.3-rc2. And the regression is related with
>> the commit: 5efdd9122eff772eae2feae9f0fc0ec02d4846a3. This commit changed the default timeout
>> value from 1s to 5s. If change the timeout back to 1s as following:
>>
>> diff --git a/fs/cifs/fs_context.c b/fs/cifs/fs_context.c
>> index 6d13f8207e96..6b930fb0c4bd 100644
>> --- a/fs/cifs/fs_context.c
>> +++ b/fs/cifs/fs_context.c
>> @@ -1537,7 +1537,7 @@ int smb3_init_fs_context(struct fs_context *fc)
>>
>> ctx->acregmax = CIFS_DEF_ACTIMEO;
>> ctx->acdirmax = CIFS_DEF_ACTIMEO;
>> - ctx->closetimeo = SMB3_DEF_DCLOSETIMEO;
>> + ctx->closetimeo = CIFS_DEF_ACTIMEO;
>>
>> /* Most clients set timeout to 0, allows server to use its default */
>> ctx->handle_timeout = 0; /* See MS-SMB2 spec section 2.2.14.2.12 */
>>
>> The regression is gone:
>> dcb45fd7f501f864 v6.3-rc2 32715af441411a5a266606be08f
>> ---------------- --------------------------- ---------------------------
>> fail:runs %reproduction fail:runs %reproduction fail:runs
>> | | | | |
>> :25 0% :3 33% 1:3 last_state.booting
>> :25 0% :3 33% 1:3 last_state.is_incomplete_run
>> %stddev %change %stddev %change %stddev
>> \ | \ | \
>> 515.95 -50.0% 257.98 -0.0% 515.92 filebench.sum_operations/s
>> 5.04 ± 7% +833.7% 47.09 ± 2% -2.9% 4.90 ± 2% filebench.sum_time_ms/op
>> 10438 -100.0% 0.33 ±141% -100.0% 0.50 ±100% filebench.time.major_page_faults
>> 167575 -4.1% 160660 -4.4% 160140 filebench.time.maximum_resident_set_size
>> 7138 ± 11% +141.4% 17235 ± 3% +147.6% 17677 filebench.time.minor_page_faults
>> 28.68 ± 9% +199.9% 86.00 ± 7% -2.4% 28.00 filebench.time.percent_of_cpu_this_job_got
>> 2453485 ± 54% -63.0% 907380 -66.2% 830273 ± 6% cpuidle..usage
>> 0.61 ± 38% +0.8 1.41 ± 3% +0.2 0.80 ± 4% mpstat.cpu.all.sys%
>> 142984 ± 13% -45.6% 77725 -47.5% 75106 vmstat.system.in
>> 34.23 ± 7% +27.9% 43.79 +27.8% 43.74 boot-time.boot
>> 17.09 ± 11% +66.0% 28.38 +65.5% 28.28 boot-time.dhcp
>> 2661 ± 7% +37.5% 3659 +37.2% 3651 boot-time.idle
>> 104737 ±185% -87.8% 12762 ± 10% -89.8% 10631 ± 4% turbostat.C1
>>
>>
>> 32715af441411a5a266606be08f is v6.3-rc2 with the change to restore the timeout to 1s. Thanks.
>>
>> Regards
>> Yin, Fengwei
>>
>>
>>>
>>>
>>> =========================================================================================
>>> compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase/ucode:
>>> gcc-11/performance/1HDD/cifs/ext4/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp9/filemicro_delete.f/filebench/0x5003302
>>>
>>> commit:
>>> dcb45fd7f5 ("cifs: Do not use tcon->cfid directly, use the cfid we get from open_cached_dir")
>>> 5efdd9122e ("smb3: allow deferred close timeout to be configurable")
>>>
>>> dcb45fd7f501f864 5efdd9122eff772eae2feae9f0f
>>> ---------------- ---------------------------
>>> %stddev %change %stddev
>>> \ | \
>>> 515.95 -50.0% 257.98 filebench.sum_operations/s
>>> 4.81 ± 2% +1038.4% 54.78 ± 6% filebench.sum_time_ms/op
>>> 29.00 ± 8% +212.1% 90.50 ± 3% filebench.time.percent_of_cpu_this_job_got
>>> 24629 +2.7% 25297 filebench.time.voluntary_context_switches
>>> 7.685e+08 +19.3% 9.169e+08 ± 4% cpuidle..time
>>> 2.53 ± 6% -20.6% 2.00 ± 3% iostat.cpu.iowait
>>> 1506141 ± 8% +22.4% 1843256 ± 3% turbostat.IRQ
>>> 2.00 -50.0% 1.00 vmstat.procs.b
>>> 21969 ± 2% -9.5% 19885 ± 2% vmstat.system.cs
>>> 3.06 ± 7% -0.7 2.35 ± 4% mpstat.cpu.all.iowait%
>>> 0.79 ± 5% +0.5 1.27 ± 2% mpstat.cpu.all.sys%
>>> 0.89 ± 3% -0.1 0.79 ± 3% mpstat.cpu.all.usr%
>>> 34.55 ± 14% -34.8% 22.51 ± 27% sched_debug.cfs_rq:/.removed.runnable_avg.avg
>>> 119.64 ± 3% -20.0% 95.69 ± 17% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
>>> 34.55 ± 14% -34.8% 22.51 ± 27% sched_debug.cfs_rq:/.removed.util_avg.avg
>>> 119.64 ± 3% -20.0% 95.69 ± 17% sched_debug.cfs_rq:/.removed.util_avg.stddev
>>> 5249 +15.8% 6076 meminfo.Active
>>> 3866 ± 2% +17.7% 4552 meminfo.Active(anon)
>>> 1382 ± 4% +10.3% 1524 ± 4% meminfo.Active(file)
>>> 69791 ± 14% +39.8% 97553 ± 6% meminfo.AnonHugePages
>>> 72709 ± 2% +12.5% 81779 ± 3% meminfo.Inactive(file)
>>> 23219 +13.5% 26352 ± 3% meminfo.KernelStack
>>> 966.50 ± 2% +17.7% 1137 proc-vmstat.nr_active_anon
>>> 74302 +6.3% 78977 ± 2% proc-vmstat.nr_anon_pages
>>> 81133 +6.0% 85973 proc-vmstat.nr_inactive_anon
>>> 18172 ± 2% +12.5% 20442 ± 3% proc-vmstat.nr_inactive_file
>>> 23213 +13.5% 26348 ± 3% proc-vmstat.nr_kernel_stack
>>> 17983 +2.3% 18400 proc-vmstat.nr_mapped
>>> 7446 ± 2% +5.5% 7853 ± 3% proc-vmstat.nr_shmem
>>> 26888 +1.6% 27306 proc-vmstat.nr_slab_reclaimable
>>> 47220 +3.4% 48803 proc-vmstat.nr_slab_unreclaimable
>>> 966.50 ± 2% +17.7% 1137 proc-vmstat.nr_zone_active_anon
>>> 81133 +6.0% 85973 proc-vmstat.nr_zone_inactive_anon
>>> 18172 ± 2% +12.5% 20442 ± 3% proc-vmstat.nr_zone_inactive_file
>>> 361460 +2.5% 370454 proc-vmstat.numa_hit
>>> 946.67 +18.6% 1122 proc-vmstat.pgactivate
>>> 361562 +2.5% 370553 proc-vmstat.pgalloc_normal
>>> 187906 +4.7% 196761 proc-vmstat.pgfault
>>> 8189 +2.5% 8395 proc-vmstat.pgreuse
>>> 1.097e+09 +15.5% 1.267e+09 ± 7% perf-stat.i.branch-instructions
>>> 39079265 ± 6% -20.9% 30915354 ± 4% perf-stat.i.branch-misses
>>> 5093263 ± 4% -23.7% 3884752 ± 9% perf-stat.i.cache-misses
>>> 29213 -18.7% 23764 ± 5% perf-stat.i.context-switches
>>> 7.666e+09 ± 4% +5.7% 8.106e+09 ± 2% perf-stat.i.cpu-cycles
>>> 1877 ± 15% +75.1% 3287 ± 12% perf-stat.i.cycles-between-cache-misses
>>> 1735450 ± 3% -12.9% 1512060 ± 3% perf-stat.i.iTLB-load-misses
>>> 2898 ± 3% +34.4% 3895 ± 7% perf-stat.i.instructions-per-iTLB-miss
>>> 1493 -20.3% 1190 ± 7% perf-stat.i.major-faults
>>> 0.09 ± 3% +5.8% 0.09 ± 2% perf-stat.i.metric.GHz
>>> 48.47 ± 11% +8.4 56.83 ± 7% perf-stat.i.node-store-miss-rate%
>>> 283426 ± 4% -21.6% 222190 ± 10% perf-stat.i.node-stores
>>> 3.57 ± 7% -1.1 2.44 ± 6% perf-stat.overall.branch-miss-rate%
>>> 1508 ± 3% +39.8% 2108 ± 9% perf-stat.overall.cycles-between-cache-misses
>>> 3022 ± 3% +23.6% 3736 ± 5% perf-stat.overall.instructions-per-iTLB-miss
>>> 9.585e+08 +18.8% 1.138e+09 ± 6% perf-stat.ps.branch-instructions
>>> 34151514 ± 6% -18.8% 27725316 ± 4% perf-stat.ps.branch-misses
>>> 4450329 ± 5% -21.7% 3486409 ± 9% perf-stat.ps.cache-misses
>>> 25524 -16.4% 21333 ± 4% perf-stat.ps.context-switches
>>> 77139 +2.5% 79105 perf-stat.ps.cpu-clock
>>> 6.704e+09 ± 4% +8.7% 7.287e+09 perf-stat.ps.cpu-cycles
>>> 1.06e+09 +11.3% 1.18e+09 ± 5% perf-stat.ps.dTLB-loads
>>> 1517349 ± 3% -10.5% 1357716 ± 2% perf-stat.ps.iTLB-load-misses
>>> 4.582e+09 +10.8% 5.075e+09 ± 6% perf-stat.ps.instructions
>>> 1296 -18.1% 1061 ± 6% perf-stat.ps.major-faults
>>> 247613 ± 4% -19.5% 199283 ± 9% perf-stat.ps.node-stores
>>> 77139 +2.5% 79105 perf-stat.ps.task-clock
>>> 3.697e+10 +35.3% 5.003e+10 perf-stat.total.instructions
>>> 8.51 ± 91% -6.9 1.59 ±223% perf-profile.calltrace.cycles-pp.getdents64
>>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
>>> 6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
>>> 6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.open64
>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getdents64
>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.iterate_dir.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
>>> 6.26 ±115% -5.4 0.88 ±223% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 6.26 ±115% -5.4 0.88 ±223% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare
>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop
>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
>>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record
>>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist
>>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push
>>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record
>>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.mmput.exit_mm.do_exit
>>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.mmput.exit_mm
>>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.mmput
>>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
>>> 8.51 ± 91% -6.9 1.59 ±223% perf-profile.children.cycles-pp.getdents64
>>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
>>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.exit_to_user_mode_loop
>>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
>>> 6.25 ±107% -6.2 0.00 perf-profile.children.cycles-pp.open64
>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.children.cycles-pp.__x64_sys_getdents64
>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.children.cycles-pp.iterate_dir
>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.children.cycles-pp.arch_do_signal_or_restart
>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.children.cycles-pp.get_signal
>>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.children.cycles-pp.perf_mmap__read_head
>>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.unmap_vmas
>>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.unmap_page_range
>>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.zap_pmd_range
>>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.zap_pte_range
>>> 8.54 ± 43% +8.6 17.19 ± 38% perf-profile.children.cycles-pp.asm_exc_page_fault
>>>
>>>
>>> If you fix the issue, kindly add following tag
>>> Reported-by: kernel test robot <[email protected]>
>>>
>>>
>>> To reproduce:
>>>
>>> git clone https://github.com/intel/lkp-tests.git
>>> cd lkp-tests
>>> sudo bin/lkp install job.yaml # job file is attached in this email
>>> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>>> sudo bin/lkp run generated-yaml-file
>>>
>>> # if come across any failure that blocks the test,
>>> # please remove ~/.lkp and /lkp dir to run from a clean state.
>>>
>>>
>>> Disclaimer:
>>> Results have been estimated based on internal Intel analysis and are provided
>>> for informational purposes only. Any difference in system hardware or software
>>> design or configuration may affect actual performance.
>>>
>>>
>>> #regzbot introduced: 5efdd9122e
>>>
>>>
>
>
>


Attachments:
cifs_stats_timeo1s.txt (1.31 kB)
cifs_stats_timeo5s.txt (1.31 kB)
Download all attachments

2023-03-23 20:37:17

by Steve French

[permalink] [raw]
Subject: Re: [smb3] 5efdd9122e: filebench.sum_operations/s -50.0% regression

This is VERY useful data. Thank you. It show the SMB3 CLOSE
operation on the wire takes twice as long to complete with
closetimeo=5 rather than 1. Can you do the same stats for
closetimeo=0

On Thu, Mar 23, 2023 at 2:48 AM Yin Fengwei <[email protected]> wrote:
>
> On 3/15/23 22:26, Steve French wrote:
> > Can you verify what this perf looks like with "closetime=0" and "closetime=1"
> >
> > Are there differences in /proc/fs/cifs/Stats when you run the same
> > steps with "closetimeo=1" vs. the more recent default (5 seconds)?
> The dump of /proc/fs/cifs/Stats are attached. You can tell which
> closetimeo is used from the file name. Thanks.
>
>
> Regards
> Yin, Fengwei
>
> >
> > On Wed, Mar 15, 2023 at 2:46 AM Yin, Fengwei <[email protected]> wrote:
> >>
> >>
> >>
> >> On 8/26/2022 10:41 AM, kernel test robot wrote:
> >>> Greeting,
> >>>
> >>> FYI, we noticed a -50.0% regression of filebench.sum_operations/s due to commit:
> >>>
> >>>
> >>> commit: 5efdd9122eff772eae2feae9f0fc0ec02d4846a3 ("smb3: allow deferred close timeout to be configurable")
> >>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >>>
> >>> in testcase: filebench
> >>> on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz (Cascade Lake) with 128G memory
> >>> with following parameters:
> >>>
> >>> disk: 1HDD
> >>> fs: ext4
> >>> fs2: cifs
> >>> test: filemicro_delete.f
> >>> cpufreq_governor: performance
> >>> ucode: 0x5003302
> >> Please note, we still could see this regresion on v6.3-rc2. And the regression is related with
> >> the commit: 5efdd9122eff772eae2feae9f0fc0ec02d4846a3. This commit changed the default timeout
> >> value from 1s to 5s. If change the timeout back to 1s as following:
> >>
> >> diff --git a/fs/cifs/fs_context.c b/fs/cifs/fs_context.c
> >> index 6d13f8207e96..6b930fb0c4bd 100644
> >> --- a/fs/cifs/fs_context.c
> >> +++ b/fs/cifs/fs_context.c
> >> @@ -1537,7 +1537,7 @@ int smb3_init_fs_context(struct fs_context *fc)
> >>
> >> ctx->acregmax = CIFS_DEF_ACTIMEO;
> >> ctx->acdirmax = CIFS_DEF_ACTIMEO;
> >> - ctx->closetimeo = SMB3_DEF_DCLOSETIMEO;
> >> + ctx->closetimeo = CIFS_DEF_ACTIMEO;
> >>
> >> /* Most clients set timeout to 0, allows server to use its default */
> >> ctx->handle_timeout = 0; /* See MS-SMB2 spec section 2.2.14.2.12 */
> >>
> >> The regression is gone:
> >> dcb45fd7f501f864 v6.3-rc2 32715af441411a5a266606be08f
> >> ---------------- --------------------------- ---------------------------
> >> fail:runs %reproduction fail:runs %reproduction fail:runs
> >> | | | | |
> >> :25 0% :3 33% 1:3 last_state.booting
> >> :25 0% :3 33% 1:3 last_state.is_incomplete_run
> >> %stddev %change %stddev %change %stddev
> >> \ | \ | \
> >> 515.95 -50.0% 257.98 -0.0% 515.92 filebench.sum_operations/s
> >> 5.04 ± 7% +833.7% 47.09 ± 2% -2.9% 4.90 ± 2% filebench.sum_time_ms/op
> >> 10438 -100.0% 0.33 ±141% -100.0% 0.50 ±100% filebench.time.major_page_faults
> >> 167575 -4.1% 160660 -4.4% 160140 filebench.time.maximum_resident_set_size
> >> 7138 ± 11% +141.4% 17235 ± 3% +147.6% 17677 filebench.time.minor_page_faults
> >> 28.68 ± 9% +199.9% 86.00 ± 7% -2.4% 28.00 filebench.time.percent_of_cpu_this_job_got
> >> 2453485 ± 54% -63.0% 907380 -66.2% 830273 ± 6% cpuidle..usage
> >> 0.61 ± 38% +0.8 1.41 ± 3% +0.2 0.80 ± 4% mpstat.cpu.all.sys%
> >> 142984 ± 13% -45.6% 77725 -47.5% 75106 vmstat.system.in
> >> 34.23 ± 7% +27.9% 43.79 +27.8% 43.74 boot-time.boot
> >> 17.09 ± 11% +66.0% 28.38 +65.5% 28.28 boot-time.dhcp
> >> 2661 ± 7% +37.5% 3659 +37.2% 3651 boot-time.idle
> >> 104737 ±185% -87.8% 12762 ± 10% -89.8% 10631 ± 4% turbostat.C1
> >>
> >>
> >> 32715af441411a5a266606be08f is v6.3-rc2 with the change to restore the timeout to 1s. Thanks.
> >>
> >> Regards
> >> Yin, Fengwei
> >>
> >>
> >>>
> >>>
> >>> =========================================================================================
> >>> compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase/ucode:
> >>> gcc-11/performance/1HDD/cifs/ext4/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp9/filemicro_delete.f/filebench/0x5003302
> >>>
> >>> commit:
> >>> dcb45fd7f5 ("cifs: Do not use tcon->cfid directly, use the cfid we get from open_cached_dir")
> >>> 5efdd9122e ("smb3: allow deferred close timeout to be configurable")
> >>>
> >>> dcb45fd7f501f864 5efdd9122eff772eae2feae9f0f
> >>> ---------------- ---------------------------
> >>> %stddev %change %stddev
> >>> \ | \
> >>> 515.95 -50.0% 257.98 filebench.sum_operations/s
> >>> 4.81 ± 2% +1038.4% 54.78 ± 6% filebench.sum_time_ms/op
> >>> 29.00 ± 8% +212.1% 90.50 ± 3% filebench.time.percent_of_cpu_this_job_got
> >>> 24629 +2.7% 25297 filebench.time.voluntary_context_switches
> >>> 7.685e+08 +19.3% 9.169e+08 ± 4% cpuidle..time
> >>> 2.53 ± 6% -20.6% 2.00 ± 3% iostat.cpu.iowait
> >>> 1506141 ± 8% +22.4% 1843256 ± 3% turbostat.IRQ
> >>> 2.00 -50.0% 1.00 vmstat.procs.b
> >>> 21969 ± 2% -9.5% 19885 ± 2% vmstat.system.cs
> >>> 3.06 ± 7% -0.7 2.35 ± 4% mpstat.cpu.all.iowait%
> >>> 0.79 ± 5% +0.5 1.27 ± 2% mpstat.cpu.all.sys%
> >>> 0.89 ± 3% -0.1 0.79 ± 3% mpstat.cpu.all.usr%
> >>> 34.55 ± 14% -34.8% 22.51 ± 27% sched_debug.cfs_rq:/.removed.runnable_avg.avg
> >>> 119.64 ± 3% -20.0% 95.69 ± 17% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
> >>> 34.55 ± 14% -34.8% 22.51 ± 27% sched_debug.cfs_rq:/.removed.util_avg.avg
> >>> 119.64 ± 3% -20.0% 95.69 ± 17% sched_debug.cfs_rq:/.removed.util_avg.stddev
> >>> 5249 +15.8% 6076 meminfo.Active
> >>> 3866 ± 2% +17.7% 4552 meminfo.Active(anon)
> >>> 1382 ± 4% +10.3% 1524 ± 4% meminfo.Active(file)
> >>> 69791 ± 14% +39.8% 97553 ± 6% meminfo.AnonHugePages
> >>> 72709 ± 2% +12.5% 81779 ± 3% meminfo.Inactive(file)
> >>> 23219 +13.5% 26352 ± 3% meminfo.KernelStack
> >>> 966.50 ± 2% +17.7% 1137 proc-vmstat.nr_active_anon
> >>> 74302 +6.3% 78977 ± 2% proc-vmstat.nr_anon_pages
> >>> 81133 +6.0% 85973 proc-vmstat.nr_inactive_anon
> >>> 18172 ± 2% +12.5% 20442 ± 3% proc-vmstat.nr_inactive_file
> >>> 23213 +13.5% 26348 ± 3% proc-vmstat.nr_kernel_stack
> >>> 17983 +2.3% 18400 proc-vmstat.nr_mapped
> >>> 7446 ± 2% +5.5% 7853 ± 3% proc-vmstat.nr_shmem
> >>> 26888 +1.6% 27306 proc-vmstat.nr_slab_reclaimable
> >>> 47220 +3.4% 48803 proc-vmstat.nr_slab_unreclaimable
> >>> 966.50 ± 2% +17.7% 1137 proc-vmstat.nr_zone_active_anon
> >>> 81133 +6.0% 85973 proc-vmstat.nr_zone_inactive_anon
> >>> 18172 ± 2% +12.5% 20442 ± 3% proc-vmstat.nr_zone_inactive_file
> >>> 361460 +2.5% 370454 proc-vmstat.numa_hit
> >>> 946.67 +18.6% 1122 proc-vmstat.pgactivate
> >>> 361562 +2.5% 370553 proc-vmstat.pgalloc_normal
> >>> 187906 +4.7% 196761 proc-vmstat.pgfault
> >>> 8189 +2.5% 8395 proc-vmstat.pgreuse
> >>> 1.097e+09 +15.5% 1.267e+09 ± 7% perf-stat.i.branch-instructions
> >>> 39079265 ± 6% -20.9% 30915354 ± 4% perf-stat.i.branch-misses
> >>> 5093263 ± 4% -23.7% 3884752 ± 9% perf-stat.i.cache-misses
> >>> 29213 -18.7% 23764 ± 5% perf-stat.i.context-switches
> >>> 7.666e+09 ± 4% +5.7% 8.106e+09 ± 2% perf-stat.i.cpu-cycles
> >>> 1877 ± 15% +75.1% 3287 ± 12% perf-stat.i.cycles-between-cache-misses
> >>> 1735450 ± 3% -12.9% 1512060 ± 3% perf-stat.i.iTLB-load-misses
> >>> 2898 ± 3% +34.4% 3895 ± 7% perf-stat.i.instructions-per-iTLB-miss
> >>> 1493 -20.3% 1190 ± 7% perf-stat.i.major-faults
> >>> 0.09 ± 3% +5.8% 0.09 ± 2% perf-stat.i.metric.GHz
> >>> 48.47 ± 11% +8.4 56.83 ± 7% perf-stat.i.node-store-miss-rate%
> >>> 283426 ± 4% -21.6% 222190 ± 10% perf-stat.i.node-stores
> >>> 3.57 ± 7% -1.1 2.44 ± 6% perf-stat.overall.branch-miss-rate%
> >>> 1508 ± 3% +39.8% 2108 ± 9% perf-stat.overall.cycles-between-cache-misses
> >>> 3022 ± 3% +23.6% 3736 ± 5% perf-stat.overall.instructions-per-iTLB-miss
> >>> 9.585e+08 +18.8% 1.138e+09 ± 6% perf-stat.ps.branch-instructions
> >>> 34151514 ± 6% -18.8% 27725316 ± 4% perf-stat.ps.branch-misses
> >>> 4450329 ± 5% -21.7% 3486409 ± 9% perf-stat.ps.cache-misses
> >>> 25524 -16.4% 21333 ± 4% perf-stat.ps.context-switches
> >>> 77139 +2.5% 79105 perf-stat.ps.cpu-clock
> >>> 6.704e+09 ± 4% +8.7% 7.287e+09 perf-stat.ps.cpu-cycles
> >>> 1.06e+09 +11.3% 1.18e+09 ± 5% perf-stat.ps.dTLB-loads
> >>> 1517349 ± 3% -10.5% 1357716 ± 2% perf-stat.ps.iTLB-load-misses
> >>> 4.582e+09 +10.8% 5.075e+09 ± 6% perf-stat.ps.instructions
> >>> 1296 -18.1% 1061 ± 6% perf-stat.ps.major-faults
> >>> 247613 ± 4% -19.5% 199283 ± 9% perf-stat.ps.node-stores
> >>> 77139 +2.5% 79105 perf-stat.ps.task-clock
> >>> 3.697e+10 +35.3% 5.003e+10 perf-stat.total.instructions
> >>> 8.51 ± 91% -6.9 1.59 ±223% perf-profile.calltrace.cycles-pp.getdents64
> >>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >>> 6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
> >>> 6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
> >>> 6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.open64
> >>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getdents64
> >>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
> >>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
> >>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.iterate_dir.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
> >>> 6.26 ±115% -5.4 0.88 ±223% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >>> 6.26 ±115% -5.4 0.88 ±223% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare
> >>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop
> >>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
> >>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
> >>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record
> >>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist
> >>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push
> >>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record
> >>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.mmput.exit_mm.do_exit
> >>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.mmput.exit_mm
> >>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.mmput
> >>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
> >>> 8.51 ± 91% -6.9 1.59 ±223% perf-profile.children.cycles-pp.getdents64
> >>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
> >>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.exit_to_user_mode_loop
> >>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> >>> 6.25 ±107% -6.2 0.00 perf-profile.children.cycles-pp.open64
> >>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.children.cycles-pp.__x64_sys_getdents64
> >>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.children.cycles-pp.iterate_dir
> >>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.children.cycles-pp.arch_do_signal_or_restart
> >>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.children.cycles-pp.get_signal
> >>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.children.cycles-pp.perf_mmap__read_head
> >>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.unmap_vmas
> >>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.unmap_page_range
> >>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.zap_pmd_range
> >>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.zap_pte_range
> >>> 8.54 ± 43% +8.6 17.19 ± 38% perf-profile.children.cycles-pp.asm_exc_page_fault
> >>>
> >>>
> >>> If you fix the issue, kindly add following tag
> >>> Reported-by: kernel test robot <[email protected]>
> >>>
> >>>
> >>> To reproduce:
> >>>
> >>> git clone https://github.com/intel/lkp-tests.git
> >>> cd lkp-tests
> >>> sudo bin/lkp install job.yaml # job file is attached in this email
> >>> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> >>> sudo bin/lkp run generated-yaml-file
> >>>
> >>> # if come across any failure that blocks the test,
> >>> # please remove ~/.lkp and /lkp dir to run from a clean state.
> >>>
> >>>
> >>> Disclaimer:
> >>> Results have been estimated based on internal Intel analysis and are provided
> >>> for informational purposes only. Any difference in system hardware or software
> >>> design or configuration may affect actual performance.
> >>>
> >>>
> >>> #regzbot introduced: 5efdd9122e
> >>>
> >>>
> >
> >
> >



--
Thanks,

Steve

2023-03-23 20:45:54

by Steve French

[permalink] [raw]
Subject: Re: [smb3] 5efdd9122e: filebench.sum_operations/s -50.0% regression

Patch attached to return default back to 1 second from 5 seconds
(Bharath's recent closetimeo on lease break patch may also help)

Let me know if you see any problems with it

On Thu, Mar 23, 2023 at 2:48 AM Yin Fengwei <[email protected]> wrote:
>
> On 3/15/23 22:26, Steve French wrote:
> > Can you verify what this perf looks like with "closetime=0" and "closetime=1"
> >
> > Are there differences in /proc/fs/cifs/Stats when you run the same
> > steps with "closetimeo=1" vs. the more recent default (5 seconds)?
> The dump of /proc/fs/cifs/Stats are attached. You can tell which
> closetimeo is used from the file name. Thanks.
>
>
> Regards
> Yin, Fengwei
>
> >
> > On Wed, Mar 15, 2023 at 2:46 AM Yin, Fengwei <[email protected]> wrote:
> >>
> >>
> >>
> >> On 8/26/2022 10:41 AM, kernel test robot wrote:
> >>> Greeting,
> >>>
> >>> FYI, we noticed a -50.0% regression of filebench.sum_operations/s due to commit:
> >>>
> >>>
> >>> commit: 5efdd9122eff772eae2feae9f0fc0ec02d4846a3 ("smb3: allow deferred close timeout to be configurable")
> >>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >>>
> >>> in testcase: filebench
> >>> on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz (Cascade Lake) with 128G memory
> >>> with following parameters:
> >>>
> >>> disk: 1HDD
> >>> fs: ext4
> >>> fs2: cifs
> >>> test: filemicro_delete.f
> >>> cpufreq_governor: performance
> >>> ucode: 0x5003302
> >> Please note, we still could see this regresion on v6.3-rc2. And the regression is related with
> >> the commit: 5efdd9122eff772eae2feae9f0fc0ec02d4846a3. This commit changed the default timeout
> >> value from 1s to 5s. If change the timeout back to 1s as following:
> >>
> >> diff --git a/fs/cifs/fs_context.c b/fs/cifs/fs_context.c
> >> index 6d13f8207e96..6b930fb0c4bd 100644
> >> --- a/fs/cifs/fs_context.c
> >> +++ b/fs/cifs/fs_context.c
> >> @@ -1537,7 +1537,7 @@ int smb3_init_fs_context(struct fs_context *fc)
> >>
> >> ctx->acregmax = CIFS_DEF_ACTIMEO;
> >> ctx->acdirmax = CIFS_DEF_ACTIMEO;
> >> - ctx->closetimeo = SMB3_DEF_DCLOSETIMEO;
> >> + ctx->closetimeo = CIFS_DEF_ACTIMEO;
> >>
> >> /* Most clients set timeout to 0, allows server to use its default */
> >> ctx->handle_timeout = 0; /* See MS-SMB2 spec section 2.2.14.2.12 */
> >>
> >> The regression is gone:
> >> dcb45fd7f501f864 v6.3-rc2 32715af441411a5a266606be08f
> >> ---------------- --------------------------- ---------------------------
> >> fail:runs %reproduction fail:runs %reproduction fail:runs
> >> | | | | |
> >> :25 0% :3 33% 1:3 last_state.booting
> >> :25 0% :3 33% 1:3 last_state.is_incomplete_run
> >> %stddev %change %stddev %change %stddev
> >> \ | \ | \
> >> 515.95 -50.0% 257.98 -0.0% 515.92 filebench.sum_operations/s
> >> 5.04 ± 7% +833.7% 47.09 ± 2% -2.9% 4.90 ± 2% filebench.sum_time_ms/op
> >> 10438 -100.0% 0.33 ±141% -100.0% 0.50 ±100% filebench.time.major_page_faults
> >> 167575 -4.1% 160660 -4.4% 160140 filebench.time.maximum_resident_set_size
> >> 7138 ± 11% +141.4% 17235 ± 3% +147.6% 17677 filebench.time.minor_page_faults
> >> 28.68 ± 9% +199.9% 86.00 ± 7% -2.4% 28.00 filebench.time.percent_of_cpu_this_job_got
> >> 2453485 ± 54% -63.0% 907380 -66.2% 830273 ± 6% cpuidle..usage
> >> 0.61 ± 38% +0.8 1.41 ± 3% +0.2 0.80 ± 4% mpstat.cpu.all.sys%
> >> 142984 ± 13% -45.6% 77725 -47.5% 75106 vmstat.system.in
> >> 34.23 ± 7% +27.9% 43.79 +27.8% 43.74 boot-time.boot
> >> 17.09 ± 11% +66.0% 28.38 +65.5% 28.28 boot-time.dhcp
> >> 2661 ± 7% +37.5% 3659 +37.2% 3651 boot-time.idle
> >> 104737 ±185% -87.8% 12762 ± 10% -89.8% 10631 ± 4% turbostat.C1
> >>
> >>
> >> 32715af441411a5a266606be08f is v6.3-rc2 with the change to restore the timeout to 1s. Thanks.
> >>
> >> Regards
> >> Yin, Fengwei
> >>
> >>
> >>>
> >>>
> >>> =========================================================================================
> >>> compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase/ucode:
> >>> gcc-11/performance/1HDD/cifs/ext4/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp9/filemicro_delete.f/filebench/0x5003302
> >>>
> >>> commit:
> >>> dcb45fd7f5 ("cifs: Do not use tcon->cfid directly, use the cfid we get from open_cached_dir")
> >>> 5efdd9122e ("smb3: allow deferred close timeout to be configurable")
> >>>
> >>> dcb45fd7f501f864 5efdd9122eff772eae2feae9f0f
> >>> ---------------- ---------------------------
> >>> %stddev %change %stddev
> >>> \ | \
> >>> 515.95 -50.0% 257.98 filebench.sum_operations/s
> >>> 4.81 ± 2% +1038.4% 54.78 ± 6% filebench.sum_time_ms/op
> >>> 29.00 ± 8% +212.1% 90.50 ± 3% filebench.time.percent_of_cpu_this_job_got
> >>> 24629 +2.7% 25297 filebench.time.voluntary_context_switches
> >>> 7.685e+08 +19.3% 9.169e+08 ± 4% cpuidle..time
> >>> 2.53 ± 6% -20.6% 2.00 ± 3% iostat.cpu.iowait
> >>> 1506141 ± 8% +22.4% 1843256 ± 3% turbostat.IRQ
> >>> 2.00 -50.0% 1.00 vmstat.procs.b
> >>> 21969 ± 2% -9.5% 19885 ± 2% vmstat.system.cs
> >>> 3.06 ± 7% -0.7 2.35 ± 4% mpstat.cpu.all.iowait%
> >>> 0.79 ± 5% +0.5 1.27 ± 2% mpstat.cpu.all.sys%
> >>> 0.89 ± 3% -0.1 0.79 ± 3% mpstat.cpu.all.usr%
> >>> 34.55 ± 14% -34.8% 22.51 ± 27% sched_debug.cfs_rq:/.removed.runnable_avg.avg
> >>> 119.64 ± 3% -20.0% 95.69 ± 17% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
> >>> 34.55 ± 14% -34.8% 22.51 ± 27% sched_debug.cfs_rq:/.removed.util_avg.avg
> >>> 119.64 ± 3% -20.0% 95.69 ± 17% sched_debug.cfs_rq:/.removed.util_avg.stddev
> >>> 5249 +15.8% 6076 meminfo.Active
> >>> 3866 ± 2% +17.7% 4552 meminfo.Active(anon)
> >>> 1382 ± 4% +10.3% 1524 ± 4% meminfo.Active(file)
> >>> 69791 ± 14% +39.8% 97553 ± 6% meminfo.AnonHugePages
> >>> 72709 ± 2% +12.5% 81779 ± 3% meminfo.Inactive(file)
> >>> 23219 +13.5% 26352 ± 3% meminfo.KernelStack
> >>> 966.50 ± 2% +17.7% 1137 proc-vmstat.nr_active_anon
> >>> 74302 +6.3% 78977 ± 2% proc-vmstat.nr_anon_pages
> >>> 81133 +6.0% 85973 proc-vmstat.nr_inactive_anon
> >>> 18172 ± 2% +12.5% 20442 ± 3% proc-vmstat.nr_inactive_file
> >>> 23213 +13.5% 26348 ± 3% proc-vmstat.nr_kernel_stack
> >>> 17983 +2.3% 18400 proc-vmstat.nr_mapped
> >>> 7446 ± 2% +5.5% 7853 ± 3% proc-vmstat.nr_shmem
> >>> 26888 +1.6% 27306 proc-vmstat.nr_slab_reclaimable
> >>> 47220 +3.4% 48803 proc-vmstat.nr_slab_unreclaimable
> >>> 966.50 ± 2% +17.7% 1137 proc-vmstat.nr_zone_active_anon
> >>> 81133 +6.0% 85973 proc-vmstat.nr_zone_inactive_anon
> >>> 18172 ± 2% +12.5% 20442 ± 3% proc-vmstat.nr_zone_inactive_file
> >>> 361460 +2.5% 370454 proc-vmstat.numa_hit
> >>> 946.67 +18.6% 1122 proc-vmstat.pgactivate
> >>> 361562 +2.5% 370553 proc-vmstat.pgalloc_normal
> >>> 187906 +4.7% 196761 proc-vmstat.pgfault
> >>> 8189 +2.5% 8395 proc-vmstat.pgreuse
> >>> 1.097e+09 +15.5% 1.267e+09 ± 7% perf-stat.i.branch-instructions
> >>> 39079265 ± 6% -20.9% 30915354 ± 4% perf-stat.i.branch-misses
> >>> 5093263 ± 4% -23.7% 3884752 ± 9% perf-stat.i.cache-misses
> >>> 29213 -18.7% 23764 ± 5% perf-stat.i.context-switches
> >>> 7.666e+09 ± 4% +5.7% 8.106e+09 ± 2% perf-stat.i.cpu-cycles
> >>> 1877 ± 15% +75.1% 3287 ± 12% perf-stat.i.cycles-between-cache-misses
> >>> 1735450 ± 3% -12.9% 1512060 ± 3% perf-stat.i.iTLB-load-misses
> >>> 2898 ± 3% +34.4% 3895 ± 7% perf-stat.i.instructions-per-iTLB-miss
> >>> 1493 -20.3% 1190 ± 7% perf-stat.i.major-faults
> >>> 0.09 ± 3% +5.8% 0.09 ± 2% perf-stat.i.metric.GHz
> >>> 48.47 ± 11% +8.4 56.83 ± 7% perf-stat.i.node-store-miss-rate%
> >>> 283426 ± 4% -21.6% 222190 ± 10% perf-stat.i.node-stores
> >>> 3.57 ± 7% -1.1 2.44 ± 6% perf-stat.overall.branch-miss-rate%
> >>> 1508 ± 3% +39.8% 2108 ± 9% perf-stat.overall.cycles-between-cache-misses
> >>> 3022 ± 3% +23.6% 3736 ± 5% perf-stat.overall.instructions-per-iTLB-miss
> >>> 9.585e+08 +18.8% 1.138e+09 ± 6% perf-stat.ps.branch-instructions
> >>> 34151514 ± 6% -18.8% 27725316 ± 4% perf-stat.ps.branch-misses
> >>> 4450329 ± 5% -21.7% 3486409 ± 9% perf-stat.ps.cache-misses
> >>> 25524 -16.4% 21333 ± 4% perf-stat.ps.context-switches
> >>> 77139 +2.5% 79105 perf-stat.ps.cpu-clock
> >>> 6.704e+09 ± 4% +8.7% 7.287e+09 perf-stat.ps.cpu-cycles
> >>> 1.06e+09 +11.3% 1.18e+09 ± 5% perf-stat.ps.dTLB-loads
> >>> 1517349 ± 3% -10.5% 1357716 ± 2% perf-stat.ps.iTLB-load-misses
> >>> 4.582e+09 +10.8% 5.075e+09 ± 6% perf-stat.ps.instructions
> >>> 1296 -18.1% 1061 ± 6% perf-stat.ps.major-faults
> >>> 247613 ± 4% -19.5% 199283 ± 9% perf-stat.ps.node-stores
> >>> 77139 +2.5% 79105 perf-stat.ps.task-clock
> >>> 3.697e+10 +35.3% 5.003e+10 perf-stat.total.instructions
> >>> 8.51 ± 91% -6.9 1.59 ±223% perf-profile.calltrace.cycles-pp.getdents64
> >>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >>> 6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
> >>> 6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
> >>> 6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.open64
> >>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getdents64
> >>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
> >>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
> >>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.iterate_dir.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
> >>> 6.26 ±115% -5.4 0.88 ±223% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >>> 6.26 ±115% -5.4 0.88 ±223% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare
> >>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop
> >>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
> >>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
> >>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record
> >>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist
> >>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push
> >>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record
> >>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.mmput.exit_mm.do_exit
> >>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.mmput.exit_mm
> >>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.mmput
> >>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
> >>> 8.51 ± 91% -6.9 1.59 ±223% perf-profile.children.cycles-pp.getdents64
> >>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
> >>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.exit_to_user_mode_loop
> >>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> >>> 6.25 ±107% -6.2 0.00 perf-profile.children.cycles-pp.open64
> >>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.children.cycles-pp.__x64_sys_getdents64
> >>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.children.cycles-pp.iterate_dir
> >>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.children.cycles-pp.arch_do_signal_or_restart
> >>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.children.cycles-pp.get_signal
> >>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.children.cycles-pp.perf_mmap__read_head
> >>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.unmap_vmas
> >>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.unmap_page_range
> >>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.zap_pmd_range
> >>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.zap_pte_range
> >>> 8.54 ± 43% +8.6 17.19 ± 38% perf-profile.children.cycles-pp.asm_exc_page_fault
> >>>
> >>>
> >>> If you fix the issue, kindly add following tag
> >>> Reported-by: kernel test robot <[email protected]>
> >>>
> >>>
> >>> To reproduce:
> >>>
> >>> git clone https://github.com/intel/lkp-tests.git
> >>> cd lkp-tests
> >>> sudo bin/lkp install job.yaml # job file is attached in this email
> >>> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> >>> sudo bin/lkp run generated-yaml-file
> >>>
> >>> # if come across any failure that blocks the test,
> >>> # please remove ~/.lkp and /lkp dir to run from a clean state.
> >>>
> >>>
> >>> Disclaimer:
> >>> Results have been estimated based on internal Intel analysis and are provided
> >>> for informational purposes only. Any difference in system hardware or software
> >>> design or configuration may affect actual performance.
> >>>
> >>>
> >>> #regzbot introduced: 5efdd9122e
> >>>
> >>>
> >
> >
> >



--
Thanks,

Steve


Attachments:
0001-smb3-lower-default-deferred-close-timeout-to-address.patch (1.83 kB)

2023-03-24 05:59:00

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [smb3] 5efdd9122e: filebench.sum_operations/s -50.0% regression


On 3/24/2023 4:00 AM, Steve French wrote:
> This is VERY useful data. Thank you. It show the SMB3 CLOSE
> operation on the wire takes twice as long to complete with
> closetimeo=5 rather than 1. Can you do the same stats for
> closetimeo=0
Please find in attached file for /proc/fs/cifs/Stats dumping with closetimeo=0.
BTW, there is no regression with closetimeo=0.

Regards
Yin, Fengwei

>
> On Thu, Mar 23, 2023 at 2:48 AM Yin Fengwei <[email protected]> wrote:
>>
>> On 3/15/23 22:26, Steve French wrote:
>>> Can you verify what this perf looks like with "closetime=0" and "closetime=1"
>>>
>>> Are there differences in /proc/fs/cifs/Stats when you run the same
>>> steps with "closetimeo=1" vs. the more recent default (5 seconds)?
>> The dump of /proc/fs/cifs/Stats are attached. You can tell which
>> closetimeo is used from the file name. Thanks.
>>
>>
>> Regards
>> Yin, Fengwei
>>
>>>
>>> On Wed, Mar 15, 2023 at 2:46 AM Yin, Fengwei <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> On 8/26/2022 10:41 AM, kernel test robot wrote:
>>>>> Greeting,
>>>>>
>>>>> FYI, we noticed a -50.0% regression of filebench.sum_operations/s due to commit:
>>>>>
>>>>>
>>>>> commit: 5efdd9122eff772eae2feae9f0fc0ec02d4846a3 ("smb3: allow deferred close timeout to be configurable")
>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>>>>
>>>>> in testcase: filebench
>>>>> on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz (Cascade Lake) with 128G memory
>>>>> with following parameters:
>>>>>
>>>>> disk: 1HDD
>>>>> fs: ext4
>>>>> fs2: cifs
>>>>> test: filemicro_delete.f
>>>>> cpufreq_governor: performance
>>>>> ucode: 0x5003302
>>>> Please note, we still could see this regresion on v6.3-rc2. And the regression is related with
>>>> the commit: 5efdd9122eff772eae2feae9f0fc0ec02d4846a3. This commit changed the default timeout
>>>> value from 1s to 5s. If change the timeout back to 1s as following:
>>>>
>>>> diff --git a/fs/cifs/fs_context.c b/fs/cifs/fs_context.c
>>>> index 6d13f8207e96..6b930fb0c4bd 100644
>>>> --- a/fs/cifs/fs_context.c
>>>> +++ b/fs/cifs/fs_context.c
>>>> @@ -1537,7 +1537,7 @@ int smb3_init_fs_context(struct fs_context *fc)
>>>>
>>>> ctx->acregmax = CIFS_DEF_ACTIMEO;
>>>> ctx->acdirmax = CIFS_DEF_ACTIMEO;
>>>> - ctx->closetimeo = SMB3_DEF_DCLOSETIMEO;
>>>> + ctx->closetimeo = CIFS_DEF_ACTIMEO;
>>>>
>>>> /* Most clients set timeout to 0, allows server to use its default */
>>>> ctx->handle_timeout = 0; /* See MS-SMB2 spec section 2.2.14.2.12 */
>>>>
>>>> The regression is gone:
>>>> dcb45fd7f501f864 v6.3-rc2 32715af441411a5a266606be08f
>>>> ---------------- --------------------------- ---------------------------
>>>> fail:runs %reproduction fail:runs %reproduction fail:runs
>>>> | | | | |
>>>> :25 0% :3 33% 1:3 last_state.booting
>>>> :25 0% :3 33% 1:3 last_state.is_incomplete_run
>>>> %stddev %change %stddev %change %stddev
>>>> \ | \ | \
>>>> 515.95 -50.0% 257.98 -0.0% 515.92 filebench.sum_operations/s
>>>> 5.04 ± 7% +833.7% 47.09 ± 2% -2.9% 4.90 ± 2% filebench.sum_time_ms/op
>>>> 10438 -100.0% 0.33 ±141% -100.0% 0.50 ±100% filebench.time.major_page_faults
>>>> 167575 -4.1% 160660 -4.4% 160140 filebench.time.maximum_resident_set_size
>>>> 7138 ± 11% +141.4% 17235 ± 3% +147.6% 17677 filebench.time.minor_page_faults
>>>> 28.68 ± 9% +199.9% 86.00 ± 7% -2.4% 28.00 filebench.time.percent_of_cpu_this_job_got
>>>> 2453485 ± 54% -63.0% 907380 -66.2% 830273 ± 6% cpuidle..usage
>>>> 0.61 ± 38% +0.8 1.41 ± 3% +0.2 0.80 ± 4% mpstat.cpu.all.sys%
>>>> 142984 ± 13% -45.6% 77725 -47.5% 75106 vmstat.system.in
>>>> 34.23 ± 7% +27.9% 43.79 +27.8% 43.74 boot-time.boot
>>>> 17.09 ± 11% +66.0% 28.38 +65.5% 28.28 boot-time.dhcp
>>>> 2661 ± 7% +37.5% 3659 +37.2% 3651 boot-time.idle
>>>> 104737 ±185% -87.8% 12762 ± 10% -89.8% 10631 ± 4% turbostat.C1
>>>>
>>>>
>>>> 32715af441411a5a266606be08f is v6.3-rc2 with the change to restore the timeout to 1s. Thanks.
>>>>
>>>> Regards
>>>> Yin, Fengwei
>>>>
>>>>
>>>>>
>>>>>
>>>>> =========================================================================================
>>>>> compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase/ucode:
>>>>> gcc-11/performance/1HDD/cifs/ext4/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp9/filemicro_delete.f/filebench/0x5003302
>>>>>
>>>>> commit:
>>>>> dcb45fd7f5 ("cifs: Do not use tcon->cfid directly, use the cfid we get from open_cached_dir")
>>>>> 5efdd9122e ("smb3: allow deferred close timeout to be configurable")
>>>>>
>>>>> dcb45fd7f501f864 5efdd9122eff772eae2feae9f0f
>>>>> ---------------- ---------------------------
>>>>> %stddev %change %stddev
>>>>> \ | \
>>>>> 515.95 -50.0% 257.98 filebench.sum_operations/s
>>>>> 4.81 ± 2% +1038.4% 54.78 ± 6% filebench.sum_time_ms/op
>>>>> 29.00 ± 8% +212.1% 90.50 ± 3% filebench.time.percent_of_cpu_this_job_got
>>>>> 24629 +2.7% 25297 filebench.time.voluntary_context_switches
>>>>> 7.685e+08 +19.3% 9.169e+08 ± 4% cpuidle..time
>>>>> 2.53 ± 6% -20.6% 2.00 ± 3% iostat.cpu.iowait
>>>>> 1506141 ± 8% +22.4% 1843256 ± 3% turbostat.IRQ
>>>>> 2.00 -50.0% 1.00 vmstat.procs.b
>>>>> 21969 ± 2% -9.5% 19885 ± 2% vmstat.system.cs
>>>>> 3.06 ± 7% -0.7 2.35 ± 4% mpstat.cpu.all.iowait%
>>>>> 0.79 ± 5% +0.5 1.27 ± 2% mpstat.cpu.all.sys%
>>>>> 0.89 ± 3% -0.1 0.79 ± 3% mpstat.cpu.all.usr%
>>>>> 34.55 ± 14% -34.8% 22.51 ± 27% sched_debug.cfs_rq:/.removed.runnable_avg.avg
>>>>> 119.64 ± 3% -20.0% 95.69 ± 17% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
>>>>> 34.55 ± 14% -34.8% 22.51 ± 27% sched_debug.cfs_rq:/.removed.util_avg.avg
>>>>> 119.64 ± 3% -20.0% 95.69 ± 17% sched_debug.cfs_rq:/.removed.util_avg.stddev
>>>>> 5249 +15.8% 6076 meminfo.Active
>>>>> 3866 ± 2% +17.7% 4552 meminfo.Active(anon)
>>>>> 1382 ± 4% +10.3% 1524 ± 4% meminfo.Active(file)
>>>>> 69791 ± 14% +39.8% 97553 ± 6% meminfo.AnonHugePages
>>>>> 72709 ± 2% +12.5% 81779 ± 3% meminfo.Inactive(file)
>>>>> 23219 +13.5% 26352 ± 3% meminfo.KernelStack
>>>>> 966.50 ± 2% +17.7% 1137 proc-vmstat.nr_active_anon
>>>>> 74302 +6.3% 78977 ± 2% proc-vmstat.nr_anon_pages
>>>>> 81133 +6.0% 85973 proc-vmstat.nr_inactive_anon
>>>>> 18172 ± 2% +12.5% 20442 ± 3% proc-vmstat.nr_inactive_file
>>>>> 23213 +13.5% 26348 ± 3% proc-vmstat.nr_kernel_stack
>>>>> 17983 +2.3% 18400 proc-vmstat.nr_mapped
>>>>> 7446 ± 2% +5.5% 7853 ± 3% proc-vmstat.nr_shmem
>>>>> 26888 +1.6% 27306 proc-vmstat.nr_slab_reclaimable
>>>>> 47220 +3.4% 48803 proc-vmstat.nr_slab_unreclaimable
>>>>> 966.50 ± 2% +17.7% 1137 proc-vmstat.nr_zone_active_anon
>>>>> 81133 +6.0% 85973 proc-vmstat.nr_zone_inactive_anon
>>>>> 18172 ± 2% +12.5% 20442 ± 3% proc-vmstat.nr_zone_inactive_file
>>>>> 361460 +2.5% 370454 proc-vmstat.numa_hit
>>>>> 946.67 +18.6% 1122 proc-vmstat.pgactivate
>>>>> 361562 +2.5% 370553 proc-vmstat.pgalloc_normal
>>>>> 187906 +4.7% 196761 proc-vmstat.pgfault
>>>>> 8189 +2.5% 8395 proc-vmstat.pgreuse
>>>>> 1.097e+09 +15.5% 1.267e+09 ± 7% perf-stat.i.branch-instructions
>>>>> 39079265 ± 6% -20.9% 30915354 ± 4% perf-stat.i.branch-misses
>>>>> 5093263 ± 4% -23.7% 3884752 ± 9% perf-stat.i.cache-misses
>>>>> 29213 -18.7% 23764 ± 5% perf-stat.i.context-switches
>>>>> 7.666e+09 ± 4% +5.7% 8.106e+09 ± 2% perf-stat.i.cpu-cycles
>>>>> 1877 ± 15% +75.1% 3287 ± 12% perf-stat.i.cycles-between-cache-misses
>>>>> 1735450 ± 3% -12.9% 1512060 ± 3% perf-stat.i.iTLB-load-misses
>>>>> 2898 ± 3% +34.4% 3895 ± 7% perf-stat.i.instructions-per-iTLB-miss
>>>>> 1493 -20.3% 1190 ± 7% perf-stat.i.major-faults
>>>>> 0.09 ± 3% +5.8% 0.09 ± 2% perf-stat.i.metric.GHz
>>>>> 48.47 ± 11% +8.4 56.83 ± 7% perf-stat.i.node-store-miss-rate%
>>>>> 283426 ± 4% -21.6% 222190 ± 10% perf-stat.i.node-stores
>>>>> 3.57 ± 7% -1.1 2.44 ± 6% perf-stat.overall.branch-miss-rate%
>>>>> 1508 ± 3% +39.8% 2108 ± 9% perf-stat.overall.cycles-between-cache-misses
>>>>> 3022 ± 3% +23.6% 3736 ± 5% perf-stat.overall.instructions-per-iTLB-miss
>>>>> 9.585e+08 +18.8% 1.138e+09 ± 6% perf-stat.ps.branch-instructions
>>>>> 34151514 ± 6% -18.8% 27725316 ± 4% perf-stat.ps.branch-misses
>>>>> 4450329 ± 5% -21.7% 3486409 ± 9% perf-stat.ps.cache-misses
>>>>> 25524 -16.4% 21333 ± 4% perf-stat.ps.context-switches
>>>>> 77139 +2.5% 79105 perf-stat.ps.cpu-clock
>>>>> 6.704e+09 ± 4% +8.7% 7.287e+09 perf-stat.ps.cpu-cycles
>>>>> 1.06e+09 +11.3% 1.18e+09 ± 5% perf-stat.ps.dTLB-loads
>>>>> 1517349 ± 3% -10.5% 1357716 ± 2% perf-stat.ps.iTLB-load-misses
>>>>> 4.582e+09 +10.8% 5.075e+09 ± 6% perf-stat.ps.instructions
>>>>> 1296 -18.1% 1061 ± 6% perf-stat.ps.major-faults
>>>>> 247613 ± 4% -19.5% 199283 ± 9% perf-stat.ps.node-stores
>>>>> 77139 +2.5% 79105 perf-stat.ps.task-clock
>>>>> 3.697e+10 +35.3% 5.003e+10 perf-stat.total.instructions
>>>>> 8.51 ± 91% -6.9 1.59 ±223% perf-profile.calltrace.cycles-pp.getdents64
>>>>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>>>> 6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
>>>>> 6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
>>>>> 6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.open64
>>>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getdents64
>>>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
>>>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
>>>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.iterate_dir.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
>>>>> 6.26 ±115% -5.4 0.88 ±223% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>>>> 6.26 ±115% -5.4 0.88 ±223% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare
>>>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop
>>>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
>>>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
>>>>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record
>>>>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist
>>>>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push
>>>>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record
>>>>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.mmput.exit_mm.do_exit
>>>>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.mmput.exit_mm
>>>>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.mmput
>>>>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
>>>>> 8.51 ± 91% -6.9 1.59 ±223% perf-profile.children.cycles-pp.getdents64
>>>>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
>>>>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.exit_to_user_mode_loop
>>>>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
>>>>> 6.25 ±107% -6.2 0.00 perf-profile.children.cycles-pp.open64
>>>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.children.cycles-pp.__x64_sys_getdents64
>>>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.children.cycles-pp.iterate_dir
>>>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.children.cycles-pp.arch_do_signal_or_restart
>>>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.children.cycles-pp.get_signal
>>>>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.children.cycles-pp.perf_mmap__read_head
>>>>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.unmap_vmas
>>>>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.unmap_page_range
>>>>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.zap_pmd_range
>>>>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.zap_pte_range
>>>>> 8.54 ± 43% +8.6 17.19 ± 38% perf-profile.children.cycles-pp.asm_exc_page_fault
>>>>>
>>>>>
>>>>> If you fix the issue, kindly add following tag
>>>>> Reported-by: kernel test robot <[email protected]>
>>>>>
>>>>>
>>>>> To reproduce:
>>>>>
>>>>> git clone https://github.com/intel/lkp-tests.git
>>>>> cd lkp-tests
>>>>> sudo bin/lkp install job.yaml # job file is attached in this email
>>>>> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>>>>> sudo bin/lkp run generated-yaml-file
>>>>>
>>>>> # if come across any failure that blocks the test,
>>>>> # please remove ~/.lkp and /lkp dir to run from a clean state.
>>>>>
>>>>>
>>>>> Disclaimer:
>>>>> Results have been estimated based on internal Intel analysis and are provided
>>>>> for informational purposes only. Any difference in system hardware or software
>>>>> design or configuration may affect actual performance.
>>>>>
>>>>>
>>>>> #regzbot introduced: 5efdd9122e
>>>>>
>>>>>
>>>
>>>
>>>
>
>
>


Attachments:
cifs_stats_timeout0s.txt (1.30 kB)

2023-03-24 10:35:46

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [smb3] 5efdd9122e: filebench.sum_operations/s -50.0% regression



On 3/24/2023 4:36 AM, Steve French wrote:
> Patch attached to return default back to 1 second from 5 seconds
> (Bharath's recent closetimeo on lease break patch may also help)
>
> Let me know if you see any problems with it
With this patch, the regression is gone.

Regards
Yin, Fengwei

>
> On Thu, Mar 23, 2023 at 2:48 AM Yin Fengwei <[email protected]> wrote:
>>
>> On 3/15/23 22:26, Steve French wrote:
>>> Can you verify what this perf looks like with "closetime=0" and "closetime=1"
>>>
>>> Are there differences in /proc/fs/cifs/Stats when you run the same
>>> steps with "closetimeo=1" vs. the more recent default (5 seconds)?
>> The dump of /proc/fs/cifs/Stats are attached. You can tell which
>> closetimeo is used from the file name. Thanks.
>>
>>
>> Regards
>> Yin, Fengwei
>>
>>>
>>> On Wed, Mar 15, 2023 at 2:46 AM Yin, Fengwei <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> On 8/26/2022 10:41 AM, kernel test robot wrote:
>>>>> Greeting,
>>>>>
>>>>> FYI, we noticed a -50.0% regression of filebench.sum_operations/s due to commit:
>>>>>
>>>>>
>>>>> commit: 5efdd9122eff772eae2feae9f0fc0ec02d4846a3 ("smb3: allow deferred close timeout to be configurable")
>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>>>>
>>>>> in testcase: filebench
>>>>> on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz (Cascade Lake) with 128G memory
>>>>> with following parameters:
>>>>>
>>>>> disk: 1HDD
>>>>> fs: ext4
>>>>> fs2: cifs
>>>>> test: filemicro_delete.f
>>>>> cpufreq_governor: performance
>>>>> ucode: 0x5003302
>>>> Please note, we still could see this regresion on v6.3-rc2. And the regression is related with
>>>> the commit: 5efdd9122eff772eae2feae9f0fc0ec02d4846a3. This commit changed the default timeout
>>>> value from 1s to 5s. If change the timeout back to 1s as following:
>>>>
>>>> diff --git a/fs/cifs/fs_context.c b/fs/cifs/fs_context.c
>>>> index 6d13f8207e96..6b930fb0c4bd 100644
>>>> --- a/fs/cifs/fs_context.c
>>>> +++ b/fs/cifs/fs_context.c
>>>> @@ -1537,7 +1537,7 @@ int smb3_init_fs_context(struct fs_context *fc)
>>>>
>>>> ctx->acregmax = CIFS_DEF_ACTIMEO;
>>>> ctx->acdirmax = CIFS_DEF_ACTIMEO;
>>>> - ctx->closetimeo = SMB3_DEF_DCLOSETIMEO;
>>>> + ctx->closetimeo = CIFS_DEF_ACTIMEO;
>>>>
>>>> /* Most clients set timeout to 0, allows server to use its default */
>>>> ctx->handle_timeout = 0; /* See MS-SMB2 spec section 2.2.14.2.12 */
>>>>
>>>> The regression is gone:
>>>> dcb45fd7f501f864 v6.3-rc2 32715af441411a5a266606be08f
>>>> ---------------- --------------------------- ---------------------------
>>>> fail:runs %reproduction fail:runs %reproduction fail:runs
>>>> | | | | |
>>>> :25 0% :3 33% 1:3 last_state.booting
>>>> :25 0% :3 33% 1:3 last_state.is_incomplete_run
>>>> %stddev %change %stddev %change %stddev
>>>> \ | \ | \
>>>> 515.95 -50.0% 257.98 -0.0% 515.92 filebench.sum_operations/s
>>>> 5.04 ± 7% +833.7% 47.09 ± 2% -2.9% 4.90 ± 2% filebench.sum_time_ms/op
>>>> 10438 -100.0% 0.33 ±141% -100.0% 0.50 ±100% filebench.time.major_page_faults
>>>> 167575 -4.1% 160660 -4.4% 160140 filebench.time.maximum_resident_set_size
>>>> 7138 ± 11% +141.4% 17235 ± 3% +147.6% 17677 filebench.time.minor_page_faults
>>>> 28.68 ± 9% +199.9% 86.00 ± 7% -2.4% 28.00 filebench.time.percent_of_cpu_this_job_got
>>>> 2453485 ± 54% -63.0% 907380 -66.2% 830273 ± 6% cpuidle..usage
>>>> 0.61 ± 38% +0.8 1.41 ± 3% +0.2 0.80 ± 4% mpstat.cpu.all.sys%
>>>> 142984 ± 13% -45.6% 77725 -47.5% 75106 vmstat.system.in
>>>> 34.23 ± 7% +27.9% 43.79 +27.8% 43.74 boot-time.boot
>>>> 17.09 ± 11% +66.0% 28.38 +65.5% 28.28 boot-time.dhcp
>>>> 2661 ± 7% +37.5% 3659 +37.2% 3651 boot-time.idle
>>>> 104737 ±185% -87.8% 12762 ± 10% -89.8% 10631 ± 4% turbostat.C1
>>>>
>>>>
>>>> 32715af441411a5a266606be08f is v6.3-rc2 with the change to restore the timeout to 1s. Thanks.
>>>>
>>>> Regards
>>>> Yin, Fengwei
>>>>
>>>>
>>>>>
>>>>>
>>>>> =========================================================================================
>>>>> compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase/ucode:
>>>>> gcc-11/performance/1HDD/cifs/ext4/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp9/filemicro_delete.f/filebench/0x5003302
>>>>>
>>>>> commit:
>>>>> dcb45fd7f5 ("cifs: Do not use tcon->cfid directly, use the cfid we get from open_cached_dir")
>>>>> 5efdd9122e ("smb3: allow deferred close timeout to be configurable")
>>>>>
>>>>> dcb45fd7f501f864 5efdd9122eff772eae2feae9f0f
>>>>> ---------------- ---------------------------
>>>>> %stddev %change %stddev
>>>>> \ | \
>>>>> 515.95 -50.0% 257.98 filebench.sum_operations/s
>>>>> 4.81 ± 2% +1038.4% 54.78 ± 6% filebench.sum_time_ms/op
>>>>> 29.00 ± 8% +212.1% 90.50 ± 3% filebench.time.percent_of_cpu_this_job_got
>>>>> 24629 +2.7% 25297 filebench.time.voluntary_context_switches
>>>>> 7.685e+08 +19.3% 9.169e+08 ± 4% cpuidle..time
>>>>> 2.53 ± 6% -20.6% 2.00 ± 3% iostat.cpu.iowait
>>>>> 1506141 ± 8% +22.4% 1843256 ± 3% turbostat.IRQ
>>>>> 2.00 -50.0% 1.00 vmstat.procs.b
>>>>> 21969 ± 2% -9.5% 19885 ± 2% vmstat.system.cs
>>>>> 3.06 ± 7% -0.7 2.35 ± 4% mpstat.cpu.all.iowait%
>>>>> 0.79 ± 5% +0.5 1.27 ± 2% mpstat.cpu.all.sys%
>>>>> 0.89 ± 3% -0.1 0.79 ± 3% mpstat.cpu.all.usr%
>>>>> 34.55 ± 14% -34.8% 22.51 ± 27% sched_debug.cfs_rq:/.removed.runnable_avg.avg
>>>>> 119.64 ± 3% -20.0% 95.69 ± 17% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
>>>>> 34.55 ± 14% -34.8% 22.51 ± 27% sched_debug.cfs_rq:/.removed.util_avg.avg
>>>>> 119.64 ± 3% -20.0% 95.69 ± 17% sched_debug.cfs_rq:/.removed.util_avg.stddev
>>>>> 5249 +15.8% 6076 meminfo.Active
>>>>> 3866 ± 2% +17.7% 4552 meminfo.Active(anon)
>>>>> 1382 ± 4% +10.3% 1524 ± 4% meminfo.Active(file)
>>>>> 69791 ± 14% +39.8% 97553 ± 6% meminfo.AnonHugePages
>>>>> 72709 ± 2% +12.5% 81779 ± 3% meminfo.Inactive(file)
>>>>> 23219 +13.5% 26352 ± 3% meminfo.KernelStack
>>>>> 966.50 ± 2% +17.7% 1137 proc-vmstat.nr_active_anon
>>>>> 74302 +6.3% 78977 ± 2% proc-vmstat.nr_anon_pages
>>>>> 81133 +6.0% 85973 proc-vmstat.nr_inactive_anon
>>>>> 18172 ± 2% +12.5% 20442 ± 3% proc-vmstat.nr_inactive_file
>>>>> 23213 +13.5% 26348 ± 3% proc-vmstat.nr_kernel_stack
>>>>> 17983 +2.3% 18400 proc-vmstat.nr_mapped
>>>>> 7446 ± 2% +5.5% 7853 ± 3% proc-vmstat.nr_shmem
>>>>> 26888 +1.6% 27306 proc-vmstat.nr_slab_reclaimable
>>>>> 47220 +3.4% 48803 proc-vmstat.nr_slab_unreclaimable
>>>>> 966.50 ± 2% +17.7% 1137 proc-vmstat.nr_zone_active_anon
>>>>> 81133 +6.0% 85973 proc-vmstat.nr_zone_inactive_anon
>>>>> 18172 ± 2% +12.5% 20442 ± 3% proc-vmstat.nr_zone_inactive_file
>>>>> 361460 +2.5% 370454 proc-vmstat.numa_hit
>>>>> 946.67 +18.6% 1122 proc-vmstat.pgactivate
>>>>> 361562 +2.5% 370553 proc-vmstat.pgalloc_normal
>>>>> 187906 +4.7% 196761 proc-vmstat.pgfault
>>>>> 8189 +2.5% 8395 proc-vmstat.pgreuse
>>>>> 1.097e+09 +15.5% 1.267e+09 ± 7% perf-stat.i.branch-instructions
>>>>> 39079265 ± 6% -20.9% 30915354 ± 4% perf-stat.i.branch-misses
>>>>> 5093263 ± 4% -23.7% 3884752 ± 9% perf-stat.i.cache-misses
>>>>> 29213 -18.7% 23764 ± 5% perf-stat.i.context-switches
>>>>> 7.666e+09 ± 4% +5.7% 8.106e+09 ± 2% perf-stat.i.cpu-cycles
>>>>> 1877 ± 15% +75.1% 3287 ± 12% perf-stat.i.cycles-between-cache-misses
>>>>> 1735450 ± 3% -12.9% 1512060 ± 3% perf-stat.i.iTLB-load-misses
>>>>> 2898 ± 3% +34.4% 3895 ± 7% perf-stat.i.instructions-per-iTLB-miss
>>>>> 1493 -20.3% 1190 ± 7% perf-stat.i.major-faults
>>>>> 0.09 ± 3% +5.8% 0.09 ± 2% perf-stat.i.metric.GHz
>>>>> 48.47 ± 11% +8.4 56.83 ± 7% perf-stat.i.node-store-miss-rate%
>>>>> 283426 ± 4% -21.6% 222190 ± 10% perf-stat.i.node-stores
>>>>> 3.57 ± 7% -1.1 2.44 ± 6% perf-stat.overall.branch-miss-rate%
>>>>> 1508 ± 3% +39.8% 2108 ± 9% perf-stat.overall.cycles-between-cache-misses
>>>>> 3022 ± 3% +23.6% 3736 ± 5% perf-stat.overall.instructions-per-iTLB-miss
>>>>> 9.585e+08 +18.8% 1.138e+09 ± 6% perf-stat.ps.branch-instructions
>>>>> 34151514 ± 6% -18.8% 27725316 ± 4% perf-stat.ps.branch-misses
>>>>> 4450329 ± 5% -21.7% 3486409 ± 9% perf-stat.ps.cache-misses
>>>>> 25524 -16.4% 21333 ± 4% perf-stat.ps.context-switches
>>>>> 77139 +2.5% 79105 perf-stat.ps.cpu-clock
>>>>> 6.704e+09 ± 4% +8.7% 7.287e+09 perf-stat.ps.cpu-cycles
>>>>> 1.06e+09 +11.3% 1.18e+09 ± 5% perf-stat.ps.dTLB-loads
>>>>> 1517349 ± 3% -10.5% 1357716 ± 2% perf-stat.ps.iTLB-load-misses
>>>>> 4.582e+09 +10.8% 5.075e+09 ± 6% perf-stat.ps.instructions
>>>>> 1296 -18.1% 1061 ± 6% perf-stat.ps.major-faults
>>>>> 247613 ± 4% -19.5% 199283 ± 9% perf-stat.ps.node-stores
>>>>> 77139 +2.5% 79105 perf-stat.ps.task-clock
>>>>> 3.697e+10 +35.3% 5.003e+10 perf-stat.total.instructions
>>>>> 8.51 ± 91% -6.9 1.59 ±223% perf-profile.calltrace.cycles-pp.getdents64
>>>>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>>>> 6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
>>>>> 6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
>>>>> 6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.open64
>>>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getdents64
>>>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
>>>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
>>>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.iterate_dir.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
>>>>> 6.26 ±115% -5.4 0.88 ±223% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>>>> 6.26 ±115% -5.4 0.88 ±223% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare
>>>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop
>>>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
>>>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
>>>>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record
>>>>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist
>>>>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push
>>>>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record
>>>>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.mmput.exit_mm.do_exit
>>>>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.mmput.exit_mm
>>>>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.mmput
>>>>> 3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
>>>>> 8.51 ± 91% -6.9 1.59 ±223% perf-profile.children.cycles-pp.getdents64
>>>>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
>>>>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.exit_to_user_mode_loop
>>>>> 8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
>>>>> 6.25 ±107% -6.2 0.00 perf-profile.children.cycles-pp.open64
>>>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.children.cycles-pp.__x64_sys_getdents64
>>>>> 7.63 ± 84% -6.0 1.59 ±223% perf-profile.children.cycles-pp.iterate_dir
>>>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.children.cycles-pp.arch_do_signal_or_restart
>>>>> 6.26 ±115% -4.6 1.67 ±141% perf-profile.children.cycles-pp.get_signal
>>>>> 4.57 ± 73% -3.8 0.76 ±223% perf-profile.children.cycles-pp.perf_mmap__read_head
>>>>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.unmap_vmas
>>>>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.unmap_page_range
>>>>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.zap_pmd_range
>>>>> 3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.zap_pte_range
>>>>> 8.54 ± 43% +8.6 17.19 ± 38% perf-profile.children.cycles-pp.asm_exc_page_fault
>>>>>
>>>>>
>>>>> If you fix the issue, kindly add following tag
>>>>> Reported-by: kernel test robot <[email protected]>
>>>>>
>>>>>
>>>>> To reproduce:
>>>>>
>>>>> git clone https://github.com/intel/lkp-tests.git
>>>>> cd lkp-tests
>>>>> sudo bin/lkp install job.yaml # job file is attached in this email
>>>>> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>>>>> sudo bin/lkp run generated-yaml-file
>>>>>
>>>>> # if come across any failure that blocks the test,
>>>>> # please remove ~/.lkp and /lkp dir to run from a clean state.
>>>>>
>>>>>
>>>>> Disclaimer:
>>>>> Results have been estimated based on internal Intel analysis and are provided
>>>>> for informational purposes only. Any difference in system hardware or software
>>>>> design or configuration may affect actual performance.
>>>>>
>>>>>
>>>>> #regzbot introduced: 5efdd9122e
>>>>>
>>>>>
>>>
>>>
>>>
>
>
>