2021-03-09 07:46:05

by kernel test robot

[permalink] [raw]
Subject: [mm/filemap] cbd59c48ae: fxmark.hdd_ext4_no_jnl_DRBM_9_bufferedio.works/sec -7.6% regression


Greeting,

FYI, we noticed a -7.6% regression of fxmark.hdd_ext4_no_jnl_DRBM_9_bufferedio.works/sec due to commit:


commit: cbd59c48ae2bcadc4a7599c29cf32fd3f9b78251 ("mm/filemap: use head pages in generic_file_buffered_read")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: fxmark
on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory
with following parameters:

disk: 1HDD
media: hdd
test: DRBM
fstype: ext4_no_jnl
directio: bufferedio
cpufreq_governor: performance
ucode: 0x11




If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml
bin/lkp run compatible-job.yaml

=========================================================================================
compiler/cpufreq_governor/directio/disk/fstype/kconfig/media/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/bufferedio/1HDD/ext4_no_jnl/x86_64-rhel-8.3/hdd/debian-10.4-x86_64-20200603.cgz/lkp-knm01/DRBM/fxmark/0x11

commit:
ff993ba130 ("mm/filemap: convert filemap_get_pages to take a pagevec")
cbd59c48ae ("mm/filemap: use head pages in generic_file_buffered_read")

ff993ba130009b1b cbd59c48ae2bcadc4a7599c29cf
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:6 17% 1:6 dmesg.WARNING:missing_R10_value_at__fsnotify_parent/0x
%stddev %change %stddev
\ | \
0.05 ? 5% -10.1% 0.05 ? 3% fxmark.hdd_ext4_no_jnl_DRBM_18_bufferedio.softirq_util
4168491 -7.6% 3849925 fxmark.hdd_ext4_no_jnl_DRBM_9_bufferedio.works/sec
300.00 +2.1% 306.16 fxmark.time.system_time
87.53 -6.7% 81.69 fxmark.time.user_time
784.83 ? 5% +23.6% 970.33 ? 7% perf-sched.wait_and_delay.count.preempt_schedule_common.__cond_resched.copy_page_to_iter.generic_file_buffered_read.new_sync_read
1174 ? 5% +10.1% 1293 ? 4% slabinfo.pool_workqueue.active_objs
19.00 -5.3% 18.00 vmstat.cpu.us
4963 ? 35% +50.1% 7451 ? 18% interrupts.CPU16.CAL:Function_call_interrupts
10193 ? 17% -30.7% 7068 ? 35% interrupts.CPU4.CAL:Function_call_interrupts
43.75 +2.1% 44.67 iostat.cpu.system
19.44 -4.7% 18.52 iostat.cpu.user
38399 ? 9% -12.8% 33473 ? 7% softirqs.CPU14.RCU
44177 ? 5% -12.1% 38823 ? 8% softirqs.CPU3.RCU
15421 ? 9% -13.9% 13275 ? 5% softirqs.CPU50.RCU
7.59 -7.6 0.00 perf-profile.calltrace.cycles-pp.find_get_pages_contig.filemap_get_pages.generic_file_buffered_read.new_sync_read.vfs_read
14.72 -1.2 13.53 perf-profile.calltrace.cycles-pp.copyout.copy_page_to_iter.generic_file_buffered_read.new_sync_read.vfs_read
13.97 -1.1 12.85 perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyout.copy_page_to_iter.generic_file_buffered_read.new_sync_read
6.55 -0.9 5.64 perf-profile.calltrace.cycles-pp.touch_atime.generic_file_buffered_read.new_sync_read.vfs_read.ksys_pread64
23.46 -0.9 22.59 perf-profile.calltrace.cycles-pp.copy_page_to_iter.generic_file_buffered_read.new_sync_read.vfs_read.ksys_pread64
8.20 -0.8 7.37 perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_pread64.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.55 -0.7 6.88 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret
6.55 -0.5 6.02 perf-profile.calltrace.cycles-pp.__entry_text_start
5.68 -0.5 5.17 perf-profile.calltrace.cycles-pp.atime_needs_update.touch_atime.generic_file_buffered_read.new_sync_read.vfs_read
3.68 -0.5 3.19 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe
4.25 -0.4 3.88 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack
1.56 -0.3 1.25 perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe
3.37 -0.3 3.08 perf-profile.calltrace.cycles-pp.common_file_perm.security_file_permission.vfs_read.ksys_pread64.do_syscall_64
2.91 -0.3 2.65 ? 2% perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_read.ksys_pread64.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.67 ? 2% -0.2 0.43 ? 44% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe
2.04 -0.2 1.83 ? 3% perf-profile.calltrace.cycles-pp.ext4_file_read_iter.new_sync_read.vfs_read.ksys_pread64.do_syscall_64
2.34 -0.2 2.15 perf-profile.calltrace.cycles-pp.make_kuid.atime_needs_update.touch_atime.generic_file_buffered_read.new_sync_read
2.23 -0.2 2.05 perf-profile.calltrace.cycles-pp.map_id_range_down.make_kuid.atime_needs_update.touch_atime.generic_file_buffered_read
1.20 -0.1 1.08 perf-profile.calltrace.cycles-pp.current_time.atime_needs_update.touch_atime.generic_file_buffered_read.new_sync_read
1.37 -0.1 1.25 perf-profile.calltrace.cycles-pp.___might_sleep.copy_page_to_iter.generic_file_buffered_read.new_sync_read.vfs_read
1.45 -0.1 1.34 perf-profile.calltrace.cycles-pp.___might_sleep.__might_fault.copy_page_to_iter.generic_file_buffered_read.new_sync_read
1.35 -0.1 1.24 perf-profile.calltrace.cycles-pp.___might_sleep.generic_file_buffered_read.new_sync_read.vfs_read.ksys_pread64
0.64 ? 2% -0.1 0.58 perf-profile.calltrace.cycles-pp.__x64_sys_pread64.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.80 -0.1 0.74 ? 2% perf-profile.calltrace.cycles-pp.make_kgid.atime_needs_update.touch_atime.generic_file_buffered_read.new_sync_read
0.85 -0.1 0.80 ? 2% perf-profile.calltrace.cycles-pp.aa_file_perm.common_file_perm.security_file_permission.vfs_read.ksys_pread64
0.77 -0.1 0.71 ? 2% perf-profile.calltrace.cycles-pp.map_id_range_down.make_kgid.atime_needs_update.touch_atime.generic_file_buffered_read
0.61 ? 3% -0.0 0.56 ? 2% perf-profile.calltrace.cycles-pp.generic_file_read_iter.new_sync_read.vfs_read.ksys_pread64.do_syscall_64
1.36 +1.4 2.78 perf-profile.calltrace.cycles-pp.mark_page_accessed.generic_file_buffered_read.new_sync_read.vfs_read.ksys_pread64
79.56 +1.7 81.22 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
73.65 +2.3 75.91 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
71.04 +2.5 73.50 perf-profile.calltrace.cycles-pp.ksys_pread64.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +2.7 2.67 ? 8% perf-profile.calltrace.cycles-pp.xas_load.filemap_get_read_batch.filemap_get_pages.generic_file_buffered_read.new_sync_read
67.69 +2.9 70.56 perf-profile.calltrace.cycles-pp.vfs_read.ksys_pread64.do_syscall_64.entry_SYSCALL_64_after_hwframe
10.00 +4.1 14.07 perf-profile.calltrace.cycles-pp.filemap_get_pages.generic_file_buffered_read.new_sync_read.vfs_read.ksys_pread64
53.17 +4.5 57.67 perf-profile.calltrace.cycles-pp.new_sync_read.vfs_read.ksys_pread64.do_syscall_64.entry_SYSCALL_64_after_hwframe
47.84 +5.0 52.86 perf-profile.calltrace.cycles-pp.generic_file_buffered_read.new_sync_read.vfs_read.ksys_pread64.do_syscall_64
0.00 +11.9 11.90 perf-profile.calltrace.cycles-pp.filemap_get_read_batch.filemap_get_pages.generic_file_buffered_read.new_sync_read.vfs_read
7.69 -7.7 0.00 perf-profile.children.cycles-pp.find_get_pages_contig
14.89 -1.2 13.69 perf-profile.children.cycles-pp.copyout
14.13 -1.1 12.98 perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
6.62 -0.9 5.70 perf-profile.children.cycles-pp.touch_atime
8.28 -0.8 7.43 perf-profile.children.cycles-pp.security_file_permission
23.59 -0.8 22.76 perf-profile.children.cycles-pp.copy_page_to_iter
8.87 -0.8 8.10 perf-profile.children.cycles-pp.syscall_return_via_sysret
7.34 -0.6 6.73 perf-profile.children.cycles-pp.__entry_text_start
5.76 -0.5 5.24 perf-profile.children.cycles-pp.atime_needs_update
3.75 -0.4 3.33 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
4.28 -0.3 3.94 perf-profile.children.cycles-pp.___might_sleep
1.58 -0.3 1.27 perf-profile.children.cycles-pp.exit_to_user_mode_prepare
3.51 -0.3 3.21 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
3.41 -0.3 3.11 perf-profile.children.cycles-pp.common_file_perm
0.80 ? 2% -0.3 0.52 ? 3% perf-profile.children.cycles-pp.rw_verify_area
2.97 -0.3 2.71 ? 2% perf-profile.children.cycles-pp.__fsnotify_parent
3.03 -0.2 2.79 perf-profile.children.cycles-pp.map_id_range_down
2.04 -0.2 1.83 ? 3% perf-profile.children.cycles-pp.ext4_file_read_iter
2.37 -0.2 2.18 perf-profile.children.cycles-pp.make_kuid
0.67 ? 2% -0.2 0.52 ? 2% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
1.27 -0.1 1.14 perf-profile.children.cycles-pp.current_time
0.58 ? 3% -0.1 0.51 perf-profile.children.cycles-pp.timestamp_truncate
0.87 -0.1 0.81 ? 2% perf-profile.children.cycles-pp.make_kgid
0.89 -0.1 0.83 ? 2% perf-profile.children.cycles-pp.aa_file_perm
0.64 ? 2% -0.1 0.59 perf-profile.children.cycles-pp.__x64_sys_pread64
0.73 -0.1 0.68 ? 2% perf-profile.children.cycles-pp.__cond_resched
0.60 ? 4% -0.1 0.55 ? 2% perf-profile.children.cycles-pp.apparmor_file_permission
0.61 ? 3% -0.0 0.56 ? 2% perf-profile.children.cycles-pp.generic_file_read_iter
0.47 ? 3% -0.0 0.42 ? 6% perf-profile.children.cycles-pp.__x86_retpoline_rax
0.51 ? 3% -0.0 0.47 ? 3% perf-profile.children.cycles-pp.rcu_nocb_flush_deferred_wakeup
0.30 ? 2% -0.0 0.27 perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
1.39 ? 2% +1.4 2.78 perf-profile.children.cycles-pp.mark_page_accessed
79.95 +1.6 81.58 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
74.07 +2.2 76.29 perf-profile.children.cycles-pp.do_syscall_64
71.08 +2.5 73.54 perf-profile.children.cycles-pp.ksys_pread64
67.85 +2.9 70.71 perf-profile.children.cycles-pp.vfs_read
10.04 +4.1 14.11 perf-profile.children.cycles-pp.filemap_get_pages
53.27 +4.5 57.75 perf-profile.children.cycles-pp.new_sync_read
48.04 +5.1 53.11 perf-profile.children.cycles-pp.generic_file_buffered_read
0.00 +12.0 11.98 perf-profile.children.cycles-pp.filemap_get_read_batch
13.65 -1.1 12.50 perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
8.87 -0.8 8.10 perf-profile.self.cycles-pp.syscall_return_via_sysret
7.34 -0.6 6.73 perf-profile.self.cycles-pp.__entry_text_start
4.04 -0.5 3.56 ? 2% perf-profile.self.cycles-pp.security_file_permission
1.68 -0.4 1.27 perf-profile.self.cycles-pp.ksys_pread64
0.81 ? 2% -0.4 0.45 ? 3% perf-profile.self.cycles-pp.touch_atime
4.11 -0.3 3.77 perf-profile.self.cycles-pp.___might_sleep
2.24 -0.3 1.93 ? 2% perf-profile.self.cycles-pp.new_sync_read
0.74 ? 2% -0.3 0.47 ? 4% perf-profile.self.cycles-pp.rw_verify_area
1.01 -0.3 0.75 perf-profile.self.cycles-pp.exit_to_user_mode_prepare
2.82 -0.3 2.57 ? 2% perf-profile.self.cycles-pp.__fsnotify_parent
2.53 -0.2 2.29 perf-profile.self.cycles-pp.vfs_read
2.45 -0.2 2.21 perf-profile.self.cycles-pp.common_file_perm
2.90 -0.2 2.67 perf-profile.self.cycles-pp.map_id_range_down
2.27 -0.2 2.03 perf-profile.self.cycles-pp.filemap_get_pages
2.73 -0.2 2.50 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.64 -0.2 0.43 ? 3% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
1.88 -0.2 1.69 ? 2% perf-profile.self.cycles-pp.ext4_file_read_iter
2.17 -0.2 1.99 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
1.24 ? 2% -0.1 1.11 ? 4% perf-profile.self.cycles-pp.atime_needs_update
0.53 ? 4% -0.1 0.46 ? 2% perf-profile.self.cycles-pp.timestamp_truncate
0.59 ? 2% -0.1 0.53 perf-profile.self.cycles-pp.__x64_sys_pread64
0.81 ? 2% -0.1 0.75 ? 2% perf-profile.self.cycles-pp.do_syscall_64
0.82 -0.1 0.77 ? 2% perf-profile.self.cycles-pp.aa_file_perm
0.74 -0.1 0.69 perf-profile.self.cycles-pp.copyout
0.59 ? 3% -0.0 0.54 ? 2% perf-profile.self.cycles-pp.generic_file_read_iter
0.45 ? 2% -0.0 0.41 ? 2% perf-profile.self.cycles-pp.current_time
0.51 ? 3% -0.0 0.47 ? 3% perf-profile.self.cycles-pp.rcu_nocb_flush_deferred_wakeup
0.47 ? 4% -0.0 0.44 ? 2% perf-profile.self.cycles-pp.apparmor_file_permission
0.42 -0.0 0.38 perf-profile.self.cycles-pp.__cond_resched
0.26 ? 2% -0.0 0.23 perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
0.35 ? 4% +0.1 0.43 ? 4% perf-profile.self.cycles-pp.__might_fault
1.51 ? 2% +0.1 1.61 ? 2% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
4.08 +0.5 4.58 perf-profile.self.cycles-pp.copy_page_to_iter
1.35 ? 2% +1.3 2.62 perf-profile.self.cycles-pp.mark_page_accessed
4.69 ? 3% +1.5 6.15 ? 2% perf-profile.self.cycles-pp.generic_file_buffered_read
0.00 +8.9 8.88 perf-profile.self.cycles-pp.filemap_get_read_batch



fxmark.hdd_ext4_no_jnl_DRBM_9_bufferedio.works_sec

4.4e+06 +-----------------------------------------------------------------+
|.++ + .+.+ +.++.+.++ |
4.3e+06 |-+ : : + : : : |
| : : : .+ : : |
4.2e+06 |-+ + +.+.++. +. .++.+ + +.++. .++. |
| + + +.++ + |
4.1e+06 |-+ |
| |
4e+06 |-+ |
| |
3.9e+06 |-O O O O O O O |
| O O O O O O O O O O OO O O O O O |
3.8e+06 |-+O OO OO O O O O OO O |
| O O O |
3.7e+06 +-----------------------------------------------------------------+


fxmark.hdd_ext4_no_jnl_DRBM_2_bufferedio.works

2.8e+07 +----------------------------------------------------------------+
| + +. |
2.75e+07 |++:+ + ++.++.+.+ |
2.7e+07 |-++ +.++. + : |
| +.++ +. .+. +.+ + + |
2.65e+07 |-+ + .+ ++ + + + :.+ .+.++ |
2.6e+07 |-+ + + + + |
| |
2.55e+07 |-+ |
2.5e+07 |-+ |
| |
2.45e+07 |-+ O O O O O O O O O OO |
2.4e+07 |-+O O O O O O O OO O OO O O O O O OO OO O O |
| O O O O |
2.35e+07 +----------------------------------------------------------------+


fxmark.hdd_ext4_no_jnl_DRBM_2_bufferedio.works_sec

940000 +------------------------------------------------------------------+
| |
920000 |-+ +.+ + .+ |
900000 |++:+ +. : +.+ +.+ |
| + +.++. : + |
880000 |-+ +.+ .+.++.+.++.+.+ + .+. + ++. |
| + + + + + + |
860000 |-+ + |
| |
840000 |-+ |
820000 |-+ |
| OO O O O O O O O O O O OO |
800000 |-OO O O O OO O OO O O O O OO O OO O O O |
| O O O |
780000 +------------------------------------------------------------------+


fxmark.hdd_ext4_no_jnl_DRBM_1_bufferedio.works

1.28e+07 +----------------------------------------------------------------+
1.26e+07 |.+ .+ + ++ + |
| +.+ : :: : : :: |
1.24e+07 |-+ :.++.+.+ .+. : :: : : : |
1.22e+07 |-+ + + + .++ +.+ + + :.+. .+ |
| + + : + ++.++.+ : |
1.2e+07 |-+ +.+ + |
1.18e+07 |-+ |
1.16e+07 |-+ |
| O |
1.14e+07 |-+ O O |
1.12e+07 |-OO O O O O O O OO O OO |
| O OO O O OO O O O O O O O O |
1.1e+07 |-+ O OO O O O O |
1.08e+07 +----------------------------------------------------------------+


fxmark.hdd_ext4_no_jnl_DRBM_1_bufferedio.works_sec

430000 +------------------------------------------------------------------+
| |
420000 |.++.+.+ + +.+ + |
| :.+.+ + :+ :+ : |
410000 |-+ + +.+. +.+. .+ + + : + |
| + ++.+. .+ + + .++. .++ |
400000 |-+ ++ + + + |
| + |
390000 |-+ |
| O |
380000 |-+ O |
| O O O O O O OO O OO |
370000 |-+O O OO O O O O O O O O O O O O O |
| O O O OO O O O |
360000 +------------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (24.38 kB)
config-5.11.0-10234-gcbd59c48ae2b (175.12 kB)
job-script (8.16 kB)
job.yaml (5.73 kB)
reproduce (263.00 B)
Download all attachments

2021-03-09 13:10:28

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [mm/filemap] cbd59c48ae: fxmark.hdd_ext4_no_jnl_DRBM_9_bufferedio.works/sec -7.6% regression

On Tue, Mar 09, 2021 at 03:57:06PM +0800, kernel test robot wrote:
> FYI, we noticed a -7.6% regression of fxmark.hdd_ext4_no_jnl_DRBM_9_bufferedio.works/sec due to commit:
>
> commit: cbd59c48ae2bcadc4a7599c29cf32fd3f9b78251 ("mm/filemap: use head pages in generic_file_buffered_read")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: fxmark
> on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory

Can you send me one of those to test on? ;-)

> %stddev %change %stddev
> \ | \
> 0.05 ? 5% -10.1% 0.05 ? 3% fxmark.hdd_ext4_no_jnl_DRBM_18_bufferedio.softirq_util
> 4168491 -7.6% 3849925 fxmark.hdd_ext4_no_jnl_DRBM_9_bufferedio.works/sec
> 300.00 +2.1% 306.16 fxmark.time.system_time
> 87.53 -6.7% 81.69 fxmark.time.user_time
> 784.83 ? 5% +23.6% 970.33 ? 7% perf-sched.wait_and_delay.count.preempt_schedule_common.__cond_resched.copy_page_to_iter.generic_file_buffered_read.new_sync_read

23% more delay while preempted copying to user? That seems bad, but I
don't see anything in this commit that would cause that.

> 7.59 -7.6 0.00 perf-profile.calltrace.cycles-pp.find_get_pages_contig.filemap_get_pages.generic_file_buffered_read.new_sync_read.vfs_read

That makes sense; we don't call find_get_pages_contig() any more, instead
we call ...

> 0.00 +11.9 11.90 perf-profile.calltrace.cycles-pp.filemap_get_read_batch.filemap_get_pages.generic_file_buffered_read.new_sync_read.vfs_read

filemap_get_read_batch() ... which is more expensive ;-(

if (PageReadahead(head))
break;
+ if (!PageHead(head))
+ continue;
xas.xa_index = head->index + thp_nr_pages(head) - 1;
xas.xa_offset = (xas.xa_index >> xas.xa_shift) & XA_CHUNK_MASK;

might be worth a try, but I have a medical appointment to get to.
I'll test it out later.