Greeting,
FYI, we noticed a -15.0% regression of stress-ng.copy-file.ops_per_sec due to commit:
commit: 0568e6122574dcc1aded2979cd0245038efe22b6 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
in testcase: stress-ng
on test machine: 96 threads 2 sockets Ice Lake with 256G memory
with following parameters:
nr_threads: 10%
disk: 1HDD
testtime: 60s
fs: f2fs
class: filesystem
test: copy-file
cpufreq_governor: performance
ucode: 0xb000280
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
filesystem/gcc-11/performance/1HDD/f2fs/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp1/copy-file/stress-ng/60s/0xb000280
commit:
4cbfca5f77 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit")
0568e61225 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
4cbfca5f7750520f 0568e6122574dcc1aded2979cd0
---------------- ---------------------------
%stddev %change %stddev
\ | \
1627 -14.9% 1385 stress-ng.copy-file.ops
27.01 -15.0% 22.96 stress-ng.copy-file.ops_per_sec
8935079 -11.9% 7870629 stress-ng.time.file_system_outputs
14.88 ? 5% -31.8% 10.14 ? 3% stress-ng.time.percent_of_cpu_this_job_got
50912 -14.7% 43413 vmstat.io.bo
93.78 +1.4% 95.10 iostat.cpu.idle
3.89 -31.6% 2.66 iostat.cpu.iowait
4.01 -1.3 2.74 mpstat.cpu.all.iowait%
0.23 ? 9% -0.1 0.17 ? 11% mpstat.cpu.all.sys%
1.66 ? 37% -1.2 0.51 ? 55% perf-profile.calltrace.cycles-pp.f2fs_write_end.generic_perform_write.f2fs_buffered_write_iter.f2fs_file_write_iter.do_iter_readv_writev
1.66 ? 37% -1.1 0.59 ? 25% perf-profile.children.cycles-pp.f2fs_write_end
1.51 ? 40% -1.1 0.45 ? 26% perf-profile.children.cycles-pp.f2fs_dirty_data_folio
1.21 ? 49% -1.0 0.23 ? 33% perf-profile.children.cycles-pp.f2fs_update_dirty_folio
0.88 ? 56% -0.8 0.04 ?111% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
0.14 ? 26% +0.1 0.25 ? 28% perf-profile.children.cycles-pp.page_cache_ra_unbounded
0.88 ? 56% -0.8 0.04 ?112% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
3164876 ? 9% -20.2% 2524713 ? 7% perf-stat.i.cache-misses
4.087e+08 -4.6% 3.899e+08 perf-stat.i.dTLB-loads
313050 ? 10% -18.4% 255410 ? 6% perf-stat.i.node-loads
972573 ? 9% -16.4% 812873 ? 6% perf-stat.i.node-stores
3114748 ? 9% -20.2% 2484807 ? 7% perf-stat.ps.cache-misses
4.022e+08 -4.6% 3.837e+08 perf-stat.ps.dTLB-loads
308178 ? 10% -18.4% 251418 ? 6% perf-stat.ps.node-loads
956996 ? 9% -16.4% 799948 ? 6% perf-stat.ps.node-stores
358486 -8.3% 328694 proc-vmstat.nr_active_file
1121620 -11.9% 987816 proc-vmstat.nr_dirtied
179906 -6.7% 167912 proc-vmstat.nr_dirty
1151201 -1.7% 1131322 proc-vmstat.nr_file_pages
100181 +9.9% 110078 ? 2% proc-vmstat.nr_inactive_file
846362 -14.6% 722471 proc-vmstat.nr_written
358486 -8.3% 328694 proc-vmstat.nr_zone_active_file
100181 +9.9% 110078 ? 2% proc-vmstat.nr_zone_inactive_file
180668 -6.8% 168456 proc-vmstat.nr_zone_write_pending
556469 -3.5% 536985 proc-vmstat.pgactivate
3385454 -14.6% 2889953 proc-vmstat.pgpgout
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://01.org/lkp
On 2022/08/05 1:05, kernel test robot wrote:
>
>
> Greeting,
>
> FYI, we noticed a -15.0% regression of stress-ng.copy-file.ops_per_sec due to commit:
>
>
> commit: 0568e6122574dcc1aded2979cd0245038efe22b6 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> in testcase: stress-ng
> on test machine: 96 threads 2 sockets Ice Lake with 256G memory
> with following parameters:
>
> nr_threads: 10%
> disk: 1HDD
> testtime: 60s
> fs: f2fs
> class: filesystem
> test: copy-file
> cpufreq_governor: performance
> ucode: 0xb000280
Without knowing what the device adapter is, hard to say where the problem is. I
suspect that with the patch applied, we may be ending up with a small default
max_sectors value, causing overhead due to more commands than necessary.
Will check what I see with my test rig.
>
>
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> sudo bin/lkp install job.yaml # job file is attached in this email
> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> sudo bin/lkp run generated-yaml-file
>
> # if come across any failure that blocks the test,
> # please remove ~/.lkp and /lkp dir to run from a clean state.
>
> =========================================================================================
> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
> filesystem/gcc-11/performance/1HDD/f2fs/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp1/copy-file/stress-ng/60s/0xb000280
>
> commit:
> 4cbfca5f77 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit")
> 0568e61225 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
>
> 4cbfca5f7750520f 0568e6122574dcc1aded2979cd0
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 1627 -14.9% 1385 stress-ng.copy-file.ops
> 27.01 -15.0% 22.96 stress-ng.copy-file.ops_per_sec
> 8935079 -11.9% 7870629 stress-ng.time.file_system_outputs
> 14.88 ± 5% -31.8% 10.14 ± 3% stress-ng.time.percent_of_cpu_this_job_got
> 50912 -14.7% 43413 vmstat.io.bo
> 93.78 +1.4% 95.10 iostat.cpu.idle
> 3.89 -31.6% 2.66 iostat.cpu.iowait
> 4.01 -1.3 2.74 mpstat.cpu.all.iowait%
> 0.23 ± 9% -0.1 0.17 ± 11% mpstat.cpu.all.sys%
> 1.66 ± 37% -1.2 0.51 ± 55% perf-profile.calltrace.cycles-pp.f2fs_write_end.generic_perform_write.f2fs_buffered_write_iter.f2fs_file_write_iter.do_iter_readv_writev
> 1.66 ± 37% -1.1 0.59 ± 25% perf-profile.children.cycles-pp.f2fs_write_end
> 1.51 ± 40% -1.1 0.45 ± 26% perf-profile.children.cycles-pp.f2fs_dirty_data_folio
> 1.21 ± 49% -1.0 0.23 ± 33% perf-profile.children.cycles-pp.f2fs_update_dirty_folio
> 0.88 ± 56% -0.8 0.04 ±111% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> 0.14 ± 26% +0.1 0.25 ± 28% perf-profile.children.cycles-pp.page_cache_ra_unbounded
> 0.88 ± 56% -0.8 0.04 ±112% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
> 3164876 ± 9% -20.2% 2524713 ± 7% perf-stat.i.cache-misses
> 4.087e+08 -4.6% 3.899e+08 perf-stat.i.dTLB-loads
> 313050 ± 10% -18.4% 255410 ± 6% perf-stat.i.node-loads
> 972573 ± 9% -16.4% 812873 ± 6% perf-stat.i.node-stores
> 3114748 ± 9% -20.2% 2484807 ± 7% perf-stat.ps.cache-misses
> 4.022e+08 -4.6% 3.837e+08 perf-stat.ps.dTLB-loads
> 308178 ± 10% -18.4% 251418 ± 6% perf-stat.ps.node-loads
> 956996 ± 9% -16.4% 799948 ± 6% perf-stat.ps.node-stores
> 358486 -8.3% 328694 proc-vmstat.nr_active_file
> 1121620 -11.9% 987816 proc-vmstat.nr_dirtied
> 179906 -6.7% 167912 proc-vmstat.nr_dirty
> 1151201 -1.7% 1131322 proc-vmstat.nr_file_pages
> 100181 +9.9% 110078 ± 2% proc-vmstat.nr_inactive_file
> 846362 -14.6% 722471 proc-vmstat.nr_written
> 358486 -8.3% 328694 proc-vmstat.nr_zone_active_file
> 100181 +9.9% 110078 ± 2% proc-vmstat.nr_zone_inactive_file
> 180668 -6.8% 168456 proc-vmstat.nr_zone_write_pending
> 556469 -3.5% 536985 proc-vmstat.pgactivate
> 3385454 -14.6% 2889953 proc-vmstat.pgpgout
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
--
Damien Le Moal
Western Digital Research
On 08/08/2022 15:52, Damien Le Moal wrote:
> On 2022/08/05 1:05, kernel test robot wrote:
>>
>>
>> Greeting,
>>
>> FYI, we noticed a -15.0% regression of stress-ng.copy-file.ops_per_sec due to commit:
>>
>>
>> commit: 0568e6122574dcc1aded2979cd0245038efe22b6 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>
>> in testcase: stress-ng
>> on test machine: 96 threads 2 sockets Ice Lake with 256G memory
>> with following parameters:
>>
>> nr_threads: 10%
>> disk: 1HDD
>> testtime: 60s
>> fs: f2fs
>> class: filesystem
>> test: copy-file
>> cpufreq_governor: performance
>> ucode: 0xb000280
>
> Without knowing what the device adapter is, hard to say where the problem is. I
> suspect that with the patch applied, we may be ending up with a small default
> max_sectors value, causing overhead due to more commands than necessary.
>
> Will check what I see with my test rig.
As far as I can see, this patch should not make a difference unless the
ATA shost driver is setting the max_sectors value unnecessarily low.
>
>>
>>
>>
>>
>> If you fix the issue, kindly add following tag
>> Reported-by: kernel test robot <[email protected]>
>>
>>
>> Details are as below:
>> -------------------------------------------------------------------------------------------------->
>>
>>
>> To reproduce:
>>
>> git clone https://github.com/intel/lkp-tests.git
>> cd lkp-tests
>> sudo bin/lkp install job.yaml # job file is attached in this email
>> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>> sudo bin/lkp run generated-yaml-file
>>
>> # if come across any failure that blocks the test,
>> # please remove ~/.lkp and /lkp dir to run from a clean state.
>>
>> =========================================================================================
>> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
>> filesystem/gcc-11/performance/1HDD/f2fs/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp1/copy-file/stress-ng/60s/0xb000280
>>
>> commit:
>> 4cbfca5f77 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit")
>> 0568e61225 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
>>
>> 4cbfca5f7750520f 0568e6122574dcc1aded2979cd0
>> ---------------- ---------------------------
>> %stddev %change %stddev
>> \ | \
>> 1627 -14.9% 1385 stress-ng.copy-file.ops
>> 27.01 -15.0% 22.96 stress-ng.copy-file.ops_per_sec
>> 8935079 -11.9% 7870629 stress-ng.time.file_system_outputs
>> 14.88 ± 5% -31.8% 10.14 ± 3% stress-ng.time.percent_of_cpu_this_job_got
>> 50912 -14.7% 43413 vmstat.io.bo
>> 93.78 +1.4% 95.10 iostat.cpu.idle
>> 3.89 -31.6% 2.66 iostat.cpu.iowait
>> 4.01 -1.3 2.74 mpstat.cpu.all.iowait%
>> 0.23 ± 9% -0.1 0.17 ± 11% mpstat.cpu.all.sys%
>> 1.66 ± 37% -1.2 0.51 ± 55% perf-profile.calltrace.cycles-pp.f2fs_write_end.generic_perform_write.f2fs_buffered_write_iter.f2fs_file_write_iter.do_iter_readv_writev
>> 1.66 ± 37% -1.1 0.59 ± 25% perf-profile.children.cycles-pp.f2fs_write_end
>> 1.51 ± 40% -1.1 0.45 ± 26% perf-profile.children.cycles-pp.f2fs_dirty_data_folio
>> 1.21 ± 49% -1.0 0.23 ± 33% perf-profile.children.cycles-pp.f2fs_update_dirty_folio
>> 0.88 ± 56% -0.8 0.04 ±111% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>> 0.14 ± 26% +0.1 0.25 ± 28% perf-profile.children.cycles-pp.page_cache_ra_unbounded
>> 0.88 ± 56% -0.8 0.04 ±112% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>> 3164876 ± 9% -20.2% 2524713 ± 7% perf-stat.i.cache-misses
>> 4.087e+08 -4.6% 3.899e+08 perf-stat.i.dTLB-loads
>> 313050 ± 10% -18.4% 255410 ± 6% perf-stat.i.node-loads
>> 972573 ± 9% -16.4% 812873 ± 6% perf-stat.i.node-stores
>> 3114748 ± 9% -20.2% 2484807 ± 7% perf-stat.ps.cache-misses
>> 4.022e+08 -4.6% 3.837e+08 perf-stat.ps.dTLB-loads
>> 308178 ± 10% -18.4% 251418 ± 6% perf-stat.ps.node-loads
>> 956996 ± 9% -16.4% 799948 ± 6% perf-stat.ps.node-stores
>> 358486 -8.3% 328694 proc-vmstat.nr_active_file
>> 1121620 -11.9% 987816 proc-vmstat.nr_dirtied
>> 179906 -6.7% 167912 proc-vmstat.nr_dirty
>> 1151201 -1.7% 1131322 proc-vmstat.nr_file_pages
>> 100181 +9.9% 110078 ± 2% proc-vmstat.nr_inactive_file
>> 846362 -14.6% 722471 proc-vmstat.nr_written
>> 358486 -8.3% 328694 proc-vmstat.nr_zone_active_file
>> 100181 +9.9% 110078 ± 2% proc-vmstat.nr_zone_inactive_file
>> 180668 -6.8% 168456 proc-vmstat.nr_zone_write_pending
>> 556469 -3.5% 536985 proc-vmstat.pgactivate
>> 3385454 -14.6% 2889953 proc-vmstat.pgpgout
>>
>>
>>
>>
>> Disclaimer:
>> Results have been estimated based on internal Intel analysis and are provided
>> for informational purposes only. Any difference in system hardware or software
>> design or configuration may affect actual performance.
>>
>>
>
>
On 09/08/2022 10:58, John Garry wrote:
>>>
>>> commit: 0568e6122574dcc1aded2979cd0245038efe22b6 ("ata: libata-scsi:
>>> cap ata_device->max_sectors according to shost->max_sectors")
>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>
>>> in testcase: stress-ng
>>> on test machine: 96 threads 2 sockets Ice Lake with 256G memory
>>> with following parameters:
>>>
>>> nr_threads: 10%
>>> disk: 1HDD
>>> testtime: 60s
>>> fs: f2fs
>>> class: filesystem
>>> test: copy-file
>>> cpufreq_governor: performance
>>> ucode: 0xb000280
>>
>> Without knowing what the device adapter is, hard to say where the
>> problem is. I
>> suspect that with the patch applied, we may be ending up with a small
>> default
>> max_sectors value, causing overhead due to more commands than necessary.
>>
>> Will check what I see with my test rig.
>
> As far as I can see, this patch should not make a difference unless the
> ATA shost driver is setting the max_sectors value unnecessarily low.
For __ATA_BASE_SHT, we don't set max_sectors. As such, we default
shost->max_sectors = SCSI_DEFAULT_MAX_SECTORS (=1024) in
scsi_host_alloc(). I assume no shost dma mapping limit applied.
Then - for example - we could select dev->max_sectors =
ATA_MAX_SECTORS_LBA48 (=65535) in ata_dev_configure().
So with commit 0568e6122574 we would have final max sectors = 1024, as
opposed to 65535 previously. I guess that the problem is something like
this.
If so, it seems that we would need to apply the shost dma mapping limit
separately in ata_scsi_dev_config() and not use shost->max_sectors.
thanks,
John
On 2022/08/09 2:58, John Garry wrote:
> On 08/08/2022 15:52, Damien Le Moal wrote:
>> On 2022/08/05 1:05, kernel test robot wrote:
>>>
>>>
>>> Greeting,
>>>
>>> FYI, we noticed a -15.0% regression of stress-ng.copy-file.ops_per_sec due to commit:
>>>
>>>
>>> commit: 0568e6122574dcc1aded2979cd0245038efe22b6 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>
>>> in testcase: stress-ng
>>> on test machine: 96 threads 2 sockets Ice Lake with 256G memory
>>> with following parameters:
>>>
>>> nr_threads: 10%
>>> disk: 1HDD
>>> testtime: 60s
>>> fs: f2fs
>>> class: filesystem
>>> test: copy-file
>>> cpufreq_governor: performance
>>> ucode: 0xb000280
>>
>> Without knowing what the device adapter is, hard to say where the problem is. I
>> suspect that with the patch applied, we may be ending up with a small default
>> max_sectors value, causing overhead due to more commands than necessary.
>>
>> Will check what I see with my test rig.
>
> As far as I can see, this patch should not make a difference unless the
> ATA shost driver is setting the max_sectors value unnecessarily low.
That is my hunch too, hence my question about which host driver is being used
for this test... That is not apparent from the problem report.
>
>>
>>>
>>>
>>>
>>>
>>> If you fix the issue, kindly add following tag
>>> Reported-by: kernel test robot <[email protected]>
>>>
>>>
>>> Details are as below:
>>> -------------------------------------------------------------------------------------------------->
>>>
>>>
>>> To reproduce:
>>>
>>> git clone https://github.com/intel/lkp-tests.git
>>> cd lkp-tests
>>> sudo bin/lkp install job.yaml # job file is attached in this email
>>> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>>> sudo bin/lkp run generated-yaml-file
>>>
>>> # if come across any failure that blocks the test,
>>> # please remove ~/.lkp and /lkp dir to run from a clean state.
>>>
>>> =========================================================================================
>>> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
>>> filesystem/gcc-11/performance/1HDD/f2fs/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp1/copy-file/stress-ng/60s/0xb000280
>>>
>>> commit:
>>> 4cbfca5f77 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit")
>>> 0568e61225 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
>>>
>>> 4cbfca5f7750520f 0568e6122574dcc1aded2979cd0
>>> ---------------- ---------------------------
>>> %stddev %change %stddev
>>> \ | \
>>> 1627 -14.9% 1385 stress-ng.copy-file.ops
>>> 27.01 -15.0% 22.96 stress-ng.copy-file.ops_per_sec
>>> 8935079 -11.9% 7870629 stress-ng.time.file_system_outputs
>>> 14.88 ± 5% -31.8% 10.14 ± 3% stress-ng.time.percent_of_cpu_this_job_got
>>> 50912 -14.7% 43413 vmstat.io.bo
>>> 93.78 +1.4% 95.10 iostat.cpu.idle
>>> 3.89 -31.6% 2.66 iostat.cpu.iowait
>>> 4.01 -1.3 2.74 mpstat.cpu.all.iowait%
>>> 0.23 ± 9% -0.1 0.17 ± 11% mpstat.cpu.all.sys%
>>> 1.66 ± 37% -1.2 0.51 ± 55% perf-profile.calltrace.cycles-pp.f2fs_write_end.generic_perform_write.f2fs_buffered_write_iter.f2fs_file_write_iter.do_iter_readv_writev
>>> 1.66 ± 37% -1.1 0.59 ± 25% perf-profile.children.cycles-pp.f2fs_write_end
>>> 1.51 ± 40% -1.1 0.45 ± 26% perf-profile.children.cycles-pp.f2fs_dirty_data_folio
>>> 1.21 ± 49% -1.0 0.23 ± 33% perf-profile.children.cycles-pp.f2fs_update_dirty_folio
>>> 0.88 ± 56% -0.8 0.04 ±111% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>>> 0.14 ± 26% +0.1 0.25 ± 28% perf-profile.children.cycles-pp.page_cache_ra_unbounded
>>> 0.88 ± 56% -0.8 0.04 ±112% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>>> 3164876 ± 9% -20.2% 2524713 ± 7% perf-stat.i.cache-misses
>>> 4.087e+08 -4.6% 3.899e+08 perf-stat.i.dTLB-loads
>>> 313050 ± 10% -18.4% 255410 ± 6% perf-stat.i.node-loads
>>> 972573 ± 9% -16.4% 812873 ± 6% perf-stat.i.node-stores
>>> 3114748 ± 9% -20.2% 2484807 ± 7% perf-stat.ps.cache-misses
>>> 4.022e+08 -4.6% 3.837e+08 perf-stat.ps.dTLB-loads
>>> 308178 ± 10% -18.4% 251418 ± 6% perf-stat.ps.node-loads
>>> 956996 ± 9% -16.4% 799948 ± 6% perf-stat.ps.node-stores
>>> 358486 -8.3% 328694 proc-vmstat.nr_active_file
>>> 1121620 -11.9% 987816 proc-vmstat.nr_dirtied
>>> 179906 -6.7% 167912 proc-vmstat.nr_dirty
>>> 1151201 -1.7% 1131322 proc-vmstat.nr_file_pages
>>> 100181 +9.9% 110078 ± 2% proc-vmstat.nr_inactive_file
>>> 846362 -14.6% 722471 proc-vmstat.nr_written
>>> 358486 -8.3% 328694 proc-vmstat.nr_zone_active_file
>>> 100181 +9.9% 110078 ± 2% proc-vmstat.nr_zone_inactive_file
>>> 180668 -6.8% 168456 proc-vmstat.nr_zone_write_pending
>>> 556469 -3.5% 536985 proc-vmstat.pgactivate
>>> 3385454 -14.6% 2889953 proc-vmstat.pgpgout
>>>
>>>
>>>
>>>
>>> Disclaimer:
>>> Results have been estimated based on internal Intel analysis and are provided
>>> for informational purposes only. Any difference in system hardware or software
>>> design or configuration may affect actual performance.
>>>
>>>
>>
>>
>
--
Damien Le Moal
Western Digital Research
On 2022/08/09 7:16, John Garry wrote:
> On 09/08/2022 10:58, John Garry wrote:
>>>>
>>>> commit: 0568e6122574dcc1aded2979cd0245038efe22b6 ("ata: libata-scsi:
>>>> cap ata_device->max_sectors according to shost->max_sectors")
>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>>
>>>> in testcase: stress-ng
>>>> on test machine: 96 threads 2 sockets Ice Lake with 256G memory
>>>> with following parameters:
>>>>
>>>> nr_threads: 10%
>>>> disk: 1HDD
>>>> testtime: 60s
>>>> fs: f2fs
>>>> class: filesystem
>>>> test: copy-file
>>>> cpufreq_governor: performance
>>>> ucode: 0xb000280
>>>
>>> Without knowing what the device adapter is, hard to say where the
>>> problem is. I
>>> suspect that with the patch applied, we may be ending up with a small
>>> default
>>> max_sectors value, causing overhead due to more commands than necessary.
>>>
>>> Will check what I see with my test rig.
>>
>> As far as I can see, this patch should not make a difference unless the
>> ATA shost driver is setting the max_sectors value unnecessarily low.
>
> For __ATA_BASE_SHT, we don't set max_sectors. As such, we default
> shost->max_sectors = SCSI_DEFAULT_MAX_SECTORS (=1024) in
> scsi_host_alloc(). I assume no shost dma mapping limit applied.
>
> Then - for example - we could select dev->max_sectors =
> ATA_MAX_SECTORS_LBA48 (=65535) in ata_dev_configure().
>
> So with commit 0568e6122574 we would have final max sectors = 1024, as
> opposed to 65535 previously. I guess that the problem is something like
> this.
>
> If so, it seems that we would need to apply the shost dma mapping limit
> separately in ata_scsi_dev_config() and not use shost->max_sectors.
OK. Will have a look at that.
>
> thanks,
> John
>
--
Damien Le Moal
Western Digital Research
...
> >> Without knowing what the device adapter is, hard to say where the problem is. I
> >> suspect that with the patch applied, we may be ending up with a small default
> >> max_sectors value, causing overhead due to more commands than necessary.
> >>
> >> Will check what I see with my test rig.
> >
> > As far as I can see, this patch should not make a difference unless the
> > ATA shost driver is setting the max_sectors value unnecessarily low.
>
> That is my hunch too, hence my question about which host driver is being used
> for this test... That is not apparent from the problem report.
No one's fallen over the old problem and managed to limit
the number of sectors in a read to the number of sectors
in (IIRC) a 'multi sector' read that uses a single DMA burst?
Was always a good way of killing disk performance.
IIRC the maximum number of sectors for an ATA disk transfer is 255.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On 09/08/2022 15:57, Damien Le Moal wrote:
>>> As far as I can see, this patch should not make a difference unless the
>>> ATA shost driver is setting the max_sectors value unnecessarily low.
>> For __ATA_BASE_SHT, we don't set max_sectors. As such, we default
>> shost->max_sectors = SCSI_DEFAULT_MAX_SECTORS (=1024) in
>> scsi_host_alloc(). I assume no shost dma mapping limit applied.
>>
>> Then - for example - we could select dev->max_sectors =
>> ATA_MAX_SECTORS_LBA48 (=65535) in ata_dev_configure().
>>
>> So with commit 0568e6122574 we would have final max sectors = 1024, as
>> opposed to 65535 previously. I guess that the problem is something like
>> this.
>>
>> If so, it seems that we would need to apply the shost dma mapping limit
>> separately in ata_scsi_dev_config() and not use shost->max_sectors.
> OK. Will have a look at that.
>
We may need to introduce something like shost->max_hw_sectors, which is
set according to sht max sectors and dma mapping limits. That could be
also used in USB scsiglue slave_configure()
Or else set max_sectors value for __ATA_BASE_SHT, but I don't know a
sane value there considering ATA_MAX_SECTORS_LBA48 gives max_sectors of
65535.
Damien, please let me know if you need help now. I am just waiting for
you to test to prove this theory about dev->max_sectors being capped. I
don't have an AHCI setup readily-available for testing - just SAS cards
or QEMU.
Thanks,
John
On 2022/08/10 1:33, John Garry wrote:
> On 09/08/2022 15:57, Damien Le Moal wrote:
>>>> As far as I can see, this patch should not make a difference unless the
>>>> ATA shost driver is setting the max_sectors value unnecessarily low.
>>> For __ATA_BASE_SHT, we don't set max_sectors. As such, we default
>>> shost->max_sectors = SCSI_DEFAULT_MAX_SECTORS (=1024) in
>>> scsi_host_alloc(). I assume no shost dma mapping limit applied.
>>>
>>> Then - for example - we could select dev->max_sectors =
>>> ATA_MAX_SECTORS_LBA48 (=65535) in ata_dev_configure().
>>>
>>> So with commit 0568e6122574 we would have final max sectors = 1024, as
>>> opposed to 65535 previously. I guess that the problem is something like
>>> this.
>>>
>>> If so, it seems that we would need to apply the shost dma mapping limit
>>> separately in ata_scsi_dev_config() and not use shost->max_sectors.
>> OK. Will have a look at that.
>>
>
> We may need to introduce something like shost->max_hw_sectors, which is
> set according to sht max sectors and dma mapping limits. That could be
> also used in USB scsiglue slave_configure()
>
> Or else set max_sectors value for __ATA_BASE_SHT, but I don't know a
> sane value there considering ATA_MAX_SECTORS_LBA48 gives max_sectors of
> 65535.
>
> Damien, please let me know if you need help now. I am just waiting for
> you to test to prove this theory about dev->max_sectors being capped. I
> don't have an AHCI setup readily-available for testing - just SAS cards
> or QEMU.
I am on it.
>
> Thanks,
> John
--
Damien Le Moal
Western Digital Research
On 2022/08/09 8:16, David Laight wrote:
> ...
>>>> Without knowing what the device adapter is, hard to say where the problem is. I
>>>> suspect that with the patch applied, we may be ending up with a small default
>>>> max_sectors value, causing overhead due to more commands than necessary.
>>>>
>>>> Will check what I see with my test rig.
>>>
>>> As far as I can see, this patch should not make a difference unless the
>>> ATA shost driver is setting the max_sectors value unnecessarily low.
>>
>> That is my hunch too, hence my question about which host driver is being used
>> for this test... That is not apparent from the problem report.
>
> No one's fallen over the old problem and managed to limit
> the number of sectors in a read to the number of sectors
> in (IIRC) a 'multi sector' read that uses a single DMA burst?
>
> Was always a good way of killing disk performance.
>
> IIRC the maximum number of sectors for an ATA disk transfer is 255.
That is for super old pata/ide devices. Modern SATA/AHCI disks can do a lot more
than that with NCQ/DMA. The default max_sectors_kb for AHCI connected devices is
1280 KB = 2560 sectors, max_segments = 168 and max_segment_size = 65536 B, which
gives a reliable command size of at least 168 * 4096 = 688128 B (1344 sectors)
with 4KB page size architectures.
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
--
Damien Le Moal
Western Digital Research
hi, Damien Le Moal,
On Tue, Aug 09, 2022 at 07:55:53AM -0700, Damien Le Moal wrote:
> On 2022/08/09 2:58, John Garry wrote:
> > On 08/08/2022 15:52, Damien Le Moal wrote:
> >> On 2022/08/05 1:05, kernel test robot wrote:
> >>>
> >>>
> >>> Greeting,
> >>>
> >>> FYI, we noticed a -15.0% regression of stress-ng.copy-file.ops_per_sec due to commit:
> >>>
> >>>
> >>> commit: 0568e6122574dcc1aded2979cd0245038efe22b6 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
> >>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >>>
> >>> in testcase: stress-ng
> >>> on test machine: 96 threads 2 sockets Ice Lake with 256G memory
> >>> with following parameters:
> >>>
> >>> nr_threads: 10%
> >>> disk: 1HDD
> >>> testtime: 60s
> >>> fs: f2fs
> >>> class: filesystem
> >>> test: copy-file
> >>> cpufreq_governor: performance
> >>> ucode: 0xb000280
> >>
> >> Without knowing what the device adapter is, hard to say where the problem is. I
> >> suspect that with the patch applied, we may be ending up with a small default
> >> max_sectors value, causing overhead due to more commands than necessary.
> >>
> >> Will check what I see with my test rig.
> >
> > As far as I can see, this patch should not make a difference unless the
> > ATA shost driver is setting the max_sectors value unnecessarily low.
>
> That is my hunch too, hence my question about which host driver is being used
> for this test... That is not apparent from the problem report.
we noticed the commit is already in mainline now, and in our tests, there is
still similar regression and also on other platforms.
could you guide us how to check "which host driver is being used for this
test"? hope to supply some useful information.
>
> >
> >>
> >>>
> >>>
> >>>
> >>>
> >>> If you fix the issue, kindly add following tag
> >>> Reported-by: kernel test robot <[email protected]>
> >>>
> >>>
> >>> Details are as below:
> >>> -------------------------------------------------------------------------------------------------->
> >>>
> >>>
> >>> To reproduce:
> >>>
> >>> git clone https://github.com/intel/lkp-tests.git
> >>> cd lkp-tests
> >>> sudo bin/lkp install job.yaml # job file is attached in this email
> >>> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> >>> sudo bin/lkp run generated-yaml-file
> >>>
> >>> # if come across any failure that blocks the test,
> >>> # please remove ~/.lkp and /lkp dir to run from a clean state.
> >>>
> >>> =========================================================================================
> >>> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
> >>> filesystem/gcc-11/performance/1HDD/f2fs/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp1/copy-file/stress-ng/60s/0xb000280
> >>>
> >>> commit:
> >>> 4cbfca5f77 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit")
> >>> 0568e61225 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
> >>>
> >>> 4cbfca5f7750520f 0568e6122574dcc1aded2979cd0
> >>> ---------------- ---------------------------
> >>> %stddev %change %stddev
> >>> \ | \
> >>> 1627 -14.9% 1385 stress-ng.copy-file.ops
> >>> 27.01 -15.0% 22.96 stress-ng.copy-file.ops_per_sec
> >>> 8935079 -11.9% 7870629 stress-ng.time.file_system_outputs
> >>> 14.88 ? 5% -31.8% 10.14 ? 3% stress-ng.time.percent_of_cpu_this_job_got
> >>> 50912 -14.7% 43413 vmstat.io.bo
> >>> 93.78 +1.4% 95.10 iostat.cpu.idle
> >>> 3.89 -31.6% 2.66 iostat.cpu.iowait
> >>> 4.01 -1.3 2.74 mpstat.cpu.all.iowait%
> >>> 0.23 ? 9% -0.1 0.17 ? 11% mpstat.cpu.all.sys%
> >>> 1.66 ? 37% -1.2 0.51 ? 55% perf-profile.calltrace.cycles-pp.f2fs_write_end.generic_perform_write.f2fs_buffered_write_iter.f2fs_file_write_iter.do_iter_readv_writev
> >>> 1.66 ? 37% -1.1 0.59 ? 25% perf-profile.children.cycles-pp.f2fs_write_end
> >>> 1.51 ? 40% -1.1 0.45 ? 26% perf-profile.children.cycles-pp.f2fs_dirty_data_folio
> >>> 1.21 ? 49% -1.0 0.23 ? 33% perf-profile.children.cycles-pp.f2fs_update_dirty_folio
> >>> 0.88 ? 56% -0.8 0.04 ?111% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> >>> 0.14 ? 26% +0.1 0.25 ? 28% perf-profile.children.cycles-pp.page_cache_ra_unbounded
> >>> 0.88 ? 56% -0.8 0.04 ?112% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
> >>> 3164876 ? 9% -20.2% 2524713 ? 7% perf-stat.i.cache-misses
> >>> 4.087e+08 -4.6% 3.899e+08 perf-stat.i.dTLB-loads
> >>> 313050 ? 10% -18.4% 255410 ? 6% perf-stat.i.node-loads
> >>> 972573 ? 9% -16.4% 812873 ? 6% perf-stat.i.node-stores
> >>> 3114748 ? 9% -20.2% 2484807 ? 7% perf-stat.ps.cache-misses
> >>> 4.022e+08 -4.6% 3.837e+08 perf-stat.ps.dTLB-loads
> >>> 308178 ? 10% -18.4% 251418 ? 6% perf-stat.ps.node-loads
> >>> 956996 ? 9% -16.4% 799948 ? 6% perf-stat.ps.node-stores
> >>> 358486 -8.3% 328694 proc-vmstat.nr_active_file
> >>> 1121620 -11.9% 987816 proc-vmstat.nr_dirtied
> >>> 179906 -6.7% 167912 proc-vmstat.nr_dirty
> >>> 1151201 -1.7% 1131322 proc-vmstat.nr_file_pages
> >>> 100181 +9.9% 110078 ? 2% proc-vmstat.nr_inactive_file
> >>> 846362 -14.6% 722471 proc-vmstat.nr_written
> >>> 358486 -8.3% 328694 proc-vmstat.nr_zone_active_file
> >>> 100181 +9.9% 110078 ? 2% proc-vmstat.nr_zone_inactive_file
> >>> 180668 -6.8% 168456 proc-vmstat.nr_zone_write_pending
> >>> 556469 -3.5% 536985 proc-vmstat.pgactivate
> >>> 3385454 -14.6% 2889953 proc-vmstat.pgpgout
> >>>
> >>>
> >>>
> >>>
> >>> Disclaimer:
> >>> Results have been estimated based on internal Intel analysis and are provided
> >>> for informational purposes only. Any difference in system hardware or software
> >>> design or configuration may affect actual performance.
> >>>
> >>>
> >>
> >>
> >
>
>
> --
> Damien Le Moal
> Western Digital Research
On 12/08/2022 06:01, Oliver Sang wrote:
> hi, Damien Le Moal,
>
> On Tue, Aug 09, 2022 at 07:55:53AM -0700, Damien Le Moal wrote:
>> On 2022/08/09 2:58, John Garry wrote:
>>> On 08/08/2022 15:52, Damien Le Moal wrote:
>>>> On 2022/08/05 1:05, kernel test robot wrote:
>>>>>
>>>>>
>>>>> Greeting,
>>>>>
>>>>> FYI, we noticed a -15.0% regression of stress-ng.copy-file.ops_per_sec due to commit:
>>>>>
>>>>>
>>>>> commit: 0568e6122574dcc1aded2979cd0245038efe22b6 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>>>
>>>>> in testcase: stress-ng
>>>>> on test machine: 96 threads 2 sockets Ice Lake with 256G memory
>>>>> with following parameters:
>>>>>
>>>>> nr_threads: 10%
>>>>> disk: 1HDD
>>>>> testtime: 60s
>>>>> fs: f2fs
>>>>> class: filesystem
>>>>> test: copy-file
>>>>> cpufreq_governor: performance
>>>>> ucode: 0xb000280
>>>>
>>>> Without knowing what the device adapter is, hard to say where the problem is. I
>>>> suspect that with the patch applied, we may be ending up with a small default
>>>> max_sectors value, causing overhead due to more commands than necessary.
>>>>
>>>> Will check what I see with my test rig.
>>>
>>> As far as I can see, this patch should not make a difference unless the
>>> ATA shost driver is setting the max_sectors value unnecessarily low.
>>
>> That is my hunch too, hence my question about which host driver is being used
>> for this test... That is not apparent from the problem report.
>
> we noticed the commit is already in mainline now, and in our tests, there is
> still similar regression and also on other platforms.
> could you guide us how to check "which host driver is being used for this
> test"? hope to supply some useful information.
>
For me, a complete kernel log may help.
>>
>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> If you fix the issue, kindly add following tag
>>>>> Reported-by: kernel test robot <[email protected]>
>>>>>
>>>>>
>>>>> Details are as below:
>>>>> -------------------------------------------------------------------------------------------------->
>>>>>
>>>>>
>>>>> To reproduce:
>>>>>
>>>>> git clone https://github.com/intel/lkp-tests.git
>>>>> cd lkp-tests
>>>>> sudo bin/lkp install job.yaml # job file is attached in this email
>>>>> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>>>>> sudo bin/lkp run generated-yaml-file
>>>>>
>>>>> # if come across any failure that blocks the test,
>>>>> # please remove ~/.lkp and /lkp dir to run from a clean state.
>>>>>
>>>>> =========================================================================================
>>>>> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
>>>>> filesystem/gcc-11/performance/1HDD/f2fs/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp1/copy-file/stress-ng/60s/0xb000280
>>>>>
>>>>> commit:
>>>>> 4cbfca5f77 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit")
>>>>> 0568e61225 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
>>>>>
>>>>> 4cbfca5f7750520f 0568e6122574dcc1aded2979cd0
>>>>> ---------------- ---------------------------
>>>>> %stddev %change %stddev
>>>>> \ | \
>>>>> 1627 -14.9% 1385 stress-ng.copy-file.ops
>>>>> 27.01 -15.0% 22.96 stress-ng.copy-file.ops_per_sec
>>>>> 8935079 -11.9% 7870629 stress-ng.time.file_system_outputs
>>>>> 14.88 ± 5% -31.8% 10.14 ± 3% stress-ng.time.percent_of_cpu_this_job_got
>>>>> 50912 -14.7% 43413 vmstat.io.bo
>>>>> 93.78 +1.4% 95.10 iostat.cpu.idle
>>>>> 3.89 -31.6% 2.66 iostat.cpu.iowait
>>>>> 4.01 -1.3 2.74 mpstat.cpu.all.iowait%
>>>>> 0.23 ± 9% -0.1 0.17 ± 11% mpstat.cpu.all.sys%
>>>>> 1.66 ± 37% -1.2 0.51 ± 55% perf-profile.calltrace.cycles-pp.f2fs_write_end.generic_perform_write.f2fs_buffered_write_iter.f2fs_file_write_iter.do_iter_readv_writev
>>>>> 1.66 ± 37% -1.1 0.59 ± 25% perf-profile.children.cycles-pp.f2fs_write_end
>>>>> 1.51 ± 40% -1.1 0.45 ± 26% perf-profile.children.cycles-pp.f2fs_dirty_data_folio
>>>>> 1.21 ± 49% -1.0 0.23 ± 33% perf-profile.children.cycles-pp.f2fs_update_dirty_folio
>>>>> 0.88 ± 56% -0.8 0.04 ±111% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>>>>> 0.14 ± 26% +0.1 0.25 ± 28% perf-profile.children.cycles-pp.page_cache_ra_unbounded
>>>>> 0.88 ± 56% -0.8 0.04 ±112% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>>>>> 3164876 ± 9% -20.2% 2524713 ± 7% perf-stat.i.cache-misses
>>>>> 4.087e+08 -4.6% 3.899e+08 perf-stat.i.dTLB-loads
>>>>> 313050 ± 10% -18.4% 255410 ± 6% perf-stat.i.node-loads
>>>>> 972573 ± 9% -16.4% 812873 ± 6% perf-stat.i.node-stores
>>>>> 3114748 ± 9% -20.2% 2484807 ± 7% perf-stat.ps.cache-misses
>>>>> 4.022e+08 -4.6% 3.837e+08 perf-stat.ps.dTLB-loads
>>>>> 308178 ± 10% -18.4% 251418 ± 6% perf-stat.ps.node-loads
>>>>> 956996 ± 9% -16.4% 799948 ± 6% perf-stat.ps.node-stores
>>>>> 358486 -8.3% 328694 proc-vmstat.nr_active_file
>>>>> 1121620 -11.9% 987816 proc-vmstat.nr_dirtied
>>>>> 179906 -6.7% 167912 proc-vmstat.nr_dirty
>>>>> 1151201 -1.7% 1131322 proc-vmstat.nr_file_pages
>>>>> 100181 +9.9% 110078 ± 2% proc-vmstat.nr_inactive_file
>>>>> 846362 -14.6% 722471 proc-vmstat.nr_written
>>>>> 358486 -8.3% 328694 proc-vmstat.nr_zone_active_file
>>>>> 100181 +9.9% 110078 ± 2% proc-vmstat.nr_zone_inactive_file
>>>>> 180668 -6.8% 168456 proc-vmstat.nr_zone_write_pending
>>>>> 556469 -3.5% 536985 proc-vmstat.pgactivate
>>>>> 3385454 -14.6% 2889953 proc-vmstat.pgpgout
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Disclaimer:
>>>>> Results have been estimated based on internal Intel analysis and are provided
>>>>> for informational purposes only. Any difference in system hardware or software
>>>>> design or configuration may affect actual performance.
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Damien Le Moal
>> Western Digital Research
> .
On 12/08/2022 12:13, John Garry wrote:
>> On Tue, Aug 09, 2022 at 07:55:53AM -0700, Damien Le Moal wrote:
>>> On 2022/08/09 2:58, John Garry wrote:
>>>> On 08/08/2022 15:52, Damien Le Moal wrote:
>>>>> On 2022/08/05 1:05, kernel test robot wrote:
>>>>>>
>>>>>>
>>>>>> Greeting,
>>>>>>
>>>>>> FYI, we noticed a -15.0% regression of
>>>>>> stress-ng.copy-file.ops_per_sec due to commit:
>>>>>>
>>>>>>
>>>>>> commit: 0568e6122574dcc1aded2979cd0245038efe22b6 ("ata:
>>>>>> libata-scsi: cap ata_device->max_sectors according to
>>>>>> shost->max_sectors")
>>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git
>>>>>> master
>>>>>>
>>>>>> in testcase: stress-ng
>>>>>> on test machine: 96 threads 2 sockets Ice Lake with 256G memory
>>>>>> with following parameters:
>>>>>>
>>>>>> nr_threads: 10%
>>>>>> disk: 1HDD
>>>>>> testtime: 60s
>>>>>> fs: f2fs
>>>>>> class: filesystem
>>>>>> test: copy-file
>>>>>> cpufreq_governor: performance
>>>>>> ucode: 0xb000280
>>>>>
>>>>> Without knowing what the device adapter is, hard to say where the
>>>>> problem is. I
>>>>> suspect that with the patch applied, we may be ending up with a
>>>>> small default
>>>>> max_sectors value, causing overhead due to more commands than
>>>>> necessary.
>>>>>
>>>>> Will check what I see with my test rig.
>>>>
>>>> As far as I can see, this patch should not make a difference unless the
>>>> ATA shost driver is setting the max_sectors value unnecessarily low.
>>>
>>> That is my hunch too, hence my question about which host driver is
>>> being used
>>> for this test... That is not apparent from the problem report.
>>
>> we noticed the commit is already in mainline now, and in our tests,
>> there is
>> still similar regression and also on other platforms.
>> could you guide us how to check "which host driver is being used for this
>> test"? hope to supply some useful information.
>>
>
> For me, a complete kernel log may help.
and since only 1HDD, the output of the following would be helpful:
/sys/block/sda/queue/max_sectors_kb
/sys/block/sda/queue/max_hw_sectors_kb
And for 5.19, if possible.
Thanks!
>
>>>
>>>>
On 2022/08/12 4:13, John Garry wrote:
> On 12/08/2022 06:01, Oliver Sang wrote:
>> hi, Damien Le Moal,
>>
>> On Tue, Aug 09, 2022 at 07:55:53AM -0700, Damien Le Moal wrote:
>>> On 2022/08/09 2:58, John Garry wrote:
>>>> On 08/08/2022 15:52, Damien Le Moal wrote:
>>>>> On 2022/08/05 1:05, kernel test robot wrote:
>>>>>>
>>>>>>
>>>>>> Greeting,
>>>>>>
>>>>>> FYI, we noticed a -15.0% regression of stress-ng.copy-file.ops_per_sec due to commit:
>>>>>>
>>>>>>
>>>>>> commit: 0568e6122574dcc1aded2979cd0245038efe22b6 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
>>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>>>>
>>>>>> in testcase: stress-ng
>>>>>> on test machine: 96 threads 2 sockets Ice Lake with 256G memory
>>>>>> with following parameters:
>>>>>>
>>>>>> nr_threads: 10%
>>>>>> disk: 1HDD
>>>>>> testtime: 60s
>>>>>> fs: f2fs
>>>>>> class: filesystem
>>>>>> test: copy-file
>>>>>> cpufreq_governor: performance
>>>>>> ucode: 0xb000280
>>>>>
>>>>> Without knowing what the device adapter is, hard to say where the problem is. I
>>>>> suspect that with the patch applied, we may be ending up with a small default
>>>>> max_sectors value, causing overhead due to more commands than necessary.
>>>>>
>>>>> Will check what I see with my test rig.
>>>>
>>>> As far as I can see, this patch should not make a difference unless the
>>>> ATA shost driver is setting the max_sectors value unnecessarily low.
>>>
>>> That is my hunch too, hence my question about which host driver is being used
>>> for this test... That is not apparent from the problem report.
>>
>> we noticed the commit is already in mainline now, and in our tests, there is
>> still similar regression and also on other platforms.
>> could you guide us how to check "which host driver is being used for this
>> test"? hope to supply some useful information.
>>
>
> For me, a complete kernel log may help.
I had a look yesterday with my test rig. I did not see any difference in the
default max_sectors_kb values for various drives between 5.18 and 5.19 (current
linus tree). The test machine has 2 AHCI adapters: Intel and Marvell. Both use
the regular AHCI driver. I have another rig with different ATA adapters but it
is powered down and I am traveling... So cannot test that right now.
>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> If you fix the issue, kindly add following tag
>>>>>> Reported-by: kernel test robot <[email protected]>
>>>>>>
>>>>>>
>>>>>> Details are as below:
>>>>>> -------------------------------------------------------------------------------------------------->
>>>>>>
>>>>>>
>>>>>> To reproduce:
>>>>>>
>>>>>> git clone https://github.com/intel/lkp-tests.git
>>>>>> cd lkp-tests
>>>>>> sudo bin/lkp install job.yaml # job file is attached in this email
>>>>>> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>>>>>> sudo bin/lkp run generated-yaml-file
>>>>>>
>>>>>> # if come across any failure that blocks the test,
>>>>>> # please remove ~/.lkp and /lkp dir to run from a clean state.
>>>>>>
>>>>>> =========================================================================================
>>>>>> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
>>>>>> filesystem/gcc-11/performance/1HDD/f2fs/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp1/copy-file/stress-ng/60s/0xb000280
>>>>>>
>>>>>> commit:
>>>>>> 4cbfca5f77 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit")
>>>>>> 0568e61225 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
>>>>>>
>>>>>> 4cbfca5f7750520f 0568e6122574dcc1aded2979cd0
>>>>>> ---------------- ---------------------------
>>>>>> %stddev %change %stddev
>>>>>> \ | \
>>>>>> 1627 -14.9% 1385 stress-ng.copy-file.ops
>>>>>> 27.01 -15.0% 22.96 stress-ng.copy-file.ops_per_sec
>>>>>> 8935079 -11.9% 7870629 stress-ng.time.file_system_outputs
>>>>>> 14.88 ± 5% -31.8% 10.14 ± 3% stress-ng.time.percent_of_cpu_this_job_got
>>>>>> 50912 -14.7% 43413 vmstat.io.bo
>>>>>> 93.78 +1.4% 95.10 iostat.cpu.idle
>>>>>> 3.89 -31.6% 2.66 iostat.cpu.iowait
>>>>>> 4.01 -1.3 2.74 mpstat.cpu.all.iowait%
>>>>>> 0.23 ± 9% -0.1 0.17 ± 11% mpstat.cpu.all.sys%
>>>>>> 1.66 ± 37% -1.2 0.51 ± 55% perf-profile.calltrace.cycles-pp.f2fs_write_end.generic_perform_write.f2fs_buffered_write_iter.f2fs_file_write_iter.do_iter_readv_writev
>>>>>> 1.66 ± 37% -1.1 0.59 ± 25% perf-profile.children.cycles-pp.f2fs_write_end
>>>>>> 1.51 ± 40% -1.1 0.45 ± 26% perf-profile.children.cycles-pp.f2fs_dirty_data_folio
>>>>>> 1.21 ± 49% -1.0 0.23 ± 33% perf-profile.children.cycles-pp.f2fs_update_dirty_folio
>>>>>> 0.88 ± 56% -0.8 0.04 ±111% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>>>>>> 0.14 ± 26% +0.1 0.25 ± 28% perf-profile.children.cycles-pp.page_cache_ra_unbounded
>>>>>> 0.88 ± 56% -0.8 0.04 ±112% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>>>>>> 3164876 ± 9% -20.2% 2524713 ± 7% perf-stat.i.cache-misses
>>>>>> 4.087e+08 -4.6% 3.899e+08 perf-stat.i.dTLB-loads
>>>>>> 313050 ± 10% -18.4% 255410 ± 6% perf-stat.i.node-loads
>>>>>> 972573 ± 9% -16.4% 812873 ± 6% perf-stat.i.node-stores
>>>>>> 3114748 ± 9% -20.2% 2484807 ± 7% perf-stat.ps.cache-misses
>>>>>> 4.022e+08 -4.6% 3.837e+08 perf-stat.ps.dTLB-loads
>>>>>> 308178 ± 10% -18.4% 251418 ± 6% perf-stat.ps.node-loads
>>>>>> 956996 ± 9% -16.4% 799948 ± 6% perf-stat.ps.node-stores
>>>>>> 358486 -8.3% 328694 proc-vmstat.nr_active_file
>>>>>> 1121620 -11.9% 987816 proc-vmstat.nr_dirtied
>>>>>> 179906 -6.7% 167912 proc-vmstat.nr_dirty
>>>>>> 1151201 -1.7% 1131322 proc-vmstat.nr_file_pages
>>>>>> 100181 +9.9% 110078 ± 2% proc-vmstat.nr_inactive_file
>>>>>> 846362 -14.6% 722471 proc-vmstat.nr_written
>>>>>> 358486 -8.3% 328694 proc-vmstat.nr_zone_active_file
>>>>>> 100181 +9.9% 110078 ± 2% proc-vmstat.nr_zone_inactive_file
>>>>>> 180668 -6.8% 168456 proc-vmstat.nr_zone_write_pending
>>>>>> 556469 -3.5% 536985 proc-vmstat.pgactivate
>>>>>> 3385454 -14.6% 2889953 proc-vmstat.pgpgout
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Disclaimer:
>>>>>> Results have been estimated based on internal Intel analysis and are provided
>>>>>> for informational purposes only. Any difference in system hardware or software
>>>>>> design or configuration may affect actual performance.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Damien Le Moal
>>> Western Digital Research
>> .
>
--
Damien Le Moal
Western Digital Research
On 12/08/2022 16:41, Damien Le Moal wrote:
>>> we noticed the commit is already in mainline now, and in our tests, there is
>>> still similar regression and also on other platforms.
>>> could you guide us how to check "which host driver is being used for this
>>> test"? hope to supply some useful information.
>>>
>> For me, a complete kernel log may help.
> I had a look yesterday with my test rig. I did not see any difference in the
> default max_sectors_kb values for various drives between 5.18 and 5.19 (current
> linus tree). The test machine has 2 AHCI adapters: Intel and Marvell. Both use
> the regular AHCI driver. I have another rig with different ATA adapters but it
> is powered down and I am traveling... So cannot test that right now.
>
FWIW, on QEMU I get a difference for IDE disk for ata_piix host.
Interestingly ata dev max_sectors kb also gets capped from 32MB (LBA48)
-> 256KB due to swiotlb max mapping size. (It would be capped by shost
default max sectors 512KB without that swiotlb limit). I assume capping
due to swiotlb limit is not occuring on Oliver's machine.
thanks,
John
[ 1.497233] ata7: found unknown device (class 0)
[ 1.498341] ata7.00: ATA-7: QEMU HARDDISK, 2.5+, max UDMA/100
[ 1.499030] ata7.00: 209716 sectors, multi 16: LBA48
[ 1.623795] ata2: SATA link down (SStatus 0 SControl 300)
[ 1.624633] ata1: SATA link down (SStatus 0 SControl 300)
[ 1.633395] ata6: SATA link down (SStatus 0 SControl 300)
[ 1.634200] ata5: SATA link down (SStatus 0 SControl 300)
[ 1.635094] ata4: SATA link down (SStatus 0 SControl 300)
[ 1.635887] ata3: SATA link down (SStatus 0 SControl 300)
[ 1.636748] scsi 6:0:0:0: Direct-Access ATA QEMU HARDDISK
2.5+ PQ: 0 ANSI: 5
[ 1.641298] sd 6:0:0:0: [sda] 209716 512-byte logical blocks: (107
MB/102 MiB)
[ 1.642188] sd 6:0:0:0: [sda] Write Protect is off
[ 1.642770] sd 6:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 1.642783] sd 6:0:0:0: [sda] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[ 1.644149] sd 6:0:0:0: [sda] Preferred minimum I/O size 512 bytes
[ 1.645142] sd 6:0:0:0: Attached scsi generic sg0 type 0
[ 1.655145] sd 6:0:0:0: [sda] Attached SCSI disk
On 2022/08/12 10:17, John Garry wrote:
> On 12/08/2022 16:41, Damien Le Moal wrote:
>>>> we noticed the commit is already in mainline now, and in our tests, there is
>>>> still similar regression and also on other platforms.
>>>> could you guide us how to check "which host driver is being used for this
>>>> test"? hope to supply some useful information.
>>>>
>>> For me, a complete kernel log may help.
>> I had a look yesterday with my test rig. I did not see any difference in the
>> default max_sectors_kb values for various drives between 5.18 and 5.19 (current
>> linus tree). The test machine has 2 AHCI adapters: Intel and Marvell. Both use
>> the regular AHCI driver. I have another rig with different ATA adapters but it
>> is powered down and I am traveling... So cannot test that right now.
>>
>
> FWIW, on QEMU I get a difference for IDE disk for ata_piix host.
>
> Interestingly ata dev max_sectors kb also gets capped from 32MB (LBA48)
> -> 256KB due to swiotlb max mapping size. (It would be capped by shost
> default max sectors 512KB without that swiotlb limit). I assume capping
> due to swiotlb limit is not occuring on Oliver's machine.
Yes, I was suspecting that we may be seeing a difference for anything that is
not AHCI, e.g. with other drivers.
But that seems to be the correct thing to do, no ? How was this working before
without applying the swiotlb limit ?
>
> thanks,
> John
>
> [ 1.497233] ata7: found unknown device (class 0)
> [ 1.498341] ata7.00: ATA-7: QEMU HARDDISK, 2.5+, max UDMA/100
> [ 1.499030] ata7.00: 209716 sectors, multi 16: LBA48
> [ 1.623795] ata2: SATA link down (SStatus 0 SControl 300)
> [ 1.624633] ata1: SATA link down (SStatus 0 SControl 300)
> [ 1.633395] ata6: SATA link down (SStatus 0 SControl 300)
> [ 1.634200] ata5: SATA link down (SStatus 0 SControl 300)
> [ 1.635094] ata4: SATA link down (SStatus 0 SControl 300)
> [ 1.635887] ata3: SATA link down (SStatus 0 SControl 300)
> [ 1.636748] scsi 6:0:0:0: Direct-Access ATA QEMU HARDDISK
> 2.5+ PQ: 0 ANSI: 5
> [ 1.641298] sd 6:0:0:0: [sda] 209716 512-byte logical blocks: (107
> MB/102 MiB)
> [ 1.642188] sd 6:0:0:0: [sda] Write Protect is off
> [ 1.642770] sd 6:0:0:0: [sda] Mode Sense: 00 3a 00 00
> [ 1.642783] sd 6:0:0:0: [sda] Write cache: enabled, read cache:
> enabled, doesn't support DPO or FUA
> [ 1.644149] sd 6:0:0:0: [sda] Preferred minimum I/O size 512 bytes
> [ 1.645142] sd 6:0:0:0: Attached scsi generic sg0 type 0
> [ 1.655145] sd 6:0:0:0: [sda] Attached SCSI disk
--
Damien Le Moal
Western Digital Research
On 12/08/2022 19:27, Damien Le Moal wrote:
>> Interestingly ata dev max_sectors kb also gets capped from 32MB (LBA48)
>> -> 256KB due to swiotlb max mapping size. (It would be capped by shost
>> default max sectors 512KB without that swiotlb limit). I assume capping
>> due to swiotlb limit is not occuring on Oliver's machine.
> Yes, I was suspecting that we may be seeing a difference for anything that is
> not AHCI, e.g. with other drivers.
>
> But that seems to be the correct thing to do, no ?
Yes, this should be the correct thing to do.
> How was this working before
> without applying the swiotlb limit ?
>
Not sure. I would need to check the libata code for how it handles DMA
mapping errors, which I assume would occur for when we exceed the
swiotlb limit.
Having said this, from limited testing, whenever I check megaraid sas or
mpt3sas for performance, the length of request data never/rarely comes
close to max sectors. That why I am surprised with the regression which
Oliver reports.
Thanks,
John
Hi John,
On Fri, Aug 12, 2022 at 12:13:20PM +0100, John Garry wrote:
> On 12/08/2022 06:01, Oliver Sang wrote:
> > hi, Damien Le Moal,
> >
> > On Tue, Aug 09, 2022 at 07:55:53AM -0700, Damien Le Moal wrote:
> > > On 2022/08/09 2:58, John Garry wrote:
> > > > On 08/08/2022 15:52, Damien Le Moal wrote:
> > > > > On 2022/08/05 1:05, kernel test robot wrote:
> > > > > >
> > > > > >
> > > > > > Greeting,
> > > > > >
> > > > > > FYI, we noticed a -15.0% regression of stress-ng.copy-file.ops_per_sec due to commit:
> > > > > >
> > > > > >
> > > > > > commit: 0568e6122574dcc1aded2979cd0245038efe22b6 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
> > > > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > > > > >
> > > > > > in testcase: stress-ng
> > > > > > on test machine: 96 threads 2 sockets Ice Lake with 256G memory
> > > > > > with following parameters:
> > > > > >
> > > > > > nr_threads: 10%
> > > > > > disk: 1HDD
> > > > > > testtime: 60s
> > > > > > fs: f2fs
> > > > > > class: filesystem
> > > > > > test: copy-file
> > > > > > cpufreq_governor: performance
> > > > > > ucode: 0xb000280
> > > > >
> > > > > Without knowing what the device adapter is, hard to say where the problem is. I
> > > > > suspect that with the patch applied, we may be ending up with a small default
> > > > > max_sectors value, causing overhead due to more commands than necessary.
> > > > >
> > > > > Will check what I see with my test rig.
> > > >
> > > > As far as I can see, this patch should not make a difference unless the
> > > > ATA shost driver is setting the max_sectors value unnecessarily low.
> > >
> > > That is my hunch too, hence my question about which host driver is being used
> > > for this test... That is not apparent from the problem report.
> >
> > we noticed the commit is already in mainline now, and in our tests, there is
> > still similar regression and also on other platforms.
> > could you guide us how to check "which host driver is being used for this
> > test"? hope to supply some useful information.
> >
>
> For me, a complete kernel log may help.
dmesg.xz is attached. also attached dmesg from parent (4cbfca5f77)
>
> > >
> > > >
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > If you fix the issue, kindly add following tag
> > > > > > Reported-by: kernel test robot <[email protected]>
> > > > > >
> > > > > >
> > > > > > Details are as below:
> > > > > > -------------------------------------------------------------------------------------------------->
> > > > > >
> > > > > >
> > > > > > To reproduce:
> > > > > >
> > > > > > git clone https://github.com/intel/lkp-tests.git
> > > > > > cd lkp-tests
> > > > > > sudo bin/lkp install job.yaml # job file is attached in this email
> > > > > > bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> > > > > > sudo bin/lkp run generated-yaml-file
> > > > > >
> > > > > > # if come across any failure that blocks the test,
> > > > > > # please remove ~/.lkp and /lkp dir to run from a clean state.
> > > > > >
> > > > > > =========================================================================================
> > > > > > class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
> > > > > > filesystem/gcc-11/performance/1HDD/f2fs/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp1/copy-file/stress-ng/60s/0xb000280
> > > > > >
> > > > > > commit:
> > > > > > 4cbfca5f77 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit")
> > > > > > 0568e61225 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
> > > > > >
> > > > > > 4cbfca5f7750520f 0568e6122574dcc1aded2979cd0
> > > > > > ---------------- ---------------------------
> > > > > > %stddev %change %stddev
> > > > > > \ | \
> > > > > > 1627 -14.9% 1385 stress-ng.copy-file.ops
> > > > > > 27.01 -15.0% 22.96 stress-ng.copy-file.ops_per_sec
> > > > > > 8935079 -11.9% 7870629 stress-ng.time.file_system_outputs
> > > > > > 14.88 ? 5% -31.8% 10.14 ? 3% stress-ng.time.percent_of_cpu_this_job_got
> > > > > > 50912 -14.7% 43413 vmstat.io.bo
> > > > > > 93.78 +1.4% 95.10 iostat.cpu.idle
> > > > > > 3.89 -31.6% 2.66 iostat.cpu.iowait
> > > > > > 4.01 -1.3 2.74 mpstat.cpu.all.iowait%
> > > > > > 0.23 ? 9% -0.1 0.17 ? 11% mpstat.cpu.all.sys%
> > > > > > 1.66 ? 37% -1.2 0.51 ? 55% perf-profile.calltrace.cycles-pp.f2fs_write_end.generic_perform_write.f2fs_buffered_write_iter.f2fs_file_write_iter.do_iter_readv_writev
> > > > > > 1.66 ? 37% -1.1 0.59 ? 25% perf-profile.children.cycles-pp.f2fs_write_end
> > > > > > 1.51 ? 40% -1.1 0.45 ? 26% perf-profile.children.cycles-pp.f2fs_dirty_data_folio
> > > > > > 1.21 ? 49% -1.0 0.23 ? 33% perf-profile.children.cycles-pp.f2fs_update_dirty_folio
> > > > > > 0.88 ? 56% -0.8 0.04 ?111% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> > > > > > 0.14 ? 26% +0.1 0.25 ? 28% perf-profile.children.cycles-pp.page_cache_ra_unbounded
> > > > > > 0.88 ? 56% -0.8 0.04 ?112% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
> > > > > > 3164876 ? 9% -20.2% 2524713 ? 7% perf-stat.i.cache-misses
> > > > > > 4.087e+08 -4.6% 3.899e+08 perf-stat.i.dTLB-loads
> > > > > > 313050 ? 10% -18.4% 255410 ? 6% perf-stat.i.node-loads
> > > > > > 972573 ? 9% -16.4% 812873 ? 6% perf-stat.i.node-stores
> > > > > > 3114748 ? 9% -20.2% 2484807 ? 7% perf-stat.ps.cache-misses
> > > > > > 4.022e+08 -4.6% 3.837e+08 perf-stat.ps.dTLB-loads
> > > > > > 308178 ? 10% -18.4% 251418 ? 6% perf-stat.ps.node-loads
> > > > > > 956996 ? 9% -16.4% 799948 ? 6% perf-stat.ps.node-stores
> > > > > > 358486 -8.3% 328694 proc-vmstat.nr_active_file
> > > > > > 1121620 -11.9% 987816 proc-vmstat.nr_dirtied
> > > > > > 179906 -6.7% 167912 proc-vmstat.nr_dirty
> > > > > > 1151201 -1.7% 1131322 proc-vmstat.nr_file_pages
> > > > > > 100181 +9.9% 110078 ? 2% proc-vmstat.nr_inactive_file
> > > > > > 846362 -14.6% 722471 proc-vmstat.nr_written
> > > > > > 358486 -8.3% 328694 proc-vmstat.nr_zone_active_file
> > > > > > 100181 +9.9% 110078 ? 2% proc-vmstat.nr_zone_inactive_file
> > > > > > 180668 -6.8% 168456 proc-vmstat.nr_zone_write_pending
> > > > > > 556469 -3.5% 536985 proc-vmstat.pgactivate
> > > > > > 3385454 -14.6% 2889953 proc-vmstat.pgpgout
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Disclaimer:
> > > > > > Results have been estimated based on internal Intel analysis and are provided
> > > > > > for informational purposes only. Any difference in system hardware or software
> > > > > > design or configuration may affect actual performance.
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Damien Le Moal
> > > Western Digital Research
> > .
>
Hi John,
On Fri, Aug 12, 2022 at 03:58:14PM +0100, John Garry wrote:
> On 12/08/2022 12:13, John Garry wrote:
> > > On Tue, Aug 09, 2022 at 07:55:53AM -0700, Damien Le Moal wrote:
> > > > On 2022/08/09 2:58, John Garry wrote:
> > > > > On 08/08/2022 15:52, Damien Le Moal wrote:
> > > > > > On 2022/08/05 1:05, kernel test robot wrote:
> > > > > > >
> > > > > > >
> > > > > > > Greeting,
> > > > > > >
> > > > > > > FYI, we noticed a -15.0% regression of
> > > > > > > stress-ng.copy-file.ops_per_sec due to commit:
> > > > > > >
> > > > > > >
> > > > > > > commit: 0568e6122574dcc1aded2979cd0245038efe22b6
> > > > > > > ("ata: libata-scsi: cap ata_device->max_sectors
> > > > > > > according to shost->max_sectors")
> > > > > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git
> > > > > > > master
> > > > > > >
> > > > > > > in testcase: stress-ng
> > > > > > > on test machine: 96 threads 2 sockets Ice Lake with 256G memory
> > > > > > > with following parameters:
> > > > > > >
> > > > > > > ????nr_threads: 10%
> > > > > > > ????disk: 1HDD
> > > > > > > ????testtime: 60s
> > > > > > > ????fs: f2fs
> > > > > > > ????class: filesystem
> > > > > > > ????test: copy-file
> > > > > > > ????cpufreq_governor: performance
> > > > > > > ????ucode: 0xb000280
> > > > > >
> > > > > > Without knowing what the device adapter is, hard to say
> > > > > > where the problem is. I
> > > > > > suspect that with the patch applied, we may be ending up
> > > > > > with a small default
> > > > > > max_sectors value, causing overhead due to more commands
> > > > > > than necessary.
> > > > > >
> > > > > > Will check what I see with my test rig.
> > > > >
> > > > > As far as I can see, this patch should not make a difference unless the
> > > > > ATA shost driver is setting the max_sectors value unnecessarily low.
> > > >
> > > > That is my hunch too, hence my question about which host driver
> > > > is being used
> > > > for this test... That is not apparent from the problem report.
> > >
> > > we noticed the commit is already in mainline now, and in our tests,
> > > there is
> > > still similar regression and also on other platforms.
> > > could you guide us how to check "which host driver is being used for this
> > > test"? hope to supply some useful information.
> > >
> >
> > For me, a complete kernel log may help.
>
> and since only 1HDD, the output of the following would be helpful:
>
> /sys/block/sda/queue/max_sectors_kb
> /sys/block/sda/queue/max_hw_sectors_kb
>
> And for 5.19, if possible.
for commit
0568e61225 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_sectors_kb
512
root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_hw_sectors_kb
512
for both commit
4cbfca5f77 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit")
and v5.19
root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_sectors_kb
1280
root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_hw_sectors_kb
32767
>
> Thanks!
>
> >
> > > >
> > > > >
>
On 16/08/2022 07:57, Oliver Sang wrote:
>>> For me, a complete kernel log may help.
>> and since only 1HDD, the output of the following would be helpful:
>>
>> /sys/block/sda/queue/max_sectors_kb
>> /sys/block/sda/queue/max_hw_sectors_kb
>>
>> And for 5.19, if possible.
> for commit
> 0568e61225 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
>
> root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_sectors_kb
> 512
> root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_hw_sectors_kb
> 512
>
> for both commit
> 4cbfca5f77 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit")
> and v5.19
>
> root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_sectors_kb
> 1280
> root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_hw_sectors_kb
> 32767
>
thanks, I appreciate this.
From the dmesg, I see 2x SATA disks - I was under the impression that
the system only has 1x.
Anyway, both drives show LBA48, which means the large max hw sectors at
32767KB:
[ 31.129629][ T1146] ata6.00: 1562824368 sectors, multi 1: LBA48 NCQ
(depth 32)
So this is what I suspected: we are capped from the default shost max
sectors (1024 sectors).
This seems like the simplest fix for you:
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -1382,7 +1382,8 @@ extern const struct attribute_group
*ata_common_sdev_groups[];
.proc_name = drv_name, \
.slave_destroy = ata_scsi_slave_destroy, \
.bios_param = ata_std_bios_param, \
- .unlock_native_capacity = ata_scsi_unlock_native_capacity
+ .unlock_native_capacity = ata_scsi_unlock_native_capacity,\
+ .max_sectors = ATA_MAX_SECTORS_LBA48
A concern is that other drivers which use libata may have similar
issues, as they use default in SCSI_DEFAULT_MAX_SECTORS for max_sectors:
hisi_sas
pm8001
aic9xxx
mvsas
isci
So they may be needlessly hobbled for some SATA disks. However I have a
system with hisi_sas controller and attached LBA48 disk. I tested
performance for v5.19 vs 6.0 and it was about the same for fio rw=read @
~120K IOPS. I can test this further.
Thanks,
John
On 2022/08/16 3:35, John Garry wrote:
> On 16/08/2022 07:57, Oliver Sang wrote:
>>>> For me, a complete kernel log may help.
>>> and since only 1HDD, the output of the following would be helpful:
>>>
>>> /sys/block/sda/queue/max_sectors_kb
>>> /sys/block/sda/queue/max_hw_sectors_kb
>>>
>>> And for 5.19, if possible.
>> for commit
>> 0568e61225 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
>>
>> root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_sectors_kb
>> 512
>> root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_hw_sectors_kb
>> 512
>>
>> for both commit
>> 4cbfca5f77 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit")
>> and v5.19
>>
>> root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_sectors_kb
>> 1280
>> root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_hw_sectors_kb
>> 32767
>>
>
> thanks, I appreciate this.
>
> From the dmesg, I see 2x SATA disks - I was under the impression that
> the system only has 1x.
>
> Anyway, both drives show LBA48, which means the large max hw sectors at
> 32767KB:
> [ 31.129629][ T1146] ata6.00: 1562824368 sectors, multi 1: LBA48 NCQ
> (depth 32)
>
> So this is what I suspected: we are capped from the default shost max
> sectors (1024 sectors).
>
> This seems like the simplest fix for you:
>
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -1382,7 +1382,8 @@ extern const struct attribute_group
> *ata_common_sdev_groups[];
> .proc_name = drv_name, \
> .slave_destroy = ata_scsi_slave_destroy, \
> .bios_param = ata_std_bios_param, \
> - .unlock_native_capacity = ata_scsi_unlock_native_capacity
> + .unlock_native_capacity = ata_scsi_unlock_native_capacity,\
> + .max_sectors = ATA_MAX_SECTORS_LBA48
This is crazy large (65535 x 512 B sectors) and never result in that being
exposed as the actual max_sectors_kb since other limits will apply first
(mapping size).
The regression may come not from commands becoming tiny, but from the fact that
after the patch, max_sectors_kb is too large, causing a lot of overhead with
qemu swiotlb mapping and slowing down IO processing.
Above, it can be seen that we ed up with max_sectors_kb being 1280, which is the
default for most scsi disks (including ATA drives). That is normal. But before
that, it was 512, which likely better fits qemu swiotlb and does not generate
overhead. So the above fix will not change anything I think...
> A concern is that other drivers which use libata may have similar
> issues, as they use default in SCSI_DEFAULT_MAX_SECTORS for max_sectors:
> hisi_sas
> pm8001
> aic9xxx
> mvsas
> isci
>
> So they may be needlessly hobbled for some SATA disks. However I have a
> system with hisi_sas controller and attached LBA48 disk. I tested
> performance for v5.19 vs 6.0 and it was about the same for fio rw=read @
> ~120K IOPS. I can test this further.
>
> Thanks,
> John
--
Damien Le Moal
Western Digital Research
On 16/08/2022 16:42, Damien Le Moal wrote:
> On 2022/08/16 3:35, John Garry wrote:
>> On 16/08/2022 07:57, Oliver Sang wrote:
>>>>> For me, a complete kernel log may help.
>>>> and since only 1HDD, the output of the following would be helpful:
>>>>
>>>> /sys/block/sda/queue/max_sectors_kb
>>>> /sys/block/sda/queue/max_hw_sectors_kb
>>>>
>>>> And for 5.19, if possible.
>>> for commit
>>> 0568e61225 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
>>>
>>> root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_sectors_kb
>>> 512
>>> root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_hw_sectors_kb
>>> 512
>>>
>>> for both commit
>>> 4cbfca5f77 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit")
>>> and v5.19
>>>
>>> root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_sectors_kb
>>> 1280
>>> root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_hw_sectors_kb
>>> 32767
>>>
>>
>> thanks, I appreciate this.
>>
>> From the dmesg, I see 2x SATA disks - I was under the impression that
>> the system only has 1x.
>>
>> Anyway, both drives show LBA48, which means the large max hw sectors at
>> 32767KB:
>> [ 31.129629][ T1146] ata6.00: 1562824368 sectors, multi 1: LBA48 NCQ
>> (depth 32)
>>
>> So this is what I suspected: we are capped from the default shost max
>> sectors (1024 sectors).
>>
>> This seems like the simplest fix for you:
>>
>> --- a/include/linux/libata.h
>> +++ b/include/linux/libata.h
>> @@ -1382,7 +1382,8 @@ extern const struct attribute_group
>> *ata_common_sdev_groups[];
>> .proc_name = drv_name, \
>> .slave_destroy = ata_scsi_slave_destroy, \
>> .bios_param = ata_std_bios_param, \
>> - .unlock_native_capacity = ata_scsi_unlock_native_capacity
>> + .unlock_native_capacity = ata_scsi_unlock_native_capacity,\
>> + .max_sectors = ATA_MAX_SECTORS_LBA48
>
> This is crazy large (65535 x 512 B sectors) and never result in that being
> exposed as the actual max_sectors_kb since other limits will apply first
> (mapping size).
Here is how I read values from above for max_sectors_kb and
max_hw_sectors_kb:
v5.19 + 0568e61225 : 512/512
v5.19 + 0568e61225 + 4cbfca5f77 : 512/512
v5.19: 1280/32767
They are want makes sense to me, at least.
Oliver, can you confirm this? Thanks!
On this basis, it appears that max_hw_sectors_kb is getting capped from
scsi default @ 1024 sectors by commit 0568e61225. If it were getting
capped by swiotlb mapping limit then that would give us 512 sectors -
this value is fixed.
So for my SHT change proposal I am just trying to revert to previous
behaviour in 5.19 - make max_hw_sectors_kb crazy big again.
>
> The regression may come not from commands becoming tiny, but from the fact that
> after the patch, max_sectors_kb is too large,
I don't think it is, but need confirmation.
>causing a lot of overhead with
> qemu swiotlb mapping and slowing down IO processing.
>
> Above, it can be seen that we ed up with max_sectors_kb being 1280, which is the
> default for most scsi disks (including ATA drives). That is normal. But before
> that, it was 512, which likely better fits qemu swiotlb and does not generate
Again, I don't think this this is the case. Need confirmation.
> overhead. So the above fix will not change anything I think...
Thanks,
John
On 2022/08/16 9:38, John Garry wrote:
> On 16/08/2022 16:42, Damien Le Moal wrote:
>> On 2022/08/16 3:35, John Garry wrote:
>>> On 16/08/2022 07:57, Oliver Sang wrote:
>>>>>> For me, a complete kernel log may help.
>>>>> and since only 1HDD, the output of the following would be helpful:
>>>>>
>>>>> /sys/block/sda/queue/max_sectors_kb
>>>>> /sys/block/sda/queue/max_hw_sectors_kb
>>>>>
>>>>> And for 5.19, if possible.
>>>> for commit
>>>> 0568e61225 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
>>>>
>>>> root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_sectors_kb
>>>> 512
>>>> root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_hw_sectors_kb
>>>> 512
>>>>
>>>> for both commit
>>>> 4cbfca5f77 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit")
>>>> and v5.19
>>>>
>>>> root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_sectors_kb
>>>> 1280
>>>> root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_hw_sectors_kb
>>>> 32767
>>>>
>>>
>>> thanks, I appreciate this.
>>>
>>> From the dmesg, I see 2x SATA disks - I was under the impression that
>>> the system only has 1x.
>>>
>>> Anyway, both drives show LBA48, which means the large max hw sectors at
>>> 32767KB:
>>> [ 31.129629][ T1146] ata6.00: 1562824368 sectors, multi 1: LBA48 NCQ
>>> (depth 32)
>>>
>>> So this is what I suspected: we are capped from the default shost max
>>> sectors (1024 sectors).
>>>
>>> This seems like the simplest fix for you:
>>>
>>> --- a/include/linux/libata.h
>>> +++ b/include/linux/libata.h
>>> @@ -1382,7 +1382,8 @@ extern const struct attribute_group
>>> *ata_common_sdev_groups[];
>>> .proc_name = drv_name, \
>>> .slave_destroy = ata_scsi_slave_destroy, \
>>> .bios_param = ata_std_bios_param, \
>>> - .unlock_native_capacity = ata_scsi_unlock_native_capacity
>>> + .unlock_native_capacity = ata_scsi_unlock_native_capacity,\
>>> + .max_sectors = ATA_MAX_SECTORS_LBA48
>>
>> This is crazy large (65535 x 512 B sectors) and never result in that being
>> exposed as the actual max_sectors_kb since other limits will apply first
>> (mapping size).
>
> Here is how I read values from above for max_sectors_kb and
> max_hw_sectors_kb:
>
> v5.19 + 0568e61225 : 512/512
> v5.19 + 0568e61225 + 4cbfca5f77 : 512/512
> v5.19: 1280/32767
>
> They are want makes sense to me, at least.
>
> Oliver, can you confirm this? Thanks!
>
> On this basis, it appears that max_hw_sectors_kb is getting capped from
> scsi default @ 1024 sectors by commit 0568e61225. If it were getting
> capped by swiotlb mapping limit then that would give us 512 sectors -
> this value is fixed.
>
> So for my SHT change proposal I am just trying to revert to previous
> behaviour in 5.19 - make max_hw_sectors_kb crazy big again.
I reread the entire thing and I think I got things reverted here. The perf
regression happens with the 512/512 settings, while the original 1280/32767
before your patches was OK. So is your patch defining the optimal mapping size
cause the reduction to 512/512. It would mean that for ATA, we need a sane
default mapping instead of SCSI default 1024 sectors. Now I understand your
proposed change using ATA_MAX_SECTORS_LBA48.
However, that would be correct only for LBA48 capable drives.
ata_dev_configure() already sets dev->max_sectors correctly according to the
drive type, capabilities and eventual quirks. So the problem comes from the
libata-scsi change:
dev->max_sectors = min(dev->max_sectors, sdev->host->max_sectors);
when sdev->host->max_sectors is 0 (not initialized). So maybe simply changing
this line to:
dev->max_sectors = min_not_zero(dev->max_sectors, sdev->host->max_sectors);
would do the trick ? Any particular adapter driver that needs a mapping cap on
the adpter max mapping size can still set sdev->host->max_sectors as needed, and
we keep the same defaults as before when it is not set. Thoughts ? Or am I
missing something else ?
>
>>
>> The regression may come not from commands becoming tiny, but from the fact that
>> after the patch, max_sectors_kb is too large,
>
> I don't think it is, but need confirmation.
>
>> causing a lot of overhead with
>> qemu swiotlb mapping and slowing down IO processing.
>
>>
>> Above, it can be seen that we ed up with max_sectors_kb being 1280, which is the
>> default for most scsi disks (including ATA drives). That is normal. But before
>> that, it was 512, which likely better fits qemu swiotlb and does not generate
>
> Again, I don't think this this is the case. Need confirmation.
>
>> overhead. So the above fix will not change anything I think...
>
>
> Thanks,
> John
--
Damien Le Moal
Western Digital Research
On 16/08/2022 21:02, Damien Le Moal wrote:
>> ou confirm this? Thanks!
>>
>> On this basis, it appears that max_hw_sectors_kb is getting capped from
>> scsi default @ 1024 sectors by commit 0568e61225. If it were getting
>> capped by swiotlb mapping limit then that would give us 512 sectors -
>> this value is fixed.
>>
>> So for my SHT change proposal I am just trying to revert to previous
>> behaviour in 5.19 - make max_hw_sectors_kb crazy big again.
> I reread the entire thing and I think I got things reverted here. The perf
> regression happens with the 512/512 settings, while the original 1280/32767
> before your patches was OK.
Right, that's as I read it. It would be useful for Oliver to confirm the
results.
> So is your patch defining the optimal mapping size
> cause the reduction to 512/512.
The optimal mapping size only affects specifically sas controllers, so I
think that we can ignore that one for now. The reduction to 512/512
comes from the change in ata_scsi_dev_config().
> It would mean that for ATA, we need a sane
> default mapping instead of SCSI default 1024 sectors.
Right
> Now I understand your
> proposed change using ATA_MAX_SECTORS_LBA48.
>
> However, that would be correct only for LBA48 capable drives.
> ata_dev_configure() already sets dev->max_sectors correctly according to the
> drive type, capabilities and eventual quirks. So the problem comes from the
> libata-scsi change:
>
> dev->max_sectors = min(dev->max_sectors, sdev->host->max_sectors);
>
> when sdev->host->max_sectors is 0 (not initialized).
That cannot happen. If sht->max_sectors is 0, then we set
shost->max_sectors at SCSI default 1024 sectors in scsi_host_alloc()
For my proposed change, dev->max_sectors would still be initialized in
ata_dev_configure() according to drive type, etc. And it should be <=
LBA48 max sectors (=65535).
So then in ata_scsi_dev_config():
dev->max_sectors = min(dev->max_sectors, sdev->host->max_sectors)
this only has an impact for ahci controllers if sdev->host->max_sectors
was capped according to host dma dev max mapping size.
I will admit that this is not ideal. An alt approach is to change
ata_scsi_dev_config() to cap the dev max_sectors only according to shost
dma dev mapping limit (similar to scsi_add_host_with_dma()), but that
would not work for a controller like ipr, which does have a geniune
max_sectors limit (which we should respect).
Thanks,
John
> So maybe simply changing
> this line to:
>
> dev->max_sectors = min_not_zero(dev->max_sectors, sdev->host->max_sectors);
>
> would do the trick ? Any particular adapter driver that needs a mapping cap on
> the adpter max mapping size can still set sdev->host->max_sectors as needed, and
> we keep the same defaults as before when it is not set. Thoughts ? Or am I
> missing something else ?
>
>
>>> The regression may come not from commands becoming tiny, but from the fact that
>>> after the patch, max_sectors_kb is too large,
>> I don't think it is, but need confirmation.
>>
>>> causing a lot of overhead with
>>> qemu swiotlb mapping and slowing down IO processing.
>>> Above, it can be seen that we ed up with max_sectors_kb being 1280, which is the
>>> default for most scsi disks (including ATA drives). That is normal. But before
>>> that, it was 512, which likely better fits qemu swiotlb and does not generate
>> Again, I don't think this this is the case. Need confirmation.
>>
>>> overhead. So the above fix will not change anything I think...
hi John,
On Tue, Aug 16, 2022 at 05:38:43PM +0100, John Garry wrote:
> On 16/08/2022 16:42, Damien Le Moal wrote:
> > On 2022/08/16 3:35, John Garry wrote:
> > > On 16/08/2022 07:57, Oliver Sang wrote:
> > > > > > For me, a complete kernel log may help.
> > > > > and since only 1HDD, the output of the following would be helpful:
> > > > >
> > > > > /sys/block/sda/queue/max_sectors_kb
> > > > > /sys/block/sda/queue/max_hw_sectors_kb
> > > > >
> > > > > And for 5.19, if possible.
> > > > for commit
> > > > 0568e61225 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
> > > >
> > > > root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_sectors_kb
> > > > 512
> > > > root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_hw_sectors_kb
> > > > 512
> > > >
> > > > for both commit
> > > > 4cbfca5f77 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit")
> > > > and v5.19
> > > >
> > > > root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_sectors_kb
> > > > 1280
> > > > root@lkp-icl-2sp1 ~# cat /sys/block/sda/queue/max_hw_sectors_kb
> > > > 32767
> > > >
> > >
> > > thanks, I appreciate this.
> > >
> > > From the dmesg, I see 2x SATA disks - I was under the impression that
> > > the system only has 1x.
> > >
> > > Anyway, both drives show LBA48, which means the large max hw sectors at
> > > 32767KB:
> > > [ 31.129629][ T1146] ata6.00: 1562824368 sectors, multi 1: LBA48 NCQ
> > > (depth 32)
> > >
> > > So this is what I suspected: we are capped from the default shost max
> > > sectors (1024 sectors).
> > >
> > > This seems like the simplest fix for you:
> > >
> > > --- a/include/linux/libata.h
> > > +++ b/include/linux/libata.h
> > > @@ -1382,7 +1382,8 @@ extern const struct attribute_group
> > > *ata_common_sdev_groups[];
> > > .proc_name = drv_name, \
> > > .slave_destroy = ata_scsi_slave_destroy, \
> > > .bios_param = ata_std_bios_param, \
> > > - .unlock_native_capacity = ata_scsi_unlock_native_capacity
> > > + .unlock_native_capacity = ata_scsi_unlock_native_capacity,\
> > > + .max_sectors = ATA_MAX_SECTORS_LBA48
> >
> > This is crazy large (65535 x 512 B sectors) and never result in that being
> > exposed as the actual max_sectors_kb since other limits will apply first
> > (mapping size).
>
> Here is how I read values from above for max_sectors_kb and
> max_hw_sectors_kb:
>
> v5.19 + 0568e61225 : 512/512
> v5.19 + 0568e61225 + 4cbfca5f77 : 512/512
> v5.19: 1280/32767
>
> They are want makes sense to me, at least.
>
> Oliver, can you confirm this? Thanks!
I confirm below two:
v5.19 + 0568e61225 : 512/512
v5.19: 1280/32767 (as last already reported)
but below failed to build:
v5.19 + 0568e61225 + 4cbfca5f77
build_errors:
- "drivers/scsi/scsi_transport_sas.c:242:33: error: implicit declaration of function 'dma_opt_mapping_size'; did you mean 'dma_max_mapping_size'? [-Werror=implicit-function-declaration]"
- "drivers/scsi/scsi_transport_sas.c:241:24: error: 'struct Scsi_Host' has no member named 'opt_sectors'; did you mean 'max_sectors'?"
not sure if I understand this correctly?
for this, I just cherry-pick 0568e61225 upon v5.19,
then cherry-pick 4cbfca5f77 again.
so my branch looks like:
a11d8b97c3ecb8 v5.19 + 0568e61225 + 4cbfca5f77
1b59440cf71f99 v5.19 + 0568e61225
3d7cb6b04c3f31 (tag: v5.19,
did I do the right thing?
>
> On this basis, it appears that max_hw_sectors_kb is getting capped from scsi
> default @ 1024 sectors by commit 0568e61225. If it were getting capped by
> swiotlb mapping limit then that would give us 512 sectors - this value is
> fixed.
>
> So for my SHT change proposal I am just trying to revert to previous
> behaviour in 5.19 - make max_hw_sectors_kb crazy big again.
>
> >
> > The regression may come not from commands becoming tiny, but from the fact that
> > after the patch, max_sectors_kb is too large,
>
> I don't think it is, but need confirmation.
>
> > causing a lot of overhead with
> > qemu swiotlb mapping and slowing down IO processing.
>
> >
> > Above, it can be seen that we ed up with max_sectors_kb being 1280, which is the
> > default for most scsi disks (including ATA drives). That is normal. But before
> > that, it was 512, which likely better fits qemu swiotlb and does not generate
>
> Again, I don't think this this is the case. Need confirmation.
>
> > overhead. So the above fix will not change anything I think...
>
>
> Thanks,
> John
On 17/08/2022 14:51, Oliver Sang wrote:
Hi Oliver,
>> v5.19 + 0568e61225 : 512/512
>> v5.19 + 0568e61225 + 4cbfca5f77 : 512/512
>> v5.19: 1280/32767
>>
>> They are want makes sense to me, at least.
>>
>> Oliver, can you confirm this? Thanks!
> I confirm below two:
> v5.19 + 0568e61225 : 512/512
> v5.19: 1280/32767 (as last already reported)
ack
>
> but below failed to build:
> v5.19 + 0568e61225 + 4cbfca5f77
>
> build_errors:
> - "drivers/scsi/scsi_transport_sas.c:242:33: error: implicit declaration of function 'dma_opt_mapping_size'; did you mean 'dma_max_mapping_size'? [-Werror=implicit-function-declaration]"
> - "drivers/scsi/scsi_transport_sas.c:241:24: error: 'struct Scsi_Host' has no member named 'opt_sectors'; did you mean 'max_sectors'?"
>
> not sure if I understand this correctly?
> for this, I just cherry-pick 0568e61225 upon v5.19,
> then cherry-pick 4cbfca5f77 again.
> so my branch looks like:
>
> a11d8b97c3ecb8 v5.19 + 0568e61225 + 4cbfca5f77
> 1b59440cf71f99 v5.19 + 0568e61225
> 3d7cb6b04c3f31 (tag: v5.19,
>
> did I do the right thing?
Sorry but I was not really interested in 4cbfca5f77 and I see where that
build error is coming, but don't be concerned with it. However, for
avoidance of doubt, if you have results for vanilla v6.0-rc1 then that
would be appreciated.
I will also send a separate patch for testing if you don't mind.
thanks,
John
On 2022/08/16 13:44, John Garry wrote:
> On 16/08/2022 21:02, Damien Le Moal wrote:
>>> ou confirm this? Thanks!
>>>
>>> On this basis, it appears that max_hw_sectors_kb is getting capped from
>>> scsi default @ 1024 sectors by commit 0568e61225. If it were getting
>>> capped by swiotlb mapping limit then that would give us 512 sectors -
>>> this value is fixed.
>>>
>>> So for my SHT change proposal I am just trying to revert to previous
>>> behaviour in 5.19 - make max_hw_sectors_kb crazy big again.
>> I reread the entire thing and I think I got things reverted here. The perf
>> regression happens with the 512/512 settings, while the original 1280/32767
>> before your patches was OK.
>
> Right, that's as I read it. It would be useful for Oliver to confirm the
> results.
>
>> So is your patch defining the optimal mapping size
>> cause the reduction to 512/512.
>
> The optimal mapping size only affects specifically sas controllers, so I
> think that we can ignore that one for now. The reduction to 512/512
> comes from the change in ata_scsi_dev_config().
>
>> It would mean that for ATA, we need a sane
>> default mapping instead of SCSI default 1024 sectors.
>
> Right
>
>> Now I understand your
>> proposed change using ATA_MAX_SECTORS_LBA48.
>>
>> However, that would be correct only for LBA48 capable drives.
>> ata_dev_configure() already sets dev->max_sectors correctly according to the
>> drive type, capabilities and eventual quirks. So the problem comes from the
>> libata-scsi change:
>>
>> dev->max_sectors = min(dev->max_sectors, sdev->host->max_sectors);
>>
>> when sdev->host->max_sectors is 0 (not initialized).
>
> That cannot happen. If sht->max_sectors is 0, then we set
> shost->max_sectors at SCSI default 1024 sectors in scsi_host_alloc()
>
> For my proposed change, dev->max_sectors would still be initialized in
> ata_dev_configure() according to drive type, etc. And it should be <=
> LBA48 max sectors (=65535).
>
> So then in ata_scsi_dev_config():
>
> dev->max_sectors = min(dev->max_sectors, sdev->host->max_sectors)
>
> this only has an impact for ahci controllers if sdev->host->max_sectors
> was capped according to host dma dev max mapping size.
Got it. I think your fix is fine then. It brings everything the defaults to what
they were before the dma max mapping patches.
>
> I will admit that this is not ideal. An alt approach is to change
> ata_scsi_dev_config() to cap the dev max_sectors only according to shost
> dma dev mapping limit (similar to scsi_add_host_with_dma()), but that
> would not work for a controller like ipr, which does have a geniune
> max_sectors limit (which we should respect).
>
> Thanks,
> John
>
>
>> So maybe simply changing
>> this line to:
>>
>> dev->max_sectors = min_not_zero(dev->max_sectors, sdev->host->max_sectors);
>>
>> would do the trick ? Any particular adapter driver that needs a mapping cap on
>> the adpter max mapping size can still set sdev->host->max_sectors as needed, and
>> we keep the same defaults as before when it is not set. Thoughts ? Or am I
>> missing something else ?
>>
>>
>>>> The regression may come not from commands becoming tiny, but from the fact that
>>>> after the patch, max_sectors_kb is too large,
>>> I don't think it is, but need confirmation.
>>>
>>>> causing a lot of overhead with
>>>> qemu swiotlb mapping and slowing down IO processing.
>>>> Above, it can be seen that we ed up with max_sectors_kb being 1280, which is the
>>>> default for most scsi disks (including ATA drives). That is normal. But before
>>>> that, it was 512, which likely better fits qemu swiotlb and does not generate
>>> Again, I don't think this this is the case. Need confirmation.
>>>
>>>> overhead. So the above fix will not change anything I think...
>
--
Damien Le Moal
Western Digital Research
hi John,
On Wed, Aug 17, 2022 at 03:04:06PM +0100, John Garry wrote:
> On 17/08/2022 14:51, Oliver Sang wrote:
>
> Hi Oliver,
>
> > > v5.19 + 0568e61225 : 512/512
> > > v5.19 + 0568e61225 + 4cbfca5f77 : 512/512
> > > v5.19: 1280/32767
> > >
> > > They are want makes sense to me, at least.
> > >
> > > Oliver, can you confirm this? Thanks!
> > I confirm below two:
> > v5.19 + 0568e61225 : 512/512
> > v5.19: 1280/32767 (as last already reported)
>
> ack
>
> >
> > but below failed to build:
> > v5.19 + 0568e61225 + 4cbfca5f77
> >
> > build_errors:
> > - "drivers/scsi/scsi_transport_sas.c:242:33: error: implicit declaration of function 'dma_opt_mapping_size'; did you mean 'dma_max_mapping_size'? [-Werror=implicit-function-declaration]"
> > - "drivers/scsi/scsi_transport_sas.c:241:24: error: 'struct Scsi_Host' has no member named 'opt_sectors'; did you mean 'max_sectors'?"
> >
> > not sure if I understand this correctly?
> > for this, I just cherry-pick 0568e61225 upon v5.19,
> > then cherry-pick 4cbfca5f77 again.
> > so my branch looks like:
> >
> > a11d8b97c3ecb8 v5.19 + 0568e61225 + 4cbfca5f77
> > 1b59440cf71f99 v5.19 + 0568e61225
> > 3d7cb6b04c3f31 (tag: v5.19,
> >
> > did I do the right thing?
>
> Sorry but I was not really interested in 4cbfca5f77 and I see where that
> build error is coming, but don't be concerned with it. However, for
> avoidance of doubt, if you have results for vanilla v6.0-rc1 then that would
> be appreciated.
for v6.0-rc1, it's still 512/512
>
> I will also send a separate patch for testing if you don't mind.
sure! we are very glad that we could help.
>
> thanks,
> John
>
On 18/08/2022 03:06, Oliver Sang wrote:
Hi Oliver,
>>> did I do the right thing?
>> Sorry but I was not really interested in 4cbfca5f77 and I see where that
>> build error is coming, but don't be concerned with it. However, for
>> avoidance of doubt, if you have results for vanilla v6.0-rc1 then that would
>> be appreciated.
> for v6.0-rc1, it's still 512/512
>
>> I will also send a separate patch for testing if you don't mind.
> sure! we are very glad that we could help.
As you probably saw, I sent "[RFT PATCH] ata: libata: Set __ATA_BASE_SHT
max_sectors" for testing on top of v6.0-rc1, and I hope that then we can
get same performance as v5.19
Thanks,
John
On Thu, Aug 18, 2022 at 10:28:30AM +0100, John Garry wrote:
> On 18/08/2022 03:06, Oliver Sang wrote:
>
> Hi Oliver,
>
> > > > did I do the right thing?
> > > Sorry but I was not really interested in 4cbfca5f77 and I see where that
> > > build error is coming, but don't be concerned with it. However, for
> > > avoidance of doubt, if you have results for vanilla v6.0-rc1 then that would
> > > be appreciated.
> > for v6.0-rc1, it's still 512/512
> >
> > > I will also send a separate patch for testing if you don't mind.
> > sure! we are very glad that we could help.
>
> As you probably saw, I sent "[RFT PATCH] ata: libata: Set __ATA_BASE_SHT
> max_sectors" for testing on top of v6.0-rc1, and I hope that then we can get
> same performance as v5.19
yeah, our test confirmed your expectation:
stress-ng.copy-file.ops_per_sec
v5.19 - 26.85
v6.0-rc1 - 23.03
v6.0-rc1 + your patch - 26.94
>
> Thanks,
> John
On 19/08/2022 07:24, Oliver Sang wrote:
>> As you probably saw, I sent "[RFT PATCH] ata: libata: Set __ATA_BASE_SHT
>> max_sectors" for testing on top of v6.0-rc1, and I hope that then we can get
Based on result below, I wonder if SAS HBAs (which use libata) should
not use default SCSI max_sectors also. Any libsas HBA driver does today,
IIRC.
>> same performance as v5.19
> yeah, our test confirmed your expectation:
>
> stress-ng.copy-file.ops_per_sec
> v5.19 - 26.85
> v6.0-rc1 - 23.03
> v6.0-rc1 + your patch - 26.94
>
great, thanks
Thanks,
John
On 2022/08/18 23:24, Oliver Sang wrote:
> On Thu, Aug 18, 2022 at 10:28:30AM +0100, John Garry wrote:
>> On 18/08/2022 03:06, Oliver Sang wrote:
>>
>> Hi Oliver,
>>
>>>>> did I do the right thing?
>>>> Sorry but I was not really interested in 4cbfca5f77 and I see where that
>>>> build error is coming, but don't be concerned with it. However, for
>>>> avoidance of doubt, if you have results for vanilla v6.0-rc1 then that would
>>>> be appreciated.
>>> for v6.0-rc1, it's still 512/512
>>>
>>>> I will also send a separate patch for testing if you don't mind.
>>> sure! we are very glad that we could help.
>>
>> As you probably saw, I sent "[RFT PATCH] ata: libata: Set __ATA_BASE_SHT
>> max_sectors" for testing on top of v6.0-rc1, and I hope that then we can get
>> same performance as v5.19
>
> yeah, our test confirmed your expectation:
>
> stress-ng.copy-file.ops_per_sec
> v5.19 - 26.85
> v6.0-rc1 - 23.03
> v6.0-rc1 + your patch - 26.94
Thanks for testing Oliver. I pushed the fix and sent it out as an rc2 fix.
>
>>
>> Thanks,
>> John
--
Damien Le Moal
Western Digital Research