2021-09-02 07:13:03

by Oliver Sang

[permalink] [raw]
Subject: [xfs] bad77c375e: stress-ng.fallocate.ops_per_sec -10.0% regression



Greeting,

FYI, we noticed a -10.0% regression of stress-ng.fallocate.ops_per_sec due to commit:


commit: bad77c375e8de6c776c848e443f7dc2d0d909be5 ("xfs: CIL checkpoint flushes caches unconditionally")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: stress-ng
on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 512G memory
with following parameters:

nr_threads: 10%
disk: 1HDD
testtime: 60s
fs: xfs
class: filesystem
test: fallocate
cpufreq_governor: performance
ucode: 0x5003006


In addition to that, the commit also has significant impact on the following tests:



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
bin/lkp run generated-yaml-file

=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
filesystem/gcc-9/performance/1HDD/xfs/x86_64-rhel-8.3/10%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp7/fallocate/stress-ng/60s/0x5003006

commit:
0431d926b3 ("xfs: async blkdev cache flush")
bad77c375e ("xfs: CIL checkpoint flushes caches unconditionally")

0431d926b399d74f bad77c375e8de6c776c848e443f
---------------- ---------------------------
%stddev %change %stddev
\ | \
618.67 -10.0% 556.67 ? 2% stress-ng.fallocate.ops
10.31 -10.0% 9.28 ? 2% stress-ng.fallocate.ops_per_sec
111618 -5.7% 105270 stress-ng.time.voluntary_context_switches
502.67 ? 2% +52.0% 763.83 ? 47% interrupts.CPU44.CAL:Function_call_interrupts
2.10 ? 2% -25.1% 1.57 iostat.cpu.iowait
2.16 ? 2% -0.5 1.62 mpstat.cpu.all.iowait%
0.09 ? 11% +45.0% 0.14 ? 34% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork
151.00 ? 19% +20.2% 181.50 ? 7% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork
6749 +6.4% 7178 perf-stat.i.context-switches
6640 +6.4% 7063 perf-stat.ps.context-switches
353877 -1.4% 348924 proc-vmstat.pgfree
39122 -15.1% 33221 proc-vmstat.pgpgout
1225 ? 14% -9.0% 1115 turbostat.Bzy_MHz
86.82 ? 17% -10.4% 77.83 turbostat.PkgWatt
602.83 -15.1% 511.83 vmstat.io.bo
6953 +6.1% 7376 vmstat.system.cs
1.33 ? 6% +0.4 1.70 ? 13% perf-profile.calltrace.cycles-pp.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
0.08 ? 22% +0.1 0.17 ? 23% perf-profile.children.cycles-pp.wait_for_completion
0.07 ? 17% +0.1 0.19 ? 64% perf-profile.children.cycles-pp.submit_bio
0.07 ? 17% +0.1 0.19 ? 64% perf-profile.children.cycles-pp.submit_bio_noacct
1.40 ? 8% +0.3 1.74 ? 12% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler



stress-ng.fallocate.ops_per_sec

12 +----------------------------------------------------------------------+
| |
10 |.+.++.+.+.+ +.+.+.++.+.+.++.+.+.+.++.+.+.++.+.+.++.+.+.+.++.+.+.++.+.|
| O : : O O OO O O O OO |
| OO O :O : O O O O |
8 |-+ O : : O O OO O O O O O O OO O |
| : : |
6 |-+ : : |
| :: |
4 |-+ :: |
| :: |
| :: |
2 |-+ : |
| : |
0 +----------------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (5.72 kB)
config-5.13.0-rc4-00090-gbad77c375e8d (176.80 kB)
job-script (8.48 kB)
job.yaml (5.78 kB)
reproduce (574.00 B)
Download all attachments

2021-09-03 10:34:46

by Xing Zhengjun

[permalink] [raw]
Subject: Re: [xfs] bad77c375e: stress-ng.fallocate.ops_per_sec -10.0% regression

Hi Dave,

Do you have time to look at this? Thanks.

On 9/2/2021 3:27 PM, kernel test robot wrote:
>
>
> Greeting,
>
> FYI, we noticed a -10.0% regression of stress-ng.fallocate.ops_per_sec due to commit:
>
>
> commit: bad77c375e8de6c776c848e443f7dc2d0d909be5 ("xfs: CIL checkpoint flushes caches unconditionally")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
>
> in testcase: stress-ng
> on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 512G memory
> with following parameters:
>
> nr_threads: 10%
> disk: 1HDD
> testtime: 60s
> fs: xfs
> class: filesystem
> test: fallocate
> cpufreq_governor: performance
> ucode: 0x5003006
>
>
> In addition to that, the commit also has significant impact on the following tests:
>
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this email
> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> bin/lkp run generated-yaml-file
>
> =========================================================================================
> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
> filesystem/gcc-9/performance/1HDD/xfs/x86_64-rhel-8.3/10%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp7/fallocate/stress-ng/60s/0x5003006
>
> commit:
> 0431d926b3 ("xfs: async blkdev cache flush")
> bad77c375e ("xfs: CIL checkpoint flushes caches unconditionally")
>
> 0431d926b399d74f bad77c375e8de6c776c848e443f
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 618.67 -10.0% 556.67 ? 2% stress-ng.fallocate.ops
> 10.31 -10.0% 9.28 ? 2% stress-ng.fallocate.ops_per_sec
> 111618 -5.7% 105270 stress-ng.time.voluntary_context_switches
> 502.67 ? 2% +52.0% 763.83 ? 47% interrupts.CPU44.CAL:Function_call_interrupts
> 2.10 ? 2% -25.1% 1.57 iostat.cpu.iowait
> 2.16 ? 2% -0.5 1.62 mpstat.cpu.all.iowait%
> 0.09 ? 11% +45.0% 0.14 ? 34% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork
> 151.00 ? 19% +20.2% 181.50 ? 7% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork
> 6749 +6.4% 7178 perf-stat.i.context-switches
> 6640 +6.4% 7063 perf-stat.ps.context-switches
> 353877 -1.4% 348924 proc-vmstat.pgfree
> 39122 -15.1% 33221 proc-vmstat.pgpgout
> 1225 ? 14% -9.0% 1115 turbostat.Bzy_MHz
> 86.82 ? 17% -10.4% 77.83 turbostat.PkgWatt
> 602.83 -15.1% 511.83 vmstat.io.bo
> 6953 +6.1% 7376 vmstat.system.cs
> 1.33 ? 6% +0.4 1.70 ? 13% perf-profile.calltrace.cycles-pp.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
> 0.08 ? 22% +0.1 0.17 ? 23% perf-profile.children.cycles-pp.wait_for_completion
> 0.07 ? 17% +0.1 0.19 ? 64% perf-profile.children.cycles-pp.submit_bio
> 0.07 ? 17% +0.1 0.19 ? 64% perf-profile.children.cycles-pp.submit_bio_noacct
> 1.40 ? 8% +0.3 1.74 ? 12% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
>
>
>
> stress-ng.fallocate.ops_per_sec
>
> 12 +----------------------------------------------------------------------+
> | |
> 10 |.+.++.+.+.+ +.+.+.++.+.+.++.+.+.+.++.+.+.++.+.+.++.+.+.+.++.+.+.++.+.|
> | O : : O O OO O O O OO |
> | OO O :O : O O O O |
> 8 |-+ O : : O O OO O O O O O O OO O |
> | : : |
> 6 |-+ : : |
> | :: |
> 4 |-+ :: |
> | :: |
> | :: |
> 2 |-+ : |
> | : |
> 0 +----------------------------------------------------------------------+
>
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> ---
> 0DAY/LKP+ Test Infrastructure Open Source Technology Center
> https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
>
> Thanks,
> Oliver Sang
>

--
Zhengjun Xing

2021-11-30 06:46:13

by Xing Zhengjun

[permalink] [raw]
Subject: Re: [LKP] Re: [xfs] bad77c375e: stress-ng.fallocate.ops_per_sec -10.0% regression

Hi Dave,

Do you have time to look at this? It still existed in v5.16-rc3. Thanks

On 9/3/2021 3:40 PM, Xing Zhengjun wrote:
> Hi Dave,
>
>    Do you have time to look at this? Thanks.
>
> On 9/2/2021 3:27 PM, kernel test robot wrote:
>>
>>
>> Greeting,
>>
>> FYI, we noticed a -10.0% regression of stress-ng.fallocate.ops_per_sec
>> due to commit:
>>
>>
>> commit: bad77c375e8de6c776c848e443f7dc2d0d909be5 ("xfs: CIL checkpoint
>> flushes caches unconditionally")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>
>>
>> in testcase: stress-ng
>> on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @
>> 2.10GHz with 512G memory
>> with following parameters:
>>
>>     nr_threads: 10%
>>     disk: 1HDD
>>     testtime: 60s
>>     fs: xfs
>>     class: filesystem
>>     test: fallocate
>>     cpufreq_governor: performance
>>     ucode: 0x5003006
>>
>>
>> In addition to that, the commit also has significant impact on the
>> following tests:
>>
>>
>>
>> If you fix the issue, kindly add following tag
>> Reported-by: kernel test robot <[email protected]>
>>
>>
>> Details are as below:
>> -------------------------------------------------------------------------------------------------->
>>
>>
>>
>> To reproduce:
>>
>>          git clone https://github.com/intel/lkp-tests.git
>>          cd lkp-tests
>>          bin/lkp install                job.yaml  # job file is
>> attached in this email
>>          bin/lkp split-job --compatible job.yaml  # generate the yaml
>> file for lkp run
>>          bin/lkp run                    generated-yaml-file
>>
>> =========================================================================================
>>
>> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
>>
>>
>> filesystem/gcc-9/performance/1HDD/xfs/x86_64-rhel-8.3/10%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp7/fallocate/stress-ng/60s/0x5003006
>>
>>
>> commit:
>>    0431d926b3 ("xfs: async blkdev cache flush")
>>    bad77c375e ("xfs: CIL checkpoint flushes caches unconditionally")
>>
>> 0431d926b399d74f bad77c375e8de6c776c848e443f
>> ---------------- ---------------------------
>>           %stddev     %change         %stddev
>>               \          |                \
>>      618.67           -10.0%     556.67 ±  2%  stress-ng.fallocate.ops
>>       10.31           -10.0%       9.28 ±  2%
>> stress-ng.fallocate.ops_per_sec
>>      111618            -5.7%     105270
>> stress-ng.time.voluntary_context_switches
>>      502.67 ±  2%     +52.0%     763.83 ± 47%
>> interrupts.CPU44.CAL:Function_call_interrupts
>>        2.10 ±  2%     -25.1%       1.57        iostat.cpu.iowait
>>        2.16 ±  2%      -0.5        1.62        mpstat.cpu.all.iowait%
>>        0.09 ± 11%     +45.0%       0.14 ± 34%
>> perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork
>>      151.00 ± 19%     +20.2%     181.50 ±  7%
>> perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork
>>        6749            +6.4%       7178
>> perf-stat.i.context-switches
>>        6640            +6.4%       7063
>> perf-stat.ps.context-switches
>>      353877            -1.4%     348924        proc-vmstat.pgfree
>>       39122           -15.1%      33221        proc-vmstat.pgpgout
>>        1225 ± 14%      -9.0%       1115        turbostat.Bzy_MHz
>>       86.82 ± 17%     -10.4%      77.83        turbostat.PkgWatt
>>      602.83           -15.1%     511.83        vmstat.io.bo
>>        6953            +6.1%       7376        vmstat.system.cs
>>        1.33 ±  6%      +0.4        1.70 ± 13%
>> perf-profile.calltrace.cycles-pp.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
>>
>>        0.08 ± 22%      +0.1        0.17 ± 23%
>> perf-profile.children.cycles-pp.wait_for_completion
>>        0.07 ± 17%      +0.1        0.19 ± 64%
>> perf-profile.children.cycles-pp.submit_bio
>>        0.07 ± 17%      +0.1        0.19 ± 64%
>> perf-profile.children.cycles-pp.submit_bio_noacct
>>        1.40 ±  8%      +0.3        1.74 ± 12%
>> perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
>>
>>
>>                           stress-ng.fallocate.ops_per_sec
>>    12
>> +----------------------------------------------------------------------+
>>
>> |                                                                      |
>>    10 |.+.++.+.+.+
>> +.+.+.++.+.+.++.+.+.+.++.+.+.++.+.+.++.+.+.+.++.+.+.++.+.|
>>       | O        :  :                             O O   OO O O O
>> OO          |
>>       |   OO O   :O :            O             O O
>> O                      |
>>     8 |-+      O :  : O O OO O O  O O O O OO
>> O                               |
>>       |          :
>> :                                                         |
>>     6 |-+        :
>> :                                                         |
>>       |
>> ::                                                         |
>>     4 |-+
>> ::                                                         |
>>       |
>> ::                                                         |
>>       |
>> ::                                                         |
>>     2 |-+
>> :                                                          |
>>       |
>> :                                                          |
>>     0
>> +----------------------------------------------------------------------+
>> [*] bisect-good sample
>> [O] bisect-bad  sample
>>
>>
>>
>>
>> Disclaimer:
>> Results have been estimated based on internal Intel analysis and are
>> provided
>> for informational purposes only. Any difference in system hardware or
>> software
>> design or configuration may affect actual performance.
>>
>>
>> ---
>> 0DAY/LKP+ Test Infrastructure                   Open Source Technology
>> Center
>> https://lists.01.org/hyperkitty/list/[email protected]       Intel
>> Corporation
>>
>> Thanks,
>> Oliver Sang
>>
>

--
Zhengjun Xing

2021-12-02 22:05:05

by Dave Chinner

[permalink] [raw]
Subject: Re: [LKP] Re: [xfs] bad77c375e: stress-ng.fallocate.ops_per_sec -10.0% regression

On Tue, Nov 30, 2021 at 02:46:06PM +0800, Xing Zhengjun wrote:
> Hi Dave,
>
> Do you have time to look at this? It still existed in v5.16-rc3. Thanks

AFAIC, it's a "don't care" issue.

The series of commits around this one:

> > > FYI, we noticed a -10.0% regression of
> > > stress-ng.fallocate.ops_per_sec due to commit:
> > >
> > >
> > > commit: bad77c375e8de6c776c848e443f7dc2d0d909be5 ("xfs: CIL
> > > checkpoint flushes caches unconditionally")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

changed how we manage FUA/cache flushes for the journal changed
performance across a wide range of workloads. Many went a lot
faster, some (like this one) went slightly slower. Overall it was a
net win, especially on storage stacks with really slow cache flushes
(e.g. dm-thinp) and workloads that do a lot of concurrent metadata
modification.

Overall, fallocate is not a performance critical path - it's a
slowpath because it serialises all IO to that file while the
fallocate call runs. Hence performance characteristics for fallocate
aren't really a major concern to begin with...

-Dave.
--
Dave Chinner
[email protected]