LinuxLists.cc - [blk] 8c5035dfbb: fio.read

2022-10-08 03:22:48

Subject: [blk] 8c5035dfbb: fio.read_iops -10.6% regression

Greeting,

FYI, we noticed a -10.6% regression of fio.read_iops due to commit:

commit: 8c5035dfbb9475b67c82b3fdb7351236525bf52b ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: fio-basic
on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 192G memory
with following parameters:

runtime: 300s
nr_task: 8t
disk: 1SSD
fs: btrfs
rw: randread
bs: 2M
ioengine: sync
test_size: 256g
cpufreq_governor: performance

test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
test-url: https://github.com/axboe/fio

Details are as below:

=========================================================================================
bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase:
2M/gcc-11/performance/1SSD/btrfs/sync/x86_64-rhel-8.3/8t/debian-11.1-x86_64-20220510.cgz/300s/randread/lkp-csl-2ap4/256g/fio-basic

commit:
f7de4886fe ("rnbd-srv: remove struct rnbd_dev")
8c5035dfbb ("blk-wbt: call rq_qos_add() after wb_normal is initialized")

f7de4886fe8f008a 8c5035dfbb9475b67c82b3fdb73
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.03 ?106% +0.2 0.22 ? 80% fio.latency_20ms%
0.02 ? 33% -0.0 0.01 ? 12% fio.latency_4ms%
2508 -10.6% 2243 fio.read_bw_MBps
6717440 +17.6% 7897088 fio.read_clat_90%_us
6892202 +19.0% 8202922 fio.read_clat_95%_us
7602176 ? 4% +18.4% 9000277 ? 3% fio.read_clat_99%_us
6374238 +11.8% 7127450 fio.read_clat_mean_us
363825 ? 10% +74.9% 636378 ? 5% fio.read_clat_stddev
1254 -10.6% 1121 fio.read_iops
104.97 +11.8% 117.32 fio.time.elapsed_time
104.97 +11.8% 117.32 fio.time.elapsed_time.max
13731 +5.6% 14498 ? 4% fio.time.maximum_resident_set_size
116.00 -8.2% 106.50 fio.time.percent_of_cpu_this_job_got
1.998e+10 +11.4% 2.226e+10 cpuidle..time
3.27 ? 3% +4.6% 3.42 iostat.cpu.iowait
4.49 ? 68% -2.1 2.38 ?152% perf-profile.children.cycles-pp.number
4.49 ? 68% -2.5 1.98 ?175% perf-profile.self.cycles-pp.number
557763 +5.4% 587781 proc-vmstat.pgfault
25488 +3.1% 26274 proc-vmstat.pgreuse
2459048 -10.1% 2209482 vmstat.io.bi
184649 ? 5% -10.4% 165526 ? 7% vmstat.system.cs
111733 ? 30% +61.8% 180770 ? 21% numa-meminfo.node0.AnonPages
113221 ? 30% +60.2% 181416 ? 21% numa-meminfo.node0.Inactive(anon)
11301 ? 24% +164.5% 29888 ?117% numa-meminfo.node2.Active(file)
104911 ? 39% -80.5% 20456 ?100% numa-meminfo.node3.AnonHugePages
131666 ? 27% -67.9% 42297 ? 82% numa-meminfo.node3.AnonPages
132698 ? 26% -67.5% 43158 ? 81% numa-meminfo.node3.Inactive(anon)
27934 ? 30% +61.8% 45196 ? 21% numa-vmstat.node0.nr_anon_pages
28306 ? 30% +60.2% 45358 ? 21% numa-vmstat.node0.nr_inactive_anon
28305 ? 30% +60.2% 45357 ? 21% numa-vmstat.node0.nr_zone_inactive_anon
6291 ? 24% +68.0% 10567 ? 26% numa-vmstat.node2.workingset_nodes
32925 ? 27% -67.9% 10571 ? 82% numa-vmstat.node3.nr_anon_pages
33182 ? 26% -67.5% 10786 ? 81% numa-vmstat.node3.nr_inactive_anon
33182 ? 26% -67.5% 10786 ? 81% numa-vmstat.node3.nr_zone_inactive_anon
161.78 ? 4% -28.2% 116.10 ? 30% sched_debug.cfs_rq:/.runnable_avg.avg
161.46 ? 4% -28.2% 115.85 ? 30% sched_debug.cfs_rq:/.util_avg.avg
426382 +11.0% 473345 ? 6% sched_debug.cpu.clock.avg
426394 +11.0% 473357 ? 6% sched_debug.cpu.clock.max
426370 +11.0% 473331 ? 6% sched_debug.cpu.clock.min
426139 +10.9% 472586 ? 6% sched_debug.cpu.clock_task.avg
426368 +11.0% 473130 ? 6% sched_debug.cpu.clock_task.max
416196 +11.1% 462228 ? 6% sched_debug.cpu.clock_task.min
1156 ? 7% -10.8% 1031 ? 6% sched_debug.cpu.curr->pid.stddev
426372 +11.0% 473334 ? 6% sched_debug.cpu_clk
425355 +11.0% 472318 ? 6% sched_debug.ktime
426826 +11.0% 473787 ? 6% sched_debug.sched_clk
1.263e+09 -7.9% 1.164e+09 ? 3% perf-stat.i.branch-instructions
190886 ? 5% -10.8% 170290 ? 7% perf-stat.i.context-switches
1.979e+09 -8.8% 1.804e+09 ? 2% perf-stat.i.dTLB-loads
8.998e+08 -8.2% 8.257e+08 ? 2% perf-stat.i.dTLB-stores
6.455e+09 -8.0% 5.938e+09 ? 3% perf-stat.i.instructions
21.78 -8.4% 19.95 perf-stat.i.metric.M/sec
7045315 ? 4% -14.0% 6057863 ? 6% perf-stat.i.node-load-misses
2658563 ? 7% -21.9% 2077647 ? 12% perf-stat.i.node-loads
414822 ? 4% -12.9% 361455 ? 3% perf-stat.i.node-store-misses
1.251e+09 -7.8% 1.154e+09 ? 3% perf-stat.ps.branch-instructions
189082 ? 5% -10.7% 168849 ? 7% perf-stat.ps.context-switches
1.96e+09 -8.8% 1.789e+09 ? 2% perf-stat.ps.dTLB-loads
8.912e+08 -8.1% 8.187e+08 ? 2% perf-stat.ps.dTLB-stores
6.393e+09 -7.9% 5.888e+09 ? 3% perf-stat.ps.instructions
6978485 ? 4% -13.9% 6006510 ? 6% perf-stat.ps.node-load-misses
2633627 ? 7% -21.8% 2060033 ? 12% perf-stat.ps.node-loads
410822 ? 4% -12.8% 358289 ? 3% perf-stat.ps.node-store-misses

If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/r/[email protected]

To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.

Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.

--
0-DAY CI Kernel Test Service
https://01.org/lkp

Attachments:

(No filename) (6.90 kB)
config-6.0.0-rc2-00077-g8c5035dfbb94 (166.40 kB)
job-script (8.23 kB)
job.yaml (5.66 kB)
reproduce (771.00 B)
Download all attachments

2022-10-08 08:22:01

by Yu Kuai

[permalink] [raw]

Subject: Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression

Hi,

在 2022/10/08 10:50, kernel test robot 写道:
> Greeting,
>
> FYI, we noticed a -10.6% regression of fio.read_iops due to commit:

I don't know how this is working but I'm *sure* this commit won't affect
performance. Please take a look at the commit, only wbt initialization
is touched, which is done while creating the device:

device_add_disk
blk_register_queue
wbt_enable_default
wbt_init

And io path is the same with or without this commit.

By the way, wbt should only work for write.

Thanks,
Kuai
>
> commit: 8c5035dfbb9475b67c82b3fdb7351236525bf52b ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: fio-basic
> on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 192G memory
> with following parameters:
>
> runtime: 300s
> nr_task: 8t
> disk: 1SSD
> fs: btrfs
> rw: randread
> bs: 2M
> ioengine: sync
> test_size: 256g
> cpufreq_governor: performance
>
> test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
> test-url: https://github.com/axboe/fio
>
>
> Details are as below:
>
> =========================================================================================
> bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase:
> 2M/gcc-11/performance/1SSD/btrfs/sync/x86_64-rhel-8.3/8t/debian-11.1-x86_64-20220510.cgz/300s/randread/lkp-csl-2ap4/256g/fio-basic
>
> commit:
> f7de4886fe ("rnbd-srv: remove struct rnbd_dev")
> 8c5035dfbb ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
>
> f7de4886fe8f008a 8c5035dfbb9475b67c82b3fdb73
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 0.03 ±106% +0.2 0.22 ± 80% fio.latency_20ms%
> 0.02 ± 33% -0.0 0.01 ± 12% fio.latency_4ms%
> 2508 -10.6% 2243 fio.read_bw_MBps
> 6717440 +17.6% 7897088 fio.read_clat_90%_us
> 6892202 +19.0% 8202922 fio.read_clat_95%_us
> 7602176 ± 4% +18.4% 9000277 ± 3% fio.read_clat_99%_us
> 6374238 +11.8% 7127450 fio.read_clat_mean_us
> 363825 ± 10% +74.9% 636378 ± 5% fio.read_clat_stddev
> 1254 -10.6% 1121 fio.read_iops
> 104.97 +11.8% 117.32 fio.time.elapsed_time
> 104.97 +11.8% 117.32 fio.time.elapsed_time.max
> 13731 +5.6% 14498 ± 4% fio.time.maximum_resident_set_size
> 116.00 -8.2% 106.50 fio.time.percent_of_cpu_this_job_got
> 1.998e+10 +11.4% 2.226e+10 cpuidle..time
> 3.27 ± 3% +4.6% 3.42 iostat.cpu.iowait
> 4.49 ± 68% -2.1 2.38 ±152% perf-profile.children.cycles-pp.number
> 4.49 ± 68% -2.5 1.98 ±175% perf-profile.self.cycles-pp.number
> 557763 +5.4% 587781 proc-vmstat.pgfault
> 25488 +3.1% 26274 proc-vmstat.pgreuse
> 2459048 -10.1% 2209482 vmstat.io.bi
> 184649 ± 5% -10.4% 165526 ± 7% vmstat.system.cs
> 111733 ± 30% +61.8% 180770 ± 21% numa-meminfo.node0.AnonPages
> 113221 ± 30% +60.2% 181416 ± 21% numa-meminfo.node0.Inactive(anon)
> 11301 ± 24% +164.5% 29888 ±117% numa-meminfo.node2.Active(file)
> 104911 ± 39% -80.5% 20456 ±100% numa-meminfo.node3.AnonHugePages
> 131666 ± 27% -67.9% 42297 ± 82% numa-meminfo.node3.AnonPages
> 132698 ± 26% -67.5% 43158 ± 81% numa-meminfo.node3.Inactive(anon)
> 27934 ± 30% +61.8% 45196 ± 21% numa-vmstat.node0.nr_anon_pages
> 28306 ± 30% +60.2% 45358 ± 21% numa-vmstat.node0.nr_inactive_anon
> 28305 ± 30% +60.2% 45357 ± 21% numa-vmstat.node0.nr_zone_inactive_anon
> 6291 ± 24% +68.0% 10567 ± 26% numa-vmstat.node2.workingset_nodes
> 32925 ± 27% -67.9% 10571 ± 82% numa-vmstat.node3.nr_anon_pages
> 33182 ± 26% -67.5% 10786 ± 81% numa-vmstat.node3.nr_inactive_anon
> 33182 ± 26% -67.5% 10786 ± 81% numa-vmstat.node3.nr_zone_inactive_anon
> 161.78 ± 4% -28.2% 116.10 ± 30% sched_debug.cfs_rq:/.runnable_avg.avg
> 161.46 ± 4% -28.2% 115.85 ± 30% sched_debug.cfs_rq:/.util_avg.avg
> 426382 +11.0% 473345 ± 6% sched_debug.cpu.clock.avg
> 426394 +11.0% 473357 ± 6% sched_debug.cpu.clock.max
> 426370 +11.0% 473331 ± 6% sched_debug.cpu.clock.min
> 426139 +10.9% 472586 ± 6% sched_debug.cpu.clock_task.avg
> 426368 +11.0% 473130 ± 6% sched_debug.cpu.clock_task.max
> 416196 +11.1% 462228 ± 6% sched_debug.cpu.clock_task.min
> 1156 ± 7% -10.8% 1031 ± 6% sched_debug.cpu.curr->pid.stddev
> 426372 +11.0% 473334 ± 6% sched_debug.cpu_clk
> 425355 +11.0% 472318 ± 6% sched_debug.ktime
> 426826 +11.0% 473787 ± 6% sched_debug.sched_clk
> 1.263e+09 -7.9% 1.164e+09 ± 3% perf-stat.i.branch-instructions
> 190886 ± 5% -10.8% 170290 ± 7% perf-stat.i.context-switches
> 1.979e+09 -8.8% 1.804e+09 ± 2% perf-stat.i.dTLB-loads
> 8.998e+08 -8.2% 8.257e+08 ± 2% perf-stat.i.dTLB-stores
> 6.455e+09 -8.0% 5.938e+09 ± 3% perf-stat.i.instructions
> 21.78 -8.4% 19.95 perf-stat.i.metric.M/sec
> 7045315 ± 4% -14.0% 6057863 ± 6% perf-stat.i.node-load-misses
> 2658563 ± 7% -21.9% 2077647 ± 12% perf-stat.i.node-loads
> 414822 ± 4% -12.9% 361455 ± 3% perf-stat.i.node-store-misses
> 1.251e+09 -7.8% 1.154e+09 ± 3% perf-stat.ps.branch-instructions
> 189082 ± 5% -10.7% 168849 ± 7% perf-stat.ps.context-switches
> 1.96e+09 -8.8% 1.789e+09 ± 2% perf-stat.ps.dTLB-loads
> 8.912e+08 -8.1% 8.187e+08 ± 2% perf-stat.ps.dTLB-stores
> 6.393e+09 -7.9% 5.888e+09 ± 3% perf-stat.ps.instructions
> 6978485 ± 4% -13.9% 6006510 ± 6% perf-stat.ps.node-load-misses
> 2633627 ± 7% -21.8% 2060033 ± 12% perf-stat.ps.node-loads
> 410822 ± 4% -12.8% 358289 ± 3% perf-stat.ps.node-store-misses
>
>
> If you fix the issue, kindly add following tag
> | Reported-by: kernel test robot <[email protected]>
> | Link: https://lore.kernel.org/r/[email protected]
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> sudo bin/lkp install job.yaml # job file is attached in this email
> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> sudo bin/lkp run generated-yaml-file
>
> # if come across any failure that blocks the test,
> # please remove ~/.lkp and /lkp dir to run from a clean state.
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>

2022-10-09 06:46:00

by Yin, Fengwei

[permalink] [raw]

Subject: Re: [LKP] Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression

Hi Kuai,

On 10/8/22 16:00, Yu Kuai wrote:
> Hi,
>
> 在 2022/10/08 10:50, kernel test robot 写道:
>> Greeting,
>>
>> FYI, we noticed a -10.6% regression of fio.read_iops due to commit:
>
> I don't know how this is working but I'm *sure* this commit won't affect
> performance. Please take a look at the commit, only wbt initialization
> is touched, which is done while creating the device:
>
> device_add_disk
> blk_register_queue
> wbt_enable_default
>    wbt_init
>
> And io path is the same with or without this commit.
>
> By the way, wbt should only work for write.
Some information here:
It looks like the line
wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags));
matters.

If move only this line to original position based on 8c5035dfbb,
the regression is gone.

If move only this line before ret = rq_qos_add() (just like your patch
did, but only with this line) based on 8c5035dfbb, the regression can
be reproduced.

Regards
Yin, Fengwei

>
> Thanks,
> Kuai
>>
>> commit: 8c5035dfbb9475b67c82b3fdb7351236525bf52b ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>
>> in testcase: fio-basic
>> on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 192G memory
>> with following parameters:
>>
>>     runtime: 300s
>>     nr_task: 8t
>>     disk: 1SSD
>>     fs: btrfs
>>     rw: randread
>>     bs: 2M
>>     ioengine: sync
>>     test_size: 256g
>>     cpufreq_governor: performance
>>
>> test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
>> test-url: https://github.com/axboe/fio
>>
>>
>> Details are as below:
>>
>> =========================================================================================
>> bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase:
>>    2M/gcc-11/performance/1SSD/btrfs/sync/x86_64-rhel-8.3/8t/debian-11.1-x86_64-20220510.cgz/300s/randread/lkp-csl-2ap4/256g/fio-basic
>>
>> commit:
>>    f7de4886fe ("rnbd-srv: remove struct rnbd_dev")
>>    8c5035dfbb ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
>>
>> f7de4886fe8f008a 8c5035dfbb9475b67c82b3fdb73
>> ---------------- ---------------------------
>>           %stddev     %change         %stddev
>>               \          |                \
>>        0.03 ±106%      +0.2        0.22 ± 80% fio.latency_20ms%
>>        0.02 ± 33%      -0.0        0.01 ± 12% fio.latency_4ms%
>>        2508           -10.6%       2243        fio.read_bw_MBps
>>     6717440           +17.6%    7897088        fio.read_clat_90%_us
>>     6892202           +19.0%    8202922        fio.read_clat_95%_us
>>     7602176 ± 4%     +18.4%    9000277 ± 3% fio.read_clat_99%_us
>>     6374238           +11.8%    7127450        fio.read_clat_mean_us
>>      363825 ± 10%     +74.9%     636378 ± 5% fio.read_clat_stddev
>>        1254           -10.6%       1121        fio.read_iops
>>      104.97           +11.8%     117.32        fio.time.elapsed_time
>>      104.97           +11.8%     117.32        fio.time.elapsed_time.max
>>       13731            +5.6%      14498 ± 4% fio.time.maximum_resident_set_size
>>      116.00            -8.2%     106.50        fio.time.percent_of_cpu_this_job_got
>> 1.998e+10           +11.4% 2.226e+10        cpuidle..time
>>        3.27 ± 3%      +4.6%       3.42        iostat.cpu.iowait
>>        4.49 ± 68%      -2.1        2.38 ±152% perf-profile.children.cycles-pp.number
>>        4.49 ± 68%      -2.5        1.98 ±175% perf-profile.self.cycles-pp.number
>>      557763            +5.4%     587781        proc-vmstat.pgfault
>>       25488            +3.1%      26274        proc-vmstat.pgreuse
>>     2459048           -10.1%    2209482        vmstat.io.bi
>>      184649 ± 5%     -10.4%     165526 ± 7% vmstat.system.cs
>>      111733 ± 30%     +61.8%     180770 ± 21% numa-meminfo.node0.AnonPages
>>      113221 ± 30%     +60.2%     181416 ± 21% numa-meminfo.node0.Inactive(anon)
>>       11301 ± 24%    +164.5%      29888 ±117% numa-meminfo.node2.Active(file)
>>      104911 ± 39%     -80.5%      20456 ±100% numa-meminfo.node3.AnonHugePages
>>      131666 ± 27%     -67.9%      42297 ± 82% numa-meminfo.node3.AnonPages
>>      132698 ± 26%     -67.5%      43158 ± 81% numa-meminfo.node3.Inactive(anon)
>>       27934 ± 30%     +61.8%      45196 ± 21% numa-vmstat.node0.nr_anon_pages
>>       28306 ± 30%     +60.2%      45358 ± 21% numa-vmstat.node0.nr_inactive_anon
>>       28305 ± 30%     +60.2%      45357 ± 21% numa-vmstat.node0.nr_zone_inactive_anon
>>        6291 ± 24%     +68.0%      10567 ± 26% numa-vmstat.node2.workingset_nodes
>>       32925 ± 27%     -67.9%      10571 ± 82% numa-vmstat.node3.nr_anon_pages
>>       33182 ± 26%     -67.5%      10786 ± 81% numa-vmstat.node3.nr_inactive_anon
>>       33182 ± 26%     -67.5%      10786 ± 81% numa-vmstat.node3.nr_zone_inactive_anon
>>      161.78 ± 4%     -28.2%     116.10 ± 30% sched_debug.cfs_rq:/.runnable_avg.avg
>>      161.46 ± 4%     -28.2%     115.85 ± 30% sched_debug.cfs_rq:/.util_avg.avg
>>      426382           +11.0%     473345 ± 6% sched_debug.cpu.clock.avg
>>      426394           +11.0%     473357 ± 6% sched_debug.cpu.clock.max
>>      426370           +11.0%     473331 ± 6% sched_debug.cpu.clock.min
>>      426139           +10.9%     472586 ± 6% sched_debug.cpu.clock_task.avg
>>      426368           +11.0%     473130 ± 6% sched_debug.cpu.clock_task.max
>>      416196           +11.1%     462228 ± 6% sched_debug.cpu.clock_task.min
>>        1156 ± 7%     -10.8%       1031 ± 6% sched_debug.cpu.curr->pid.stddev
>>      426372           +11.0%     473334 ± 6% sched_debug.cpu_clk
>>      425355           +11.0%     472318 ± 6% sched_debug.ktime
>>      426826           +11.0%     473787 ± 6% sched_debug.sched_clk
>> 1.263e+09            -7.9% 1.164e+09 ± 3% perf-stat.i.branch-instructions
>>      190886 ± 5%     -10.8%     170290 ± 7% perf-stat.i.context-switches
>> 1.979e+09            -8.8% 1.804e+09 ± 2% perf-stat.i.dTLB-loads
>> 8.998e+08            -8.2% 8.257e+08 ± 2% perf-stat.i.dTLB-stores
>> 6.455e+09            -8.0% 5.938e+09 ± 3% perf-stat.i.instructions
>>       21.78            -8.4%      19.95        perf-stat.i.metric.M/sec
>>     7045315 ± 4%     -14.0%    6057863 ± 6% perf-stat.i.node-load-misses
>>     2658563 ± 7%     -21.9%    2077647 ± 12% perf-stat.i.node-loads
>>      414822 ± 4%     -12.9%     361455 ± 3% perf-stat.i.node-store-misses
>> 1.251e+09            -7.8% 1.154e+09 ± 3% perf-stat.ps.branch-instructions
>>      189082 ± 5%     -10.7%     168849 ± 7% perf-stat.ps.context-switches
>>    1.96e+09            -8.8% 1.789e+09 ± 2% perf-stat.ps.dTLB-loads
>> 8.912e+08            -8.1% 8.187e+08 ± 2% perf-stat.ps.dTLB-stores
>> 6.393e+09            -7.9% 5.888e+09 ± 3% perf-stat.ps.instructions
>>     6978485 ± 4%     -13.9%    6006510 ± 6% perf-stat.ps.node-load-misses
>>     2633627 ± 7%     -21.8%    2060033 ± 12% perf-stat.ps.node-loads
>>      410822 ± 4%     -12.8%     358289 ± 3% perf-stat.ps.node-store-misses
>>
>>
>> If you fix the issue, kindly add following tag
>> | Reported-by: kernel test robot <[email protected]>
>> | Link: https://lore.kernel.org/r/[email protected]
>>
>>
>> To reproduce:
>>
>>          git clone https://github.com/intel/lkp-tests.git
>>          cd lkp-tests
>>          sudo bin/lkp install job.yaml           # job file is attached in this email
>>          bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>>          sudo bin/lkp run generated-yaml-file
>>
>>          # if come across any failure that blocks the test,
>>          # please remove ~/.lkp and /lkp dir to run from a clean state.
>>
>>
>> Disclaimer:
>> Results have been estimated based on internal Intel analysis and are provided
>> for informational purposes only. Any difference in system hardware or software
>> design or configuration may affect actual performance.
>>
>>
> _______________________________________________
> LKP mailing list -- [email protected]
> To unsubscribe send an email to [email protected]

2022-10-09 09:16:08

by Ming Lei

[permalink] [raw]

Subject: Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression

On Sat, Oct 08, 2022 at 04:00:10PM +0800, Yu Kuai wrote:
> Hi,
>
> 在 2022/10/08 10:50, kernel test robot 写道:
> > Greeting,
> >
> > FYI, we noticed a -10.6% regression of fio.read_iops due to commit:
>
> I don't know how this is working but I'm *sure* this commit won't affect

Looks it is wrong to move

wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags));

before rq_qos_add() in wbt_init().

Without adding wbt rq_qos, wbt_set_write_cache is just a nop.

thanks,
Ming

2022-10-09 09:49:55

by Yu Kuai

[permalink] [raw]

Subject: Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression

Hi,

在 2022/10/09 16:43, Ming Lei 写道:
> On Sat, Oct 08, 2022 at 04:00:10PM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2022/10/08 10:50, kernel test robot 写道:
>>> Greeting,
>>>
>>> FYI, we noticed a -10.6% regression of fio.read_iops due to commit:
>>
>> I don't know how this is working but I'm *sure* this commit won't affect
>
> Looks it is wrong to move
>
> wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags));
>
> before rq_qos_add() in wbt_init().
>
> Without adding wbt rq_qos, wbt_set_write_cache is just a nop.

Yes, I got it now, I'm being foolish here.

I missed that "rwb->wc" is got by rq_qos in wbt_set_write_cache(), which
is NULL before rq_qos_add(). By the way, it's interesting that how read
performance is affected, I still don't know why yet...

Thanks,
Kuai
>
>
> thanks,
> Ming
>
> .
>

2022-10-13 08:02:18

by Feng Tang

[permalink] [raw]

Subject: Re: [blk] 8c5035dfbb: fio.read_iops -10.6% regression

On Sun, Oct 09, 2022 at 05:32:34PM +0800, Yu Kuai wrote:
> Hi,
>
> 在 2022/10/09 16:43, Ming Lei 写道:
> > On Sat, Oct 08, 2022 at 04:00:10PM +0800, Yu Kuai wrote:
> >> Hi,
> >>
> >> 在 2022/10/08 10:50, kernel test robot 写道:
> >>> Greeting,
> >>>
> >>> FYI, we noticed a -10.6% regression of fio.read_iops due to commit:
> >>
> >> I don't know how this is working but I'm *sure* this commit won't affect
> >
> > Looks it is wrong to move
> >
> > wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags));
> >
> > before rq_qos_add() in wbt_init().
> >
> > Without adding wbt rq_qos, wbt_set_write_cache is just a nop.
>
> Yes, I got it now, I'm being foolish here.
>
> I missed that "rwb->wc" is got by rq_qos in wbt_set_write_cache(), which
> is NULL before rq_qos_add(). By the way, it's interesting that how read
> performance is affected, I still don't know why yet...

Indeed, we are confused too. So we did some further check, and found
it could be related with the less calls of wake_up_all(), due to the
'rwb->wc' value changed.

I'm not familiar with the block layer and VFS, and just checked the
'blk-wbt.c'. Before commit 8c5035dfbb, the 'rwb->wc' is 0 in 0Day's
test env, while it's 1 after the commit.

Inside wbt_rqw_done(), 'rwb->wc' be used to judge whether to wakeup
other waiters in system, so we add some debug code to check the
wakeup and skip-wakeup counter:

----------------------------------------------------------------
@@ -130,6 +130,8 @@ static void wbt_rqw_done(struct rq_wb *rwb, struct rq_wait *rqw,
/*
* Don't wake anyone up if we are above the normal limit.
*/
- if (inflight && inflight >= limit)
+ if (inflight && inflight >= limit) {
+ skip_wakeup++;
return;
+ }

if (wq_has_sleeper(&rqw->wait)) {
int diff = limit - inflight;

- if (!inflight || diff >= rwb->wb_background / 2)
+ if (!inflight || diff >= rwb->wb_background / 2) {
+ wakeup++;
wake_up_all(&rqw->wait);
+ }
----------------------------------------------------------------

And after the fio task, the 'skip_wakeup' number is much bigger
after the patch:

before patch:
422.274394: wbt_rqw_done: wakeup_skip=19408 wakup_all=1944759

after patch:
433.753345: wbt_rqw_done: wakeup_skip=2090585 wakup_all=13630

Hope this can help the root causing.

Thanks,
Feng