2024-03-06 18:46:44

by Harshit Mogalapalli

[permalink] [raw]
Subject: [bug-report] Performance regression with fio sequential-write on a multipath setup.

Hi,

We have noticed a performance regression in kernel with fio sequential
write job.

Notes and observations:
======================
1. This is observed on recent kernels(6.6) when compared with 5.15.y,
the bisection points to commit d47f9717e5cf ("block/mq-deadline: use
correct way to throttling write requests")
2. Reverting the above commit improves the performance.
3. This regression can also be seen on 6.8-rc7 and a revert on top of
that fixes the regression.
4. The commit looks very much related to the cause of regression.
5. Note that this happens only with multi-path setup even with 2 block
devices.

Test details:
============
(A) fio.write job

fio-3.19 -- fio version

[global]
ioengine=libaio
rw=write
bs=128k
iodepth=64
numjobs=24
direct=1
fsync=1
runtime=600
group_reporting

[job]
filename=/dev/dm-0
[job]
filename=/dev/dm-1

Each disk is of 600G size.

(B) Test results

6.8-rc7: 2 block devices with multi-path
-------

job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T)
128KiB-128KiB, ioengine=libaio, iodepth=64
..
job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T)
128KiB-128KiB, ioengine=libaio, iodepth=64
..
fio-3.19
Starting 48 processes

job: (groupid=0, jobs=48): err= 0: pid=6164: Wed Mar 6 17:58:33 2024
write: IOPS=1884, BW=236MiB/s (247MB/s)(138GiB/600319msec); 0 zone resets
slat (usec): min=2, max=540462, avg=25445.35, stdev=24181.85
clat (msec): min=9, max=4941, avg=1602.56, stdev=339.05
lat (msec): min=9, max=4973, avg=1628.00, stdev=342.19
clat percentiles (msec):
| 1.00th=[ 986], 5.00th=[ 1167], 10.00th=[ 1250], 20.00th=[ 1368],
| 30.00th=[ 1435], 40.00th=[ 1502], 50.00th=[ 1569], 60.00th=[ 1636],
| 70.00th=[ 1703], 80.00th=[ 1804], 90.00th=[ 1955], 95.00th=[ 2140],
| 99.00th=[ 2869], 99.50th=[ 3239], 99.90th=[ 3842], 99.95th=[ 4010],
| 99.99th=[ 4329]
bw ( KiB/s): min=47229, max=516492, per=100.00%, avg=241546.47,
stdev=1326.92, samples=57259
iops : min= 322, max= 3996, avg=1843.17, stdev=10.39,
samples=57259
lat (msec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=0.02%
lat (msec) : 500=0.06%, 750=0.14%, 1000=0.93%, 2000=90.41%,
>=2000=8.42%
fsync/fdatasync/sync_file_range:
sync (nsec): min=10, max=57940, avg=104.23, stdev=498.86
sync percentiles (nsec):
| 1.00th=[ 13], 5.00th=[ 19], 10.00th=[ 26], 20.00th=[ 61],
| 30.00th=[ 68], 40.00th=[ 72], 50.00th=[ 75], 60.00th=[ 78],
| 70.00th=[ 87], 80.00th=[ 167], 90.00th=[ 175], 95.00th=[ 177],
| 99.00th=[ 221], 99.50th=[ 231], 99.90th=[ 318], 99.95th=[15680],
| 99.99th=[17792]
cpu : usr=0.08%, sys=0.16%, ctx=1096948, majf=0, minf=1995
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
>=64=199.5%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%,
>=64=0.0%
issued rwts: total=0,1131018,0,1127994 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
WRITE: bw=236MiB/s (247MB/s), 236MiB/s-236MiB/s (247MB/s-247MB/s),
io=138GiB (148GB), run=600319-600319msec

Disk stats (read/write):
dm-0: ios=50/533034, merge=0/27056, ticks=16/113070163,
in_queue=113070180, util=100.00%, aggrios=43/266595, aggrmerge=0/0,
aggrticks=156/56542549, aggrin_queue=56542706, aggrutil=100.00%
sdac: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sde: ios=86/533191, merge=0/0, ticks=313/113085099,
in_queue=113085413, util=100.00%
dm-1: ios=5/534381, merge=0/36389, ticks=240/113110344,
in_queue=113110584, util=100.00%, aggrios=7/267191, aggrmerge=0/0,
aggrticks=153/56543654, aggrin_queue=56543807, aggrutil=100.00%
sdf: ios=14/534382, merge=0/0, ticks=306/113087308,
in_queue=113087615, util=100.00%
sdad: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

Throughput Results:
WRITE:247:1884:0


6.8-rc7+ Revert : 2 block devices with multi-path
-------

job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T)
128KiB-128KiB, ioengine=libaio, iodepth=64
..
job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T)
128KiB-128KiB, ioengine=libaio, iodepth=64
..
fio-3.19
Starting 48 processes

job: (groupid=0, jobs=48): err= 0: pid=6104: Wed Mar 6 18:29:13 2024
write: IOPS=2518, BW=315MiB/s (330MB/s)(185GiB/600339msec); 0 zone resets
slat (usec): min=2, max=923472, avg=6789.22, stdev=20329.20
clat (msec): min=4, max=6020, avg=1212.68, stdev=714.90
lat (msec): min=4, max=6020, avg=1219.47, stdev=718.40
clat percentiles (msec):
| 1.00th=[ 203], 5.00th=[ 309], 10.00th=[ 384], 20.00th=[ 535],
| 30.00th=[ 709], 40.00th=[ 911], 50.00th=[ 1133], 60.00th=[ 1334],
| 70.00th=[ 1519], 80.00th=[ 1754], 90.00th=[ 2198], 95.00th=[ 2601],
| 99.00th=[ 3171], 99.50th=[ 3608], 99.90th=[ 4329], 99.95th=[ 4597],
| 99.99th=[ 5134]
bw ( KiB/s): min=12237, max=1834896, per=100.00%, avg=413187.52,
stdev=6322.04, samples=44948
iops : min= 48, max=14314, avg=3186.68, stdev=49.49,
samples=44948
lat (msec) : 10=0.01%, 20=0.01%, 50=0.09%, 100=0.02%, 250=2.28%
lat (msec) : 500=15.45%, 750=14.26%, 1000=11.83%, 2000=42.52%,
>=2000=13.55%
fsync/fdatasync/sync_file_range:
sync (nsec): min=10, max=76066, avg=57.85, stdev=299.52
sync percentiles (nsec):
| 1.00th=[ 13], 5.00th=[ 14], 10.00th=[ 15], 20.00th=[ 16],
| 30.00th=[ 17], 40.00th=[ 20], 50.00th=[ 28], 60.00th=[ 47],
| 70.00th=[ 65], 80.00th=[ 80], 90.00th=[ 103], 95.00th=[ 175],
| 99.00th=[ 237], 99.50th=[ 241], 99.90th=[ 262], 99.95th=[ 318],
| 99.99th=[16512]
cpu : usr=0.06%, sys=0.07%, ctx=531434, majf=0, minf=728
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
>=64=199.6%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%,
>=64=0.0%
issued rwts: total=0,1511918,0,1508894 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
WRITE: bw=315MiB/s (330MB/s), 315MiB/s-315MiB/s (330MB/s-330MB/s),
io=185GiB (198GB), run=600339-600339msec

Disk stats (read/write):
dm-0: ios=0/246318, merge=0/493981, ticks=0/142584585,
in_queue=142584586, util=99.17%, aggrios=6/181454, aggrmerge=0/0,
aggrticks=112/70608689, aggrin_queue=70608801, aggrutil=84.92%
sdac: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sde: ios=12/362908, merge=0/0, ticks=224/141217379,
in_queue=141217603, util=84.92%
dm-1: ios=0/233211, merge=0/538097, ticks=0/142579042,
in_queue=142579043, util=99.15%, aggrios=8/174475, aggrmerge=0/0,
aggrticks=128/70654686, aggrin_queue=70654814, aggrutil=85.20%
sdf: ios=16/348951, merge=0/0, ticks=256/141309372,
in_queue=141309628, util=85.20%
sdad: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

Throughput Results:
WRITE:330:2518:0

(C) performance difference:

That is roughly a 33.65% performance change, this is reproducible on
higher number of block devices as well.



Thanks to Paul Webb for identifying this regression and sharing the details.
We will be happy to test any patches to check the change in performance
and also follow any suggestions.


Thanks,
Harshit


2024-03-07 02:56:33

by Zhiguo Niu

[permalink] [raw]
Subject: 答复: [bug-report] Performance regression w ith fio sequential-write on a multipath setup.

Hi Harshit Mogalapalli

What is the queue_depth of queue of your storage device?
In the same test conditions, what are the the results of sequential reading?

Thanks!
-----邮件原件-----
发件人: Harshit Mogalapalli <[email protected]>
发送时间: 2024年3月7日 2:46
收件人: 牛志国 (Zhiguo Niu) <[email protected]>; [email protected]; Jens Axboe <[email protected]>; [email protected]
抄送: LKML <[email protected]>; Ramanan Govindarajan <[email protected]>; Paul Webb <[email protected]>; [email protected]
主题: [bug-report] Performance regression with fio sequential-write on a multipath setup.


注意: 这封邮件来自于外部。除非你确定邮件内容安全,否则不要点击任何链接和附件。
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.



Hi,

We have noticed a performance regression in kernel with fio sequential write job.

Notes and observations:
======================
1. This is observed on recent kernels(6.6) when compared with 5.15.y, the bisection points to commit d47f9717e5cf ("block/mq-deadline: use correct way to throttling write requests") 2. Reverting the above commit improves the performance.
3. This regression can also be seen on 6.8-rc7 and a revert on top of that fixes the regression.
4. The commit looks very much related to the cause of regression.
5. Note that this happens only with multi-path setup even with 2 block devices.

Test details:
============
(A) fio.write job

fio-3.19 -- fio version

[global]
ioengine=libaio
rw=write
bs=128k
iodepth=64
numjobs=24
direct=1
fsync=1
runtime=600
group_reporting

[job]
filename=/dev/dm-0
[job]
filename=/dev/dm-1

Each disk is of 600G size.

(B) Test results

6.8-rc7: 2 block devices with multi-path
-------

job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=64 ...
job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=64 ...
fio-3.19
Starting 48 processes

job: (groupid=0, jobs=48): err= 0: pid=6164: Wed Mar 6 17:58:33 2024
write: IOPS=1884, BW=236MiB/s (247MB/s)(138GiB/600319msec); 0 zone resets
slat (usec): min=2, max=540462, avg=25445.35, stdev=24181.85
clat (msec): min=9, max=4941, avg=1602.56, stdev=339.05
lat (msec): min=9, max=4973, avg=1628.00, stdev=342.19
clat percentiles (msec):
| 1.00th=[ 986], 5.00th=[ 1167], 10.00th=[ 1250], 20.00th=[ 1368],
| 30.00th=[ 1435], 40.00th=[ 1502], 50.00th=[ 1569], 60.00th=[ 1636],
| 70.00th=[ 1703], 80.00th=[ 1804], 90.00th=[ 1955], 95.00th=[ 2140],
| 99.00th=[ 2869], 99.50th=[ 3239], 99.90th=[ 3842], 99.95th=[ 4010],
| 99.99th=[ 4329]
bw ( KiB/s): min=47229, max=516492, per=100.00%, avg=241546.47, stdev=1326.92, samples=57259
iops : min= 322, max= 3996, avg=1843.17, stdev=10.39,
samples=57259
lat (msec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=0.02%
lat (msec) : 500=0.06%, 750=0.14%, 1000=0.93%, 2000=90.41%,
>=2000=8.42%
fsync/fdatasync/sync_file_range:
sync (nsec): min=10, max=57940, avg=104.23, stdev=498.86
sync percentiles (nsec):
| 1.00th=[ 13], 5.00th=[ 19], 10.00th=[ 26], 20.00th=[ 61],
| 30.00th=[ 68], 40.00th=[ 72], 50.00th=[ 75], 60.00th=[ 78],
| 70.00th=[ 87], 80.00th=[ 167], 90.00th=[ 175], 95.00th=[ 177],
| 99.00th=[ 221], 99.50th=[ 231], 99.90th=[ 318], 99.95th=[15680],
| 99.99th=[17792]
cpu : usr=0.08%, sys=0.16%, ctx=1096948, majf=0, minf=1995
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
>=64=199.5%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=0,1131018,0,1127994 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
WRITE: bw=236MiB/s (247MB/s), 236MiB/s-236MiB/s (247MB/s-247MB/s), io=138GiB (148GB), run=600319-600319msec

Disk stats (read/write):
dm-0: ios=50/533034, merge=0/27056, ticks=16/113070163, in_queue=113070180, util=100.00%, aggrios=43/266595, aggrmerge=0/0, aggrticks=156/56542549, aggrin_queue=56542706, aggrutil=100.00%
sdac: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sde: ios=86/533191, merge=0/0, ticks=313/113085099, in_queue=113085413, util=100.00%
dm-1: ios=5/534381, merge=0/36389, ticks=240/113110344, in_queue=113110584, util=100.00%, aggrios=7/267191, aggrmerge=0/0, aggrticks=153/56543654, aggrin_queue=56543807, aggrutil=100.00%
sdf: ios=14/534382, merge=0/0, ticks=306/113087308, in_queue=113087615, util=100.00%
sdad: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

Throughput Results:
WRITE:247:1884:0


6.8-rc7+ Revert : 2 block devices with multi-path
-------

job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=64 ...
job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=64 ...
fio-3.19
Starting 48 processes

job: (groupid=0, jobs=48): err= 0: pid=6104: Wed Mar 6 18:29:13 2024
write: IOPS=2518, BW=315MiB/s (330MB/s)(185GiB/600339msec); 0 zone resets
slat (usec): min=2, max=923472, avg=6789.22, stdev=20329.20
clat (msec): min=4, max=6020, avg=1212.68, stdev=714.90
lat (msec): min=4, max=6020, avg=1219.47, stdev=718.40
clat percentiles (msec):
| 1.00th=[ 203], 5.00th=[ 309], 10.00th=[ 384], 20.00th=[ 535],
| 30.00th=[ 709], 40.00th=[ 911], 50.00th=[ 1133], 60.00th=[ 1334],
| 70.00th=[ 1519], 80.00th=[ 1754], 90.00th=[ 2198], 95.00th=[ 2601],
| 99.00th=[ 3171], 99.50th=[ 3608], 99.90th=[ 4329], 99.95th=[ 4597],
| 99.99th=[ 5134]
bw ( KiB/s): min=12237, max=1834896, per=100.00%, avg=413187.52, stdev=6322.04, samples=44948
iops : min= 48, max=14314, avg=3186.68, stdev=49.49,
samples=44948
lat (msec) : 10=0.01%, 20=0.01%, 50=0.09%, 100=0.02%, 250=2.28%
lat (msec) : 500=15.45%, 750=14.26%, 1000=11.83%, 2000=42.52%,
>=2000=13.55%
fsync/fdatasync/sync_file_range:
sync (nsec): min=10, max=76066, avg=57.85, stdev=299.52
sync percentiles (nsec):
| 1.00th=[ 13], 5.00th=[ 14], 10.00th=[ 15], 20.00th=[ 16],
| 30.00th=[ 17], 40.00th=[ 20], 50.00th=[ 28], 60.00th=[ 47],
| 70.00th=[ 65], 80.00th=[ 80], 90.00th=[ 103], 95.00th=[ 175],
| 99.00th=[ 237], 99.50th=[ 241], 99.90th=[ 262], 99.95th=[ 318],
| 99.99th=[16512]
cpu : usr=0.06%, sys=0.07%, ctx=531434, majf=0, minf=728
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
>=64=199.6%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=0,1511918,0,1508894 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
WRITE: bw=315MiB/s (330MB/s), 315MiB/s-315MiB/s (330MB/s-330MB/s), io=185GiB (198GB), run=600339-600339msec

Disk stats (read/write):
dm-0: ios=0/246318, merge=0/493981, ticks=0/142584585, in_queue=142584586, util=99.17%, aggrios=6/181454, aggrmerge=0/0, aggrticks=112/70608689, aggrin_queue=70608801, aggrutil=84.92%
sdac: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sde: ios=12/362908, merge=0/0, ticks=224/141217379, in_queue=141217603, util=84.92%
dm-1: ios=0/233211, merge=0/538097, ticks=0/142579042, in_queue=142579043, util=99.15%, aggrios=8/174475, aggrmerge=0/0, aggrticks=128/70654686, aggrin_queue=70654814, aggrutil=85.20%
sdf: ios=16/348951, merge=0/0, ticks=256/141309372, in_queue=141309628, util=85.20%
sdad: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

Throughput Results:
WRITE:330:2518:0

(C) performance difference:

That is roughly a 33.65% performance change, this is reproducible on higher number of block devices as well.



Thanks to Paul Webb for identifying this regression and sharing the details.
We will be happy to test any patches to check the change in performance and also follow any suggestions.


Thanks,
Harshit

2024-03-12 10:18:21

by Harshit Mogalapalli

[permalink] [raw]
Subject: Re: 答复: [bug-report] Performance regre ssion with fio sequential-write on a multipath setup.

Hi Zhiguo,


On 07/03/24 08:25, 牛志国 (Zhiguo Niu) wrote:
> Hi Harshit Mogalapalli
>
> What is the queue_depth of queue of your storage device?
> In the same test conditions, what are the the results of sequential reading?
>

Thanks for the response.

Queue depth of the storage device is 254.

And here are sequential read data:

6.8-rc7: 2 block devices with multi-path:
----------------------------------------
Run status group 0 (all jobs):
READ: bw=448MiB/s (470MB/s), 448MiB/s-448MiB/s (470MB/s-470MB/s),
io=263GiB (282GB), run=600311-600311msec

Disk stats (read/write):
dm-1: ios=418480/0, merge=642066/0, ticks=143492597/0,
in_queue=143492597, util=98.28%, aggrios=287904/0, aggrmerge=0/0,
aggrticks=71063414/0, aggrin_queue=71063414, aggrutil=86.71%
sdf: ios=575809/0, merge=0/0, ticks=142126829/0, in_queue=142126829,
util=86.71%
sdad: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
dm-12: ios=422296/0, merge=667474/0, ticks=143680598/0,
in_queue=143680598, util=98.95%, aggrios=288787/0, aggrmerge=0/0,
aggrticks=71153453/0, aggrin_queue=71153453, aggrutil=86.72%
sdae: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sdg: ios=577574/0, merge=0/0, ticks=142306906/0, in_queue=142306906,
util=86.72%

Throughput Results:
READ:470:3582:0



6.8-rc7+ Revert : 2 block devices with multi-path:
-------------------------------------------------
Run status group 0 (all jobs):
READ: bw=462MiB/s (484MB/s), 462MiB/s-462MiB/s (484MB/s-484MB/s),
io=271GiB (291GB), run=600298-600298msec

Disk stats (read/write):
dm-1: ios=421574/0, merge=692148/0, ticks=143444547/0,
in_queue=143444547, util=99.19%, aggrios=288316/0, aggrmerge=0/0,
aggrticks=71080370/0, aggrin_queue=71080370, aggrutil=87.08%
sdf: ios=576633/0, merge=0/0, ticks=142160740/0, in_queue=142160740,
util=87.08%
sdad: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
dm-12: ios=432589/0, merge=672001/0, ticks=142976262/0,
in_queue=142976262, util=99.03%, aggrios=293051/0, aggrmerge=0/0,
aggrticks=70886007/0, aggrin_queue=70886007, aggrutil=87.03%
sdae: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sdg: ios=586102/0, merge=0/0, ticks=141772015/0, in_queue=141772015,
util=87.03%

Throughput Results:
READ:484:3695:0


On an average over 4 iterations:

on 6.8-rc7 : 3571
on 6.8-rc7 + revert : 3634

Almost there is no regression on sequential-read while there is a
significant regression in sequential write


Thanks,
Harshit
> Thanks!
> -----邮件原件-----
> 发件人: Harshit Mogalapalli <[email protected]>
> 发送时间: 2024年3月7日 2:46
> 收件人: 牛志国 (Zhiguo Niu) <[email protected]>; [email protected]; Jens Axboe <[email protected]>; [email protected]
> 抄送: LKML <[email protected]>; Ramanan Govindarajan <[email protected]>; Paul Webb <[email protected]>; [email protected]
> 主题: [bug-report] Performance regression with fio sequential-write on a multipath setup.
>
>
> 注意: 这封邮件来自于外部。除非你确定邮件内容安全,否则不要点击任何链接和附件。
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
>
> Hi,
>
> We have noticed a performance regression in kernel with fio sequential write job.
>
> Notes and observations:
> ======================
> 1. This is observed on recent kernels(6.6) when compared with 5.15.y, the bisection points to commit d47f9717e5cf ("block/mq-deadline: use correct way to throttling write requests") 2. Reverting the above commit improves the performance.
> 3. This regression can also be seen on 6.8-rc7 and a revert on top of that fixes the regression.
> 4. The commit looks very much related to the cause of regression.
> 5. Note that this happens only with multi-path setup even with 2 block devices.
>
> Test details:
> ============
> (A) fio.write job
>
> fio-3.19 -- fio version
>
> [global]
> ioengine=libaio
> rw=write
> bs=128k
> iodepth=64
> numjobs=24
> direct=1
> fsync=1
> runtime=600
> group_reporting
>
> [job]
> filename=/dev/dm-0
> [job]
> filename=/dev/dm-1
>
> Each disk is of 600G size.
>
> (B) Test results
>
> 6.8-rc7: 2 block devices with multi-path
> -------
>
> job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=64 ...
> job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=64 ...
> fio-3.19
> Starting 48 processes
>
> job: (groupid=0, jobs=48): err= 0: pid=6164: Wed Mar 6 17:58:33 2024
> write: IOPS=1884, BW=236MiB/s (247MB/s)(138GiB/600319msec); 0 zone resets
> slat (usec): min=2, max=540462, avg=25445.35, stdev=24181.85
> clat (msec): min=9, max=4941, avg=1602.56, stdev=339.05
> lat (msec): min=9, max=4973, avg=1628.00, stdev=342.19
> clat percentiles (msec):
> | 1.00th=[ 986], 5.00th=[ 1167], 10.00th=[ 1250], 20.00th=[ 1368],
> | 30.00th=[ 1435], 40.00th=[ 1502], 50.00th=[ 1569], 60.00th=[ 1636],
> | 70.00th=[ 1703], 80.00th=[ 1804], 90.00th=[ 1955], 95.00th=[ 2140],
> | 99.00th=[ 2869], 99.50th=[ 3239], 99.90th=[ 3842], 99.95th=[ 4010],
> | 99.99th=[ 4329]
> bw ( KiB/s): min=47229, max=516492, per=100.00%, avg=241546.47, stdev=1326.92, samples=57259
> iops : min= 322, max= 3996, avg=1843.17, stdev=10.39,
> samples=57259
> lat (msec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=0.02%
> lat (msec) : 500=0.06%, 750=0.14%, 1000=0.93%, 2000=90.41%,
> >=2000=8.42%
> fsync/fdatasync/sync_file_range:
> sync (nsec): min=10, max=57940, avg=104.23, stdev=498.86
> sync percentiles (nsec):
> | 1.00th=[ 13], 5.00th=[ 19], 10.00th=[ 26], 20.00th=[ 61],
> | 30.00th=[ 68], 40.00th=[ 72], 50.00th=[ 75], 60.00th=[ 78],
> | 70.00th=[ 87], 80.00th=[ 167], 90.00th=[ 175], 95.00th=[ 177],
> | 99.00th=[ 221], 99.50th=[ 231], 99.90th=[ 318], 99.95th=[15680],
> | 99.99th=[17792]
> cpu : usr=0.08%, sys=0.16%, ctx=1096948, majf=0, minf=1995
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
> >=64=199.5%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
> issued rwts: total=0,1131018,0,1127994 short=0,0,0,0 dropped=0,0,0,0
> latency : target=0, window=0, percentile=100.00%, depth=64
>
> Run status group 0 (all jobs):
> WRITE: bw=236MiB/s (247MB/s), 236MiB/s-236MiB/s (247MB/s-247MB/s), io=138GiB (148GB), run=600319-600319msec
>
> Disk stats (read/write):
> dm-0: ios=50/533034, merge=0/27056, ticks=16/113070163, in_queue=113070180, util=100.00%, aggrios=43/266595, aggrmerge=0/0, aggrticks=156/56542549, aggrin_queue=56542706, aggrutil=100.00%
> sdac: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
> sde: ios=86/533191, merge=0/0, ticks=313/113085099, in_queue=113085413, util=100.00%
> dm-1: ios=5/534381, merge=0/36389, ticks=240/113110344, in_queue=113110584, util=100.00%, aggrios=7/267191, aggrmerge=0/0, aggrticks=153/56543654, aggrin_queue=56543807, aggrutil=100.00%
> sdf: ios=14/534382, merge=0/0, ticks=306/113087308, in_queue=113087615, util=100.00%
> sdad: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
>
> Throughput Results:
> WRITE:247:1884:0
>
>
> 6.8-rc7+ Revert : 2 block devices with multi-path
> -------
>
> job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=64 ...
> job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=64 ...
> fio-3.19
> Starting 48 processes
>
> job: (groupid=0, jobs=48): err= 0: pid=6104: Wed Mar 6 18:29:13 2024
> write: IOPS=2518, BW=315MiB/s (330MB/s)(185GiB/600339msec); 0 zone resets
> slat (usec): min=2, max=923472, avg=6789.22, stdev=20329.20
> clat (msec): min=4, max=6020, avg=1212.68, stdev=714.90
> lat (msec): min=4, max=6020, avg=1219.47, stdev=718.40
> clat percentiles (msec):
> | 1.00th=[ 203], 5.00th=[ 309], 10.00th=[ 384], 20.00th=[ 535],
> | 30.00th=[ 709], 40.00th=[ 911], 50.00th=[ 1133], 60.00th=[ 1334],
> | 70.00th=[ 1519], 80.00th=[ 1754], 90.00th=[ 2198], 95.00th=[ 2601],
> | 99.00th=[ 3171], 99.50th=[ 3608], 99.90th=[ 4329], 99.95th=[ 4597],
> | 99.99th=[ 5134]
> bw ( KiB/s): min=12237, max=1834896, per=100.00%, avg=413187.52, stdev=6322.04, samples=44948
> iops : min= 48, max=14314, avg=3186.68, stdev=49.49,
> samples=44948
> lat (msec) : 10=0.01%, 20=0.01%, 50=0.09%, 100=0.02%, 250=2.28%
> lat (msec) : 500=15.45%, 750=14.26%, 1000=11.83%, 2000=42.52%,
> >=2000=13.55%
> fsync/fdatasync/sync_file_range:
> sync (nsec): min=10, max=76066, avg=57.85, stdev=299.52
> sync percentiles (nsec):
> | 1.00th=[ 13], 5.00th=[ 14], 10.00th=[ 15], 20.00th=[ 16],
> | 30.00th=[ 17], 40.00th=[ 20], 50.00th=[ 28], 60.00th=[ 47],
> | 70.00th=[ 65], 80.00th=[ 80], 90.00th=[ 103], 95.00th=[ 175],
> | 99.00th=[ 237], 99.50th=[ 241], 99.90th=[ 262], 99.95th=[ 318],
> | 99.99th=[16512]
> cpu : usr=0.06%, sys=0.07%, ctx=531434, majf=0, minf=728
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
> >=64=199.6%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
> issued rwts: total=0,1511918,0,1508894 short=0,0,0,0 dropped=0,0,0,0
> latency : target=0, window=0, percentile=100.00%, depth=64
>
> Run status group 0 (all jobs):
> WRITE: bw=315MiB/s (330MB/s), 315MiB/s-315MiB/s (330MB/s-330MB/s), io=185GiB (198GB), run=600339-600339msec
>
> Disk stats (read/write):
> dm-0: ios=0/246318, merge=0/493981, ticks=0/142584585, in_queue=142584586, util=99.17%, aggrios=6/181454, aggrmerge=0/0, aggrticks=112/70608689, aggrin_queue=70608801, aggrutil=84.92%
> sdac: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
> sde: ios=12/362908, merge=0/0, ticks=224/141217379, in_queue=141217603, util=84.92%
> dm-1: ios=0/233211, merge=0/538097, ticks=0/142579042, in_queue=142579043, util=99.15%, aggrios=8/174475, aggrmerge=0/0, aggrticks=128/70654686, aggrin_queue=70654814, aggrutil=85.20%
> sdf: ios=16/348951, merge=0/0, ticks=256/141309372, in_queue=141309628, util=85.20%
> sdad: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
>
> Throughput Results:
> WRITE:330:2518:0
>
> (C) performance difference:
>
> That is roughly a 33.65% performance change, this is reproducible on higher number of block devices as well.
>
>
>
> Thanks to Paul Webb for identifying this regression and sharing the details.
> We will be happy to test any patches to check the change in performance and also follow any suggestions.
>
>
> Thanks,
> Harshit


2024-03-13 10:31:40

by Zhiguo Niu

[permalink] [raw]
Subject: 答复: 答复: [bug-report] Performance regr ession with fio sequential-write on a multipa th setup.

Hi Harshit
This patch did not modify some important limit mechanisms of the block layer, just corrects the depth limit method of the deadline, because the original method is not correct. In fact, other io scheduler also have limit mechanism, such as bfq, kyber etc.
I think that the deadline depth limit mechanism is to limit write requests when the amount of IO is large, and give priority to providing resources to reading, avoid blocking on reading for too long because reading is synchronous. My test results are also consistent with this.
IMO if you want to solve this seq write performance regression problem, you need to see if the place where limit depth is called can be bypassed? Or see if maintainer experts have any suggestions?
Thanks!

-----邮件原件-----
发件人: Harshit Mogalapalli <[email protected]>
发送时间: 2024年3月12日 18:18
收件人: 牛志国 (Zhiguo Niu) <[email protected]>; [email protected]; Jens Axboe <[email protected]>; [email protected]
抄送: LKML <[email protected]>; Ramanan Govindarajan <[email protected]>; Paul Webb <[email protected]>; [email protected]; 邢云龙 (Yunlong Xing) <[email protected]>; 金红宇 (Hongyu Jin) <[email protected]>; Darren Kenny <[email protected]>
主题: Re: 答复: [bug-report] Performance regression with fio sequential-write on a multipath setup.


注意: 这封邮件来自于外部。除非你确定邮件内容安全,否则不要点击任何链接和附件。
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.



Hi Zhiguo,


On 07/03/24 08:25, 牛志国 (Zhiguo Niu) wrote:
> Hi Harshit Mogalapalli
>
> What is the queue_depth of queue of your storage device?
> In the same test conditions, what are the the results of sequential reading?
>

Thanks for the response.

Queue depth of the storage device is 254.

And here are sequential read data:

6.8-rc7: 2 block devices with multi-path:
----------------------------------------
Run status group 0 (all jobs):
READ: bw=448MiB/s (470MB/s), 448MiB/s-448MiB/s (470MB/s-470MB/s), io=263GiB (282GB), run=600311-600311msec

Disk stats (read/write):
dm-1: ios=418480/0, merge=642066/0, ticks=143492597/0, in_queue=143492597, util=98.28%, aggrios=287904/0, aggrmerge=0/0, aggrticks=71063414/0, aggrin_queue=71063414, aggrutil=86.71%
sdf: ios=575809/0, merge=0/0, ticks=142126829/0, in_queue=142126829, util=86.71%
sdad: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
dm-12: ios=422296/0, merge=667474/0, ticks=143680598/0, in_queue=143680598, util=98.95%, aggrios=288787/0, aggrmerge=0/0, aggrticks=71153453/0, aggrin_queue=71153453, aggrutil=86.72%
sdae: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sdg: ios=577574/0, merge=0/0, ticks=142306906/0, in_queue=142306906, util=86.72%

Throughput Results:
READ:470:3582:0



6.8-rc7+ Revert : 2 block devices with multi-path:
-------------------------------------------------
Run status group 0 (all jobs):
READ: bw=462MiB/s (484MB/s), 462MiB/s-462MiB/s (484MB/s-484MB/s), io=271GiB (291GB), run=600298-600298msec

Disk stats (read/write):
dm-1: ios=421574/0, merge=692148/0, ticks=143444547/0, in_queue=143444547, util=99.19%, aggrios=288316/0, aggrmerge=0/0, aggrticks=71080370/0, aggrin_queue=71080370, aggrutil=87.08%
sdf: ios=576633/0, merge=0/0, ticks=142160740/0, in_queue=142160740, util=87.08%
sdad: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
dm-12: ios=432589/0, merge=672001/0, ticks=142976262/0, in_queue=142976262, util=99.03%, aggrios=293051/0, aggrmerge=0/0, aggrticks=70886007/0, aggrin_queue=70886007, aggrutil=87.03%
sdae: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
sdg: ios=586102/0, merge=0/0, ticks=141772015/0, in_queue=141772015, util=87.03%

Throughput Results:
READ:484:3695:0


On an average over 4 iterations:

on 6.8-rc7 : 3571
on 6.8-rc7 + revert : 3634

Almost there is no regression on sequential-read while there is a significant regression in sequential write


Thanks,
Harshit
> Thanks!
> -----邮件原件-----
> 发件人: Harshit Mogalapalli <[email protected]>
> 发送时间: 2024年3月7日 2:46
> 收件人: 牛志国 (Zhiguo Niu) <[email protected]>; [email protected];
> Jens Axboe <[email protected]>; [email protected]
> 抄送: LKML <[email protected]>; Ramanan Govindarajan
> <[email protected]>; Paul Webb <[email protected]>;
> [email protected]
> 主题: [bug-report] Performance regression with fio sequential-write on a multipath setup.
>
>
> 注意: 这封邮件来自于外部。除非你确定邮件内容安全,否则不要点击任何链接和附件。
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
>
> Hi,
>
> We have noticed a performance regression in kernel with fio sequential write job.
>
> Notes and observations:
> ======================
> 1. This is observed on recent kernels(6.6) when compared with 5.15.y, the bisection points to commit d47f9717e5cf ("block/mq-deadline: use correct way to throttling write requests") 2. Reverting the above commit improves the performance.
> 3. This regression can also be seen on 6.8-rc7 and a revert on top of that fixes the regression.
> 4. The commit looks very much related to the cause of regression.
> 5. Note that this happens only with multi-path setup even with 2 block devices.
>
> Test details:
> ============
> (A) fio.write job
>
> fio-3.19 -- fio version
>
> [global]
> ioengine=libaio
> rw=write
> bs=128k
> iodepth=64
> numjobs=24
> direct=1
> fsync=1
> runtime=600
> group_reporting
>
> [job]
> filename=/dev/dm-0
> [job]
> filename=/dev/dm-1
>
> Each disk is of 600G size.
>
> (B) Test results
>
> 6.8-rc7: 2 block devices with multi-path
> -------
>
> job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=64 ...
> job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=64 ...
> fio-3.19
> Starting 48 processes
>
> job: (groupid=0, jobs=48): err= 0: pid=6164: Wed Mar 6 17:58:33 2024
> write: IOPS=1884, BW=236MiB/s (247MB/s)(138GiB/600319msec); 0 zone resets
> slat (usec): min=2, max=540462, avg=25445.35, stdev=24181.85
> clat (msec): min=9, max=4941, avg=1602.56, stdev=339.05
> lat (msec): min=9, max=4973, avg=1628.00, stdev=342.19
> clat percentiles (msec):
> | 1.00th=[ 986], 5.00th=[ 1167], 10.00th=[ 1250], 20.00th=[ 1368],
> | 30.00th=[ 1435], 40.00th=[ 1502], 50.00th=[ 1569], 60.00th=[ 1636],
> | 70.00th=[ 1703], 80.00th=[ 1804], 90.00th=[ 1955], 95.00th=[ 2140],
> | 99.00th=[ 2869], 99.50th=[ 3239], 99.90th=[ 3842], 99.95th=[ 4010],
> | 99.99th=[ 4329]
> bw ( KiB/s): min=47229, max=516492, per=100.00%, avg=241546.47, stdev=1326.92, samples=57259
> iops : min= 322, max= 3996, avg=1843.17, stdev=10.39,
> samples=57259
> lat (msec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=0.02%
> lat (msec) : 500=0.06%, 750=0.14%, 1000=0.93%, 2000=90.41%,
> >=2000=8.42%
> fsync/fdatasync/sync_file_range:
> sync (nsec): min=10, max=57940, avg=104.23, stdev=498.86
> sync percentiles (nsec):
> | 1.00th=[ 13], 5.00th=[ 19], 10.00th=[ 26], 20.00th=[ 61],
> | 30.00th=[ 68], 40.00th=[ 72], 50.00th=[ 75], 60.00th=[ 78],
> | 70.00th=[ 87], 80.00th=[ 167], 90.00th=[ 175], 95.00th=[ 177],
> | 99.00th=[ 221], 99.50th=[ 231], 99.90th=[ 318], 99.95th=[15680],
> | 99.99th=[17792]
> cpu : usr=0.08%, sys=0.16%, ctx=1096948, majf=0, minf=1995
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
> >=64=199.5%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
> issued rwts: total=0,1131018,0,1127994 short=0,0,0,0 dropped=0,0,0,0
> latency : target=0, window=0, percentile=100.00%, depth=64
>
> Run status group 0 (all jobs):
> WRITE: bw=236MiB/s (247MB/s), 236MiB/s-236MiB/s (247MB/s-247MB/s),
> io=138GiB (148GB), run=600319-600319msec
>
> Disk stats (read/write):
> dm-0: ios=50/533034, merge=0/27056, ticks=16/113070163, in_queue=113070180, util=100.00%, aggrios=43/266595, aggrmerge=0/0, aggrticks=156/56542549, aggrin_queue=56542706, aggrutil=100.00%
> sdac: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
> sde: ios=86/533191, merge=0/0, ticks=313/113085099, in_queue=113085413, util=100.00%
> dm-1: ios=5/534381, merge=0/36389, ticks=240/113110344, in_queue=113110584, util=100.00%, aggrios=7/267191, aggrmerge=0/0, aggrticks=153/56543654, aggrin_queue=56543807, aggrutil=100.00%
> sdf: ios=14/534382, merge=0/0, ticks=306/113087308, in_queue=113087615, util=100.00%
> sdad: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
>
> Throughput Results:
> WRITE:247:1884:0
>
>
> 6.8-rc7+ Revert : 2 block devices with multi-path
> -------
>
> job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=64 ...
> job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=64 ...
> fio-3.19
> Starting 48 processes
>
> job: (groupid=0, jobs=48): err= 0: pid=6104: Wed Mar 6 18:29:13 2024
> write: IOPS=2518, BW=315MiB/s (330MB/s)(185GiB/600339msec); 0 zone resets
> slat (usec): min=2, max=923472, avg=6789.22, stdev=20329.20
> clat (msec): min=4, max=6020, avg=1212.68, stdev=714.90
> lat (msec): min=4, max=6020, avg=1219.47, stdev=718.40
> clat percentiles (msec):
> | 1.00th=[ 203], 5.00th=[ 309], 10.00th=[ 384], 20.00th=[ 535],
> | 30.00th=[ 709], 40.00th=[ 911], 50.00th=[ 1133], 60.00th=[ 1334],
> | 70.00th=[ 1519], 80.00th=[ 1754], 90.00th=[ 2198], 95.00th=[ 2601],
> | 99.00th=[ 3171], 99.50th=[ 3608], 99.90th=[ 4329], 99.95th=[ 4597],
> | 99.99th=[ 5134]
> bw ( KiB/s): min=12237, max=1834896, per=100.00%, avg=413187.52, stdev=6322.04, samples=44948
> iops : min= 48, max=14314, avg=3186.68, stdev=49.49,
> samples=44948
> lat (msec) : 10=0.01%, 20=0.01%, 50=0.09%, 100=0.02%, 250=2.28%
> lat (msec) : 500=15.45%, 750=14.26%, 1000=11.83%, 2000=42.52%,
> >=2000=13.55%
> fsync/fdatasync/sync_file_range:
> sync (nsec): min=10, max=76066, avg=57.85, stdev=299.52
> sync percentiles (nsec):
> | 1.00th=[ 13], 5.00th=[ 14], 10.00th=[ 15], 20.00th=[ 16],
> | 30.00th=[ 17], 40.00th=[ 20], 50.00th=[ 28], 60.00th=[ 47],
> | 70.00th=[ 65], 80.00th=[ 80], 90.00th=[ 103], 95.00th=[ 175],
> | 99.00th=[ 237], 99.50th=[ 241], 99.90th=[ 262], 99.95th=[ 318],
> | 99.99th=[16512]
> cpu : usr=0.06%, sys=0.07%, ctx=531434, majf=0, minf=728
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
> >=64=199.6%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
> issued rwts: total=0,1511918,0,1508894 short=0,0,0,0 dropped=0,0,0,0
> latency : target=0, window=0, percentile=100.00%, depth=64
>
> Run status group 0 (all jobs):
> WRITE: bw=315MiB/s (330MB/s), 315MiB/s-315MiB/s (330MB/s-330MB/s),
> io=185GiB (198GB), run=600339-600339msec
>
> Disk stats (read/write):
> dm-0: ios=0/246318, merge=0/493981, ticks=0/142584585, in_queue=142584586, util=99.17%, aggrios=6/181454, aggrmerge=0/0, aggrticks=112/70608689, aggrin_queue=70608801, aggrutil=84.92%
> sdac: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
> sde: ios=12/362908, merge=0/0, ticks=224/141217379, in_queue=141217603, util=84.92%
> dm-1: ios=0/233211, merge=0/538097, ticks=0/142579042, in_queue=142579043, util=99.15%, aggrios=8/174475, aggrmerge=0/0, aggrticks=128/70654686, aggrin_queue=70654814, aggrutil=85.20%
> sdf: ios=16/348951, merge=0/0, ticks=256/141309372, in_queue=141309628, util=85.20%
> sdad: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
>
> Throughput Results:
> WRITE:330:2518:0
>
> (C) performance difference:
>
> That is roughly a 33.65% performance change, this is reproducible on higher number of block devices as well.
>
>
>
> Thanks to Paul Webb for identifying this regression and sharing the details.
> We will be happy to test any patches to check the change in performance and also follow any suggestions.
>
>
> Thanks,
> Harshit