2020-04-24 15:07:42

by Konstantin Kharlamov

[permalink] [raw]
Subject: Changing a workload results in performance drop

* SSDs are used in testing, so random access is not a concern. But I tried the
"steps to reproduce" with raw block device, and IOPS always holds 9k for me.
* "Direct" IO is used to bypass file-system cache.
* The issue is way less visible on XFS, so it looks specific to file systems.
* The biggest difference I've seen is on 70% reads/30% writes workload. But for
simplicity in "steps to reproduce" I'm using 100% write.
* it seems over time (perhaps a day) performance gets improved, so for best
results when testing that you need to re-create ext4 anew.
* in "steps to reproduce" I grep fio stdout. That suppresses interactive
output. Interactive output may be interesting though, I've often seen workload
drops to 600-700 IOPS while average was 5-6k
* Original problem I worked with https://github.com/openzfs/zfs/issues/10231

# Steps to reproduce (in terms of terminal commands)

$ cat fio_jobfile
[job-section]
name=temp-fio
bs=8k
ioengine=libaio
rw=randrw
rwmixread=0
rwmixwrite=100
filename=/mnt/test/file1
iodepth=1
numjobs=1
group_reporting
time_based
runtime=1m
direct=1
filesize=4G
$ mkfs.ext4 /dev/sdw1
$ mount /dev/sdw1 /mnt/test
$ truncate -s 100G /mnt/test/file1
$ fio fio_jobfile | grep -i IOPS
write: IOPS=12.5k, BW=97.0MiB/s (103MB/s)(5879MiB/60001msec)
iops : min=10966, max=14730, avg=12524.20, stdev=1240.27, samples=119
$ sed -i 's/4G/100G/' fio_jobfile
$ fio fio_jobfile | grep -i IOPS
write: IOPS=5880, BW=45.9MiB/s (48.2MB/s)(2756MiB/60001msec)
iops : min= 4084, max= 6976, avg=5879.31, stdev=567.58, samples=119

## Expected

Performance should be more or less the same

## Actual

The second test is twice as slow

# Versions

* Kernel version: 5.6.2-050602-generic

It seems however that the problem is present at least in 4.19 and 5.4. as well, so not a regression.