LinuxLists.cc - IO scheduler benchmarking

2003-02-21 05:11:28

Subject: IO scheduler benchmarking

Following this email are the results of a number of tests of various I/O
schedulers:

- Anticipatory Scheduler (AS) (from 2.5.61-mm1 approx)

- CFQ (as in 2.5.61-mm1)

- 2.5.61+hacks (Basically 2.5.61 plus everything before the anticipatory
scheduler - tweaks which fix the writes-starve-reads problem via a
scheduling storm)

- 2.4.21-pre4

All these tests are simple things from the command line.

I stayed away from the standard benchmarks because they do not really touch
on areas where the Linux I/O scheduler has traditionally been bad. (If they
did, perhaps it wouldn't have been so bad..)

Plus all the I/O schedulers perform similarly with the usual benchmarks.
With the exception of some tiobench phases, where AS does very well.

Executive summary: the anticipatory scheduler is wiping the others off the
map, and 2.4 is a disaster.

I really have not sought to make the AS look good - I mainly concentrated on
things which we have traditonally been bad at. If anyone wants to suggest
other tests, please let me know.

The known regressions from the anticipatory scheduler are:

1) 15% (ish) slowdown in David Mansfield's database run. This appeared to
go away in later versions of the scheduler.

2) 5% dropoff in single-threaded qsbench swapstorms

3) 30% dropoff in write bandwidth when there is a streaming read (this is
actually good).

The test machine is a fast P4-HT with 256MB of memory. Testing was against a
single fast IDE disk, using ext2.

2003-02-21 05:13:45

by Andrew Morton

[permalink] [raw]

Subject: iosched: effect of streaming read on interactivity

Similarly, start a large streaming read on the test box and see how long it
then takes to pop up an x client running on that box with

time ssh testbox xterm -e true

2.4.21-4: 45 seconds
2.5.61+hacks: 5 seconds
2.5.61+CFQ: 8 seconds
2.5.61+AS: 9 seconds

2003-02-21 05:13:03

by Andrew Morton

[permalink] [raw]

Subject: iosched: effect of streaming write on interactivity

It peeves me that if a machine is writing heavily, it takes *ages* to get a
login prompt.

Here we start a large streaming write, wait for that to reach steady state
and then see how long it takes to pop up an xterm from the machine under
test with

time ssh testbox xterm -e true

there is quite a lot of variability here.

2.4.21-4: 62 seconds
2.5.61+hacks: 14 seconds
2.5.61+CFQ: 11 seconds
2.5.61+AS: 12 seconds

2003-02-21 05:12:16

by Andrew Morton

[permalink] [raw]

Subject: iosched: parallel streaming reads

Here we see how well the scheduler can cope with multiple processes reading
multiple large files. We read ten well laid out 100 megabyte files in
parallel (ten readers):

for i in $(seq 0 9)
do
time cat 100-meg-file-$i > /dev/null &
done

2.4.21-pre4:

0.00s user 0.18s system 2% cpu 6.115 total
0.02s user 0.22s system 1% cpu 14.312 total
0.01s user 0.19s system 1% cpu 14.812 total
0.00s user 0.14s system 0% cpu 20.462 total
0.02s user 0.19s system 0% cpu 23.887 total
0.06s user 0.14s system 0% cpu 27.085 total
0.01s user 0.26s system 0% cpu 32.367 total
0.00s user 0.22s system 0% cpu 34.844 total
0.01s user 0.21s system 0% cpu 35.233 total
0.01s user 0.16s system 0% cpu 37.007 total

2.5.61+hacks:

0.01s user 0.16s system 0% cpu 2:12.00 total
0.01s user 0.15s system 0% cpu 2:12.12 total
0.00s user 0.14s system 0% cpu 2:12.34 total
0.01s user 0.15s system 0% cpu 2:12.68 total
0.00s user 0.15s system 0% cpu 2:12.93 total
0.01s user 0.17s system 0% cpu 2:13.06 total
0.01s user 0.14s system 0% cpu 2:13.18 total
0.01s user 0.17s system 0% cpu 2:13.31 total
0.01s user 0.16s system 0% cpu 2:13.49 total
0.01s user 0.19s system 0% cpu 2:13.51 total

2.5.61+CFQ:

0.01s user 0.16s system 0% cpu 50.778 total
0.01s user 0.16s system 0% cpu 51.067 total
0.01s user 0.16s system 0% cpu 52.854 total
0.01s user 0.17s system 0% cpu 53.303 total
0.01s user 0.17s system 0% cpu 54.565 total
0.01s user 0.18s system 0% cpu 1:07.39 total
0.01s user 0.17s system 0% cpu 1:19.96 total
0.00s user 0.17s system 0% cpu 1:28.74 total
0.01s user 0.18s system 0% cpu 1:31.28 total
0.01s user 0.18s system 0% cpu 1:32.34 total

2.5.61+AS

0.01s user 0.17s system 0% cpu 27.995 total
0.01s user 0.18s system 0% cpu 30.550 total
0.00s user 0.17s system 0% cpu 31.413 total
0.00s user 0.18s system 0% cpu 32.381 total
0.01s user 0.17s system 0% cpu 33.273 total
0.01s user 0.18s system 0% cpu 33.389 total
0.01s user 0.15s system 0% cpu 34.534 total
0.01s user 0.17s system 0% cpu 34.481 total
0.00s user 0.17s system 0% cpu 34.694 total
0.01s user 0.16s system 0% cpu 34.832 total

AS and 2.4 almost achieved full disk bandwidth. 2.4 does quite well here,
although it was unfair.

As an aside, I reran this test with the VM readahead wound down from the
usual 128k to just 8k:

2.5.61+CFQ:

0.01s user 0.25s system 0% cpu 7:48.39 total
0.01s user 0.23s system 0% cpu 7:48.72 total
0.02s user 0.26s system 0% cpu 7:48.93 total
0.02s user 0.25s system 0% cpu 7:48.93 total
0.01s user 0.26s system 0% cpu 7:49.08 total
0.02s user 0.25s system 0% cpu 7:49.22 total
0.02s user 0.26s system 0% cpu 7:49.25 total
0.02s user 0.25s system 0% cpu 7:50.35 total
0.02s user 0.26s system 0% cpu 8:19.82 total
0.02s user 0.28s system 0% cpu 8:19.83 total

2.5.61 base:

0.01s user 0.25s system 0% cpu 8:10.53 total
0.01s user 0.27s system 0% cpu 8:11.96 total
0.02s user 0.26s system 0% cpu 8:14.95 total
0.02s user 0.26s system 0% cpu 8:17.33 total
0.02s user 0.25s system 0% cpu 8:18.05 total
0.01s user 0.24s system 0% cpu 8:19.03 total
0.02s user 0.27s system 0% cpu 8:19.66 total
0.02s user 0.25s system 0% cpu 8:20.00 total
0.02s user 0.26s system 0% cpu 8:20.10 total
0.02s user 0.25s system 0% cpu 8:20.11 total

2.5.61+AS

0.02s user 0.23s system 0% cpu 28.640 total
0.01s user 0.23s system 0% cpu 28.066 total
0.02s user 0.23s system 0% cpu 28.525 total
0.01s user 0.20s system 0% cpu 28.925 total
0.01s user 0.22s system 0% cpu 28.835 total
0.02s user 0.21s system 0% cpu 29.014 total
0.02s user 0.23s system 0% cpu 29.093 total
0.01s user 0.20s system 0% cpu 29.175 total
0.01s user 0.23s system 0% cpu 29.233 total
0.01s user 0.21s system 0% cpu 29.285 total

We see here that the anticipatory scheduler is not dependent upon large
readahead to get good performance.

2003-02-21 05:15:52

by Andrew Morton

[permalink] [raw]

Subject: iosched: impact of streaming write on streaming read

Here we take a look at the read bandwidth.

A single streaming write
while true
do
dd if=/dev/zero done

and we measure how long filesystem with

time cat 100m-file >
I'll include `vmstat 1'

2.4.21-pre4: 42 seconds
1 3 276 4384 2144 222300 0 3 276 4344 2144 222240 0 3 276 4340 2148 222220 0 3 276 4404 2152 222132 0 4 276 4464 2160 221928 0 4 276 4460 2160 221900 0 4 276 4392 2156 221972 0 4 276 4420 2168 221852 0 4 276 4204 2164 221912 0 4 276 4448 2164 221668 0 4 276 4432 2160 221688 0 4 276 4400 2168 221608 4 1 276 4324 2188 221616 0 4 276 3516 2196 222408 0 4 276 3468 2212 222424 0 4 276 4112 2208 221700 3 2 276 3768 2208 222040 0 4 276 4452 2216 221344
2.5.61+hacks: 48 seconds
0 5 0 2140 1296 227700 0 5 0 2252 1296 227664 0 6 0 4044 1288 225904 0 6 0 4100 1268 225788 0 6 0 4156 1248 225908 0 6 0 4100 1244 226012 0 6 0 4212 1240 225980 0 5 0 5444 1192 224824 0 6 0 2196 1180 228088
2.5.61+CFQ: 27 seconds
1 3 0 6196 2060 222852 0 2 0 4404 1820 224880 2 4 0 2884 1680 226588 0 4 0 4332 1312 225388 0 4 0 4268 1012 225764 0 4 0 3316 1016 226752 0 4 0 4212 992 225924
2.5.61+AS: 3.8 seconds
0 4 0 2236 1320 227548 0 4 0 2236 1296 227636 0 5 0 3348 1088 226604 0 5 0 2284 1056 227920 0 5 0 4916 1080 225672 0 5 120 2228 1108 228732 0 4 120 4196 1060 226984

impact which a streaming write has upon streaming
was set up with:
of=foo bs=1M count=512 conv=notrunc
it takes to read a 100 megabyte file from the same
/dev/null
snippets here as well.
0 0 80 26480 520 743 0 6 94 0
0 0 76 25224 512 492 0 4 96 0
0 0 124 25584 520 536 0 3 97 0
0 0 44 26604 538 533 0 5 95 0
0 0 60 25040 516 559 0 4 96 0
0 0 612 27456 560 621 0 4 96 0
0 0 708 23872 488 566 0 4 95 0
0 0 688 26668 545 653 0 4 96 0
0 0 696 21588 492 884 0 5 95 0
0 0 396 21376 423 833 0 4 96 0
0 0 784 26368 544 705 0 4 96 0
0 0 560 27640 563 596 0 5 95 0
0 0 12476 12996 538 908 0 4 96 0
0 0 12320 16048 529 971 0 2 98 0
0 0 12704 14428 540 1039 0 4 96 0
0 0 552 20824 474 539 0 4 96 0
0 0 524 25428 503 612 0 3 97 0
0 0 536 19548 437 1241 0 3 97 0
0 0 0 22236 1213 126 0 4 0 96
0 0 0 23340 1219 123 0 3 0 97
0 0 1844 13632 1183 236 0 2 0 98
0 0 1920 13780 1173 217 0 2 0 98
0 0 2184 14828 1184 236 0 3 0 97
0 0 2176 13720 1173 237 0 2 0 98
0 0 1924 13900 1175 236 0 2 0 98
0 0 2304 11820 1164 206 0 2 0 98
0 0 2308 14460 1180 269 0 3 0 97
0 0 0 23840 1247 220 0 4 4 92
0 0 0 22208 1237 271 0 3 8 89
0 0 1496 26944 1263 355 0 4 2 94
0 0 4592 14692 1244 414 0 3 0 97
0 0 1408 29540 1308 671 0 5 0 95
0 0 2820 27500 1306 668 0 5 0 95
0 0 3076 22148 1255 508 0 3 0 97
0 0 0 36684 1335 136 0 5 0 95
0 0 0 37736 1334 134 0 5 0 95
0 0 1232 30040 1320 174 0 4 0 96
0 0 29088 5488 1536 855 0 4 0 96
0 0 26904 8452 1517 993 0 5 0 95
0 120 29472 6752 1545 940 0 3 1 96
0 0 16164 15740 1426 627 0 3 3 93

2003-02-21 05:14:28

by Andrew Morton

[permalink] [raw]

Subject: iosched: time to copy many small files

This test simply measures how long it takes to copy a large number of files
within the same filesystem. It creates a lot of small, competing read and
write I/O's. Changes which were made to the VFS dirty memory handling early
in the 2.5 cycle tends to make 2.5 a bit slower at this.

Three copies of the 2.4.19 kernel tree were placed on an ext2 filesystem.
Measure the time it takes to copy them all to the same filesystem, and to
then sync the system. This is just

cp -a ./dir-with-three-kernel-trees/ ./new-dir
sync

The anticipatory scheduler doesn't help here. It could, but we haven't got
there yet, and it may need VFS help.

2.4.21-pre4: 70 seconds
2.5.61+hacks: 72 seconds
2.5.61+CFQ: 69 seconds
2.5.61+AS: 66 seconds

2003-02-21 05:16:40

by Andrew Morton

[permalink] [raw]

Subject: iosched: impact of streaming write on read-many-files

Here we look at what affect a large streaming write has upon an operation
which reads many small files from the same disk.

A single streaming write was set up with:

while true
do
dd if=/dev/zero of=foo bs=1M count=512 conv=notrunc
done

and we measure how long it takes to read all the files from a 2.4.19 kernel
tree off the same disk with

time (find kernel-tree -type f | xargs cat > /dev/null)

As a reference, the time to read the kernel tree with no competing I/O is 7.9
seconds.

2.4.21-pre4:

Don't know. I killed it after 15 minutes. Judging from the vmstat
output it would have taken many hours.

2.5.61+hacks: 7 minutes 27 seconds
r b swpd free buff cache si so bi bo in cs us sy id wa
0 8 0 2188 1200 226692 0 0 852 17664 1204 253 0 3 0 97
0 8 0 4148 1212 224804 0 0 1940 16208 1187 245 0 2 0 98
0 7 0 4260 1128 224756 0 0 324 20228 1226 298 0 3 0 97
0 8 0 4204 1048 224944 0 0 500 20856 1227 313 0 3 0 97
1 7 0 2300 1040 226840 0 0 348 20272 1227 313 0 3 0 97
0 8 0 4204 1044 224952 0 0 212 21564 1230 320 0 3 0 97

2.5.61+CFQ: 9 minutes 55 seconds
r b swpd free buff cache si so bi bo in cs us sy id wa
1 2 0 4308 1028 224660 0 0 180 38368 1250 357 0 3 6 91
0 4 0 2180 1020 226852 0 0 324 25196 1266 408 0 4 1 95
0 4 0 2236 1016 226744 0 0 252 26948 1276 449 0 4 2 93
0 4 0 4196 1020 224816 0 0 380 23204 1250 454 0 3 4 93
0 3 0 4356 1036 224632 0 0 2616 25824 1271 490 0 4 0 96
0 4 0 4140 968 224996 0 0 496 29416 1304 609 0 4 0 96
0 4 0 2180 948 226972 0 0 352 29364 1300 688 0 5 0 95
0 3 0 4364 928 224796 0 0 344 22100 1281 656 0 4 22 74

(CFQ had a strange 20-second pause in which it performed no reads at all)
(And a later 4-second one)
(then 10 seconds..)

2.5.61+AS: 17 seconds
r b swpd free buff cache si so bi bo in cs us sy id wa
0 6 0 2280 2716 226112 0 0 0 22388 1205 151 0 3 0 97
0 6 0 4296 2596 224168 0 0 0 21968 1213 148 0 3 0 97
1 6 0 3872 2516 224408 0 0 296 19552 1223 249 0 3 0 97
0 9 0 2176 2584 225324 0 0 5112 14588 1573 1424 0 5 0 94
0 8 0 3364 2668 223116 0 0 17512 8500 3059 6065 0 8 0 92
1 8 0 4156 2708 221340 0 0 12812 9560 2695 4863 0 9 0 91
0 8 0 3740 2956 221188 0 0 17216 7200 2406 4045 0 6 0 94
0 9 0 3828 2668 221192 0 0 9712 8972 1615 1540 0 5 0 94
1 6 0 2060 2924 222272 0 0 8428 17784 1713 1718 0 5 0 95

2003-02-21 05:16:40

by Andrew Morton

[permalink] [raw]

Subject: iosched: concurrent reads of many small files

This test is very approximately the "busy web server" workload. We set up a
number of processes each of which are reading many small files from different
parts of the disk.

Set up six separate copies of the 2.4.19 kernel tree, and then run, in
parallel, six processes which are reading them:

for i in 1 2 3 4 5 6
do
time (find kernel-tree-$i -type f | xargs cat > /dev/null ) &
done

With this test we have six read requests in the queue all the time. It's
what the anticipatory scheduler was designed for.

2.4.21-pre4:
6m57.537s
6m57.620s
6m57.741s
6m57.891s
6m57.909s
6m57.916s

2.5.61+hacks:
3m40.188s
3m51.332s
3m55.110s
3m56.186s
3m56.757s
3m56.791s

2.5.61+CFQ:
5m15.932s
5m16.219s
5m16.386s
5m17.407s
5m50.233s
5m50.602s

2.5.61+AS:
0m44.573s
0m45.119s
0m46.559s
0m49.202s
0m51.884s
0m53.087s

This was a little unfair to 2.4 because three of the trees were laid out by
the pre-Orlov ext2. So I reran the test with 2.4.21-pre4 when all six trees
were laid out by 2.5's Orlov allocator:

6m12.767s
6m12.974s
6m13.001s
6m13.045s
6m13.062s
6m13.085s

Not much difference there, although Orlov is worth a 4x speedup in this test
when there is only a single reader (or multiple readers + anticipatory
scheduler)

2003-02-21 05:17:59

by Andrew Morton

[permalink] [raw]

Subject: iosched: impact of streaming read on read-many-files

Here we look at what affect a large streaming read has upon an operation
which reads many small files from the same disk.

A single streaming read was set up with:

while true
do
cat 512M-file > /dev/null
done

and we measure how long it takes to read all the files from a 2.4.19 kernel
tree off the same disk with

time (find kernel-tree -type f | xargs cat > /dev/null)

2.4.21-pre4: 31 minutes 30 seconds

2.5.61+hacks: 3 minutes 39 seconds

2.5.61+CFQ: 5 minutes 7 seconds (*)

2.5.61+AS: 17 seconds

* CFQ performed very strangely here. Tremendous amount of seeking and a
big drop in aggregate bandwidth. See the vmstat 1 output from when the
kernel tree read started up:

r b swpd free buff cache si so bi bo in cs us sy id wa
0 1 1240 125260 1176 109488 0 0 40744 0 1672 725 0 3 49 47
0 1 1240 85892 1220 148788 0 0 39344 0 1651 693 0 3 49 48
0 1 1240 45124 1260 189492 0 0 40744 0 1663 683 0 3 49 47
1 1 1240 4544 1300 230068 0 0 40616 0 1661 837 0 4 49 47
0 2 1348 3468 944 231696 0 108 40488 148 1671 800 0 4 4 91
0 2 1348 2180 936 232920 0 0 40612 64 1668 789 0 4 0 96
0 3 1348 4220 996 230648 0 0 11348 0 1256 352 0 2 0 98
0 3 1348 4052 1064 230472 0 0 9012 0 1207 305 0 1 0 98
0 4 1348 3596 1148 230580 0 0 6756 0 1171 247 0 1 0 99
0 4 1348 4044 1148 229888 0 0 6344 0 1165 237 0 1 0 99
1 3 1348 3708 1160 230212 0 0 7800 0 1187 255 0 1 21 78

2003-02-21 05:19:01

by Andrew Morton

[permalink] [raw]

Subject: iosched: effect of streaming read on streaming write

Here we look at how much damage a streaming read can do to writeout
performance. Start a streaming read with:

while true
do
cat 512M-file > /dev/null
done

and measure how long it takes to write out and fsync a 100 megabyte file:

time write-and-fsync -f -m 100 outfile

2.4.21-pre4: 6.4 seconds
2.5.61+hacks: 7.7 seconds
2.5.61+CFQ: 8.4 seconds
2.5.61+AS: 11.9 seconds

This is the one where the anticipatory scheduler could show its downside.
It's actually not too bad - the read stream steals 2/3rds of the disk
bandwidth. Dirty memory will reach the vm threshold and writers will
throttle. This is usually what we want to happen.

Here is the vmstat 1 trace for the anticipatory scheduler:

r b swpd free buff cache si so bi bo in cs us sy id wa
1 1 8728 2268 2620 233412 0 0 40360 0 1658 802 0 4 0 96
0 2 8728 3780 2508 231924 0 0 40616 4 1668 874 0 5 0 95
0 2 8728 3668 2276 232416 0 0 40740 20 1668 978 0 4 0 96
0 3 8728 3660 2192 232668 40 0 35296 12 1603 904 0 4 0 95
0 5 8728 3612 1964 231672 0 0 26220 18572 1497 1381 0 15 0 85
0 5 8728 2100 1732 233584 0 0 25232 8696 1497 867 0 3 16 81
0 5 8728 3664 1204 232424 0 0 27668 8696 1533 787 0 3 0 97
1 4 8728 2432 792 234108 0 0 27160 8696 1527 965 0 3 0 97
0 6 8728 2208 760 234436 0 0 25904 9584 1513 856 0 3 0 97
2 6 8728 3776 760 233148 0 0 27776 8716 1537 880 0 3 0 97
0 6 8728 2204 624 234968 0 0 27924 8812 1541 991 0 4 0 96
0 4 8716 2508 600 234740 0 0 28188 8216 1537 1038 0 4 0 96
0 4 8716 4072 532 233316 0 16 25624 9644 1515 896 0 3 0 97
0 4 8716 3740 548 233624 0 0 27548 8696 1528 908 0 3 0 97

2003-02-21 06:42:39

by David Lang

[permalink] [raw]

Subject: IO scheduler benchmarking

Subject: iosched: effect of streaming read on interactivity

Subject: iosched: effect of streaming write on interactivity

Subject: iosched: parallel streaming reads

Subject: iosched: impact of streaming write on streaming read

Subject: iosched: time to copy many small files

Subject: iosched: impact of streaming write on read-many-files

Subject: iosched: concurrent reads of many small files

Subject: iosched: impact of streaming read on read-many-files

Subject: iosched: effect of streaming read on streaming write

Subject: Re: IO scheduler benchmarking

Subject: Re: IO scheduler benchmarking

Subject: Re: IO scheduler benchmarking

Subject: Re: iosched: impact of streaming read on read-many-files

Subject: Re: iosched: impact of streaming read on read-many-files

Subject: Re: IO scheduler benchmarking

Subject: Re: IO scheduler benchmarking

Subject: Re: iosched: impact of streaming read on read-many-files

Subject: Re: IO scheduler benchmarking

Subject: Re: IO scheduler benchmarking

Subject: Re: IO scheduler benchmarking

Subject: Re: IO scheduler benchmarking

Subject: Re: iosched: impact of streaming read on read-many-files

Subject: Re: IO scheduler benchmarking

Subject: Re: IO scheduler benchmarking

Subject: Re: iosched: impact of streaming read on read-many-files

Subject: Re: iosched: impact of streaming read on read-many-files

Subject: Re: IO scheduler benchmarking

Subject: Re: IO scheduler benchmarking

Subject: Re: IO scheduler benchmarking

Subject: Re: IO scheduler benchmarking

Subject: Re: IO scheduler benchmarking