2001-11-17 08:04:27

by Randy Hron

[permalink] [raw]
Subject: I/O tests using elvtune to improve interactive performance

Kernel: 2.4.15-pre5

Test: Run growfiles tests from Linux Test Project that really hurt
interactive performance. Simultaneously run "ls -laR /".
Change the elevator read latency value with elvtune.
Also run mp3blaster tests.


Summary: Smaller values for the I/O elevator read latency
have a signficant positive impact on interactive
performance; and throughput is as good or better
than default value of 8192.

The idea for this came from Andrea Arcangeli's excellent doc at
http://tux.u-strasbg.fr/jl3/features-2.3-1.html . That page shows
that dbench throughput can be good with low values for read
latency too.

My initial tests were just to run growfiles and do commands that
were slow to respond in the past. Things like "ls -l", "login",
"ps aux", etc. I didn't time these tests, but it was amazing what
a difference using elvtune to set read latency to 128 or 32 made.
Each growfiles test prints the number of iterations for a 120 second
interval, and I was happy to see that the number of iterations went
up while interactive performance was dramatically better.

Of course, running ls -l in big directories isn't exactly scientific,
so I tried to come up with something to measure interactive performance.

For these tests, the ls -laR / is running at the same time as some
growfiles tests. I picked ls for a couple reasons:

1) It's slow to respond when I/O is high.
2) It's easy to measure and repeat.
3) My disk has 5 partitions and lots of files spread on each partition,
which will require some seeking on the disk.

ls -laR / is not ideal though; it isn't interactive.

Total time for the 4 growfiles tests is 8 minutes (120 seconds per test).
The ls command finished before the last growfiles test completed in
each run.

I rebooted between each of these tests.

read_latency = 2
----------------
The ls was the slowest here, and none of the growfiles were the fastest.

ls -laR / > /var/tmp/ls-laR2
Elapsed (wall clock) time (h:mm:ss or m:ss): 7:40.52
Percent of CPU this job got: 4%

growfiles -b -e 1 -i 0 -L 120 -u -g 4090 -T 100 -t 408990 -l -C 10 -c 1000 -S 10 -f Lgf02_
13969 iterations to 10 files. Hit time value of 120

growfiles -b -e 1 -i 0 -L 120 -u -g 5000 -T 100 -t 499990 -l -C 10 -c 1000 -S 10 -f Lgf03_
12252 iterations to 10 files. Hit time value of 120

growfiles -b -e 1 -u -r 1-49600 -I r -u -i 0 -L 120 Lgfile1
48352 iterations to 1 files. Hit time value of 120

growfiles -b -e 1 -i 0 -L 120 -w -u -r 10-5000 -I r -T 10 -l -S 2 -f Lgf04_
59807 iterations to 2 files. Hit time value of 120


read_latency = 32
-----------------
This value had 3 of the best results for growfiles. ls was 16% slower than
the default read latency though. Interative performance was great though.

ls -laR / > /var/tmp/ls-laR32
Elapsed (wall clock) time (h:mm:ss or m:ss): 5:08.23
Percent of CPU this job got: 6%

growfiles -b -e 1 -i 0 -L 120 -u -g 4090 -T 100 -t 408990 -l -C 10 -c 1000 -S 10 -f Lgf02_
14181 iterations to 10 files. Hit time value of 120

growfiles -b -e 1 -i 0 -L 120 -u -g 5000 -T 100 -t 499990 -l -C 10 -c 1000 -S 10 -f Lgf03_
11691 iterations to 10 files. Hit time value of 120

growfiles -b -e 1 -u -r 1-49600 -I r -u -i 0 -L 120 Lgfile1
54768 iterations to 1 files. Hit time value of 120

growfiles -b -e 1 -i 0 -L 120 -w -u -r 10-5000 -I r -T 10 -l -S 2 -f Lgf04_
68342 iterations to 2 files. Hit time value of 120


read_latency = 8192 (default)
-------------------
ls -laR / > /var/tmp/ls-laR8192
Elapsed (wall clock) time (h:mm:ss or m:ss): 4:26.13
Percent of CPU this job got: 7%

growfiles -b -e 1 -i 0 -L 120 -u -g 4090 -T 100 -t 408990 -l -C 10 -c 1000 -S 10 -f Lgf02_
11085 iterations to 10 files. Hit time value of 120

growfiles -b -e 1 -i 0 -L 120 -u -g 5000 -T 100 -t 499990 -l -C 10 -c 1000 -S 10 -f Lgf03_
13797 iterations to 10 files. Hit time value of 120

growfiles -b -e 1 -u -r 1-49600 -I r -u -i 0 -L 120 Lgfile1
53198 iterations to 1 files. Hit time value of 120

growfiles -b -e 1 -i 0 -L 120 -w -u -r 10-5000 -I r -T 10 -l -S 2 -f Lgf04_
63542 iterations to 2 files. Hit time value of 120


mtest01 and mmap001
-------------------

I also ran the mtest01 and mmap001 tests playing mp3blaster with various
elevator settings. These are the same tests I've run before. Below is just
the total time for the test, and the percentage of time the mp3 played.

read_latency = 16 was best here. Test was fastest and had the highest
mp3 playtime.

read_latency = 2
mtest01 - mp3 played 280 seconds of 316 second run. (88%)
mmap001 not run because changing elvtune didn't seem to affect this test.

read_latency = 16
mtest01 - mp3 played 280 seconds of 309 second run. (91%)
mmap001 - mp3 played 908 seconds of 908 second run.

read_latency = 64
mtest01 - mp3 played 280 seconds of 309 second run. (80%)
mmap001 - mp3 played 908 seconds of 908 second run.

read_latency = 8192
mtest01 - mp3 played 262 seconds of 314 second run. (83%)
mmap001 - mp3 played 901 seconds of 901 second run.


Hardware
--------
Athlon 1333
512 Mb RAM
(1) 40 Gb IDE hard drive with 5 partitions


It's exciting to see Linux have good interactive performance under heavy
disk load.

Have fun!
--
Randy Hron


2001-11-19 07:10:25

by Jens Axboe

[permalink] [raw]
Subject: Re: I/O tests using elvtune to improve interactive performance

On Sat, Nov 17 2001, [email protected] wrote:
> Kernel: 2.4.15-pre5
>
> Test: Run growfiles tests from Linux Test Project that really hurt
> interactive performance. Simultaneously run "ls -laR /".
> Change the elevator read latency value with elvtune.
> Also run mp3blaster tests.

Interesting tests, thanks. I wonder if you could be convinced to do
bonnie++ and dbench tests with the same read_latency values used? Also,
I'm assuming you kept write latency at its default of 16384?

--
Jens Axboe

2001-11-19 15:24:05

by Randy Hron

[permalink] [raw]
Subject: Re: I/O tests using elvtune to improve interactive performance

On Mon, Nov 19, 2001 at 08:09:22AM +0100, Jens Axboe wrote:
> > Test: Run growfiles tests from Linux Test Project that really hurt
> > interactive performance. Simultaneously run "ls -laR /".
> > Change the elevator read latency value with elvtune.
> > Also run mp3blaster tests.
>
> Interesting tests, thanks. I wonder if you could be convinced to do
> bonnie++ and dbench tests with the same read_latency values used? Also,
> I'm assuming you kept write latency at its default of 16384?
> --
> Jens Axboe
>

Thanks for the feedback. Write latency was 16384 for all tests.

I'm downloading dbench and bonnie++ now. I'll check them out.

I'm still not sure how to measure/quantify interactive performance.

My ideal test will have these components:

1) Simulate and measure user interactive response time.
2) Disk I/O patterns capable of making interactive performance slow.
3) Measurement of I/O throughput.
4) Note how changes with elvtune effect throughput and response time.
5) It's not too boring. (i.e. type something, use a stop watch).

It's the "measure interactive response time" that I haven't got a handle
on yet. I'm looking at the SSBA benchmarks for something that simulates
users. I don't know if it measures response time.

I could resort to a stopwatch to test interactive response, but
hopefully, something better will come to mind.
--
Randy Hron

2001-11-20 07:30:29

by Randy Hron

[permalink] [raw]
Subject: Re: I/O tests using elvtune to improve interactive performance

On Mon, Nov 19, 2001 at 08:09:22AM +0100, Jens Axboe wrote:
> Interesting tests, thanks. I wonder if you could be convinced to do
> bonnie++ and dbench tests with the same read_latency values used? Also,
> --
> Jens Axboe

Jens,

I'm sure this isn't what you had in mind, but ... :)

Kernel: 2.4.15-pre6

Test: dbench 775 on 5 partitions. Time ls -l on big directories.
Test with read_latency to 8192 (default) and 32.

Summary: Load average of 775 and console IRC clients perform great.
Lower read latency reduces throughput, but big directory listings
are faster.

This is really a crazy test, but it's a testament to the amazing work
of the kernel hackers.

I was looking for the I/O load that makes interactive response poor.
There are a couple growfiles tests in the Linux Test Project that
do that with a load average of less than 5. dbench is different.
The dbench load the kernel can handle is remarkable.

Hardware:
1 Athlon 1333
1 GB RAM
1 GB swap
1 40 GB IDE disk

A reasonable test may be dbench 36 or 144, which return:
Throughput 90.636 MB/sec (NB=113.295 MB/sec 906.36 MBit/sec) 8 procs
Throughput 56.0331 MB/sec (NB=70.0413 MB/sec 560.331 MBit/sec) 36 procs
Throughput 25.7869 MB/sec (NB=32.2336 MB/sec 257.869 MBit/sec) 144 procs

Instead, I figured out roughly how many simultaneous dbench processes would
run with the amount of free disk space I have.

8:01pm up 53 min, 12 users, load average: 779.12, 778.68, 737.68

I had 3 console irc sessions up. Occasionally there was a very slight
delay. "ls -l" on the other hand was very slow on big directories; timings
are below.

Summary:
read latency=8192 compared to read latency=32

dbench 50 10% more throughput
dbench 50 6% more throughput
dbench 75 22% more throughput
dbench 150 25% more throughput
dbench 450 24% more throughput
ls -l time 48% longer

ls times are interspersed with dbench results in chronologic order.

read_latency = 8192
-------------------
/usr/share/man/man3 real 30m48.908s

# /home/dbench$ ./dbench 50 completes
Throughput 1.91472 MB/sec (NB=2.39339 MB/sec 19.1472 MBit/sec) 50 procs

# /usr/local/dbench$ ./dbench 50 completes
Throughput 1.84434 MB/sec (NB=2.30543 MB/sec 18.4434 MBit/sec) 50 procs

# /dbench$ ./dbench 75 completes
Throughput 2.50039 MB/sec (NB=3.12548 MB/sec 25.0039 MBit/sec) 75 procs

/usr/src/linux ls -laR real 10m11.953s

# /usr/src/sources/d/dbench$ ./dbench 150 completes
Throughput 3.51881 MB/sec (NB=4.39852 MB/sec 35.1881 MBit/sec) 150 procs

/usr/X11R6/lib/X11/fonts/75dpi real 28m22.315s
/usr/X11R6/lib/X11/fonts/100dpi real 12m27.915s

# /opt/dbench$ ./dbench 450 completes
Throughput 4.64194 MB/sec (NB=5.80242 MB/sec 46.4194 MBit/sec) 450 procs


read_latency = 32
-----------------
/usr/share/man/man3 real 10m8.684s

# /home/dbench$ ./dbench 50 completes
Throughput 1.74518 MB/sec (NB=2.18147 MB/sec 17.4518 MBit/sec) 50 procs

# /usr/local/dbench$ ./dbench 50 completes
Throughput 1.73985 MB/sec (NB=2.17481 MB/sec 17.3985 MBit/sec) 50 procs

/usr/src/linux ls -laR real 5m57.340s

# /dbench$ ./dbench 75 completes
Throughput 2.0441 MB/sec (NB=2.55513 MB/sec 20.441 MBit/sec) 75 procs

/usr/X11R6/lib/X11/fonts/75dpi real 13m32.822s

# /usr/src/sources/d/dbench$ ./dbench 150 completes
Throughput 2.8047 MB/sec (NB=3.50587 MB/sec 28.047 MBit/sec) 150 procs

/usr/X11R6/lib/X11/fonts/100dpi real 14m14.336s

# /opt/dbench$ ./dbench 450 completes
Throughput 3.74463 MB/sec (NB=4.68079 MB/sec 37.4463 MBit/sec) 450 procs


Filesystems (test not running)
-----------
Filesystem Type Size Used Avail Use% Mounted on
/dev/hda12 reiserfs 4.2G 1.2G 3.0G 27% /
/dev/hda11 reiserfs 15G 3.9G 11G 26% /opt
/dev/hda5 reiserfs 10G 5.6G 4.9G 53% /usr/src
/dev/hda6 reiserfs 5.2G 3.4G 1.8G 64% /home
/dev/hda8 reiserfs 2.1G 200M 1.8G 10% /usr/local

Conclusion:
Load average 775!
Box is solid.
IRC clients perform great.
Total throughput goes down as load goes up.


It may have made more sense to do a shorter test with less processes and more
values for read_latency, but it turned out this way. Hopefully it's entertaining,
nonetheless. :)

--
Randy Hron