2003-01-16 01:36:22

by Randy Hron

[permalink] [raw]
Subject: big ext3 sequential write improvement in 2.5.51-mm1 gone in 2.5.53-mm1?

On a quad xeon running tiobench...
The throughput and max latency for ext3 sequential writes
looks very good when threads >= 2 on 2.5.51-mm1.

Did 2.5.51-mm1 mount ext3 as ext2? I have ext2 logs for
2.5.51-mm1 and they look similar to the ext3 results.
The other 2.5 kernels from around that time look more
like 2.5.53-mm1.

file size = 8192 megs
block size = 4096 bytes
rate in megabytes/second
latency in milliseconds

Sequential Writes
Num Avg Maximum Lat% Lat% CPU
Identifier Thr Rate (CPU%) Latency Latency >2s >10s Eff
------------ --- ------ ------ --------- ----------- -------- -------- -----
2.5.51-mm1 1 55.58 46.70% 0.234 18461.07 0.00834 0.00000 119
2.5.51-mm1 2 38.00 31.54% 0.563 18460.95 0.00896 0.00000 120
2.5.51-mm1 4 35.28 32.69% 1.041 84910.77 0.01306 0.00057 108
2.5.51-mm1 8 34.86 32.97% 2.090 113261.88 0.02433 0.00387 106
2.5.51-mm1 16 34.79 32.80% 3.786 216278.13 0.02923 0.01054 106
2.5.51-mm1 32 33.25 32.31% 7.083 331456.04 0.03152 0.01411 103
2.5.51-mm1 64 31.77 32.14% 14.020 604095.22 0.03772 0.02094 99
2.5.51-mm1 128 30.59 31.60% 25.436 653761.04 0.04019 0.02298 97
2.5.51-mm1 256 32.45 34.83% 47.633 598925.79 0.06914 0.04615 93

Sequential Writes
Num Avg Maximum Lat% Lat% CPU
Identifier Thr Rate (CPU%) Latency Latency >2s >10s Eff
------------ --- ------ ------ --------- ----------- -------- -------- -----
2.5.53-mm1 1 52.74 68.60% 0.604 19544.48 0.01731 0.00000 77
2.5.53-mm1 2 2.70 4.951% 7.589 54571.99 0.12716 0.00739 55
2.5.53-mm1 4 2.78 34.78% 13.966 467805.71 0.16842 0.03018 8
2.5.53-mm1 8 2.93 59.73% 26.819 1008655.17 0.19922 0.04420 5
2.5.53-mm1 16 3.14 26.13% 45.610 1939797.82 0.14705 0.05607 12
2.5.53-mm1 32 3.35 19.17% 80.421 3055837.66 0.12188 0.04888 17
2.5.53-mm1 64 3.43 15.13% 163.323 4284106.34 0.11868 0.05264 23
2.5.53-mm1 128 3.66 20.04% 260.372 5148947.62 0.12889 0.04530 18
2.5.53-mm1 256 4.26 20.30% 382.981 3094442.29 0.20232 0.06323 21

There is another odd thing in some of the 2.5 ext3 results. Several of the
kernels show a jump in throughput at 256 threads.

Sequential Writes
Num Avg Maximum Lat% Lat% CPU
Identifier Thr Rate (CPU%) Latency Latency >2s >10s Eff
------------ --- ------ ------ --------- ----------- -------- -------- -----
2.5.56 1 53.17 69.99% 0.194 36612.36 0.00029 0.00005 76
2.5.56 2 2.53 4.728% 7.549 1219600.59 0.05112 0.00205 53
2.5.56 4 2.58 73.97% 15.141 823168.02 0.05078 0.01531 3
2.5.56 8 2.67 179.9% 29.981 641722.67 0.07091 0.04382 1
2.5.56 16 3.34 136.6% 47.075 1416051.92 0.11807 0.09304 2
2.5.56 32 2.93 124.8% 100.112 1842078.09 0.18826 0.14262 2
2.5.56 64 3.66 37.46% 147.693 4216304.67 0.12394 0.06661 10
2.5.56 128 4.01 17.11% 237.592 4194864.65 0.10777 0.05642 23
2.5.56 256 12.64 48.78% 353.895 3741404.43 0.10434 0.05335 26

2.4 has a more gentle degradation in throughput and max latency for seq writes
on ext3:

Sequential Writes
Num Avg Maximum Lat% Lat% CPU
Identifier Thr Rate (CPU%) Latency Latency >2s >10s Eff
------------- --- ------ ------ --------- ----------- -------- -------- -----
2.4.20-pre10 1 37.71 56.08% 0.288 4315.58 0.00000 0.00000 67
2.4.20-pre10 2 33.01 98.65% 0.592 5517.10 0.00010 0.00000 33
2.4.20-pre10 4 30.83 153.3% 1.162 3684.74 0.00000 0.00000 20
2.4.20-pre10 8 24.86 126.9% 2.523 7436.22 0.00058 0.00000 20
2.4.20-pre10 16 21.21 104.0% 4.893 9132.94 0.00992 0.00000 20
2.4.20-pre10 32 18.14 97.27% 10.394 13451.42 0.09843 0.00000 19
2.4.20-pre10 64 15.63 90.39% 22.679 18888.44 0.39897 0.00000 17
2.4.20-pre10 128 12.03 78.06% 54.387 31156.69 1.12638 0.00038 15
2.4.20-pre10 256 9.94 71.13% 134.323 61604.97 2.87437 0.03022 14

--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html


2003-01-16 06:21:18

by Andrew Morton

[permalink] [raw]
Subject: Re: big ext3 sequential write improvement in 2.5.51-mm1 gone in 2.5.53-mm1?

[email protected] wrote:
>
> On a quad xeon running tiobench...
> The throughput and max latency for ext3 sequential writes
> looks very good when threads >= 2 on 2.5.51-mm1.
>
> Did 2.5.51-mm1 mount ext3 as ext2? I have ext2 logs for
> 2.5.51-mm1 and they look similar to the ext3 results.
> The other 2.5 kernels from around that time look more
> like 2.5.53-mm1.
>

Dunno. There have been about 7,000 different versions of the I/O scheduler
in that time and it seems a bit prone to butterfly effects.

Or maybe you accidentally ran the 2.5.51-mm1 tests on uniprocessor?
Multithreaded tiobench on SMP brings out the worst behaviour in the ext2 and
ext3 block allocators. Look:

<start tiobench>
<wait a while>
<kill it all off>

quad:/mnt/sde5/tiobench> ls -ltr
...
-rw------- 1 akpm akpm 860971008 Jan 15 22:02 _956_tiotest.0
-rw------- 1 akpm akpm 840470528 Jan 15 22:03 _956_tiotest.1

OK, 800 megs.

quad:/mnt/sde5/tiobench> 0 bmap _956_tiotest.0|wc
199224 597671 6751187

wtf? It's taking 200,000 separate chunks of disk.

quad:/mnt/sde5/tiobench> expr 860971008 / 199224
4321

so the average chunk size is a little over 4k.

quad:/mnt/sde5/tiobench> 0 bmap _956_tiotest.0 | tail -50000 | head -10
149770-149770: 1845103-1845103 (1)
149771-149771: 1845105-1845105 (1)
149772-149772: 1845107-1845107 (1)
149773-149773: 1845109-1845109 (1)
149774-149774: 1845111-1845111 (1)
149775-149775: 1845113-1845113 (1)
149776-149776: 1845115-1845115 (1)
149777-149777: 1845117-1845117 (1)
149778-149778: 1845119-1845119 (1)
149779-149779: 1845121-1845121 (1)

lovely. These two files have perfectly intermingled blocks. Writeback
bandwdith goes from 20 megabytes per second to about 0.5.

It doesn't happen on uniprocessor because each tiobench instance gets to run
for a timeslice, during which it is able to allocate a decent number of
contiguous blocks.

ext2 has block preallocation and will intermingle in 32k units, not 4k units.
So it's still crap, only not so smelly.

Does it matter much in practice? Sometimes, not often.

It is crap? Yes.

Do I have time to do anything about it? Probably not.


2003-01-24 01:11:25

by Randy Hron

[permalink] [raw]
Subject: Re: big ext3 sequential write improvement in 2.5.51-mm1 gone in 2.5.53-mm1?

>> >lovely. These two files have perfectly intermingled blocks.

>> Writeback? or read?

> Both.

> The fileystems need fixing....

Did you add a secret sauce to 2.5.59-mm2? 10x sequential
write improvement on ext3 for multiple tiobench threads.

Quad P3 Xeon (4GB ram)
8 GB files
4K blocksize
32 threads
Rate = MB/sec
latency in milliseconds

Sequential Writes ext3
Avg Maximum Lat% Lat% CPU
Kernel Rate (CPU%) Latency Latency >2s >10s Eff
---------- ------------------------------------------------------------
2.4.20aa1 11.85 72.77% 11.814 21802.73 0.05036 0.00000 16
2.5.59 3.42 17.36% 83.976 3109518.52 0.11253 0.05088 20
2.5.59-mm2 32.39 34.28% 7.742 340597.62 0.04287 0.01765 94

Similar improvement for seq writes for 2, 4, 8, 16, 64, 128, 256 threads.

Sequential reads on ext3 with 2.5.59-mm2 improves around 3x for various
thread counts. Below is 32 threads.

Sequential Reads ext3
Avg Maximum Lat% Lat% CPU
Kernel Rate (CPU%) Latency Latency >2s >10s Eff
---------- ------------------------------------------------------------
2.4.20aa1 8.24 7.21% 28.587 449134.11 0.10395 0.07086 114
2.5.59 9.50 5.50% 36.703 4310.62 0.00000 0.00000 173
2.5.59-mm2 35.28 17.69% 10.173 18950.56 0.01010 0.00000 199


--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html

2003-01-24 02:01:17

by Andrew Morton

[permalink] [raw]
Subject: Re: big ext3 sequential write improvement in 2.5.51-mm1 gone in 2.5.53-mm1?

[email protected] wrote:
>
> Did you add a secret sauce to 2.5.59-mm2?

I have not been paying any attention to the I/O scheduler changes for a
couple of months, so I can't say exactly what caused this. Possibly Nick's
batch expiry logic which causes the scheduler to alternate between reading
and writing with fairly coarse granularity.

> 10x sequential write improvement on ext3 for multiple tiobench threads.

OK...

I _have_ been paying attention to the IO scheduler for the past few days.
-mm5 will have the first draft of the anticipatory IO scheduler. This of
course is yielding tremendous improvements in bandwidth when there are
competing reads and writes.

I expect it will take another week or two to get the I/O scheduler changes
really settled down. Your assistance in thoroughly benching that would be
appreciated.

> 2.4.20aa1 8.24 7.21% 28.587 449134.11 0.10395 0.07086 114
> 2.5.59 9.50 5.50% 36.703 4310.62 0.00000 0.00000 173
> 2.5.59-mm2 35.28 17.69% 10.173 18950.56 0.01010 0.00000 199

boggle.


2003-01-24 02:23:36

by Nick Piggin

[permalink] [raw]
Subject: Re: big ext3 sequential write improvement in 2.5.51-mm1 gone in 2.5.53-mm1?

Andrew Morton wrote:

>[email protected] wrote:
>
>>Did you add a secret sauce to 2.5.59-mm2?
>>
>
>I have not been paying any attention to the I/O scheduler changes for a
>couple of months, so I can't say exactly what caused this. Possibly Nick's
>batch expiry logic which causes the scheduler to alternate between reading
>and writing with fairly coarse granularity.
>
Yes, however tiobench doesn't mix the two. The batch_expire helps
probably by giving longer batches between servicing expired requests.
The deadline-np-42 patch also eliminates corner cases in which requests
could be starved for a long time. A large batch_expire as in mm2 is not
a good solution without my anticipatory scheduling stuff though as
writes really starve reads.

>
>
>> 10x sequential write improvement on ext3 for multiple tiobench threads.
>>
>
>OK...
>
>I _have_ been paying attention to the IO scheduler for the past few days.
>-mm5 will have the first draft of the anticipatory IO scheduler. This of
>course is yielding tremendous improvements in bandwidth when there are
>competing reads and writes.
>
>I expect it will take another week or two to get the I/O scheduler changes
>really settled down. Your assistance in thoroughly benching that would be
>appreciated.
>
>
>>2.4.20aa1 8.24 7.21% 28.587 449134.11 0.10395 0.07086 114
>>2.5.59 9.50 5.50% 36.703 4310.62 0.00000 0.00000 173
>>2.5.59-mm2 35.28 17.69% 10.173 18950.56 0.01010 0.00000 199
>>
>
>boggle.
>
I'm happy with that as long as they aren't too dependant on the phase of
the moon. The initial deadline scheduler had quite a lot of problems with
these workloads.

2003-01-24 21:04:08

by Randy Hron

[permalink] [raw]
Subject: Re: big ext3 sequential write improvement in 2.5.51-mm1 gone in 2.5.53-mm1?

> qsbench isn't really a thing which should be optimised for.

The way I run qsbench simulates an uncommon workload.

> It is important to specify how much memory you have, and how you are
> invoking qsbench.

There is 3.75 GB of ram. I grab MemTotal from /proc/meminfo, and run
4 qsbench processes. Each qsbench uses 30% of MemTotal (1089 megs).

--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html

2003-01-24 21:11:43

by Andrew Morton

[permalink] [raw]
Subject: Re: big ext3 sequential write improvement in 2.5.51-mm1 gone in 2.5.53-mm1?

[email protected] wrote:
>
> > It is important to specify how much memory you have, and how you are
> > invoking qsbench.
>
> There is 3.75 GB of ram. I grab MemTotal from /proc/meminfo, and run
> 4 qsbench processes. Each qsbench uses 30% of MemTotal (1089 megs).

Yes, 2.5 sucks at that. Run `top', observe how in 2.4, one qsbench instance
grabs all the CPU time, then exits. The remaining three can now complete
with no swapout at all..