2003-06-05 02:35:15

by Randy Hron

[permalink] [raw]
Subject: [BENCHMARK] AIM7 fserver regressed in 2.5.70*


Summary:
AIM7 fileserver workload behaviour changed with 2.5.70.
At low task counts (load average), 2.5.70* takes 40%
longer than 2.5.69. As task count increases, regression
disappears.

Hardware has (4) 700 mhz P3 Xeons.
3.75 GB RAM
RAID 0 LUN (hardware raid)

Background:
AIM7 fserver is the only regressed workload. In general,
2.5.70* has better numbers than 2.5.69* for a variety of
benchmarks.

Part of the improvement in 2.5.70 I/O benchmarks is
from a fiber channel configuration change. 2.5.70* has
two online fiber channels. Earlier kernels had only one
fiber channel online.

Tiobench and bonnie++ show about 10% improvement.
LMbench microbenchmarks are generally improving or stable
in recent 2.5.x.

So, it's strange that AIM7 fserver is regressed.

The kernels below are listed in chronological order.

AIM7 database workloads show a nice 27-30% improvement
with 2.5.70*.

Real and CPU are time in seconds.

AIM7 dbase workload
kernel Tasks Jobs/Min Real CPU
2.5.69 32 477.6 398.0 149.3
2.5.69-bk1 32 476.3 399.0 143.1
2.5.69-mm3 32 560.5 339.1 159.3
2.5.69-mm5 32 560.9 338.9 164.0
2.5.70 32 611.1 311.0 153.0
2.5.70-mjb1 32 606.4 313.5 159.2
2.5.70-mm3 32 685.8 277.2 161.8


2.5.69 256 769.9 1975.2 982.2
2.5.69-bk1 256 768.0 1979.9 977.9
2.5.69-mm3 256 906.5 1677.5 1132.8
2.5.69-mm5 256 909.5 1671.9 1088.4
2.5.70 256 1042.7 1458.4 1036.3
2.5.70-mjb1 256 1030.0 1476.4 1049.0
2.5.70-mm3 256 1186.4 1281.8 1066.8


AIM7 fileserver is regressed about 40% at 4 tasks.
As the task load increases, the regression becomes less.
At 32 tasks, 2.5.70* is even or ahead of 2.5.69*.

AIM7 fserver workload
kernel Tasks Jobs/Min Real CPU
2.5.69 4 120.9 200.5 32.8
2.5.69-bk1 4 122.3 198.2 33.8
2.5.69-mm3 4 122.3 198.3 37.9
2.5.69-mm5 4 124.0 195.5 38.0
2.5.70 4 79.0 306.9 34.2
2.5.70-mjb1 4 83.4 290.8 33.6
2.5.70-mm3 4 71.7 338.0 34.9
2.5.70-mm4 4 73.9 328.0 33.9


2.5.69 8 174.7 277.5 61.1
2.5.69-bk1 8 175.8 275.8 64.1
2.5.69-mm3 8 179.4 270.2 65.7
2.5.69-mm5 8 184.3 263.0 66.3
2.5.70 8 136.6 354.9 58.8
2.5.70-mjb1 8 137.3 353.0 57.0
2.5.70-mm3 8 123.9 391.3 58.7
2.5.70-mm4 8 118.4 409.4 57.5


2.5.69 32 234.3 827.6 221.8
2.5.69-bk1 32 236.0 821.8 220.8
2.5.69-mm3 32 253.8 764.0 246.6
2.5.69-mm5 32 254.8 761.0 248.3
2.5.70 32 239.7 809.1 219.4
2.5.70-mjb1 32 248.9 779.2 226.3
2.5.70-mm3 32 231.3 838.4 224.9

AIM7 shared has a similar behavior. At 64 tasks, 2.5.69*
is close to or a little ahead of 2.5.70*.

AIM7 shared workload
kernel Tasks Jobs/Min Real CPU
2.5.69 64 2121.2 175.6 167.5
2.5.69-bk1 64 2096.7 177.7 168.6
2.5.69-mm3 64 2422.3 153.8 179.2
2.5.69-mm5 64 2429.0 153.3 178.3
2.5.70 64 2123.1 175.4 170.0
2.5.70-mjb1 64 2163.9 172.1 175.5
2.5.70-mm3 64 2186.9 170.3 175.7

2.5.69 128 2257.8 329.9 333.9
2.5.69-bk1 128 2269.9 328.2 333.1
2.5.69-mm3 128 2700.6 275.9 352.9
2.5.69-mm5 128 2697.8 276.1 352.8
2.5.70 128 2410.4 309.1 338.2
2.5.70-mjb1 128 2580.1 288.7 354.7
2.5.70-mm3 128 2705.2 275.4 350.6

By 512 tasks, 2.5.70* is AIM7 shared is ahead of 2.5.69*
by 14-20%.

2.5.69 512 2314.7 1287.3 1369.2
2.5.69-bk1 512 2319.4 1284.7 1370.5
2.5.69-mm3 512 2574.1 1157.6 1457.7
2.5.69-mm5 512 2698.3 1104.3 1481.0
2.5.70 512 2788.0 1068.8 1399.1
2.5.70-mjb1 512 2607.6 1142.8 1670.8
2.5.70-mm3 512 3075.9 968.8 1462.8


--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html


2003-06-05 03:21:40

by Andrew Morton

[permalink] [raw]
Subject: Re: [BENCHMARK] AIM7 fserver regressed in 2.5.70*

[email protected] wrote:
>
> Summary:
> AIM7 fileserver workload behaviour changed with 2.5.70.
> At low task counts (load average), 2.5.70* takes 40%
> longer than 2.5.69. As task count increases, regression
> disappears.
>
> Hardware has (4) 700 mhz P3 Xeons.
> 3.75 GB RAM
> RAID 0 LUN (hardware raid)
>
> Background:
> AIM7 fserver is the only regressed workload. In general,
> 2.5.70* has better numbers than 2.5.69* for a variety of
> benchmarks.
>
> Part of the improvement in 2.5.70 I/O benchmarks is
> from a fiber channel configuration change. 2.5.70* has
> two online fiber channels. Earlier kernels had only one
> fiber channel online.
>
> Tiobench and bonnie++ show about 10% improvement.
> LMbench microbenchmarks are generally improving or stable
> in recent 2.5.x.

I'd assume that the improvements would be wholly due to the
IO controller changes? Are you saying that there is something
else involved?

If you could share the jobfile and means-to-reproduce I can
take a look, thanks.

2003-06-05 04:47:25

by Nick Piggin

[permalink] [raw]
Subject: Re: [BENCHMARK] AIM7 fserver regressed in 2.5.70*



[email protected] wrote:

>Summary:
>AIM7 fileserver workload behaviour changed with 2.5.70.
>At low task counts (load average), 2.5.70* takes 40%
>longer than 2.5.69. As task count increases, regression
>disappears.
>
>Hardware has (4) 700 mhz P3 Xeons.
>3.75 GB RAM
>RAID 0 LUN (hardware raid)
>
>Background:
>AIM7 fserver is the only regressed workload. In general,
>2.5.70* has better numbers than 2.5.69* for a variety of
>benchmarks.
>
>
[snip]

>AIM7 fserver workload
>kernel Tasks Jobs/Min Real CPU
>2.5.69 4 120.9 200.5 32.8
>2.5.69-bk1 4 122.3 198.2 33.8
>2.5.69-mm3 4 122.3 198.3 37.9
>2.5.69-mm5 4 124.0 195.5 38.0
>
^^^^^^
I think this was the last kernel Joel tested before a
similar magnitude dropoff in WimMark performance.

>
>2.5.70 4 79.0 306.9 34.2
>2.5.70-mjb1 4 83.4 290.8 33.6
>2.5.70-mm3 4 71.7 338.0 34.9
>2.5.70-mm4 4 73.9 328.0 33.9
>
>

I don't know what sort of disk IO fserver does, but it
could be the same problem.

2003-06-05 10:32:41

by Randy Hron

[permalink] [raw]
Subject: Re: [BENCHMARK] AIM7 fserver regressed in 2.5.70*

On Wed, 4 Jun 2003 20:35:09 -0700 Andrew Morton wrote;
> I'd assume that the improvements would be wholly due to the
> IO controller changes? Are you saying that there is something
> else involved?

I don't know. The IO controller changes could be for redundancy,
or for performance. It may be the kernel too. I mention
the controller changes because it's a real hardware difference.

On Thu, 05 Jun 2003 15:00:22 +1000 Nick Piggin wrote:
> I don't know what sort of disk IO fserver does, but it
> could be the same problem.

AIM7 dbase (improved in 2.5.70*) is about 11% synchronous IO,
AIM7 fserver is about 2.5% synchronous IO.
AIM7 dbase is about 15% regular IO, fserver is 30% regular IO.
AIM7 fserver has the additional creat-clo, dir_rtns_1 loads.
AIM7 fserver may be more seeky than the dbase load.

dbase workload:
load percentage of workload
add_int 3.636
add_long 3.636
add_short 3.636
disk_rd 7.273
disk_rr 7.273
div_int 1.818
div_long 1.818
div_short 1.818
jmp_test 1.818
mem_rtns_1 7.273
mem_rtns_2 7.273
mul_int 1.818
mul_long 1.818
mul_short 1.818
page_test 7.273
ram_copy 3.636
shared_memory 7.273
sieve 5.455
sort_rtns_1 5.455
stream_pipe 1.818
string_rtns 5.455
sync_disk_rw 5.455
sync_disk_update 5.455

fserver workload
load percentage of workload
add_int 3.361
add_long 3.361
add_short 3.361
creat-clo 3.361
dir_rtns_1 3.361
disk_cp 5.042
disk_rd 5.042
disk_rr 5.042
disk_rw 5.042
disk_src 5.042
disk_wrt 5.042
div_int 1.681
div_long 1.681
div_short 1.681
jmp_test 1.681
link_test 3.361
mem_rtns_1 6.723
mem_rtns_2 1.681
misc_rtns_1 3.361
mul_int 1.681
mul_long 1.681
mul_short 1.681
ram_copy 3.361
signal_test 1.681
sort_rtns_1 5.042
string_rtns 5.042
sync_disk_cp 0.840
sync_disk_rw 0.840
sync_disk_wrt 0.840
tcp_test 1.681
udp_test 6.723




> If you could share the jobfile and means-to-reproduce I can
> take a look, thanks.

The underlying filesystem was ext2.


This is the AIM7 fserver workload:
# @(#) workfile.fserver:1.3 1/22/96 00:00:00
# Fileserver Mix
FILESIZE: 10M
POOLSIZE: 20M
20 add_int
20 add_long
20 add_short
20 creat-clo
20 dir_rtns_1
30 disk_cp
30 disk_rd
30 disk_rr
30 disk_rw
30 disk_src
30 disk_wrt
10 div_int
10 div_long
10 div_short
10 jmp_test
20 link_test
40 mem_rtns_1
10 mem_rtns_2
20 misc_rtns_1
10 mul_int
10 mul_long
10 mul_short
20 ram_copy
10 signal_test
30 sort_rtns_1
30 string_rtns
5 sync_disk_cp
5 sync_disk_rw
5 sync_disk_wrt
10 tcp_test
40 udp_test


The script to run AIM7 looks like this:

#!/bin/bash
# Run fserver once with load 4-32 increment by 4
export input=input.fserver
cat >$input<<EOF
$(hostname)
$(uname -r)-4xP3/3.75GB/SCSI-raid5
1
4
2
32
4
EOF



for n in fserver
do
echo "AIM7 $w workload"
cp -p workfile.${w} workfile
# suite7.ss is appended to by default.
# zero it out before run
>suite7.ss
# -N newtonian adapter for crossover
# -t normal timing adapter
# -nl no log
# multitask -N -nl < $input
SECONDS=0
multitask -t < input.${w}
echo "AIM7 $w completed in $SECONDS seconds"
cp -p suite7.ss aimresults/suite7-${w}-$(uname -r).ss
done


iozone writes for a 64 k file has a small regression in 2.5.70.
rewrite, read, reread are improved in 2.5.70.

KB reclen write rewrite read reread
2.5.70 64 4 227773 504003 695900 876083
2.5.70 64 8 309248 556588 810228 914439
2.5.70 64 16 321593 587073 790141 864828
2.5.70 64 32 304694 604016 820049 876206
2.5.70 64 64 239686 537800 842702 941127

KB reclen write rewrite read reread
2.5.69-bk1 64 4 292251 488642 659659 820706
2.5.69-bk1 64 8 316922 547154 810303 913709
2.5.69-bk1 64 16 328129 586978 770969 877370
2.5.69-bk1 64 32 316906 621315 809981 941515
2.5.69-bk1 64 64 287051 639902 852688 970454


For a 5 MB file, iozone write regression is smaller.

KB reclen write rewrite read reread
2.5.70 524288 4 153271 236749 311157 320316
2.5.70 524288 8 174955 241503 326604 336004
2.5.70 524288 16 180028 242359 342079 352546
2.5.70 524288 32 176625 235474 350006 360833
2.5.70 524288 64 165770 215484 354577 365812
2.5.70 524288 128 152281 191098 357021 368434
2.5.70 524288 256 141154 173406 353768 365583
2.5.70 524288 512 137654 168346 294428 302423
2.5.70 524288 1024 136510 167359 229127 233676
2.5.70 524288 2048 136923 167848 187121 190085
2.5.70 524288 4096 136880 167633 175851 178419
2.5.70 524288 8192 136938 167755 174985 177523
2.5.70 524288 16384 136868 167449 174699 177208


KB reclen write rewrite read reread
2.5.69-bk1 524288 4 162156 229090 301785 310414
2.5.69-bk1 524288 8 172918 236818 321115 330003
2.5.69-bk1 524288 16 175961 238366 337755 348203
2.5.69-bk1 524288 32 172919 229970 346889 357655
2.5.69-bk1 524288 64 163741 211673 351416 362329
2.5.69-bk1 524288 128 150270 188487 354342 365391
2.5.69-bk1 524288 256 138487 170444 354947 366019
2.5.69-bk1 524288 512 136009 166210 288743 296204
2.5.69-bk1 524288 1024 134791 164132 222318 226508
2.5.69-bk1 524288 2048 134851 163984 184880 187753
2.5.69-bk1 524288 4096 134929 164422 173763 176235
2.5.69-bk1 524288 8192 134810 164287 173071 175598
2.5.69-bk1 524288 16384 134854 164087 173027 175449



--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html