2002-11-24 21:16:32

by Paolo Ciarrocchi

[permalink] [raw]
Subject: [Benchmark] AIM results

Hi all,
I've ran the AIM benchmark against 2.4.19 and 2.5.49 on my laptop (PIII@800 Reiserfs)

add_double 10010 29.4705 530469.53 Thousand Double Precision Additions/second
add_double 10040 29.1833 525298.80 Thousand Double Precision Additions/second

add_float 10000 44.2 530400.00 Thousand Single Precision Additions/second
add_float 10000 43.8 525600.00 Thousand Single Precision Additions/second

add_long 10010 27.2727 1636363.64 Thousand Long Integer Additions/second
add_long 10020 27.0459 1622754.49 Thousand Long Integer Additions/second

add_int 10000 27.2 1632000.00 Thousand Integer Additions/second
add_int 10030 26.9192 1615154.54 Thousand Integer Additions/second

add_short 10000 68.2 1636800.00 Thousand Short Integer Additions/second
add_short 10000 67.6 1622400.00 Thousand Short Integer Additions/second

creat-clo 10020 23.7525 23752.50 File Creations and Closes/second
creat-clo 10030 18.9432 18943.17 File Creations and Closes/second
^^^^^ Here 2.4.19 is faster then 2.5.49

page_test 10000 152.9 259930.00 System Allocations & Pages/second
page_test 10010 128.971 219250.75 System Allocations & Pages/second
^^^^^ Here 2.4.19 is faster then 2.5.49

brk_test 10000 63.5 1079500.00 System Memory Allocations/second
brk_test 10010 53.9461 917082.92 System Memory Allocations/second
^^^^^ Here 2.4.19 is faster then 2.5.49

jmp_test 10000 5308.5 5308500.00 Non-local gotos/second
jmp_test 10000 5261.6 5261600.00 Non-local gotos/second

signal_test 10000 204.5 204500.00 Signal Traps/second
signal_test 10000 146.5 146500.00 Signal Traps/second
^^^^^ Here 2.4.19 is faster then 2.5.49

exec_test 10010 16.6833 83.42 Program Loads/second
exec_test 10030 15.5533 77.77 Program Loads/second
^^^^^ Here 2.4.19 is faster then 2.5.49

fork_test 10000 56.2 5620.00 Task Creations/second
fork_test 10010 32.967 3296.70 Task Creations/second
^^^^^ Here 2.4.19 is faster then 2.5.49

link_test 10000 224.1 14118.30 Link/Unlink Pairs/second
link_test 10000 125.7 7919.10 Link/Unlink Pairs/second
^^^^^ Here 2.4.19 is faster then 2.5.49

disk_rr 10110 7.71513 39501.48 Random Disk Reads (K)/second
disk_rr 10010 7.89211 40407.59 Random Disk Reads (K)/second

disk_rw 10090 6.44202 32983.15 Random Disk Writes (K)/second
disk_rw 10030 7.17846 36753.74 Random Disk Writes (K)/second
^^^^^ Here 2.5.49 is faster then 2.4.19

disk_rd 10010 38.1618 195388.61 Sequential Disk Reads (K)/second
disk_rd 10000 38 194560.00 Sequential Disk Reads (K)/second

disk_wrt 10070 9.63257 49318.77 Sequential Disk Writes (K)/second
disk_wrt 10060 9.94036 50894.63 Sequential Disk Writes (K)/second

disk_cp 10040 8.26693 42326.69 Disk Copies (K)/second
disk_cp 10040 8.06773 41306.77 Disk Copies (K)/second

sync_disk_rw 15080 0.066313 169.76 Sync Random Disk Writes (K)/second
sync_disk_rw 14310 0.0698812 178.90 Sync Random Disk Writes (K)/second
^^^^^ Here 2.5.49 is faster then 2.4.19

sync_disk_wrt 11170 0.0895255 229.19 Sync Sequential Disk Writes (K)/second
sync_disk_wrt 10100 0.0990099 253.47 Sync Sequential Disk Writes (K)/second
^^^^^ Here 2.5.49 is faster then 2.4.19

sync_disk_cp 11020 0.0907441 232.30 Sync Disk Copies (K)/second
sync_disk_cp 10010 0.0999001 255.74 Sync Disk Copies (K)/second
^^^^^ Here 2.5.49 is faster then 2.4.19

disk_src 10000 154.2 11565.00 Directory Searches/second
disk_src 10000 141.8 10635.00 Directory Searches/second

div_double 10000 30 90000.00 Thousand Double Precision Divides/second
div_double 10010 29.7702 89310.69 Thousand Double Precision Divides/second

div_float 10020 30.0399 90119.76 Thousand Single Precision Divides/second
div_float 10020 29.7405 89221.56 Thousand Single Precision Divides/second

div_long 10020 24.5509 22095.81 Thousand Long Integer Divides/second
div_long 10020 24.3513 21916.17 Thousand Long Integer Divides/second

div_int 10020 24.5509 22095.81 Thousand Integer Divides/second
div_int 10030 24.327 21894.32 Thousand Integer Divides/second

div_short 10010 24.5754 22117.88 Thousand Short Integer Divides/second
div_short 10020 24.3513 21916.17 Thousand Short Integer Divides/second

fun_cal 10000 76.9 39372800.00 Function Calls (no arguments)/second
fun_cal 10010 76.2238 39026573.43 Function Calls (no arguments)/second

fun_cal1 10000 209.7 107366400.00 Function Calls (1 argument)/second
fun_cal1 10010 207.792 106389610.39 Function Calls (1 argument)/second

fun_cal2 10000 138.4 70860800.00 Function Calls (2 arguments)/second
fun_cal2 10000 137.2 70246400.00 Function Calls (2 arguments)/second

fun_cal15 10010 41.958 21482517.48 Function Calls (15 arguments)/second
fun_cal15 10000 41.6 21299200.00 Function Calls (15 arguments)/second

sieve 10330 0.871249 4.36 Integer Sieves/second
sieve 10590 0.849858 4.25 Integer Sieves/second

mul_double 10030 26.5204 318245.26 Thousand Double Precision Multiplies/second
mul_double 10030 25.5234 306281.16 Thousand Double Precision Multiplies/second

mul_float 10030 26.5204 318245.26 Thousand Single Precision Multiplies/second
mul_float 10010 25.7742 309290.71 Thousand Single Precision Multiplies/second

mul_long 10000 1166.4 279936.00 Thousand Long Integer Multiplies/second
mul_long 10000 1156.9 277656.00 Thousand Long Integer Multiplies/second

mul_int 10000 1171.6 281184.00 Thousand Integer Multiplies/second
mul_ine 10000 1162.1 278904.00 Thousand Integer Multiplies/second

mul_short 10000 934.3 280290.00 Thousand Short Integer Multiplies/second
mul_short 10000 927.4 278220.00 Thousand Short Integer Multiplies/second

num_rtns_1 10000 575.2 57520.00 Numeric Functions/second
num_rtns_1 10000 571.5 57150.00 Numeric Functions/second

trig_rtns 10010 35.2647 352647.35 Trigonometric Functions/second
trig_rtns 10020 34.9301 349301.40 Trigonometric Functions/second

matrix_rtns 10000 7337.6 733760.00 Point Transformations/second
matrix_rtns 10000 7274.4 727440.00 Point Transformations/second

array_rtns 10030 16.8495 336.99 Linear Systems Solved/second
array_rtns 10050 16.6169 332.34 Linear Systems Solved/second

string_rtns 10050 11.1443 1114.43 String Manipulations/second
string_rtns 10060 11.0338 1103.38 String Manipulations/second

mem_rtns_1 10020 32.3353 970059.88 Dynamic Memory Operations/second
mem_rtns_1 10030 30.5085 915254.24 Dynamic Memory Operations/second

mem_rtns_2 10000 2009.5 200950.00 Block Memory Operations/second
mem_rtns_2 10000 1992.3 199230.00 Block Memory Operations/second

sort_rtns_1 10000 41.4 414.00 Sort Operations/second
sort_rtns_1 10010 40.8591 408.59 Sort Operations/second

misc_rtns_1 10000 951.9 9519.00 Auxiliary Loops/second
misc_rtns_1 10000 870 8700.00 Auxiliary Loops/second

dir_rtns_1 10000 105.5 1055000.00 Directory Operations/second
dir_rtns_1 10000 91.1 911000.00 Directory Operations/second
^^^^^ Here 2.4.49 is faster then 2.4.19

shell_rtns_1 10000 31.3 31.30 Shell Scripts/second
shell_rtns_1 10010 28.5714 28.57 Shell Scripts/second

shell_rtns_2 10010 31.3686 31.37 Shell Scripts/second
shell_rtns_2 10030 28.6142 28.61 Shell Scripts/second

shell_rtns_3 10010 31.3686 31.37 Shell Scripts/second
shell_rtns_3 10020 28.6427 28.64 Shell Scripts/second

series_1 10000 39306.1 3930610.00 Series Evaluations/second
series_1 10000 38976.2 3897620.00 Series Evaluations/second

shared_memory 10000 2742.2 274220.00 Shared Memory Operations/second
shared_memory 10000 2360.6 236060.00 Shared Memory Operations/second
^^^^^ Here 2.4.19 is faster then 2.5.49

tcp_test 10000 805.5 72495.00 TCP/IP Messages/second
tcp_test 10000 660.7 59463.00 TCP/IP Messages/second
^^^^^ Here 2.4.19 is faster then 2.5.49 (debug?)

udp_test 10000 1448.6 144860.00 UDP/IP DataGrams/second
udp_test 10000 1115.7 111570.00 UDP/IP DataGrams/second
^^^^^ Here 2.4.19 is faster then 2.5.49 (debug?)

fifo_test 10000 1568.7 156870.00 FIFO Messages/second
fifo_test 10000 1041.6 104160.00 FIFO Messages/second
^^^^^ Here 2.4.19 is faster then 2.5.49

stream_pipe 10000 2807.4 280740.00 Stream Pipe Messages/second
stream_pipe 10000 2602.3 260230.00 Stream Pipe Messages/second
^^^^^ Here 2.4.19 is faster then 2.5.49

dgram_pipe 10000 2756.9 275690.00 DataGram Pipe Messages/second
dgram_pipe 10000 2460.5 246050.00 DataGram Pipe Messages/second
^^^^^ Here 2.4.19 is faster then 2.5.49

pipe_cpy 10000 4164.8 416480.00 Pipe Messages/second
pipe_cpy 10000 3736.4 373640.00 Pipe Messages/second
^^^^^ Here 2.4.19 is faster then 2.5.49

ram_copy 10000 23801.6 595516032.00 Memory to Memory Copy/second
ram_copy 10000 23583 590046660.00 Memory to Memory Copy/second

Ciao,
Paolo
--
______________________________________________
http://www.linuxmail.org/
Now with POP3/IMAP access for only US$19.95/yr

Powered by Outblaze


2002-11-25 10:38:49

by Andrew Morton

[permalink] [raw]
Subject: Re: [Benchmark] AIM results

Paolo Ciarrocchi wrote:
>
> Hi all,
> I've ran the AIM benchmark against 2.4.19 and 2.5.49 on my laptop (PIII@800 Reiserfs)

AIM9, I assume.

It's a rather dumb benchmark, but fun. Lots of really tiny
microbenchmarks, easy to see what's going on.

Some of the codepaths which it touches in 2.5 have had new stuff
added to them - things like the pagecache read and write functions
have additional setup for readv and writev (which got sped up tons,
but is not tested here). And additional layering for AIO support.

These things hurt when your benchmark is timing how long it takes to
write to a file in 1 kbyte chunks. This is not really a thing which
we should optimise for.

But still, there's some stuff here which can be fixed; let's go through
it.

> creat-clo 10020 23.7525 23752.50 File Creations and Closes/second
> creat-clo 10030 18.9432 18943.17 File Creations and Closes/second
> ^^^^^ Here 2.4.19 is faster then 2.5.49

This loops, creating and deleting a file. Adding some optimisation
to handle the removal of a zero-length file seems worthwhile. That
sped it up by 30%-odd.

> page_test 10000 152.9 259930.00 System Allocations & Pages/second
> page_test 10010 128.971 219250.75 System Allocations & Pages/second
> ^^^^^ Here 2.4.19 is faster then 2.5.49

This found a bug in 2.5. The deferred-lru-addition queues were preventing
the hot-n-cold page queues from working right. I fixed that up and it's
almost running at the same speed as 2.4. But the code paths are longer...

This is an area where smp optimisations cost uniprocessors a little
bit. We're slightly slower on UP, but when running four instances of
this test on 4-CPU, 2.5 is more than twice the speed of 2.4.

We are still taking a very big hit in vm_enough_memory's call to
get_page_state(). Changing the overcommit mode can further speed
up 2.5, but we need to fix this properly somehow.

> brk_test 10000 63.5 1079500.00 System Memory Allocations/second
> brk_test 10010 53.9461 917082.92 System Memory Allocations/second
> ^^^^^ Here 2.4.19 is faster then 2.5.49

Same deal as page_test.

> signal_test 10000 204.5 204500.00 Signal Traps/second
> signal_test 10000 146.5 146500.00 Signal Traps/second
> ^^^^^ Here 2.4.19 is faster then 2.5.49

Mostly fixed in 2.5.49-mm1

> exec_test 10010 16.6833 83.42 Program Loads/second
> exec_test 10030 15.5533 77.77 Program Loads/second
> ^^^^^ Here 2.4.19 is faster then 2.5.49

Can't do a lot about that.

> fork_test 10000 56.2 5620.00 Task Creations/second
> fork_test 10010 32.967 3296.70 Task Creations/second
> ^^^^^ Here 2.4.19 is faster then 2.5.49

Or that.

> link_test 10000 224.1 14118.30 Link/Unlink Pairs/second
> link_test 10000 125.7 7919.10 Link/Unlink Pairs/second
> ^^^^^ Here 2.4.19 is faster then 2.5.49

I was seeing that difference for a while, then it went away. I
suspect something is broken in the test. In particular, it seems
to leave files behind in the test directory, so subsequent runs
have to search past more files.

I sped ext2 up a bit for this, but there's not much difference
from 2.4.

> ...
>
> dir_rtns_1 10000 105.5 1055000.00 Directory Operations/second
> dir_rtns_1 10000 91.1 911000.00 Directory Operations/second
> ^^^^^ Here 2.4.49 is faster then 2.4.19

No, 2.4 was faster than 2.5. The uninlining of the usercopy
functions costs a bit here.

The code in fs/readdir.c was quite inefficient, and buggy - lots
of unchecked copy_to_users. I fixed all that up and sped it up
by 50%.


> shell_rtns_1 10000 31.3 31.30 Shell Scripts/second
> shell_rtns_1 10010 28.5714 28.57 Shell Scripts/second
>
> shell_rtns_2 10010 31.3686 31.37 Shell Scripts/second
> shell_rtns_2 10030 28.6142 28.61 Shell Scripts/second
>
> shell_rtns_3 10010 31.3686 31.37 Shell Scripts/second
> shell_rtns_3 10020 28.6427 28.64 Shell Scripts/second
>
> series_1 10000 39306.1 3930610.00 Series Evaluations/second
> series_1 10000 38976.2 3897620.00 Series Evaluations/second
>
> shared_memory 10000 2742.2 274220.00 Shared Memory Operations/second
> shared_memory 10000 2360.6 236060.00 Shared Memory Operations/second
> ^^^^^ Here 2.4.19 is faster then 2.5.49

20% of the cost here is just the syscall entry code. This is
a nanobenchmark.

We added some scalability changes here and yes, uniprocessors
have taken a 5% hit from that. But running four instances of
this test on 4-way, 2.5 is 250% faster than 2.4.

> tcp_test 10000 805.5 72495.00 TCP/IP Messages/second
> tcp_test 10000 660.7 59463.00 TCP/IP Messages/second
> ^^^^^ Here 2.4.19 is faster then 2.5.49 (debug?)
>
> udp_test 10000 1448.6 144860.00 UDP/IP DataGrams/second
> udp_test 10000 1115.7 111570.00 UDP/IP DataGrams/second
> ^^^^^ Here 2.4.19 is faster then 2.5.49 (debug?)

Not sure what's going on here, really. Lots of tiny TCP and UDP
copies to localhost. The profiles are splattered all over the place.
Networking just generally seems to have increased its cache footprint.

> fifo_test 10000 1568.7 156870.00 FIFO Messages/second
> fifo_test 10000 1041.6 104160.00 FIFO Messages/second
> ^^^^^ Here 2.4.19 is faster then 2.5.49

Not on my machine. 2.5.49-mm1 is 20% faster than 2.4. This is the
sort of thing which shows much variability.

2002-11-25 10:47:33

by bert hubert

[permalink] [raw]
Subject: Re: [Benchmark] AIM results

On Mon, Nov 25, 2002 at 02:45:54AM -0800, Andrew Morton wrote:


> > tcp_test 10000 805.5 72495.00 TCP/IP Messages/second
> > tcp_test 10000 660.7 59463.00 TCP/IP Messages/second
> > ^^^^^ Here 2.4.19 is faster then 2.5.49 (debug?)
> >
> > udp_test 10000 1448.6 144860.00 UDP/IP DataGrams/second
> > udp_test 10000 1115.7 111570.00 UDP/IP DataGrams/second
> > ^^^^^ Here 2.4.19 is faster then 2.5.49 (debug?)
>
> Not sure what's going on here, really. Lots of tiny TCP and UDP
> copies to localhost. The profiles are splattered all over the place.
> Networking just generally seems to have increased its cache footprint.

Dave has said there is debugging code in 2.5 that slows down traffic on
localhost.

Regards,

bert

--
http://www.PowerDNS.com Versatile DNS Software & Services
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO

2002-11-25 10:59:37

by Andrew Morton

[permalink] [raw]
Subject: Re: [Benchmark] AIM results

bert hubert wrote:
>
> On Mon, Nov 25, 2002 at 02:45:54AM -0800, Andrew Morton wrote:
>
> > > tcp_test 10000 805.5 72495.00 TCP/IP Messages/second
> > > tcp_test 10000 660.7 59463.00 TCP/IP Messages/second
> > > ^^^^^ Here 2.4.19 is faster then 2.5.49 (debug?)
> > >
> > > udp_test 10000 1448.6 144860.00 UDP/IP DataGrams/second
> > > udp_test 10000 1115.7 111570.00 UDP/IP DataGrams/second
> > > ^^^^^ Here 2.4.19 is faster then 2.5.49 (debug?)
> >
> > Not sure what's going on here, really. Lots of tiny TCP and UDP
> > copies to localhost. The profiles are splattered all over the place.
> > Networking just generally seems to have increased its cache footprint.
>
> Dave has said there is debugging code in 2.5 that slows down traffic on
> localhost.

Yes, but that is not apparent in the profiles with this test. It
really is just all over the map. It looks like just longer code
paths touching more memory.

Here's the uniproc profile for tcp_test. This sends small packets
to localhost - various sizes up to 512 bytes. The instruction-level
profile showed nothing obvious either.



c026cc5c 81 0.952493 tcp_v4_send_check
c011f06c 83 0.976011 mod_timer
c02593e0 84 0.98777 ip_output
c0248344 86 1.01129 skb_clone
c026870c 86 1.01129 __tcp_select_window
c026d934 88 1.03481 tcp_v4_do_rcv
c010ef20 95 1.11712 do_gettimeofday
c013ea2c 95 1.11712 vfs_write
c024c13c 95 1.11712 net_rx_action
c026bb78 97 1.14064 __tcp_v4_lookup_established
c01cad6c 100 1.17592 __copy_user_intel
c0247f68 106 1.24647 skb_head_to_pool
c02480ec 106 1.24647 skb_headerinit
c0256e90 109 1.28175 ip_local_deliver
c011c5e0 111 1.30527 do_softirq
c025d6b4 111 1.30527 tcp_push
c0247fb8 118 1.38758 alloc_skb
c01cae10 119 1.39934 __copy_user_zeroing_intel
c0259148 119 1.39934 ip_finish_output2
c0264288 120 1.4111 tcp_ack
c02684d0 127 1.49341 tcp_write_xmit
c010a99c 137 1.61101 system_call
c024b998 137 1.61101 dev_queue_xmit
c0247f18 151 1.77563 skb_head_from_pool
c0250fac 151 1.77563 eth_type_trans
c024bf14 153 1.79915 netif_receive_skb
c01fff88 163 1.91675 loopback_xmit
c0263d18 167 1.96378 tcp_clean_rtx_queue
c0130b4c 178 2.09313 kmalloc
c024bc8c 178 2.09313 netif_rx
c024c03c 182 2.14017 process_backlog
c0266290 203 2.38711 tcp_rcv_established
c0256fec 239 2.81044 ip_rcv
c026da7c 251 2.95155 tcp_v4_rcv
c0259458 287 3.37488 ip_queue_xmit
c025f46c 309 3.63358 tcp_recvmsg
c025e27c 317 3.72766 tcp_sendmsg
c02674c4 376 4.42145 tcp_transmit_skb

2002-11-25 11:34:15

by Paolo Ciarrocchi

[permalink] [raw]
Subject: Re: [Benchmark] AIM results

From: Andrew Morton <[email protected]>
> Paolo Ciarrocchi wrote:
> >
> > Hi all,
> > I've ran the AIM benchmark against 2.4.19 and 2.5.49 on my laptop (PIII@800 Reiserfs)
>
> AIM9, I assume.

Yes

> It's a rather dumb benchmark, but fun. Lots of really tiny
> microbenchmarks, easy to see what's going on.

I can run it for every 2.5.* linus will release.
Do you think it is a good idea or just a waste of time ?

Paolo


--
______________________________________________
http://www.linuxmail.org/
Now with POP3/IMAP access for only US$19.95/yr

Powered by Outblaze

2002-11-25 17:30:35

by Timothy D. Witham

[permalink] [raw]
Subject: Re: [Benchmark] AIM results

I think that as a point of comparison it has value. That is
one of the reasons that these are automatically run on STP.


http://www.osdl.org/cgi-bin/stp.cgi?modulename=stp&command=search&sort=%21Completion_date&d_Status=completed&d_Distro_tag_uid=&d_Patch_tag=&d_Test_uid=aim9&d_Host_uid=&d_Project_uid=&op=Search

(Sorry for the long URL)


However it would be nice to see the code updated for a
different environment that the assumed VAX 11/780. As
that was the original baseline for the test. Block sizes,
and maybe things like making fakeh.tar to something like
the 2.4.19 kernel. This has been discussed a couple of
times I think that it will just take some body to lead
it. If you need equipment to test this on larger machines
we have some available.

Tim

On Mon, 2002-11-25 at 03:41, Paolo Ciarrocchi wrote:
> From: Andrew Morton <[email protected]>
> > Paolo Ciarrocchi wrote:
> > >
> > > Hi all,
> > > I've ran the AIM benchmark against 2.4.19 and 2.5.49 on my laptop (PIII@800 Reiserfs)
> >
> > AIM9, I assume.
>

> Yes
>
> > It's a rather dumb benchmark, but fun. Lots of really tiny
> > microbenchmarks, easy to see what's going on.
>
> I can run it for every 2.5.* linus will release.
> Do you think it is a good idea or just a waste of time ?
>
> Paolo
--
Timothy D. Witham - Lab Director - [email protected]
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x11 (office) (503)-702-2871 (cell)
(503)-626-2436 (fax)

2002-11-28 17:11:53

by Bill Davidsen

[permalink] [raw]
Subject: Re: [Benchmark] AIM results

On Mon, 25 Nov 2002, Paolo Ciarrocchi wrote:

> I can run it for every 2.5.* linus will release.
> Do you think it is a good idea or just a waste of time ?

As someone who worries about IPC latency (more than speed) I think these
are useful numbers, if only to give some suggestions to kernel developers
who want to get the last bit out and will take them as a challenge.

The VM stuff in 2.5 is slightly slower, not much to be done there,
hopefully in the real world balanced by more stable performance under
heavy load.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.