My previous testing with unix sockets prompted me to do a few lmbench runs with
2.4.19 and 2.5.65. The results have me a bit concerned, as there is no area
where 2.5 is faster and several where it is significantly slower.
In particular:
stat is 8 times worse
open/close are 7 times worse
fork is twice as expensive
tcp latency is 5 times worse
file deletion and mmap are both twice as expensive
tcp bandwidth is 5 times worse
Optimizing for muliple processors and heavy loads is nice, but this looks like
its happening at the cost of basic performance. Is this really the route we
should be taking?
L M B E N C H 2 . 0 S U M M A R Y
------------------------------------
Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
Host OS Mhz null null open selct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
------ ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
doug Linux 2.5.65 750 0.38 0.61 39.8 42.1 1.07 5.29 424. 2378 20.K
doug Linux 2.5.65 750 0.38 0.54 40.2 44.2 1.07 5.31 439. 2386 20.K
doug Linux 2.4.19 750 0.37 0.52 5.21 6.78 36.7 0.93 3.59 197. 1472 15.K
Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
doug Linux 2.5.65 1.790 3.0300 118.7 46.1 158.3 46.5 158.2
doug Linux 2.5.65 1.950 2.9800 122.6 46.3 159.5 47.1 158.7
doug Linux 2.4.19 1.690 2.6700 92.9 44.4 155.2 45.0 155.8
*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP
ctxsw UNIX UDP TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
doug Linux 2.5.65 1.790 8.926 16.3 29.7 60.6 171.5 204.6 216.
doug Linux 2.5.65 1.950 9.695 18.1 28.6 59.8 173.4 207.0 212.
doug Linux 2.4.19 1.690 6.146 12.4 17.8 44.2 26.2 66.6 101.
File & VM system latencies in microseconds - smaller is better
--------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page
Create Delete Create Delete Latency Fault Fault
--------- ------------- ------ ------ ------ ------ ------- ----- -----
doug Linux 2.5.65 110.2 65.0 242.5 100.7 3130.0 0.621 4.00000
doug Linux 2.5.65 110.1 63.5 237.2 96.6 3284.0 0.741 4.00000
doug Linux 2.4.19 82.5 32.4 187.5 47.9 1660.0 1.177 3.00000
*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem
UNIX reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
doug Linux 2.5.65 167. 94.7 14.3 212.5 354.8 214.5 215.9 474. 328.4
doug Linux 2.5.65 175. 86.3 14.2 216.3 354.1 211.4 210.9 474. 328.8
doug Linux 2.4.19 220. 108. 86.4 238.2 369.1 215.5 215.0 496. 328.0
--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]
On Sat, 2003-03-22 at 16:11, Chris Friesen wrote:
> My previous testing with unix sockets prompted me to do a few lmbench runs with
> 2.4.19 and 2.5.65. The results have me a bit concerned, as there is no area
> where 2.5 is faster and several where it is significantly slower.
Are you building both with SMP off, and pre-empt off ? Also both with APM/ACPI off ?
On Sat, Mar 22, 2003 at 11:11:14AM -0500, Chris Friesen wrote:
> My previous testing with unix sockets prompted me to do a few lmbench runs
> with 2.4.19 and 2.5.65. The results have me a bit concerned, as there is
> no area where 2.5 is faster and several where it is significantly slower.
> In particular:
> stat is 8 times worse
> open/close are 7 times worse
> fork is twice as expensive
> tcp latency is 5 times worse
> file deletion and mmap are both twice as expensive
> tcp bandwidth is 5 times worse
> Optimizing for muliple processors and heavy loads is nice, but this looks
> like its happening at the cost of basic performance. Is this really the
> route we should be taking?
These aren't terribly informative without profiles (esp. cache perfctrs).
TCP to localhost was explained to me as some excess checksumming that
will eventually get removed before 2.6.0.
It's unclear why open()/close()/stat()/unlink() should be any different.
fork() is just rmap stuff. Try 2.5.65-mm2 and 2.5.65-mm3.
-- wli
> My previous testing with unix sockets prompted me to do a few lmbench
> runs with 2.4.19 and 2.5.65. The results have me a bit concerned, as
> there is no area where 2.5 is faster and several where it is
> significantly slower.
>
> In particular:
>
> stat is 8 times worse
> open/close are 7 times worse
> fork is twice as expensive
> tcp latency is 5 times worse
> file deletion and mmap are both twice as expensive
> tcp bandwidth is 5 times worse
>
> Optimizing for muliple processors and heavy loads is nice, but this
> looks like its happening at the cost of basic performance. Is this
> really the route we should be taking?
I think you're jumping to conclusions about what causes this - let's
actually try to find the real root cause. These things have many different
causes ... for instance, rmap has been found to be a problem in some
workloads (especially things like the fork stuff). If you want to
try 65-mjb1 with and without the the shared pagetable stuff, you
may get some different results. (if you have stability problems, try
doing a patch -p1 -R of 400-shpte, it seems a little fragile right now).
http://www.kernel.org/pub/linux/kernel/people/mbligh/2.5.65/
Also, if you can get kernel profiles for each test, that'd help to work
out the root cause.
M.
If someone wants to go through individual lmbench metrics
and find regression points, I have some data that I believe
is mostly very good.
There is lmbench info for a lot of 2.4 and 2.5 kernels in
these pages:
http://home.earthlink.net/~rwhron/kernel/k6-2-475.html
http://home.earthlink.net/~rwhron/kernel/old-k6-2-475.html
They are from 2 different Linux OS's, but the same piece
of hardware. It would be best not to combine them because
of the OS differences.
If anyone feels like grabbing any of the data in my web
pages and graphing it, feel free to do so.
If you have any specific questions or want even more
data/background let me know. I'd love for the data
to be more useful.
There is another page with a slew of quad xeon benchmarks.
http://home.earthlink.net/~rwhron/kernel/bigbox.html
--
Randy Hron
In article <[email protected]>,
Chris Friesen <[email protected]> wrote:
>
>My previous testing with unix sockets prompted me to do a few lmbench runs with
>2.4.19 and 2.5.65. The results have me a bit concerned, as there is no area
>where 2.5 is faster and several where it is significantly slower.
Try it with a modern library (like the one in the RH phoebe beta), and
you'll see system calls having sped up by a factor of two. Even on UP.
But there's certainly something wrong with your open/close/stat numbers.
I don't see anywhere _near_ those kinds of differences, and there are no
real SMP locking issues there either. Are you sure you're testing the
same setup?
Oh, and the TCP bandwidth thing is at least partly due to the fact that TCP
loopback does extra copies due to debugging code being enabled.
Linus