LinuxLists.cc - some more nifty benchmarks

2002-04-02 17:16:03

Subject: some more nifty benchmarks

benchmark url: http://www.gardena.net/benno/linux/audio/

The jam2 patch: http://giga.cps.unizar.es/~magallon/linux/kernel/

command used: ./do_tests none 3 256 0 1400000000

System ram: Mem: 661925888

both kernels use the same config on the same system run with the same
apps open at the time of the test.

Only one real issue is present. Patching preempt to jam2 had big issues
with the new schedular, i had to remove all the preempting in sched.c.
This sounds like it would disable preemption altogether but i did it
anyway in hopes that something still preempts. Either way it didn't
hurt anything and worst case scenario is that it acts just like jam2
without any preempt patch applied.

The results are quite interesting.
http://safemode.homeip.net/2.4.19-pre4-ac3-preempt/3x256.html

http://safemode.homeip.net/2.4.19-pre5-jam2-preempt/3x256.html

Max Latency:

As you can see, procfs latency has increased 2x with the jam2 patch.
The jam2 patch uses AA's new vm patches and low latency patches. With
mostly schedular and vm changes in AA's patches, it seems more likely
something with pre5 hurting procfs performance, Although the changelog
is so cluttered with email addresses of every single submission
included, it's difficult to glance and see if any fs/procfs changes were
made.

Anyway besides that anomoly we see expected improvements over non-low
latency kernels with the jam2 patch. 696ms goes to 80ms max latency.
That's quite amazing, especially on a ext3fs that creates even more
overhead.

Disk copy also shows nearly a 3x improvement with the low latency
patches found in jam2.

Probably the one everyone already expects to be the best. The
read-lowlatency-2 patch takes 738ms down to 16.3ms.

Basically this benchmark tells us nothing surprising or that wasn't
already expected except for the procfs slowdown. Perhaps it was just a
fluke the couple times i ran it and i'll run it again. Can anyone
comment on this change?

Avg. Latency:

We usually only care about max latency because that's what causes
noticable latency with users.

What we see with average latency is that across the board, the
pre4-ac3-preempt kernel has a higher average of +/- 1ms latency than the
pre5-jam2-preempt kernel. So even though the ac3 kernel has a higher
peak than the jam2 kernel in most places, it has a smaller deviation
from +/- 1ms latency than jam2. Obviously those needing realtime only
care about their max latency but there has to be something to be said
about most of the time being spent between +/- 1ms on the non-low
latency kernel than on the low latency kernel.

2002-04-05 03:49:46

by Dieter Nützel

[permalink] [raw]

Subject: Re: some more nifty benchmarks

On Tuesday, March 2002-04-02 17:15:40, Ed Sweetman wrote:
> benchmark url: http://www.gardena.net/benno/linux/audio/
>
> The jam2 patch: http://giga.cps.unizar.es/~magallon/linux/kernel/
>
> command used: ./do_tests none 3 256 0 1400000000
>
> System ram: Mem: 661925888
>
> both kernels use the same config on the same system run with the same
> apps open at the time of the test.
>
> Only one real issue is present. Patching preempt to jam2 had big issues
> with the new schedular, i had to remove all the preempting in sched.c.
> This sounds like it would disable preemption altogether but i did it
> anyway in hopes that something still preempts. Either way it didn't
> hurt anything and worst case scenario is that it acts just like jam2
> without any preempt patch applied.
>
> The results are quite interesting.
> http://safemode.homeip.net/2.4.19-pre4-ac3-preempt/3x256.html
>
> http://safemode.homeip.net/2.4.19-pre5-jam2-preempt/3x256.html
>
> Max Latency:
>
> As you can see, procfs latency has increased 2x with the jam2 patch.
> The jam2 patch uses AA's new vm patches and low latency patches. With
> mostly schedular and vm changes in AA's patches, it seems more likely
> something with pre5 hurting procfs performance, Although the changelog
> is so cluttered with email addresses of every single submission
> included, it's difficult to glance and see if any fs/procfs changes were
> made.

Hello Ed and others,

it must be something in the "latest" O(1) scheduler for 2.4.
I found weird latency numbers since 2.4.19-pre2-ac2 (Alan had the
O(1)-scheduler included). See what I found below.
2.4.19-pre5 + vm33 + preemption show the same ;-(

I had best numbers for latency with 2.4.17/2.4.18-pre something together
with O(1) and preemption+lock-break (max ~2ms).
Maybe Robert have some of my numbers in his maildir.

I think someone should put an eye on it.

-Dieter

BTW Is Ingo OK? I haven't seen post from him for some weeks, now.

---------- Weitergeleitete Nachricht ----------

Subject: Re: latencytest0.42-png looks weird for 2.4.19-pre2-ac2-prlo
Date: Thu, 7 Mar 2002 03:41:27 +0100
From: Dieter N?tzel <[email protected]>
To: Robert Love <[email protected]>
Cc: Ingo Molnar <[email protected]>, George Anzinger <[email protected]>, Alan Cox <[email protected]>, Linux Kernel List <[email protected]>

On Dienstag, 5. M?rz 2002 03:29:03, you wrote:
> On Mon, 2002-03-04 at 21:22, Dieter N?tzel wrote:
> > This is really weird.
> > I get results and my feeling was before it _is_ running with preemption
> > on 'cause it is smooth and speedy.
> >
> > preempt-kernel-rml-2.4.19-pre2-ac2-3.patch
> > Applied.
> >
> > But the numbers for latencytest0.42-png look ugly.
> > I'll enable DEBUG. Hope I find something.
>
> Let me know ... I really need to see comparisons. The above vs
> 2.4.19-pre2-ac2 with no preemption. Or 2.4.19-pre2 with just O(1) or
> 2.4.19-pre2 with rmap, etc ... I need to see a baseline (nothing) and
> then find out if it is rmap or O(1) causing the problem.

2.4.18 clean running OK, apart from the inherent slowness...;-)

> From your results, preemption is definitely working. It must be
> something else causing a bad mix...

Yep, FOUND it.
Ingo`s latest sched-O1-2.4.18-pre8-K3 is the culprit!!!
Even with -ac (2.4.19-pre2-ac2) and together with -aa (latest here is
2.4.18-pre8-K3-VM-24-preempt-lock).

Below are the number for 2.4.18+sched-O1-2.4.18-pre8-K3.
Have a look into the attachment, too.

Hopefully you or Ingo will find something out.

See yah.
Dieter

SunWave1 dbench/latencytest0.42-png# time ./do_tests none 3 256 0 350000000
x11perf - X11 performance program, version 1.5
The XFree86 Project, Inc server version 40200000 on :0.0
from SunWave1
Thu Mar 7 03:23:44 2002

Sync time adjustment is 0.1117 msecs.

3000 reps @ 1.7388 msec ( 575.0/sec): Scroll 500x500 pixels
3000 reps @ 1.7427 msec ( 574.0/sec): Scroll 500x500 pixels
3000 reps @ 1.7416 msec ( 574.0/sec): Scroll 500x500 pixels
3000 reps @ 1.7401 msec ( 575.0/sec): Scroll 500x500 pixels
3000 reps @ 1.7434 msec ( 574.0/sec): Scroll 500x500 pixels
15000 trep @ 1.7413 msec ( 574.0/sec): Scroll 500x500 pixels

800 reps @ 7.4185 msec ( 135.0/sec): ShmPutImage 500x500 square
800 reps @ 7.4216 msec ( 135.0/sec): ShmPutImage 500x500 square
800 reps @ 7.4239 msec ( 135.0/sec): ShmPutImage 500x500 square
800 reps @ 7.4210 msec ( 135.0/sec): ShmPutImage 500x500 square
800 reps @ 7.4219 msec ( 135.0/sec): ShmPutImage 500x500 square
4000 trep @ 7.4214 msec ( 135.0/sec): ShmPutImage 500x500 square

fragment latency = 1.451247 ms
cpu latency = 1.160998 ms
13.5ms ( 13)|
1MS num_time_samples=43483 num_times_within_1ms=35936 factor=82.643792
2MS num_time_samples=43483 num_times_within_2ms=43447 factor=99.917209
PIXEL_PER_MS=103
fragment latency = 1.451247 ms
cpu latency = 1.160998 ms
321.2ms ( 16)|
1MS num_time_samples=19656 num_times_within_1ms=18006 factor=91.605617
2MS num_time_samples=19656 num_times_within_2ms=19563 factor=99.526862
PIXEL_PER_MS=103
fragment latency = 1.451247 ms
cpu latency = 1.160998 ms
79.1ms ( 36)|
1MS num_time_samples=15681 num_times_within_1ms=11212 factor=71.500542
2MS num_time_samples=15681 num_times_within_2ms=15595 factor=99.451566
PIXEL_PER_MS=103
-rw-r--r-- 1 root root 350000000 M?r 7 03:25 tmpfile
fragment latency = 1.451247 ms
cpu latency = 1.160998 ms
147.3ms (158)|
1MS num_time_samples=19290 num_times_within_1ms=18423 factor=95.505443
2MS num_time_samples=19290 num_times_within_2ms=19030 factor=98.652151
PIXEL_PER_MS=103
-rw-r--r-- 1 root root 350000000 M?r 7 03:25 tmpfile
-rw-r--r-- 1 root root 350000000 M?r 7 03:26 tmpfile2
fragment latency = 1.451247 ms
cpu latency = 1.160998 ms
484.1ms ( 64)|
1MS num_time_samples=14912 num_times_within_1ms=13493 factor=90.484174
2MS num_time_samples=14912 num_times_within_2ms=14783 factor=99.134925
PIXEL_PER_MS=103
-rw-r--r-- 1 root root 350000000 M?r 7 03:25 tmpfile
-rw-r--r-- 1 root root 350000000 M?r 7 03:26 tmpfile2
66.180u 17.240s 3:21.28 41.4% 0+0k 0+0io 10374pf+0w

-------------------------------------------------------

--
Dieter N?tzel
Graduate Student, Computer Science

University of Hamburg
Department of Computer Science
@home: [email protected]

Attachments:

2.4.18-sched-O1-2.4.18-pre8-K3.tar.gz (35.68 kB)

2002-04-05 10:23:39

by Ed Sweetman

[permalink] [raw]

Subject: Re: some more nifty benchmarks

On Thu, 2002-04-04 at 22:49, Dieter N?tzel wrote:
> On Tuesday, March 2002-04-02 17:15:40, Ed Sweetman wrote:
> > benchmark url: http://www.gardena.net/benno/linux/audio/
> >
> > The jam2 patch: http://giga.cps.unizar.es/~magallon/linux/kernel/
> >
> > command used: ./do_tests none 3 256 0 1400000000
> >
> > System ram: Mem: 661925888
> >
> > both kernels use the same config on the same system run with the same
> > apps open at the time of the test.
> >
> > Only one real issue is present. Patching preempt to jam2 had big issues
> > with the new schedular, i had to remove all the preempting in sched.c.
> > This sounds like it would disable preemption altogether but i did it
> > anyway in hopes that something still preempts. Either way it didn't
> > hurt anything and worst case scenario is that it acts just like jam2
> > without any preempt patch applied.
> >
> > The results are quite interesting.
> > http://safemode.homeip.net/2.4.19-pre4-ac3-preempt/3x256.html
> >
> > http://safemode.homeip.net/2.4.19-pre5-jam2-preempt/3x256.html
> >
> > Max Latency:
> >
> > As you can see, procfs latency has increased 2x with the jam2 patch.
> > The jam2 patch uses AA's new vm patches and low latency patches. With
> > mostly schedular and vm changes in AA's patches, it seems more likely
> > something with pre5 hurting procfs performance, Although the changelog
> > is so cluttered with email addresses of every single submission
> > included, it's difficult to glance and see if any fs/procfs changes were
> > made.
>
> Hello Ed and others,
>
> it must be something in the "latest" O(1) scheduler for 2.4.
> I found weird latency numbers since 2.4.19-pre2-ac2 (Alan had the
> O(1)-scheduler included). See what I found below.
> 2.4.19-pre5 + vm33 + preemption show the same ;-(
>
> I had best numbers for latency with 2.4.17/2.4.18-pre something together
> with O(1) and preemption+lock-break (max ~2ms).
> Maybe Robert have some of my numbers in his maildir.
>
> I think someone should put an eye on it.
>
> -Dieter
>
> BTW Is Ingo OK? I haven't seen post from him for some weeks, now.
>
>
> ---------- Weitergeleitete Nachricht ----------
>
> Subject: Re: latencytest0.42-png looks weird for 2.4.19-pre2-ac2-prlo
> Date: Thu, 7 Mar 2002 03:41:27 +0100
> From: Dieter N?tzel <[email protected]>
> To: Robert Love <[email protected]>
> Cc: Ingo Molnar <[email protected]>, George Anzinger <[email protected]>, Alan Cox <[email protected]>, Linux Kernel List <[email protected]>
>
> On Dienstag, 5. M?rz 2002 03:29:03, you wrote:
> > On Mon, 2002-03-04 at 21:22, Dieter N?tzel wrote:
> > > This is really weird.
> > > I get results and my feeling was before it _is_ running with preemption
> > > on 'cause it is smooth and speedy.
> > >
> > > preempt-kernel-rml-2.4.19-pre2-ac2-3.patch
> > > Applied.
> > >
> > > But the numbers for latencytest0.42-png look ugly.
> > > I'll enable DEBUG. Hope I find something.
> >
> > Let me know ... I really need to see comparisons. The above vs
> > 2.4.19-pre2-ac2 with no preemption. Or 2.4.19-pre2 with just O(1) or
> > 2.4.19-pre2 with rmap, etc ... I need to see a baseline (nothing) and
> > then find out if it is rmap or O(1) causing the problem.
>
> 2.4.18 clean running OK, apart from the inherent slowness...;-)
>
> > From your results, preemption is definitely working. It must be
> > something else causing a bad mix...
>
> Yep, FOUND it.
> Ingo`s latest sched-O1-2.4.18-pre8-K3 is the culprit!!!
> Even with -ac (2.4.19-pre2-ac2) and together with -aa (latest here is
> 2.4.18-pre8-K3-VM-24-preempt-lock).
>
> Below are the number for 2.4.18+sched-O1-2.4.18-pre8-K3.
> Have a look into the attachment, too.
>
> Hopefully you or Ingo will find something out.

I seem to have lost your earlier emails. Did you get a max latency of
around <2 before this 0(1) scheduler patch? 2.2 with low latency patch
gets that. 2.4 with low latency patch is many many times worse. The
high latency areas of the kernel are already known. It's just a matter
of deciding how to deal with them that's the problem. It seems that it
might be a general consensus that it can't be dealt with in 2.4
mainstream.

As you've implied before though, the scheduler is much more important
than latency is to the average user. As most people would know from
2.2, audio would skip unless it was running -20 nice and the highest
priority etc. With 2.4's scheduler and preempt, well you dont have to
worry about skips and you can leave the player at a normal nice and
priority value.

i'll continue to look at them over the weekend. Right now i'm playing
with software suspend.

> See yah.
> Dieter
>
> SunWave1 dbench/latencytest0.42-png# time ./do_tests none 3 256 0 350000000
> x11perf - X11 performance program, version 1.5
> The XFree86 Project, Inc server version 40200000 on :0.0
> from SunWave1
> Thu Mar 7 03:23:44 2002
>
> Sync time adjustment is 0.1117 msecs.
>
> 3000 reps @ 1.7388 msec ( 575.0/sec): Scroll 500x500 pixels
> 3000 reps @ 1.7427 msec ( 574.0/sec): Scroll 500x500 pixels
> 3000 reps @ 1.7416 msec ( 574.0/sec): Scroll 500x500 pixels
> 3000 reps @ 1.7401 msec ( 575.0/sec): Scroll 500x500 pixels
> 3000 reps @ 1.7434 msec ( 574.0/sec): Scroll 500x500 pixels
> 15000 trep @ 1.7413 msec ( 574.0/sec): Scroll 500x500 pixels
>
> 800 reps @ 7.4185 msec ( 135.0/sec): ShmPutImage 500x500 square
> 800 reps @ 7.4216 msec ( 135.0/sec): ShmPutImage 500x500 square
> 800 reps @ 7.4239 msec ( 135.0/sec): ShmPutImage 500x500 square
> 800 reps @ 7.4210 msec ( 135.0/sec): ShmPutImage 500x500 square
> 800 reps @ 7.4219 msec ( 135.0/sec): ShmPutImage 500x500 square
> 4000 trep @ 7.4214 msec ( 135.0/sec): ShmPutImage 500x500 square
>
> fragment latency = 1.451247 ms
> cpu latency = 1.160998 ms
> 13.5ms ( 13)|
> 1MS num_time_samples=43483 num_times_within_1ms=35936 factor=82.643792
> 2MS num_time_samples=43483 num_times_within_2ms=43447 factor=99.917209
> PIXEL_PER_MS=103
> fragment latency = 1.451247 ms
> cpu latency = 1.160998 ms
> 321.2ms ( 16)|
> 1MS num_time_samples=19656 num_times_within_1ms=18006 factor=91.605617
> 2MS num_time_samples=19656 num_times_within_2ms=19563 factor=99.526862
> PIXEL_PER_MS=103
> fragment latency = 1.451247 ms
> cpu latency = 1.160998 ms
> 79.1ms ( 36)|
> 1MS num_time_samples=15681 num_times_within_1ms=11212 factor=71.500542
> 2MS num_time_samples=15681 num_times_within_2ms=15595 factor=99.451566
> PIXEL_PER_MS=103
> -rw-r--r-- 1 root root 350000000 M?r 7 03:25 tmpfile
> fragment latency = 1.451247 ms
> cpu latency = 1.160998 ms
> 147.3ms (158)|
> 1MS num_time_samples=19290 num_times_within_1ms=18423 factor=95.505443
> 2MS num_time_samples=19290 num_times_within_2ms=19030 factor=98.652151
> PIXEL_PER_MS=103
> -rw-r--r-- 1 root root 350000000 M?r 7 03:25 tmpfile
> -rw-r--r-- 1 root root 350000000 M?r 7 03:26 tmpfile2
> fragment latency = 1.451247 ms
> cpu latency = 1.160998 ms
> 484.1ms ( 64)|
> 1MS num_time_samples=14912 num_times_within_1ms=13493 factor=90.484174
> 2MS num_time_samples=14912 num_times_within_2ms=14783 factor=99.134925
> PIXEL_PER_MS=103
> -rw-r--r-- 1 root root 350000000 M?r 7 03:25 tmpfile
> -rw-r--r-- 1 root root 350000000 M?r 7 03:26 tmpfile2
> 66.180u 17.240s 3:21.28 41.4% 0+0k 0+0io 10374pf+0w
>
> -------------------------------------------------------

2002-04-05 21:00:39

by Ed Sweetman

[permalink] [raw]

Subject: Re: some more nifty benchmarks

On Fri, 2002-04-05 at 15:37, Dieter N?tzel wrote:
> On Freitag, 5. April 2002 :22, Ed Sweetman wrote:
> > On Thu, 2002-04-04 at 22:49, Dieter N?tzel wrote:
> > > On Tuesday, March 2002-04-02 17:15:40, Ed Sweetman wrote:
>
> [-]
>
> > > Yep, FOUND it.
> > > Ingo`s latest sched-O1-2.4.18-pre8-K3 is the culprit!!!
> > > Even with -ac (2.4.19-pre2-ac2) and together with -aa (latest here is
> > > 2.4.18-pre8-K3-VM-24-preempt-lock).
> > >
> > > Below are the number for 2.4.18+sched-O1-2.4.18-pre8-K3.
> > > Have a look into the attachment, too.
> > >
> > > Hopefully you or Ingo will find something out.
> >
> > I seem to have lost your earlier emails. Did you get a max latency of
> > around <2 before this 0(1) scheduler patch?
>
> In short:
>
> YES with 2.4 and with preemption+lock-break
> I repeated it for 2.4.19-pre5+vm33. See results below.
>
> It is NOT in any case an -aa VM or preemption+lock-break bug.
> Ingo's latest sched-O1-2.4.18-pre8-K3.patch for 2.4 is the culprit. So all
> latest -ac kernels are broken in this sense, too.
>
> > 2.2 with low latency patch gets that. 2.4 with low latency patch is many
> > many times worse. The high latency areas of the kernel are already known.
>
> I know :-)
> Bad we badly need a newer lock-break for 2.4 from Robert (sorry Andrew :-).
> I will do some "stats data collection" with my next boot.
>
> > It's just a matter of deciding how to deal with them that's the problem.
> > It seems that it might be a general consensus that it can't be dealt with
> > in 2.4 mainstream.
>
> No I think it is not.
> If we can eliminate the remaining bugs from O(1) and use preemption everything
> should be smooth.
>
> > As you've implied before though, the scheduler is much more important
> > than latency is to the average user.
>
> The O(1)-scheduler is great but broken (latency wise) in the current 2.4
> version. Have anyone of you some older versions from Ingo around?
>
> > As most people would know from
> > 2.2, audio would skip unless it was running -20 nice and the highest
> > priority etc. With 2.4's scheduler and preempt, well you dont have to
> > worry about skips and you can leave the player at a normal nice and
> > priority value.
>
> That's not true with the O(1)-scheduler.
> In most of my tests (Ingo got my results) you have to renice the audio daemon
> to something like -16 (first "real time" class) and X to -10 (for good
> interactivity) during "heavy" background stuff (40 gcc and 40 g++ processes
> reniced +19 for example). This load resulting in ~350 processes, 80~85
> running in parallel and sound playing on my "old" 1 GHz Athlon II with 640
> MB...;-)

You realize that if you run enough processes your timeslice for all
proccesses sharing that priority is decreased. Decoding and audio
playing needs at least an N length timeslice and from the sound of it,
you're just running so many running processes that the length of that
priority's timeslice is below N. There is nothing a schedular can do
about that, it's being fair. Of course if you want something to run
with 350 other running processes you'll have to make it a higher
priority, if you let it be fair then it's timeslice is just too small
for it with that many divisions of your cpu. You cant expect the kernel
to autodetect the functionality of the programs it's running and
auto-tune for usable performance, that's the user's job.

What people complain about is a couple processes having the same effect
as running 350 procesess. I dont see that at all with preempt anymore.
There is no need to renice anything in a preempt kernel unless you know
you'll be running so many processes that your timeslice is just going to
be too small for applications that care. This used to be the case in
2.2 back when i used it and before the preempt and new schedular patches
for 2.4.x.

Running a couple apps and having it affect audio playing is something
you shouldn't expect to occur. But running hundreds of programs and
having it affect audio playing is perfectly acceptable if they're all at
the same priority. Would you say that it's the kernel's fault for
skipping on your 486 66Mhz cpu? no, you just dont have the processing
time/s to do what needs to be done, that's all that's happening with
your 350 processes at once on your 1Ghz cpu. it's preemption and
scheduling the way it's supposed to be, before you'd only need one
process that hogged the cpu in kernel to bring your audio to a halt.

An alternative fix would be to run your forkbombs and/or massively
threaded apps at +10 or so since it's a rare case that running so many
running processes is "normal use"

> But that's not so good for the "normal" user. We need some "auto renicing".
>
> BTW My former 2.4.17/2.4.18-pre numbers were much better for throughput and
> somewhat for latency.
>
> I used Andrea's -aa VM and Robert's preemption and lock-break on ReiserFS all
> the time. But together with bootmem-2.4.17-pre6 and waitq-2.4.17-mainline-1.
> Anyone know where I can get newer versions of them?

2002-04-05 20:38:06

by Dieter Nützel

[permalink] [raw]

Subject: Re: some more nifty benchmarks

On Freitag, 5. April 2002 :22, Ed Sweetman wrote:
> On Thu, 2002-04-04 at 22:49, Dieter N?tzel wrote:
> > On Tuesday, March 2002-04-02 17:15:40, Ed Sweetman wrote:

[-]

> > Yep, FOUND it.
> > Ingo`s latest sched-O1-2.4.18-pre8-K3 is the culprit!!!
> > Even with -ac (2.4.19-pre2-ac2) and together with -aa (latest here is
> > 2.4.18-pre8-K3-VM-24-preempt-lock).
> >
> > Below are the number for 2.4.18+sched-O1-2.4.18-pre8-K3.
> > Have a look into the attachment, too.
> >
> > Hopefully you or Ingo will find something out.
>
> I seem to have lost your earlier emails. Did you get a max latency of
> around <2 before this 0(1) scheduler patch?

In short:

YES with 2.4 and with preemption+lock-break
I repeated it for 2.4.19-pre5+vm33. See results below.

It is NOT in any case an -aa VM or preemption+lock-break bug.
Ingo's latest sched-O1-2.4.18-pre8-K3.patch for 2.4 is the culprit. So all
latest -ac kernels are broken in this sense, too.

> 2.2 with low latency patch gets that. 2.4 with low latency patch is many
> many times worse. The high latency areas of the kernel are already known.

I know :-)
Bad we badly need a newer lock-break for 2.4 from Robert (sorry Andrew :-).
I will do some "stats data collection" with my next boot.

> It's just a matter of deciding how to deal with them that's the problem.
> It seems that it might be a general consensus that it can't be dealt with
> in 2.4 mainstream.

No I think it is not.
If we can eliminate the remaining bugs from O(1) and use preemption everything
should be smooth.

> As you've implied before though, the scheduler is much more important
> than latency is to the average user.

The O(1)-scheduler is great but broken (latency wise) in the current 2.4
version. Have anyone of you some older versions from Ingo around?

> As most people would know from
> 2.2, audio would skip unless it was running -20 nice and the highest
> priority etc. With 2.4's scheduler and preempt, well you dont have to
> worry about skips and you can leave the player at a normal nice and
> priority value.

That's not true with the O(1)-scheduler.
In most of my tests (Ingo got my results) you have to renice the audio daemon
to something like -16 (first "real time" class) and X to -10 (for good
interactivity) during "heavy" background stuff (40 gcc and 40 g++ processes
reniced +19 for example). This load resulting in ~350 processes, 80~85
running in parallel and sound playing on my "old" 1 GHz Athlon II with 640
MB...;-)

But that's not so good for the "normal" user. We need some "auto renicing".

BTW My former 2.4.17/2.4.18-pre numbers were much better for throughput and
somewhat for latency.

I used Andrea's -aa VM and Robert's preemption and lock-break on ReiserFS all
the time. But together with bootmem-2.4.17-pre6 and waitq-2.4.17-mainline-1.
Anyone know where I can get newer versions of them?

Best dbench 32 numbers were:
Throughput: ~55 MB/sec
real ~1:15

Last one:

Were is Ingo? --- I hope he is fine!

Regards,
Dieter

2.4.19-pre5-vm33

32 clients started
..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................+..............+......................+........+...................................................................................................................+..+...........++.+....+++..+++++.+++++++..++++++++********************************
Throughput 40.4878 MB/sec (NB=50.6098 MB/sec 404.878 MBit/sec)
14.440u 50.650s 1:45.35 61.7% 0+0k 0+0io 939pf+0w

SunWave1 dbench/latencytest0.42-png# time ./do_tests none 3 256 0 350000000
x11perf - X11 performance program, version 1.5
The XFree86 Project, Inc server version 40200000 on :0.0
from SunWave1
Fri Apr 5 20:06:34 2002

Sync time adjustment is 0.2107 msecs.

3000 reps @ 2.2644 msec ( 442.0/sec): Scroll 500x500 pixels
3000 reps @ 2.2663 msec ( 441.0/sec): Scroll 500x500 pixels
3000 reps @ 2.2635 msec ( 442.0/sec): Scroll 500x500 pixels
3000 reps @ 2.2654 msec ( 441.0/sec): Scroll 500x500 pixels
3000 reps @ 2.2714 msec ( 440.0/sec): Scroll 500x500 pixels
15000 trep @ 2.2662 msec ( 441.0/sec): Scroll 500x500 pixels

800 reps @ 11.6017 msec ( 86.2/sec): ShmPutImage 500x500 square
800 reps @ 11.6358 msec ( 85.9/sec): ShmPutImage 500x500 square
800 reps @ 11.6463 msec ( 85.9/sec): ShmPutImage 500x500 square
800 reps @ 11.6122 msec ( 86.1/sec): ShmPutImage 500x500 square
800 reps @ 11.6322 msec ( 86.0/sec): ShmPutImage 500x500 square
4000 trep @ 11.6257 msec ( 86.0/sec): ShmPutImage 500x500 square

fragment latency = 1.451247 ms
cpu latency = 1.160998 ms
4.2ms ( 0)|
1MS num_time_samples=63551 num_times_within_1ms=61215 factor=96.324212
2MS num_time_samples=63551 num_times_within_2ms=63546 factor=99.992132
PIXEL_PER_MS=103
fragment latency = 1.451247 ms
cpu latency = 1.160998 ms
3.8ms ( 0)|
1MS num_time_samples=20758 num_times_within_1ms=19668 factor=94.749012
2MS num_time_samples=20758 num_times_within_2ms=20693 factor=99.686868
PIXEL_PER_MS=103
fragment latency = 1.451247 ms
cpu latency = 1.160998 ms
30.0ms ( 3)|
1MS num_time_samples=17604 num_times_within_1ms=16825 factor=95.574869
2MS num_time_samples=17604 num_times_within_2ms=17591 factor=99.926153
PIXEL_PER_MS=103
-rw-r--r-- 1 root root 350000000 Apr 5 20:09 tmpfile
fragment latency = 1.451247 ms
cpu latency = 1.160998 ms
14.8ms ( 12)|
1MS num_time_samples=24448 num_times_within_1ms=23863 factor=97.607166
2MS num_time_samples=24448 num_times_within_2ms=24425 factor=99.905923
PIXEL_PER_MS=103
-rw-r--r-- 1 root root 350000000 Apr 5 20:09 tmpfile
-rw-r--r-- 1 root root 350000000 Apr 5 20:10 tmpfile2
fragment latency = 1.451247 ms
cpu latency = 1.160998 ms
4.5ms ( 1)|
1MS num_time_samples=16142 num_times_within_1ms=15463 factor=95.793582
2MS num_time_samples=16142 num_times_within_2ms=16134 factor=99.950440
PIXEL_PER_MS=103
-rw-r--r-- 1 root root 350000000 Apr 5 20:09 tmpfile
-rw-r--r-- 1 root root 350000000 Apr 5 20:10 tmpfile2
122.970u 18.150s 4:09.80 56.4% 0+0k 0+0io 10418pf+0w

*******************************************************************************

2.4.19-pre5-vm33-rml

32 clients started
...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................++.....+.........+.........+....+.++.....+.+.+...+.+.....++.+...+++++...++.+.++++++++********************************
Throughput 39.637 MB/sec (NB=49.5463 MB/sec 396.37 MBit/sec)
14.370u 53.580s 1:47.59 63.1% 0+0k 0+0io 939pf+0w

SunWave1 dbench/latencytest0.42-png# time ./do_tests none 3 256 0 350000000
x11perf - X11 performance program, version 1.5
The XFree86 Project, Inc server version 40200000 on :0.0
from SunWave1
Fri Apr 5 21:29:15 2002

Sync time adjustment is 0.2172 msecs.

3000 reps @ 2.2866 msec ( 437.0/sec): Scroll 500x500 pixels
3000 reps @ 2.2899 msec ( 437.0/sec): Scroll 500x500 pixels
3000 reps @ 2.2885 msec ( 437.0/sec): Scroll 500x500 pixels
3000 reps @ 2.2847 msec ( 438.0/sec): Scroll 500x500 pixels
3000 reps @ 2.2958 msec ( 436.0/sec): Scroll 500x500 pixels
15000 trep @ 2.2891 msec ( 437.0/sec): Scroll 500x500 pixels

400 reps @ 11.7923 msec ( 84.8/sec): ShmPutImage 500x500 square
400 reps @ 11.8264 msec ( 84.6/sec): ShmPutImage 500x500 square
400 reps @ 11.8240 msec ( 84.6/sec): ShmPutImage 500x500 square
400 reps @ 11.8370 msec ( 84.5/sec): ShmPutImage 500x500 square
400 reps @ 11.8484 msec ( 84.4/sec): ShmPutImage 500x500 square
2000 trep @ 11.8256 msec ( 84.6/sec): ShmPutImage 500x500 square

fragment latency = 1.451247 ms
cpu latency = 1.160998 ms
4.2ms ( 0)|
1MS num_time_samples=48986 num_times_within_1ms=47284 factor=96.525538
2MS num_time_samples=48986 num_times_within_2ms=48979 factor=99.985710
PIXEL_PER_MS=103
fragment latency = 1.451247 ms
cpu latency = 1.160998 ms
3.8ms ( 0)|
1MS num_time_samples=20764 num_times_within_1ms=20537 factor=98.906762
2MS num_time_samples=20764 num_times_within_2ms=20762 factor=99.990368
PIXEL_PER_MS=103
fragment latency = 1.451247 ms
cpu latency = 1.160998 ms
3.8ms ( 0)|
1MS num_time_samples=20603 num_times_within_1ms=20109 factor=97.602291
2MS num_time_samples=20603 num_times_within_2ms=20602 factor=99.995146
PIXEL_PER_MS=103
-rw-r--r-- 1 root root 350000000 Apr 5 21:31 tmpfile
fragment latency = 1.451247 ms
cpu latency = 1.160998 ms
6.8ms ( 2)|
1MS num_time_samples=25283 num_times_within_1ms=24655 factor=97.516118
2MS num_time_samples=25283 num_times_within_2ms=25280 factor=99.988134
PIXEL_PER_MS=103
-rw-r--r-- 1 root root 350000000 Apr 5 21:31 tmpfile
-rw-r--r-- 1 root root 350000000 Apr 5 21:32 tmpfile2
fragment latency = 1.451247 ms
cpu latency = 1.160998 ms
5.3ms ( 1)|
1MS num_time_samples=16210 num_times_within_1ms=15669 factor=96.662554
2MS num_time_samples=16210 num_times_within_2ms=16203 factor=99.956817
PIXEL_PER_MS=103
-rw-r--r-- 1 root root 350000000 Apr 5 21:31 tmpfile
-rw-r--r-- 1 root root 350000000 Apr 5 21:32 tmpfile2
116.600u 19.040s 3:54.15 57.9% 0+0k 0+0io 10418pf+0w

Attachments:

2.4.19-pre5-vm33.tar.gz (25.49 kB)
2.4.19-pre5-vm33-rml.tar.gz (20.33 kB)
Download all attachments

2002-04-06 01:51:28

by Lincoln Dale

[permalink] [raw]

Subject: Re: some more nifty benchmarks

At 10:37 PM 5/04/2002 +0200, Dieter N?tzel wrote:
>That's not true with the O(1)-scheduler.
>In most of my tests (Ingo got my results) you have to renice the audio daemon
>to something like -16 (first "real time" class) and X to -10 (for good
>interactivity) during "heavy" background stuff (40 gcc and 40 g++ processes
>reniced +19 for example). This load resulting in ~350 processes, 80~85
>running in parallel and sound playing on my "old" 1 GHz Athlon II with 640
>MB...;-)

you've completely missed the point.

for "CPU intensive" tasks (which GCC will be for large files being
compiled), it will want to use its entire time-slice.
with HZ set at 100 for x86, that means it can run for up to 10msec without
being preempted (if its not performing any system calls, I/O or other
things which can cause a context-switch).

with 40 of these running, i have no doubt that you'll get skips on your audio.

are you using xmms? if so, this has been discussed to death previously -
and the fault lies with the userspace application.

cheers,

lincoln.