2007-05-17 23:25:37

by Bill Davidsen

[permalink] [raw]
Subject: Scheduling tests on IPC methods, fc6, sd0.48, cfs12

I have posted the results of my initial testing, measuring IPC rates
using various schedulers under no load, limited nice load, and heavy
load at nice 0.

http://www.tmr.com/~davidsen/ctxbench_testing.html

--
bill davidsen <[email protected]>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979


2007-05-21 07:30:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: Scheduling tests on IPC methods, fc6, sd0.48, cfs12


* Bill Davidsen <[email protected]> wrote:

> I have posted the results of my initial testing, measuring IPC rates
> using various schedulers under no load, limited nice load, and heavy
> load at nice 0.
>
> http://www.tmr.com/~davidsen/ctxbench_testing.html

nice! For this to become really representative though i'd like to ask
for a real workload function to be used after the task gets the
lock/message. The reason is that there is an inherent balancing conflict
in this area: should the scheduler 'spread' tasks to other CPUs or not?
In general, for all workloads that matter, the answer is almost always:
'yes, it should'.

But in your ctxbench results the work a task performs after doing IPC is
not reflected (the benchmark goes about to do the next IPC - hence
penalizing scheduling strategies that move tasks to other CPUs) - hence
the bonus of a scheduler properly spreading out tasks is not measured
fairly. A real-life IPC workload is rarely just about messaging around
(a single task could do that itself) - some real workload function is
used. You can see this effect yourself: do a "taskset -p 01 $$" before
running ctxbench and you'll see the numbers improve significantly on all
of the schedulers.

As a solution i'd suggest to add a workload function with a 100 or 200
usecs (or larger) cost (as a fixed-length loop or something like that)
so that the 'spreading' effect/benefit gets measured fairly too.

Ingo

2007-05-21 09:39:17

by William Lee Irwin III

[permalink] [raw]
Subject: Re: Scheduling tests on IPC methods, fc6, sd0.48, cfs12

On Thu, May 17, 2007 at 07:26:38PM -0400, Bill Davidsen wrote:
> I have posted the results of my initial testing, measuring IPC rates
> using various schedulers under no load, limited nice load, and heavy
> load at nice 0.
> http://www.tmr.com/~davidsen/ctxbench_testing.html

Kernel compiles are not how to stress these. The way to stress them is
to have multiple simultaneous independent chains of communicators and
deeper chains of communicators.

Kernel compiles are little but background cpu/memory load for these
sorts of tests. Something expected to have some sort of mutual
interference depending on quality of implementation would be a better
sort of competing load, one vastly more reflective of real workloads.
For instance, another set of processes communicating using the same
primitive.

Perhaps best of all would be a macrobenchmark utilizing a variety of
the primitives under consideration. Unsurprisingly, major commercial
databases do so for major benchmarks.


-- wli

2007-05-22 22:21:31

by Bill Davidsen

[permalink] [raw]
Subject: Re: Scheduling tests on IPC methods, fc6, sd0.48, cfs12

Ingo Molnar wrote:
> * Bill Davidsen <[email protected]> wrote:
>
>> I have posted the results of my initial testing, measuring IPC rates
>> using various schedulers under no load, limited nice load, and heavy
>> load at nice 0.
>>
>> http://www.tmr.com/~davidsen/ctxbench_testing.html
>
> nice! For this to become really representative though i'd like to ask
> for a real workload function to be used after the task gets the
> lock/message. The reason is that there is an inherent balancing conflict
> in this area: should the scheduler 'spread' tasks to other CPUs or not?
> In general, for all workloads that matter, the answer is almost always:
> 'yes, it should'.
>
Added to the short to-do list. Note that this was originally simply a
check to see which IPC works best (or at all) in an o/s. It has been
useful for some other things, and an option for work will be forthcoming.

> But in your ctxbench results the work a task performs after doing IPC is
> not reflected (the benchmark goes about to do the next IPC - hence
> penalizing scheduling strategies that move tasks to other CPUs) - hence
> the bonus of a scheduler properly spreading out tasks is not measured
> fairly. A real-life IPC workload is rarely just about messaging around
> (a single task could do that itself) - some real workload function is
> used. You can see this effect yourself: do a "taskset -p 01 $$" before
> running ctxbench and you'll see the numbers improve significantly on all
> of the schedulers.
>
> As a solution i'd suggest to add a workload function with a 100 or 200
> usecs (or larger) cost (as a fixed-length loop or something like that)
> so that the 'spreading' effect/benefit gets measured fairly too.
>
Can do.

--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot

2007-05-22 22:34:44

by Bill Davidsen

[permalink] [raw]
Subject: Re: Scheduling tests on IPC methods, fc6, sd0.48, cfs12

William Lee Irwin III wrote:
> On Thu, May 17, 2007 at 07:26:38PM -0400, Bill Davidsen wrote:
>> I have posted the results of my initial testing, measuring IPC rates
>> using various schedulers under no load, limited nice load, and heavy
>> load at nice 0.
>> http://www.tmr.com/~davidsen/ctxbench_testing.html
>
> Kernel compiles are not how to stress these. The way to stress them is
> to have multiple simultaneous independent chains of communicators and
> deeper chains of communicators.
>
> Kernel compiles are little but background cpu/memory load for these
> sorts of tests.

Just so. What is being quantified is the rate of slowdown due to
external load. I would hope that each IPC method would slow by some
similar factor.

> ... Something expected to have some sort of mutual
> interference depending on quality of implementation would be a better
> sort of competing load, one vastly more reflective of real workloads.
> For instance, another set of processes communicating using the same
> primitive.
>
The original intent was purely to measure IPC speed under no load
conditions, since fairness is in vogue I also attempted to look for
surprising behavior. Corresponding values under equal load may be useful
in relation to one another, but this isn't (and hopefully doesn't claim
to be) a benchmark. It may or may not be useful viewed in that light,
but that's not the target.

> Perhaps best of all would be a macrobenchmark utilizing a variety of
> the primitives under consideration. Unsurprisingly, major commercial
> databases do so for major benchmarks.
>
And that's a very good point, either multiple copies or more forked
processes might be useful, and I do intend to add threaded tests on the
next upgrade, but perhaps a whole new code might be better for
generating the load you suggest.

--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot