2002-12-19 21:40:43

by Con Kolivas

[permalink] [raw]
Subject: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

contest results, osdl hardware, scheduler tunable prio_bonus_ratio; default
value (2.5.52-mm1) is 25; these results are interesting.

noload:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.5.52-mm1 [8] 39.7 180 0 0 1.10
pri_bon00 [3] 40.6 180 0 0 1.12
pri_bon10 [3] 40.2 180 0 0 1.11
pri_bon30 [3] 39.7 181 0 0 1.10
pri_bon50 [3] 40.0 179 0 0 1.10

cacherun:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.5.52-mm1 [7] 36.9 194 0 0 1.02
pri_bon00 [3] 37.6 194 0 0 1.04
pri_bon10 [3] 37.2 194 0 0 1.03
pri_bon30 [3] 36.9 194 0 0 1.02
pri_bon50 [3] 36.7 195 0 0 1.01

process_load:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.5.52-mm1 [7] 49.0 144 10 50 1.35
pri_bon00 [3] 47.5 152 9 41 1.31
pri_bon10 [3] 48.2 147 10 47 1.33
pri_bon30 [3] 50.1 141 12 53 1.38
pri_bon50 [3] 46.2 154 8 39 1.28
Seems to subtly affect the balance here.


ctar_load:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.5.52-mm1 [7] 55.5 156 1 10 1.53
pri_bon00 [3] 44.6 165 0 5 1.23
pri_bon10 [3] 45.5 164 0 7 1.26
pri_bon30 [3] 52.0 154 1 10 1.44
pri_bon50 [3] 57.5 158 1 10 1.59
Seems to be a direct relationship; pb up, time up


xtar_load:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.5.52-mm1 [7] 77.4 122 1 8 2.14
pri_bon00 [3] 60.6 125 0 7 1.67
pri_bon10 [3] 61.7 125 1 8 1.70
pri_bon30 [3] 74.8 128 1 9 2.07
pri_bon50 [3] 74.5 130 1 8 2.06
when pb goes up, time goes up, but maxes out

io_load:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.5.52-mm1 [7] 80.5 108 10 19 2.22
pri_bon00 [3] 120.3 94 22 24 3.32
pri_bon10 [3] 123.6 91 20 23 3.41
pri_bon30 [3] 95.8 84 14 20 2.65
pri_bon50 [3] 76.8 114 11 21 2.12
when pb goes up, time goes down (large effect)

io_other:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.5.52-mm1 [7] 60.1 131 7 18 1.66
pri_bon00 [3] 142.8 94 27 26 3.94
pri_bon10 [3] 116.5 93 22 26 3.22
pri_bon30 [3] 72.8 115 8 19 2.01
pri_bon50 [3] 99.8 97 15 22 2.76
similar to io_load, not quite linear


read_load:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.5.52-mm1 [7] 49.9 149 5 6 1.38
pri_bon00 [3] 48.3 154 2 3 1.33
pri_bon10 [3] 49.5 150 5 6 1.37
pri_bon30 [3] 50.7 148 5 6 1.40
pri_bon50 [3] 49.8 149 5 6 1.38

list_load:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.5.52-mm1 [7] 43.8 167 0 9 1.21
pri_bon00 [3] 43.7 168 0 7 1.21
pri_bon10 [3] 44.0 167 0 8 1.22
pri_bon30 [3] 44.0 166 0 9 1.22
pri_bon50 [3] 43.8 167 0 9 1.21

mem_load:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.5.52-mm1 [7] 71.1 123 36 2 1.96
pri_bon00 [3] 78.8 98 33 2 2.18
pri_bon10 [3] 94.0 82 35 2 2.60
pri_bon30 [3] 108.6 74 36 2 3.00
pri_bon50 [3] 106.2 75 36 2 2.93
in the opposite direction to io_load; as pb goes up, time goes up, but
mem_load achieves no more work.


Changing this tunable seems to shift the balance in either direction depending
on the load. Most of the disk writing loads have shorter times as pb goes up,
but under heavy mem_load the time goes up (without an increase in the amount
of work done by the mem_load itself). The effect is quite large.

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (GNU/Linux)

iD8DBQE+Aj8nF6dfvkL3i1gRAuJOAKCYVUsr4tii1akA996c/XVqdCizuQCfQi+a
QtX8sg1Q1KA2VI6eY+X5GtM=
=QlX7
-----END PGP SIGNATURE-----


2002-12-19 22:41:46

by Robert Love

[permalink] [raw]
Subject: Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio

On Thu, 2002-12-19 at 16:50, Con Kolivas wrote:

> Changing this tunable seems to shift the balance in either direction depending
> on the load. Most of the disk writing loads have shorter times as pb goes up,
> but under heavy mem_load the time goes up (without an increase in the amount
> of work done by the mem_load itself). The effect is quite large.

This is one of the most interesting tests. Thanks, Con.

prio_bonus_ratio determines how big a bonus we give to interactive
tasks, as a percentage of the full -20 to +19 nice range. Setting it to
zero means we scale the bonuses/penalties be zero percent, i.e. we do
not give a bonus or penalty. 25% implies 25% of the range is used (i.e.
-/+5 points). Etc.

I suspect tests where you see an improvement as the value increases are
ones in which the test is more interactive than the background load. In
that case, the larger bonuses helps more so to the test and it completes
quicker.

When you see a decrease associated with a larger value, the test is less
interactive than the load. Thus the load is scheduled to the detriment
of the test, and the test takes longer to complete.

Not too sure what to make of it. It shows the interactivity estimator
does indeed help... but only if what you consider "important" is what is
considered "interactive" by the estimator. Andrew will say that is too
often not the case.

Robert Love

P.S. This setting is also useful for endusers to test. Setting
prio_bonus_ratio to zero effectively disables the interactivity
estimator, so users can test without that feature enabled. It should
fix e.g. Andrew's X wiggle issue.

2002-12-19 23:11:50

by Andrew Morton

[permalink] [raw]
Subject: Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio

Robert Love wrote:
>
> ...
> Not too sure what to make of it. It shows the interactivity estimator
> does indeed help... but only if what you consider "important" is what is
> considered "interactive" by the estimator. Andrew will say that is too
> often not the case.
>

That is too often not the case.

I can get the desktop machine working about as comfortably
as 2.4.19 with:

# echo 10 > max_timeslice
# echo 0 > prio_bonus_ratio

ie: disabling all the fancy new scheduler features :(

Dropping max_timeslice fixes the enormous stalls which happen
when an interactive process gets incorrectly identified as a
cpu hog. (OK, that's expected)

But when switching virtual desktops some windows still take a
large fraction of a second to redraw themselves. Disabling the
interactivity estimator fixes that up too. (Not OK. That's bad)

hm. It's actually quite nice. I'd be prepared to throw away
a few cycles for this.

I don't expect the interactivity/cpuhog estimator will ever work
properly on the desktop, frankly. There will always be failure
cases when a sudden swing in load causes it to make the wrong
decision.

So it appears that to stem my stream of complaints we need to
merge scheduler_tunables.patch and edit my /etc/rc.local.

2002-12-19 23:32:39

by Con Kolivas

[permalink] [raw]
Subject: Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>Robert Love wrote:
>> ...
>> Not too sure what to make of it. It shows the interactivity estimator
>> does indeed help... but only if what you consider "important" is what is
>> considered "interactive" by the estimator. Andrew will say that is too
>> often not the case.
>
>That is too often not the case.
>
>I can get the desktop machine working about as comfortably
>as 2.4.19 with:
>
># echo 10 > max_timeslice
># echo 0 > prio_bonus_ratio
>
>ie: disabling all the fancy new scheduler features :(
>
>Dropping max_timeslice fixes the enormous stalls which happen
>when an interactive process gets incorrectly identified as a
>cpu hog. (OK, that's expected)
>
>But when switching virtual desktops some windows still take a
>large fraction of a second to redraw themselves. Disabling the
>interactivity estimator fixes that up too. (Not OK. That's bad)
>
>hm. It's actually quite nice. I'd be prepared to throw away
>a few cycles for this.
>
>I don't expect the interactivity/cpuhog estimator will ever work
>properly on the desktop, frankly. There will always be failure
>cases when a sudden swing in load causes it to make the wrong
>decision.
>
>So it appears that to stem my stream of complaints we need to
>merge scheduler_tunables.patch and edit my /etc/rc.local.

I guess this explains why my variable timeslice thingy in -ck helps on the
desktop. Basically by shortening the timeslice it is masking the effect of
the interactivity estimator under load. That is, it is treating the symptoms
of having an interactivity estimator rather than tackling the cause.

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (GNU/Linux)

iD8DBQE+AllzF6dfvkL3i1gRAurrAJ97s1tW96zf+C6NfF2aDpdQM5iUkwCgjxc9
9uNvOEBjvsYIiQxc6yBZcks=
=pvhz
-----END PGP SIGNATURE-----

2002-12-19 23:33:35

by Robert Love

[permalink] [raw]
Subject: Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio

On Thu, 2002-12-19 at 18:18, Andrew Morton wrote:

> That is too often not the case.

I knew you would say that!

> I can get the desktop machine working about as comfortably
> as 2.4.19 with:
>
> # echo 10 > max_timeslice
> # echo 0 > prio_bonus_ratio
>
> ie: disabling all the fancy new scheduler features :(
>
> Dropping max_timeslice fixes the enormous stalls which happen
> when an interactive process gets incorrectly identified as a
> cpu hog. (OK, that's expected)

Curious why you need to drop max_timeslice, too. Did you do that
_before_ changing the interactivity estimator? Dropping max_timeslice
closer to min_timeslice would do away with a lot of effect of the
interactivity estimator, since bonuses and penalties would be less
apparent.

There would still be (a) the improved priority given to interactive
processes and (b) the reinsertion into the active away done to
interactive processes.

Setting prio_bonus_ratio to zero would finish off (a) and (b). It would
also accomplish the effect of setting max_timeslice low, without
actually doing it.

Thus, can you try putting max_timeslice back to 300? You would never
actually use that range, mind you, except for niced/real-time
processes. But at least then the default timeslice would be a saner
100ms.

> I don't expect the interactivity/cpuhog estimator will ever work
> properly on the desktop, frankly. There will always be failure
> cases when a sudden swing in load causes it to make the wrong
> decision.
>
> So it appears that to stem my stream of complaints we need to
> merge scheduler_tunables.patch and edit my /etc/rc.local.

I am glad sched-tune helped identify and fix the issue. I would have no
problem merging this to Linus. I actually have a 2.5.52 patch out which
is a bit cleaner - it removes the defines completely and uses the new
variables. More proper for the long term. Feel free to push what you
have, too.

But that in no way precludes not fixing what we have, because good
algorithms should not require tuning for common cases. Period.

Robert Love

2002-12-19 23:45:12

by Robert Love

[permalink] [raw]
Subject: Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio

On Thu, 2002-12-19 at 18:42, Con Kolivas wrote:

> I guess this explains why my variable timeslice thingy in -ck helps on the
> desktop. Basically by shortening the timeslice it is masking the effect of
> the interactivity estimator under load. That is, it is treating the symptoms
> of having an interactivity estimator rather than tackling the cause.

You would probably get the same effect or better by setting
prio_bonus_ratio lower (or off).

Setting it lower will also give less priority bonus/penalty and not
reinsert the tasks so readily into the active array.

Something like the attached patch may help...

Robert Love

--- linux-2.5.52/kernel/sched.c 2002-12-19 18:47:53.000000000 -0500
+++ linux/kernel/sched.c 2002-12-19 18:48:05.000000000 -0500
@@ -66,8 +66,8 @@
int child_penalty = 95;
int parent_penalty = 100;
int exit_weight = 3;
-int prio_bonus_ratio = 25;
-int interactive_delta = 2;
+int prio_bonus_ratio = 5;
+int interactive_delta = 1;
int max_sleep_avg = 2 * HZ;
int starvation_limit = 2 * HZ;




2002-12-19 23:54:16

by Con Kolivas

[permalink] [raw]
Subject: Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>On Thu, 2002-12-19 at 18:42, Con Kolivas wrote:
>> I guess this explains why my variable timeslice thingy in -ck helps on the
>> desktop. Basically by shortening the timeslice it is masking the effect of
>> the interactivity estimator under load. That is, it is treating the
>> symptoms of having an interactivity estimator rather than tackling the
>> cause.
>
>You would probably get the same effect or better by setting
>prio_bonus_ratio lower (or off).
>
>Setting it lower will also give less priority bonus/penalty and not
>reinsert the tasks so readily into the active array.
>
>Something like the attached patch may help...
>
> Robert Love

Thanks. That looks fair enough. My only concern is that io_load performance is
worse with lower prio_bonus_ratio settings and io loads are the most felt.

I was thinking of changing what it varied. I was going to leave the timeslice
fixed and use it to change the prio_bonus_ratio under load. Although that
kind of defeats the purpose of having it in the first place since it is
supposed to decide what is interactive under load?

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (GNU/Linux)

iD8DBQE+Al6IF6dfvkL3i1gRAo6mAKColJKXyGNaa0dcwot4EvElpHqkawCeORLm
ZSyRVx1w76qWBEgkjbRZWmw=
=ckYA
-----END PGP SIGNATURE-----

2002-12-19 23:55:38

by Andrew Morton

[permalink] [raw]
Subject: Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio

Robert Love wrote:
>
> On Thu, 2002-12-19 at 18:18, Andrew Morton wrote:
>
> > That is too often not the case.
>
> I knew you would say that!
>
> > I can get the desktop machine working about as comfortably
> > as 2.4.19 with:
> >
> > # echo 10 > max_timeslice
> > # echo 0 > prio_bonus_ratio
> >
> > ie: disabling all the fancy new scheduler features :(
> >
> > Dropping max_timeslice fixes the enormous stalls which happen
> > when an interactive process gets incorrectly identified as a
> > cpu hog. (OK, that's expected)
>
> Curious why you need to drop max_timeslice, too.

What Con said. When the scheduler makes an inappropriate decision,
shortening the timeslice minimises its impact.

> Did you do that _before_ changing the interactivity estimator?

I disabled the estimator first. The result was amazingly bad ;)

> Dropping max_timeslice
> closer to min_timeslice would do away with a lot of effect of the
> interactivity estimator, since bonuses and penalties would be less
> apparent.

Yup. One good test is to keep rebuilding a kernel all the time,
then just *use* the system. Setting max_timeslice=10, prio_bonus=10
works better still. prio_bonus=25 has small-but-odd lags.

> There would still be (a) the improved priority given to interactive
> processes and (b) the reinsertion into the active away done to
> interactive processes.
>
> Setting prio_bonus_ratio to zero would finish off (a) and (b). It would
> also accomplish the effect of setting max_timeslice low, without
> actually doing it.
>
> Thus, can you try putting max_timeslice back to 300? You would never
> actually use that range, mind you, except for niced/real-time
> processes. But at least then the default timeslice would be a saner
> 100ms.

prio_bonus=0, max_timeslice=300 is awful. Try it...

> ...
> But that in no way precludes not fixing what we have, because good
> algorithms should not require tuning for common cases. Period.

hm. Good luck ;)

This is a situation in which one is prepares to throw away some cycles
to achieve a desired effect.

2002-12-20 00:07:02

by Robert Love

[permalink] [raw]
Subject: Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio

On Thu, 2002-12-19 at 19:02, Andrew Morton wrote:

> What Con said. When the scheduler makes an inappropriate decision,
> shortening the timeslice minimises its impact.

OK, I tried it. It does suck.

I wonder why, though, because with the estimator off the scheduler
should not be making "bad" decisions.

> > But that in no way precludes not fixing what we have, because good
> > algorithms should not require tuning for common cases. Period.
>
> hm. Good luck ;)
>
> This is a situation in which one is prepares to throw away some cycles
> to achieve a desired effect.

Well one option would be no algorithm at all :)

But if you can find good values that make things run nice, then perhaps
we just need to change the defaults.

I think we should merge sched-tune..

Robert Love

2002-12-20 00:08:11

by Robert Love

[permalink] [raw]
Subject: Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio

On Thu, 2002-12-19 at 19:04, Con Kolivas wrote:

> Thanks. That looks fair enough. My only concern is that io_load performance is
> worse with lower prio_bonus_ratio settings and io loads are the most felt.
>
> I was thinking of changing what it varied. I was going to leave the timeslice
> fixed and use it to change the prio_bonus_ratio under load. Although that
> kind of defeats the purpose of having it in the first place since it is
> supposed to decide what is interactive under load?

Yep.

You want to find good defaults that just work.

Robert Love

2002-12-20 00:12:47

by Con Kolivas

[permalink] [raw]
Subject: Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio


>On Thu, 2002-12-19 at 19:02, Andrew Morton wrote:
>> What Con said. When the scheduler makes an inappropriate decision,
>> shortening the timeslice minimises its impact.
>
>OK, I tried it. It does suck.
>
>I wonder why, though, because with the estimator off the scheduler
>should not be making "bad" decisions.

Is it just because the base timeslices are longer than the old scheduler?

Con

2002-12-20 00:21:11

by Robert Love

[permalink] [raw]
Subject: Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio

On Thu, 2002-12-19 at 19:22, Con Kolivas wrote:

> Is it just because the base timeslices are longer than the old scheduler?

Could be. The default timeslice was around 50ms in 2.4. The default in
2.5 with a min of 10 and a max of 300 is about 100ms.

It could be that without the priority boost, 100ms is too long and
capable of starving tasks (which, without the priority boost, are all at
the same level and thus scheduled round-robin).

Robert Love

2002-12-20 00:20:22

by Andrew Morton

[permalink] [raw]
Subject: Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio

Robert Love wrote:
>
> I actually have a 2.5.52 patch out which
> is a bit cleaner - it removes the defines completely and uses the new
> variables.

I actually don't mind the

#define TUNABLE (tunable)

thing, because when you look at the code, it tells you that
TUNABLE is "special". Not a local variable, not a formal arg,
not (really) a global variable. It aids comprehension.

Prefixing all the names with "tune_" would suit, too.

2002-12-20 02:34:10

by Robert Love

[permalink] [raw]
Subject: Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio

On Thu, 2002-12-19 at 19:27, Andrew Morton wrote:

> Prefixing all the names with "tune_" would suit, too.

I have no problem with this. Keeping the names in all caps rings
"preprocessor define" to me, which in fact they are - but only insomuch
as they point to a real variable. So I dislike that.

Personally I like them as normal variable names... don't you do the same
in the VM code as well? But tune_foo is fine..

Robert Love

2002-12-20 02:40:16

by Andrew Morton

[permalink] [raw]
Subject: Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio

Robert Love wrote:
>
> On Thu, 2002-12-19 at 19:27, Andrew Morton wrote:
>
> > Prefixing all the names with "tune_" would suit, too.
>
> I have no problem with this. Keeping the names in all caps rings
> "preprocessor define" to me, which in fact they are - but only insomuch
> as they point to a real variable. So I dislike that.

Think of them as "runtime controllable constants" :)

> Personally I like them as normal variable names... don't you do the same
> in the VM code as well? But tune_foo is fine..

Convention there is to use sysctl_foo, which is fine. We haven't
been consistent in that, and it's a thing I regret. But not
enough to bother changing it.

2002-12-20 11:10:01

by Marc-Christian Petersen

[permalink] [raw]
Subject: Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio

On Friday 20 December 2002 00:53, Robert Love wrote:

Hi Robert,

> You would probably get the same effect or better by setting
> prio_bonus_ratio lower (or off).
> Setting it lower will also give less priority bonus/penalty and not
> reinsert the tasks so readily into the active array.
> Something like the attached patch may help...

> --- linux-2.5.52/kernel/sched.c 2002-12-19 18:47:53.000000000 -0500
> +++ linux/kernel/sched.c 2002-12-19 18:48:05.000000000 -0500
> @@ -66,8 +66,8 @@
> int child_penalty = 95;
> int parent_penalty = 100;
> int exit_weight = 3;
> -int prio_bonus_ratio = 25;
> -int interactive_delta = 2;
> +int prio_bonus_ratio = 5;
> +int interactive_delta = 1;
> int max_sleep_avg = 2 * HZ;
> int starvation_limit = 2 * HZ;
FYI: These changes are a horrible slowdown of all apps while kernel
compilation.

ciao, Marc

2002-12-20 17:46:16

by Robert Love

[permalink] [raw]
Subject: Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio

On Fri, 2002-12-20 at 06:17, Marc-Christian Petersen wrote:

> FYI: These changes are a horrible slowdown of all apps while kernel
> compilation.

Try leaving interactive_delta at 2. If there are still issues, then it
may just be the normal "lack" of interactivity bonus. A heavy compile
is never pleasant to the other applications.

You could also try Andrew's numbers (max_timeslice=10 and
prio_bonus_rate=0 or 10), but I would prefer not to decrease
max_timeslice so much.

Robert Love

2002-12-24 22:18:17

by Scott Thomason

[permalink] [raw]
Subject: Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio

On Thursday 19 December 2002 05:41 pm, Robert Love wrote:
> On Thu, 2002-12-19 at 18:18, Andrew Morton wrote:
> > That is too often not the case.
>
> I knew you would say that!
>
> > I can get the desktop machine working about as comfortably
> > as 2.4.19 with:
> >
> > # echo 10 > max_timeslice
> > # echo 0 > prio_bonus_ratio
> >
> > ie: disabling all the fancy new scheduler features :(
> >
> > Dropping max_timeslice fixes the enormous stalls which happen
> > when an interactive process gets incorrectly identified as a
> > cpu hog. (OK, that's expected)

My experiences to add to the pot...I started by booting 2.5.52-mm2 and
launching KDE3. I have a dual AMD MP2000+, 1GB RAM, with most of the
data used below on striped/RAID0 ATA/133 drives. Taking Andrew's
advice, I created a continuous load with:

while [ 1 ]; do ( make -j4 clean; make -j4 bzImage ); done

...in a kernel tree, then sat down for a leisurely email and web
cruising session. After about fifteen minutes, it became apparent I
wasn't suffering any interactive slowdown. So I increased the load:

while [ 1 ]; do ( make -j8 clean; make -j8 bzImage ); done
while [ 1 ]; do ( cp dump1 dump2; rm dump2; sync ); done

...where file "dump1" is 100MB. Now we're seeing some impact :)

To combat this I tried:

echo 3000 > starvation_limit
echo 4 > interactive_delta
echo 200 max_timeslice
echo 20 min_timeslice

This works pretty well. The "spinning envelope" on the email monitor
of gkrellm actually corresponds quite nicely with the actual feel of
my system, so after awhile, I just sat back and observed it. Both the
tactile response and the gkrellm obervations show this: it's common
to experience maybe a .1--.3 second lag every 2 or 3 seconds with
this load, with maybe the odd .5 second lag occurring once or twice a
minute. Watching the compile job in the background scroll by, I
noticed that there are times when it comes to a dead stop. The next
step, I guess, needs to be a ConTest with the final settings...

child_penalty: 95

exit_weight: 3

interactive_delta: 4

max_sleep_avg: 2000

max_timeslice: 300

min_timeslice: 10

parent_penalty: 100

prio_bonus_ratio: 25

starvation_limit: 3000
---scott

2002-12-25 07:21:28

by Con Kolivas

[permalink] [raw]
Subject: Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 25 Dec 2002 09:26 am, scott thomason wrote:
> On Thursday 19 December 2002 05:41 pm, Robert Love wrote:
> > On Thu, 2002-12-19 at 18:18, Andrew Morton wrote:
> > > That is too often not the case.
> >
> > I knew you would say that!
> >
> > > I can get the desktop machine working about as comfortably
> > > as 2.4.19 with:
> > >
> > > # echo 10 > max_timeslice
> > > # echo 0 > prio_bonus_ratio
> > >
> > > ie: disabling all the fancy new scheduler features :(
> > >
> > > Dropping max_timeslice fixes the enormous stalls which happen
> > > when an interactive process gets incorrectly identified as a
> > > cpu hog. (OK, that's expected)
>
> My experiences to add to the pot...I started by booting 2.5.52-mm2 and
> launching KDE3. I have a dual AMD MP2000+, 1GB RAM, with most of the
> data used below on striped/RAID0 ATA/133 drives. Taking Andrew's
> advice, I created a continuous load with:
>
> while [ 1 ]; do ( make -j4 clean; make -j4 bzImage ); done
>
> ...in a kernel tree, then sat down for a leisurely email and web
> cruising session. After about fifteen minutes, it became apparent I
> wasn't suffering any interactive slowdown. So I increased the load:
>
> while [ 1 ]; do ( make -j8 clean; make -j8 bzImage ); done
> while [ 1 ]; do ( cp dump1 dump2; rm dump2; sync ); done
>
> ...where file "dump1" is 100MB. Now we're seeing some impact :)
>
> To combat this I tried:
>
> echo 3000 > starvation_limit
> echo 4 > interactive_delta
> echo 200 max_timeslice
> echo 20 min_timeslice
>
> This works pretty well. The "spinning envelope" on the email monitor
> of gkrellm actually corresponds quite nicely with the actual feel of
> my system, so after awhile, I just sat back and observed it. Both the
> tactile response and the gkrellm obervations show this: it's common
> to experience maybe a .1--.3 second lag every 2 or 3 seconds with
> this load, with maybe the odd .5 second lag occurring once or twice a
> minute. Watching the compile job in the background scroll by, I
> noticed that there are times when it comes to a dead stop. The next
> step, I guess, needs to be a ConTest with the final settings...
>
> child_penalty: 95
>
> exit_weight: 3
>
> interactive_delta: 4
>
> max_sleep_avg: 2000
>
> max_timeslice: 300
>
> min_timeslice: 10
>
> parent_penalty: 100
>
> prio_bonus_ratio: 25
>
> starvation_limit: 3000

Scott

These don't correspond to your values listed above. Typo?

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (GNU/Linux)

iD8DBQE+CV5XF6dfvkL3i1gRAn+7AJ0Qq0oEo0LE2GG1jpju4cHqH+k6/QCfV1AU
/7JI1ApZoQYwyBmFpH/50FY=
=MAZ5
-----END PGP SIGNATURE-----

2002-12-25 16:09:30

by Scott Thomason

[permalink] [raw]
Subject: Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio

On Wednesday 25 December 2002 01:29 am, Con Kolivas wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Wed, 25 Dec 2002 09:26 am, scott thomason wrote:
> > My experiences to add to the pot...I started by booting
> > 2.5.52-mm2 and launching KDE3. I have a dual AMD MP2000+, 1GB
> > RAM, with most of the data used below on striped/RAID0 ATA/133
> > drives. Taking Andrew's advice, I created a continuous load with:
> >
> > while [ 1 ]; do ( make -j4 clean; make -j4 bzImage ); done
> >
> > ...in a kernel tree, then sat down for a leisurely email and web
> > cruising session. After about fifteen minutes, it became apparent
> > I wasn't suffering any interactive slowdown. So I increased the
> > load:
> >
> > while [ 1 ]; do ( make -j8 clean; make -j8 bzImage ); done
> > while [ 1 ]; do ( cp dump1 dump2; rm dump2; sync ); done
> >
> > ...where file "dump1" is 100MB. Now we're seeing some impact :)
> >
> > To combat this I tried:
> >
> > echo 3000 > starvation_limit
> > echo 4 > interactive_delta
> > echo 200 max_timeslice
> > echo 20 min_timeslice
> >
> > This works pretty well. The "spinning envelope" on the email
> > monitor of gkrellm actually corresponds quite nicely with the
> > actual feel of my system, so after awhile, I just sat back and
> > observed it. Both the tactile response and the gkrellm
> > obervations show this: it's common to experience maybe a .1--.3
> > second lag every 2 or 3 seconds with this load, with maybe the
> > odd .5 second lag occurring once or twice a minute. Watching the
> > compile job in the background scroll by, I noticed that there are
> > times when it comes to a dead stop. The next step, I guess, needs
> > to be a ConTest with the final settings...
> >
> > child_penalty: 95
> >
> > exit_weight: 3
> >
> > interactive_delta: 4
> >
> > max_sleep_avg: 2000
> >
> > max_timeslice: 300
> >
> > min_timeslice: 10
> >
> > parent_penalty: 100
> >
> > prio_bonus_ratio: 25
> >
> > starvation_limit: 3000
>
> Scott
>
> These don't correspond to your values listed above. Typo?
>
> Con

Yes, sorry. The values listed concisely are correct, IOW:

child_penalty: 95
exit_weight: 3
interactive_delta: 4
max_sleep_avg: 2000
max_timeslice: 200
min_timeslice: 20
parent_penalty: 100
prio_bonus_ratio: 25
starvation_limit: 3000

Now I need to fire up a ConTest, then off to Christmas with Grandma
and the kids! Merry Christmas to all!
---scott

2002-12-26 14:53:42

by Scott Thomason

[permalink] [raw]
Subject: Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio

On Wednesday 25 December 2002 10:17 am, scott thomason wrote:
> Yes, sorry. The values listed concisely are correct, IOW:
>
> child_penalty: 95
> exit_weight: 3
> interactive_delta: 4
> max_sleep_avg: 2000
> max_timeslice: 200
> min_timeslice: 20
> parent_penalty: 100
> prio_bonus_ratio: 25
> starvation_limit: 3000
>
> Now I need to fire up a ConTest, then off to Christmas with Grandma
> and the kids! Merry Christmas to all!

Here is the detailed ConTest data, first for 2.5.52-mm2 with the
default tunables, then with the settings listed above. Two comments
about how I use ConTest so you don't wig out about low numbers: 1) I
compile qmail with a tailored Makefile instead of the kernel; 2) I
limit the size of the tempfile to 100MB (instead of the 1GB my setup
would normally yield). On with the detailed data (sorry, I don't have
the original *.logs to build the -r report):

Default tunables

noload Time: 18.82 CPU: 171% LoadRuns: 0 LoadCPU%: 0 Major Faults:
368526 Minor Faults: 181377
process_load Time: 25.12 CPU: 125% LoadRuns: 18 LoadCPU%: 74%
Major Faults: 368526 Minor Faults: 181375
ctar_load Time: 32.73 CPU: 139% LoadRuns: 206 LoadCPU%: 31% Major
Faults: 368527 Minor Faults: 181378
xtar_load Time: 30.51 CPU: 152% LoadRuns: 79 LoadCPU%: 26% Major
Faults: 368528 Minor Faults: 181376
io_load Time: 37.13 CPU: 118% LoadRuns: 125 LoadCPU%: 40% Major
Faults: 368524 Minor Faults: 181376
read_load Time: 31.20 CPU: 99% LoadRuns: 1147 LoadCPU%: 99% Major
Faults: 368524 Minor Faults: 181377
list_load Time: 22.33 CPU: 151% LoadRuns: 0 LoadCPU%: 4% Major
Faults: 368526 Minor Faults: 181376
mem_load Time: 39.70 CPU: 106% LoadRuns: 88 LoadCPU%: 14% Major
Faults: 368533 Minor Faults: 183364
noload Time: 18.67 CPU: 171% LoadRuns: 0 LoadCPU%: 0 Major Faults:
368527 Minor Faults: 181378
process_load Time: 20.57 CPU: 154% LoadRuns: 10 LoadCPU%: 49%
Major Faults: 368527 Minor Faults: 181377
ctar_load Time: 24.48 CPU: 195% LoadRuns: 164 LoadCPU%: 20% Major
Faults: 368527 Minor Faults: 181381
xtar_load Time: 31.33 CPU: 141% LoadRuns: 83 LoadCPU%: 24% Major
Faults: 368526 Minor Faults: 181376
io_load Time: 32.06 CPU: 154% LoadRuns: 159 LoadCPU%: 55% Major
Faults: 368526 Minor Faults: 181377
read_load Time: 30.11 CPU: 103% LoadRuns: 1044 LoadCPU%: 95% Major
Faults: 368527 Minor Faults: 181376
list_load Time: 19.49 CPU: 172% LoadRuns: 0 LoadCPU%: 8% Major
Faults: 368527 Minor Faults: 181377
mem_load Time: 47.46 CPU: 102% LoadRuns: 97 LoadCPU%: 11% Major
Faults: 369220 Minor Faults: 184244
noload Time: 18.56 CPU: 172% LoadRuns: 0 LoadCPU%: 0 Major Faults:
368528 Minor Faults: 181376
process_load Time: 26.16 CPU: 120% LoadRuns: 20 LoadCPU%: 77%
Major Faults: 368525 Minor Faults: 181375
ctar_load Time: 68.20 CPU: 68% LoadRuns: 428 LoadCPU%: 14% Major
Faults: 368525 Minor Faults: 181381
xtar_load Time: 28.97 CPU: 163% LoadRuns: 85 LoadCPU%: 25% Major
Faults: 368527 Minor Faults: 181376
io_load Time: 29.25 CPU: 164% LoadRuns: 126 LoadCPU%: 48% Major
Faults: 368526 Minor Faults: 181377
read_load Time: 31.03 CPU: 100% LoadRuns: 1115 LoadCPU%: 97% Major
Faults: 368525 Minor Faults: 181377
list_load Time: 21.81 CPU: 155% LoadRuns: 0 LoadCPU%: 8% Major
Faults: 368526 Minor Faults: 181377
mem_load Time: 47.05 CPU: 88% LoadRuns: 96 LoadCPU%: 11% Major
Faults: 368552 Minor Faults: 181433


Tweaked tunables

noload Time: 18.62 CPU: 172% LoadRuns: 0 LoadCPU%: 0 Major Faults:
368528 Minor Faults: 181384
process_load Time: 27.54 CPU: 114% LoadRuns: 25 LoadCPU%: 81%
Major Faults: 368527 Minor Faults: 181383
ctar_load Time: 27.53 CPU: 171% LoadRuns: 181 LoadCPU%: 30% Major
Faults: 368528 Minor Faults: 181388
xtar_load Time: 32.61 CPU: 127% LoadRuns: 74 LoadCPU%: 20% Major
Faults: 368525 Minor Faults: 181385
io_load Time: 42.57 CPU: 113% LoadRuns: 204 LoadCPU%: 46% Major
Faults: 368527 Minor Faults: 181384
read_load Time: 22.66 CPU: 141% LoadRuns: 312 LoadCPU%: 44% Major
Faults: 368527 Minor Faults: 181384
list_load Time: 21.63 CPU: 158% LoadRuns: 0 LoadCPU%: 4% Major
Faults: 368526 Minor Faults: 181383
mem_load Time: 44.48 CPU: 96% LoadRuns: 93 LoadCPU%: 12% Major
Faults: 368528 Minor Faults: 181402
noload Time: 18.70 CPU: 172% LoadRuns: 0 LoadCPU%: 0 Major Faults:
368525 Minor Faults: 181385
process_load Time: 27.37 CPU: 115% LoadRuns: 25 LoadCPU%: 82%
Major Faults: 368527 Minor Faults: 181382
ctar_load Time: 73.84 CPU: 62% LoadRuns: 498 LoadCPU%: 19% Major
Faults: 368525 Minor Faults: 181388
xtar_load Time: 25.15 CPU: 163% LoadRuns: 58 LoadCPU%: 24% Major
Faults: 368527 Minor Faults: 181385
io_load Time: 35.49 CPU: 118% LoadRuns: 112 LoadCPU%: 33% Major
Faults: 368525 Minor Faults: 181385
read_load Time: 23.11 CPU: 137% LoadRuns: 517 LoadCPU%: 63% Major
Faults: 368525 Minor Faults: 181383
list_load Time: 21.61 CPU: 157% LoadRuns: 0 LoadCPU%: 4% Major
Faults: 368528 Minor Faults: 181383
mem_load Time: 42.15 CPU: 130% LoadRuns: 90 LoadCPU%: 13% Major
Faults: 368524 Minor Faults: 181466
noload Time: 18.95 CPU: 169% LoadRuns: 0 LoadCPU%: 0 Major Faults:
368526 Minor Faults: 181384
process_load Time: 27.44 CPU: 114% LoadRuns: 24 LoadCPU%: 83%
Major Faults: 368527 Minor Faults: 181384
ctar_load Time: 23.68 CPU: 191% LoadRuns: 155 LoadCPU%: 16% Major
Faults: 368527 Minor Faults: 181388
xtar_load Time: 31.76 CPU: 137% LoadRuns: 65 LoadCPU%: 23% Major
Faults: 368527 Minor Faults: 181386
io_load Time: 33.40 CPU: 123% LoadRuns: 150 LoadCPU%: 45% Major
Faults: 368527 Minor Faults: 181385
read_load Time: 21.26 CPU: 147% LoadRuns: 424 LoadCPU%: 57% Major
Faults: 368526 Minor Faults: 181384
list_load Time: 20.29 CPU: 167% LoadRuns: 0 LoadCPU%: 4% Major
Faults: 368527 Minor Faults: 181383
mem_load Time: 51.10 CPU: 130% LoadRuns: 98 LoadCPU%: 12% Major
Faults: 368596 Minor Faults: 181598


Finally, it crossed my mind that completely subjective monitoring of X
jerkiness perhaps wasn't the most scientific way of measuring the
interactive impact of the tunables. I'm no Evil Scientist, but I
whipped up a perl script that I think accomplishes something close to
capturing those statistics. It captures 1000 samples of what should
be a precise .2 second delay (on an idle system it is, with a tiny
bit of noise).

Here's the script, along with some output produced while the system
was under considerable load. Would something like this be worth
developing further to help rigorously measure the interactive impact
of the tunables? Or is there a flaw in the approach?


#!/usr/bin/perl

use strict;
use warnings;

use Time::HiRes qw/sleep time/;

my %pause = ();

for (my $x = 0; $x < 1000; $x++) {
my $start = time();
sleep(.2);
my $stop = time();
my $elapsed = $stop - $start;

$pause{sprintf('%01.3f', $elapsed)}++;
}

foreach (sort(keys(%pause))) {
print "$_: $pause{$_}\n";
}

exit 0;


Sample output

time ./int_resp_timer.pl
0.192: 1
0.199: 1
0.200: 10
0.201: 201
0.202: 53
0.203: 25
0.204: 22
0.205: 21
0.206: 34
0.207: 29
0.208: 29
0.209: 100
0.210: 250
0.211: 120
0.212: 35
0.213: 16
0.214: 17
0.215: 14
0.216: 9
0.217: 1
0.218: 3
0.219: 3
0.220: 1
0.222: 1
0.233: 1
0.303: 1
0.304: 1
0.385: 1

real 3m28.568s
user 0m0.329s
sys 0m1.260s

2003-01-01 00:23:05

by Scott Thomason

[permalink] [raw]
Subject: Impact of scheduler tunables on interactive response (was Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio)

Around mid-December, Con, rml, & akpm had a discussion about whether
or not the scheduler tunables were a good thing for interactive
responsiveness. Andrew was of the opinion that the interactivity
estimator judged poorly too often and introduced noticeable lags to
the interactive experience. To combat this, he fiddled with the
tunable knobs in an attempt to basicly turn off the interactive
estimation.

I wrote a program that emulates a varying but constant set of loads
with a fixed amount of sleep() time in the hopes that it would appear
"interactive" to the estimator. The program measures the time it
takes to process each iteration (minus the time it spends sleeping).
Then I tried seven different configurations of the tunables while the
system was under load. The kernel was 2.5.53-mm2. The load was a
continuously looping kernel make -j4 clean/make -j4 bzImage, and a
continuously looping copy of a 100MB file. My system is a dual AMD
MP2000 with 1GB RAM.

*IF* the test program is valid--something I would like feedback
on!--the results show that you can attack the background load with
aggressive tunable settings to achieve low interactive response
times, contrary to the direction Andrew had suggested taking for
tunable settings.

The seven tunable configurations, a graph of the results, and the raw
data are here:

http://www.thomasons.org/int_res.html

Tab-delimited text and OpenOffice spreadsheets of the data are here:

http://www.thomasons.org/int_res.txt
http://www.thomasons.org/int_res.sxc

I would like to assemble a small suite of tools that can be used to
measure the impact of kernel changes on interactive performance,
starting with Mark Hahn's/Andrew's "realfeel" microbenchmark and
moving up thru whatever else may be necessary to gauge real-life
impact. Your comments and direction are very welcome.

This test program is:

#!/usr/bin/perl

use strict;
use warnings;

use Time::HiRes qw/sleep time/;
use IO::File;

use constant OBS => 5000;
use constant SLEEP => 0.3;
use constant MEMLOW => 04 * 1024 * 1024;
use constant MEMINC => 2 * 1024 * 1024;
use constant MEMHI => 30 * 1024 * 1024;

my $m = MEMHI;

for (my $x = 0; $x < OBS; $x++) {
my $start = time();

$m += MEMINC;
if ($m > MEMHI) {
$m = MEMLOW;
}
my $mem = 'x' x $m; ## Touch a little memory

sleep(SLEEP);

$mem = undef; ## Release the memory
my $fh = IO::File->new_tmpfile or die "Can't get temp file handle!";
my $m2 = $m * .02; ## Write 2% of the memory allocation to disk
print $fh 'x' x $m2;
$fh = undef;

my $elapsed = (time() - $start) - SLEEP;
printf("%07.4f\n", $elapsed); ## Capture to tenths of ms - sleep
}

exit 0;



On Thursday 19 December 2002 05:41 pm, Robert Love wrote:
> On Thu, 2002-12-19 at 18:18, Andrew Morton wrote:
> > That is too often not the case.
>
> I knew you would say that!
>
> > I can get the desktop machine working about as comfortably
> > as 2.4.19 with:
> >
> > # echo 10 > max_timeslice
> > # echo 0 > prio_bonus_ratio
> >
> > ie: disabling all the fancy new scheduler features :(
> >
> > Dropping max_timeslice fixes the enormous stalls which happen
> > when an interactive process gets incorrectly identified as a
> > cpu hog. (OK, that's expected)
>
> Curious why you need to drop max_timeslice, too. Did you do that
> _before_ changing the interactivity estimator? Dropping
> max_timeslice closer to min_timeslice would do away with a lot of
> effect of the interactivity estimator, since bonuses and penalties
> would be less apparent.
>
> There would still be (a) the improved priority given to interactive
> processes and (b) the reinsertion into the active away done to
> interactive processes.
>
> Setting prio_bonus_ratio to zero would finish off (a) and (b). It
> would also accomplish the effect of setting max_timeslice low,
> without actually doing it.
>
> Thus, can you try putting max_timeslice back to 300? You would
> never actually use that range, mind you, except for niced/real-time
> processes. But at least then the default timeslice would be a
> saner 100ms.
>
> > I don't expect the interactivity/cpuhog estimator will ever work
> > properly on the desktop, frankly. There will always be failure
> > cases when a sudden swing in load causes it to make the wrong
> > decision.
> >
> > So it appears that to stem my stream of complaints we need to
> > merge scheduler_tunables.patch and edit my /etc/rc.local.
>
> I am glad sched-tune helped identify and fix the issue. I would
> have no problem merging this to Linus. I actually have a 2.5.52
> patch out which is a bit cleaner - it removes the defines
> completely and uses the new variables. More proper for the long
> term. Feel free to push what you have, too.
>
> But that in no way precludes not fixing what we have, because good
> algorithms should not require tuning for common cases. Period.
>
> Robert Love
>
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2003-01-01 15:59:32

by Bill Davidsen

[permalink] [raw]
Subject: Re: Impact of scheduler tunables on interactive response (was Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio)

On Wed, 1 Jan 2003, scott thomason wrote:

> I wrote a program that emulates a varying but constant set of loads
> with a fixed amount of sleep() time in the hopes that it would appear
> "interactive" to the estimator. The program measures the time it
> takes to process each iteration (minus the time it spends sleeping).
> Then I tried seven different configurations of the tunables while the
> system was under load. The kernel was 2.5.53-mm2. The load was a
> continuously looping kernel make -j4 clean/make -j4 bzImage, and a
> continuously looping copy of a 100MB file. My system is a dual AMD
> MP2000 with 1GB RAM.

This sounds very like my resp2 (http://www.unyuug.org/benchmarks/) program I
announced on this list some months ago, but resp2 generates loads of a
specific type so that you can determine of changes affect i/o load,
swapping load, CPU load, etc.

>
> *IF* the test program is valid--something I would like feedback
> on!--the results show that you can attack the background load with
> aggressive tunable settings to achieve low interactive response
> times, contrary to the direction Andrew had suggested taking for
> tunable settings.
>
> The seven tunable configurations, a graph of the results, and the raw
> data are here:
>
> http://www.thomasons.org/int_res.html
>
> Tab-delimited text and OpenOffice spreadsheets of the data are here:
>
> http://www.thomasons.org/int_res.txt
> http://www.thomasons.org/int_res.sxc
>
> I would like to assemble a small suite of tools that can be used to
> measure the impact of kernel changes on interactive performance,
> starting with Mark Hahn's/Andrew's "realfeel" microbenchmark and
> moving up thru whatever else may be necessary to gauge real-life
> impact. Your comments and direction are very welcome.

Note: the context switching benchmark is at the same URL. I have posted
some output recently, haven't had a of feedback other than some folks
mailing results to me without copying the list.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2003-01-01 17:06:46

by Scott Thomason

[permalink] [raw]
Subject: Re: Impact of scheduler tunables on interactive response (was Re: [BENCHMARK] scheduler tunables with contest - prio_bonus_ratio)

On Wednesday 01 January 2003 10:05 am, Bill Davidsen wrote:
> This sounds very like my resp2 (http://www.unyuug.org/benchmarks/) program
> I announced on this list some months ago, but resp2 generates loads
> of a specific type so that you can determine of changes affect i/o
> load, swapping load, CPU load, etc

Have you (or anyone else) used resp2 to measure the impact of the
scheduler tunables on interactive responsiveness yet, and if so, what
are your conclusions?

Also, your benchmark page doesn't list a URL for resp2. Sure, I could
Google for one, but wouldn't it be nice to update your page, too?
---scott