2001-04-04 16:14:03

by SodaPop

[permalink] [raw]
Subject: [QUESTION] 2.4.x nice level

I too have noticed that nicing processes does not work nearly as
effectively as I'd like it to. I run on an underpowered machine,
and have had to stop running things such as seti because it steals too
much cpu time, even when maximally niced.

As an example, I can run mpg123 and a kernel build concurrently without
trouble; but if I add a single maximally niced seti process, mpg123 runs
out of gas and will start to skip while decoding.

Is there any way we can make nice levels stronger than they currently are
in 2.4? Or is this perhaps a timeslice problem, where once seti gets cpu
time it runs longer than it should since it makes relatively few system
calls?

-dennis T


2001-04-05 17:25:17

by tor

[permalink] [raw]
Subject: Re: [QUESTION] 2.4.x nice level

LA Walsh <[email protected]> writes:
> I was running 2 copies of setiathome on a 4 CPU server
>@ work. The two processes ran nice'd -19. The builds we were
>running still took 20-30% longer as opposed to when setiathome wasn't
>running (went from 45 minutes up to about an hour). This machine
>has 1G, so I don't think it was hurting from swapping.

It would be nice to have IRIX weightless processes on Linux..
setiathome on SGI computers don't affect anything else except
in extreme cases.

-Tor

2001-04-10 03:38:29

by George Anzinger

[permalink] [raw]
Subject: Re: [QUESTION] 2.4.x nice level

SodaPop wrote:
>
> I too have noticed that nicing processes does not work nearly as
> effectively as I'd like it to. I run on an underpowered machine,
> and have had to stop running things such as seti because it steals too
> much cpu time, even when maximally niced.
>
> As an example, I can run mpg123 and a kernel build concurrently without
> trouble; but if I add a single maximally niced seti process, mpg123 runs
> out of gas and will start to skip while decoding.
>
> Is there any way we can make nice levels stronger than they currently are
> in 2.4? Or is this perhaps a timeslice problem, where once seti gets cpu
> time it runs longer than it should since it makes relatively few system
> calls?
>
In kernel/sched.c for HZ < 200 an adjustment of nice to tick is set up
to be nice>>2 (i.e. nice /4). This gives the ratio of nice to time
slice. Adjustments are made to make the MOST nice yield 1 jiffy, so
using this scale and remembering nice ranges from -19 to 20 the least
nice is 40/4 or 10 ticks. This implies that if only two tasks are
running and they are most and least niced then one will get 1/11 of the
processor, the other 10/11 (about 10% and 90%). If one is niced and the
other is not you get 1 and 5 for the time slices or 1/6 and 5/6 (17% and
83%).

In 2.2.x systems the full range of nice was used one to one to give 1
and 39 or 40 or 2.5% and 97.5% for max nice to min. For most nice to
normal you would get 1 and 20 or 4.7% and 95.3%.

The comments say the objective is to come up with a time slice of 50ms,
presumably for the normal nice value of zero. After translating the
range this would be a value of 20 and, yep 20/4 give 5 jiffies or 50
ms. Sure puts a crimp in the min to max range, however.

George

2001-04-10 16:18:26

by Rik van Riel

[permalink] [raw]
Subject: Re: [QUESTION] 2.4.x nice level

On Mon, 9 Apr 2001, george anzinger wrote:
> SodaPop wrote:
> >
> > I too have noticed that nicing processes does not work nearly as
> > effectively as I'd like it to. I run on an underpowered machine,
> > and have had to stop running things such as seti because it steals too
> > much cpu time, even when maximally niced.

> In kernel/sched.c for HZ < 200 an adjustment of nice to tick is set up
> to be nice>>2 (i.e. nice /4). This gives the ratio of nice to time
> slice. Adjustments are made to make the MOST nice yield 1 jiffy, so
[snip 2.4 nice scale is too limited]

I'll try to come up with a recalculation change that will make
this thing behave better, while still retaining the short time
slices for multiple normal-priority tasks and the cache footprint
schedule() and friends currently have...

[I've got some vague ideas ... give me a few hours to put them
into code ;)]

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-04-10 16:40:07

by George Anzinger

[permalink] [raw]
Subject: Re: [QUESTION] 2.4.x nice level

Rik van Riel wrote:
>
> On Mon, 9 Apr 2001, george anzinger wrote:
> > SodaPop wrote:
> > >
> > > I too have noticed that nicing processes does not work nearly as
> > > effectively as I'd like it to. I run on an underpowered machine,
> > > and have had to stop running things such as seti because it steals too
> > > much cpu time, even when maximally niced.
>
> > In kernel/sched.c for HZ < 200 an adjustment of nice to tick is set up
> > to be nice>>2 (i.e. nice /4). This gives the ratio of nice to time
> > slice. Adjustments are made to make the MOST nice yield 1 jiffy, so
> [snip 2.4 nice scale is too limited]
>
> I'll try to come up with a recalculation change that will make
> this thing behave better, while still retaining the short time
> slices for multiple normal-priority tasks and the cache footprint
> schedule() and friends currently have...
>
> [I've got some vague ideas ... give me a few hours to put them
> into code ;)]

You might check out this:

http://rtsched.sourceforge.net/

I did some work on leveling out the recalculation overhead. I think, as
the code shows, that it can be done without dropping the run queue lock.

I wonder if the wave nature of the recalculation cycle is a problem. By
this I mean after a recalculation tasks run for relatively long times
(50 ms today) but as the recalculation time approaches, the time reduces
to 10 ms. Gets one to thinking about a way to come up with a more
uniform, over time, mix.

George

George

2001-04-11 10:37:29

by Rik van Riel

[permalink] [raw]
Subject: [test-PATCH] Re: [QUESTION] 2.4.x nice level

On Tue, 10 Apr 2001, Rik van Riel wrote:

> I'll try to come up with a recalculation change that will make
> this thing behave better, while still retaining the short time
> slices for multiple normal-priority tasks and the cache footprint
> schedule() and friends currently have...

OK, here it is. It's nothing like montavista's singing-dancing
scheduler patch that does all, just a really minimal change that
should stretch the nice levels to yield the following CPU usage:

Nice 0 5 10 15 19
%CPU 100 56 25 6 1

Note that the code doesn't change the actual scheduling code,
just the recalculation. Care has also been taken to not increase
the cache footprint of the scheduling and recalculation code.

I'd love to hear some test results from people who are interested
in wider nice levels. How does this run on your system? Can you
trigger bad behaviour in any way?

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/



--- linux-2.4.3-ac4/kernel/sched.c.orig Tue Apr 10 21:04:06 2001
+++ linux-2.4.3-ac4/kernel/sched.c Wed Apr 11 06:18:46 2001
@@ -686,8 +686,26 @@
struct task_struct *p;
spin_unlock_irq(&runqueue_lock);
read_lock(&tasklist_lock);
- for_each_task(p)
+ for_each_task(p) {
+ if (p->nice <= 0) {
+ /* The normal case... */
p->counter = (p->counter >> 1) + NICE_TO_TICKS(p->nice);
+ } else {
+ /*
+ * Niced tasks get less CPU less often, leading to
+ * the following distribution of CPU time:
+ *
+ * Nice 0 5 10 15 19
+ * %CPU 100 56 25 6 1
+ */
+ short prio = 20 - p->nice;
+ p->nice_calc += prio;
+ if (p->nice_calc >= 20) {
+ p->nice_calc -= 20;
+ p->counter = (p->counter >> 1) + NICE_TO_TICKS(p->nice);
+ }
+ }
+ }
read_unlock(&tasklist_lock);
spin_lock_irq(&runqueue_lock);
}
--- linux-2.4.3-ac4/include/linux/sched.h.orig Tue Apr 10 21:04:13 2001
+++ linux-2.4.3-ac4/include/linux/sched.h Wed Apr 11 06:26:47 2001
@@ -303,7 +303,8 @@
* the goodness() loop in schedule().
*/
long counter;
- long nice;
+ short nice_calc;
+ short nice;
unsigned long policy;
struct mm_struct *mm;
int has_cpu, processor;

2001-04-11 15:54:59

by Rik van Riel

[permalink] [raw]
Subject: Re: [test-PATCH] Re: [QUESTION] 2.4.x nice level

On Wed, 11 Apr 2001, Rik van Riel wrote:

> OK, here it is. It's nothing like montavista's singing-dancing
> scheduler patch that does all, just a really minimal change that
> should stretch the nice levels to yield the following CPU usage:
>
> Nice 0 5 10 15 19
> %CPU 100 56 25 6 1

PID USER PRI NI SIZE SWAP RSS SHARE STAT %CPU %MEM TIME COMMAND
980 riel 17 0 296 0 296 240 R 54.1 0.5 54:19 loop
1005 riel 16 5 296 0 296 240 R N 27.0 0.5 0:34 loop
1006 riel 17 10 296 0 296 240 R N 13.5 0.5 0:16 loop
1007 riel 18 15 296 0 296 240 R N 4.5 0.5 0:05 loop
987 riel 20 19 296 0 296 240 R N 0.4 0.5 0:25 loop

... is what I got when testing it here. It seems that nice levels
REALLY mean something with the patch applied ;)

You can get it at http://www.surriel.com/patches/2.4/2.4.3ac4-largenice

Since there seems to be quite a bit of demand for this feature,
please test it and try to make it break. If it doesn't break we
can try to put it in the kernel...

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-04-11 16:29:20

by George Anzinger

[permalink] [raw]
Subject: Re: [test-PATCH] Re: [QUESTION] 2.4.x nice level

One rule of optimization is to move any code you can outside the loop.
Why isn't the nice_to_ticks calculation done when nice is changed
instead of EVERY recalc.? I guess another way to ask this is, who needs
to see the original nice? Would it be worth another task_struct entry
to move this calculation out of the loop?

George

Rik van Riel wrote:
>
> On Tue, 10 Apr 2001, Rik van Riel wrote:
>
> > I'll try to come up with a recalculation change that will make
> > this thing behave better, while still retaining the short time
> > slices for multiple normal-priority tasks and the cache footprint
> > schedule() and friends currently have...
>
> OK, here it is. It's nothing like montavista's singing-dancing
> scheduler patch that does all, just a really minimal change that
> should stretch the nice levels to yield the following CPU usage:
>
> Nice 0 5 10 15 19
> %CPU 100 56 25 6 1
>
> Note that the code doesn't change the actual scheduling code,
> just the recalculation. Care has also been taken to not increase
> the cache footprint of the scheduling and recalculation code.
>
> I'd love to hear some test results from people who are interested
> in wider nice levels. How does this run on your system? Can you
> trigger bad behaviour in any way?
>
> regards,
>
> Rik
> --
> Virtual memory is like a game you can't win;
> However, without VM there's truly nothing to lose...
>
> http://www.surriel.com/
> http://www.conectiva.com/ http://distro.conectiva.com.br/
>
> --- linux-2.4.3-ac4/kernel/sched.c.orig Tue Apr 10 21:04:06 2001
> +++ linux-2.4.3-ac4/kernel/sched.c Wed Apr 11 06:18:46 2001
> @@ -686,8 +686,26 @@
> struct task_struct *p;
> spin_unlock_irq(&runqueue_lock);
> read_lock(&tasklist_lock);
> - for_each_task(p)
> + for_each_task(p) {
> + if (p->nice <= 0) {
> + /* The normal case... */
> p->counter = (p->counter >> 1) + NICE_TO_TICKS(p->nice);
> + } else {
> + /*
> + * Niced tasks get less CPU less often, leading to
> + * the following distribution of CPU time:
> + *
> + * Nice 0 5 10 15 19
> + * %CPU 100 56 25 6 1
> + */
> + short prio = 20 - p->nice;
> + p->nice_calc += prio;
> + if (p->nice_calc >= 20) {
> + p->nice_calc -= 20;
> + p->counter = (p->counter >> 1) + NICE_TO_TICKS(p->nice);
> + }
> + }
> + }
> read_unlock(&tasklist_lock);
> spin_lock_irq(&runqueue_lock);
> }
> --- linux-2.4.3-ac4/include/linux/sched.h.orig Tue Apr 10 21:04:13 2001
> +++ linux-2.4.3-ac4/include/linux/sched.h Wed Apr 11 06:26:47 2001
> @@ -303,7 +303,8 @@
> * the goodness() loop in schedule().
> */
> long counter;
> - long nice;
> + short nice_calc;
> + short nice;
> unsigned long policy;
> struct mm_struct *mm;
> int has_cpu, processor;

2001-04-12 22:52:14

by Pozsar Balazs

[permalink] [raw]
Subject: Re: [test-PATCH] Re: [QUESTION] 2.4.x nice level

On Wed, Apr 11, 2001 at 12:53:16PM -0300, Rik van Riel wrote:
> On Wed, 11 Apr 2001, Rik van Riel wrote:
>
> > OK, here it is. It's nothing like montavista's singing-dancing
> > scheduler patch that does all, just a really minimal change that
> > should stretch the nice levels to yield the following CPU usage:
> >
> > Nice 0 5 10 15 19
> > %CPU 100 56 25 6 1
>
> PID USER PRI NI SIZE SWAP RSS SHARE STAT %CPU %MEM TIME COMMAND
> 980 riel 17 0 296 0 296 240 R 54.1 0.5 54:19 loop
> 1005 riel 16 5 296 0 296 240 R N 27.0 0.5 0:34 loop
> 1006 riel 17 10 296 0 296 240 R N 13.5 0.5 0:16 loop
> 1007 riel 18 15 296 0 296 240 R N 4.5 0.5 0:05 loop
> 987 riel 20 19 296 0 296 240 R N 0.4 0.5 0:25 loop

How does this scale to negative nice levels? Afaik it should, in some way.
(I don't mean that it's wrong in this state, i'm just asking).

regards,
Balazs.

2001-04-16 12:04:58

by Pavel Machek

[permalink] [raw]
Subject: Re: [test-PATCH] Re: [QUESTION] 2.4.x nice level

Hi!

> One rule of optimization is to move any code you can outside the loop.
> Why isn't the nice_to_ticks calculation done when nice is changed
> instead of EVERY recalc.? I guess another way to ask this is, who needs

This way change is localized very nicely, and it is "obviously right".

--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

2001-04-16 14:19:48

by Rik van Riel

[permalink] [raw]
Subject: Re: [test-PATCH] Re: [QUESTION] 2.4.x nice level

On Thu, 12 Apr 2001, Pavel Machek wrote:

> > One rule of optimization is to move any code you can outside the loop.
> > Why isn't the nice_to_ticks calculation done when nice is changed
> > instead of EVERY recalc.? I guess another way to ask this is, who needs
>
> This way change is localized very nicely, and it is "obviously right".

Except for two obvious things:

1. we need to load the nice level anyway
2. a shift takes less cycles than a load on most
CPUs

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-04-16 17:52:16

by George Anzinger

[permalink] [raw]
Subject: Re: [test-PATCH] Re: [QUESTION] 2.4.x nice level

Rik van Riel wrote:
>
> On Thu, 12 Apr 2001, Pavel Machek wrote:
>
> > > One rule of optimization is to move any code you can outside the loop.
> > > Why isn't the nice_to_ticks calculation done when nice is changed
> > > instead of EVERY recalc.? I guess another way to ask this is, who needs
> >
> > This way change is localized very nicely, and it is "obviously right".
>
> Except for two obvious things:
>
> 1. we need to load the nice level anyway
> 2. a shift takes less cycles than a load on most
> CPUs
>
Gosh, what am I missing here? I think "top" and "ps" want to see the
"nice" value so it needs to be available and since the NICE_TO_TICK()
function looses information (i.e. is not reversible) we can not compute
it from ticks. Still, yes we need to load something, but is it nice?
Why not the result of the NICE_TO_TICK()?

A shift and a subtract are fast, yes, but this loop runs over all tasks
(not just the run list). This loop can put a real dent in preemption
times AND the notion of turning on interrupts while it is done can run
into some interesting race conditions. (This is why the MontaVista
scheduler does the loop without dropping the lock, AFTER optimizing the
h... out of it.)

What am I missing?

George