LinuxLists.cc - Problem with the O(1) scheduler in 2.4.19

2002-09-01 21:48:57

Subject: Problem with the O(1) scheduler in 2.4.19

While the O(1) scheduler has performed very well for me in most
situations, I have one big problem with it. When running a Counter-Strike
game server on Linux 2.4.19 with the sched-2.4.19-rc2-A4 patch applied,
the server process is niced from the default value of 15 (interactive) to
25 (background). This means that every time crond wakes up or a mail
arrives the game latency becomes extremely bad and the users experience
lag.

The process takes around 70% CPU on these occasions, so I'm surprised that
the task is not considered to be interactive.

This does not happen with stock 2.4.19. Do you have any ideas why this
regression is happening?

/Tobias

2002-09-02 13:01:27

by Alan

[permalink] [raw]

Subject: Re: Problem with the O(1) scheduler in 2.4.19

On Sun, 2002-09-01 at 22:53, Tobias Ringstrom wrote:
> While the O(1) scheduler has performed very well for me in most
> situations, I have one big problem with it. When running a Counter-Strike
> game server on Linux 2.4.19 with the sched-2.4.19-rc2-A4 patch applied,
> the server process is niced from the default value of 15 (interactive) to
> 25 (background). This means that every time crond wakes up or a mail
> arrives the game latency becomes extremely bad and the users experience
> lag.
>
> The process takes around 70% CPU on these occasions, so I'm surprised that
> the task is not considered to be interactive.
>
> This does not happen with stock 2.4.19. Do you have any ideas why this
> regression is happening?

It isnt a regression, its a bug fix. The nice value is now being
honoured properly.

2002-09-02 13:30:19

by Ingo Molnar

[permalink] [raw]

Subject: Re: Problem with the O(1) scheduler in 2.4.19

On Sun, 1 Sep 2002, Tobias Ringstrom wrote:

> While the O(1) scheduler has performed very well for me in most
> situations, I have one big problem with it. When running a
> Counter-Strike game server on Linux 2.4.19 with the sched-2.4.19-rc2-A4
> patch applied, the server process is niced from the default value of 15
> (interactive) to 25 (background). This means that every time crond
> wakes up or a mail arrives the game latency becomes extremely bad and
> the users experience lag.

does the same problem happen if you renice the game server to -10 or -15?

Ingo

2002-09-02 13:37:36

by Tobias Ringstrom

[permalink] [raw]

Subject: Re: Problem with the O(1) scheduler in 2.4.19

On 2 Sep 2002, Alan Cox wrote:

> It isnt a regression, its a bug fix. The nice value is now being
> honoured properly.

The problem is that the kernel decided to nice the process (by changing
the priority, not the nice value) as if it was a background task, but it's
not a background task. On the contrary, it's highly interactive.

/Tobias

2002-09-02 13:49:37

by Tobias Ringstrom

[permalink] [raw]

Subject: Re: Problem with the O(1) scheduler in 2.4.19

On Mon, 2 Sep 2002, Ingo Molnar wrote:

> On Sun, 1 Sep 2002, Tobias Ringstrom wrote:
>
> > While the O(1) scheduler has performed very well for me in most
> > situations, I have one big problem with it. When running a
> > Counter-Strike game server on Linux 2.4.19 with the sched-2.4.19-rc2-A4
> > patch applied, the server process is niced from the default value of 15
> > (interactive) to 25 (background). This means that every time crond
> > wakes up or a mail arrives the game latency becomes extremely bad and
> > the users experience lag.
>
> does the same problem happen if you renice the game server to -10 or -15?

The process was at nice level 0, which I think corresponds to prio 15-25
for interactive to background tasks if I understand things correctly.
When I used top to renice the process to -10, the prio became 15, i.e. it
was still considered non-interactive. I even tried -20 (or maybe -19),
and it was still at the non-interactive prio.

In other words: For all nice values I tried (-20, -10, 0), the prio was
20+nice+5. When the server is lightly loaded, the prio is 20+nice-5.

Note that even when the server was loaded, it only used 70% CPU, which I
suppose must mean that it does not use up the time slices, which I thought
should make the kernel treat the process as interactive. Is there a
description of the criteria somewhere (other than in the source code)?

/Tobias

2002-09-02 21:40:25

by Tobias Ringstrom

[permalink] [raw]

Subject: Re: Problem with the O(1) scheduler in 2.4.19

On Mon, 2 Sep 2002, Tobias Ringstrom wrote:

> On 2 Sep 2002, Alan Cox wrote:
>
> > It isnt a regression, its a bug fix. The nice value is now being
> > honoured properly.
>
> The problem is that the kernel decided to nice the process (by changing
> the priority, not the nice value) as if it was a background task, but it's
> not a background task. On the contrary, it's highly interactive.

I think I will have to take this back. It looks like even the old kernel
treats the game server as a background process, but as you said, it does
not make such a big difference. Another change is that the prio value
varies very quickly over time (as seen in top). I do not recall seeing
that using the O(1)-scheduler.

But I still do not understand why the process is classified as
non-interactive... Around 20 times per second it does a nanosleep for
1 ms which takes around 40 ms in reality. (Seeing this makes me believe
that I should try to increase HZ, but that is a separate issue.)

/Tobias

2002-09-03 05:46:20

by Ingo Molnar

[permalink] [raw]

Subject: Re: Problem with the O(1) scheduler in 2.4.19

On Mon, 2 Sep 2002, Tobias Ringstrom wrote:

> But I still do not understand why the process is classified as
> non-interactive... Around 20 times per second it does a nanosleep for 1
> ms which takes around 40 ms in reality. (Seeing this makes me believe
> that I should try to increase HZ, but that is a separate issue.)

what CPU usage does it have? 70% CPU usage is not interactive.

well, even 70% CPU usage can be interactive if you lower its priority to
-20. But with the default nice value a task will lose its interactivity
much quicker.

also, could you increase HZ to 1000 (in asm/param.h, full recompile of the
kernel is needed), does it make a difference?

Ingo

2002-09-03 10:09:09

by Tobias Ringstrom

[permalink] [raw]

Subject: Re: Problem with the O(1) scheduler in 2.4.19

On Tue, 3 Sep 2002, Ingo Molnar wrote:

> On Mon, 2 Sep 2002, Tobias Ringstrom wrote:
>
> > But I still do not understand why the process is classified as
> > non-interactive... Around 20 times per second it does a nanosleep for 1
> > ms which takes around 40 ms in reality. (Seeing this makes me believe
> > that I should try to increase HZ, but that is a separate issue.)
>
> what CPU usage does it have? 70% CPU usage is not interactive.
>
> well, even 70% CPU usage can be interactive if you lower its priority to
> -20. But with the default nice value a task will lose its interactivity
> much quicker.

If I understand the code in sched.c correctly, the dynamic prio [-5...5]
is calculated using sleep_avg, but the name is deceiving, it's more like
the edge of a knife. If a process is sleeping, its sleep_avg is
incremented by one per timer tick, and if it is running it is decremented
by one per timer tick. This means (for a periodic task) that if it sleeps
for less than 50% of the timer ticks, it will get a sleep_avg of zero
(dynamic prio +5), and if it is sleeping for more than 50%, it will get a
sleep_avg of MAX_SLEEP_AVG (dynamic prio -5).

For the case of a game server, this means that when the CPU utilization
gets above 50% (roughly), it will switch from -5 to +5 in dynamic priority
in a few seconds and stay there until the CPU utilization drops under 50%.

Is my analysis correct, and is this what we want?

Have you experimented with other averaging algorithms?

> also, could you increase HZ to 1000 (in asm/param.h, full recompile of the
> kernel is needed), does it make a difference?

I tried that yesterday (without the O(1) scheduler), and it does wonders
for the in-game latency (i.e. ping). I suppose that the dynamic prio will
still be +5 at 70% CPU utilization even with a HZ of 1000 using the O(1)
scheduler. Why would it make a difference?

/Tobias

2002-09-03 10:20:12

by Ingo Molnar

[permalink] [raw]

Subject: Re: Problem with the O(1) scheduler in 2.4.19

On Tue, 3 Sep 2002, Tobias Ringstrom wrote:

> For the case of a game server, this means that when the CPU utilization
> gets above 50% (roughly), it will switch from -5 to +5 in dynamic
> priority in a few seconds and stay there until the CPU utilization drops
> under 50%.
>
> Is my analysis correct, and is this what we want?

do you expect a task that uses up 50% CPU time over an extended period of
time to be rated 'interactive'?

we might make the '50%' rule to be '100% / nr_running_avg', so that if
your task is the only one in the system then it gets rated interactive -
but i suspect it will still be rated a CPU hog if it keeps trying to use
up 50% of CPU time even during busier periods. I have tried the
(1/nr_running) rule in earlier incarnations of the scheduler, and it didnt
make much difference, but we obviously need a boundary case like yours to
see the differences.

> I tried that yesterday (without the O(1) scheduler), and it does wonders
> for the in-game latency (i.e. ping). I suppose that the dynamic prio
> will still be +5 at 70% CPU utilization even with a HZ of 1000 using the
> O(1) scheduler. Why would it make a difference?

(it could in theory make a difference in some rare cases, in which the
frequency of sampling resonates with internal timings of the application -
i asked for this only to make sure there are no interactions.)

Ingo

2002-09-03 12:19:29

by Tobias Ringstrom

[permalink] [raw]

Subject: Re: Problem with the O(1) scheduler in 2.4.19

On Tue, 3 Sep 2002, Ingo Molnar wrote:

> do you expect a task that uses up 50% CPU time over an extended period of
> time to be rated 'interactive'?

Interactive is not the best word, but I would not expect a process like
the one I described to be considedred a CPU hog. It's a deadline driven
semi realtime process.

> we might make the '50%' rule to be '100% / nr_running_avg', so that if
> your task is the only one in the system then it gets rated interactive -
> but i suspect it will still be rated a CPU hog if it keeps trying to use
> up 50% of CPU time even during busier periods. I have tried the
> (1/nr_running) rule in earlier incarnations of the scheduler, and it didnt
> make much difference, but we obviously need a boundary case like yours to
> see the differences.

I think the problem I have (that I loose a lot of performance to processes
such as crond, httpd, etc.) is common to the whole class of semi-realtime
processes, at least if they use >50% CPU. This means that CPU intensive
audio and video (e.g. DVD) playback programs might have the same problem.

I see three simple ways to solve the problem without changing the
scheduler. Either run the process with nice -20, use SCHED_RR, or use a
dedicated server with no other processes (such as crond, httpd, etc).
The first two might be OK, but you need root privilegies to run renice and
to change the scheduler policy. The third one is not an option for all
users, and definately not for the video playback case.

A problem is that this new scheduler behaviour will hit people running
semi realtime processes as a regression when they switch to 2.6. It would
be nice to avoid that.

One solution might be to teach the scheduler how to detect these deadline
driven semi-realtime processes, and not punish them. It is not obvious to
me how to do that.

Another much simpler solution that might work just as well is be to change
the CPU utilization threshold from 50% to 90%.

You're the expert of course. I'm only fumbling in the dark... :-)

> (it could in theory make a difference in some rare cases, in which the
> frequency of sampling resonates with internal timings of the application -
> i asked for this only to make sure there are no interactions.)

I'll try it out and let you know if it does make a difference.

/Tobias

2002-09-03 15:55:18

by Mark Mielke

[permalink] [raw]

Subject: Re: Problem with the O(1) scheduler in 2.4.19

I wonder if it does not make sense to just give the process real time
priority? No scheduler will be excellent in all situations. I would not
consider a game, or game server, to be a standard application.

mark

On Tue, Sep 03, 2002 at 02:23:49PM +0200, Tobias Ringstrom wrote:
> On Tue, 3 Sep 2002, Ingo Molnar wrote:
>
> > do you expect a task that uses up 50% CPU time over an extended period of
> > time to be rated 'interactive'?
>
> Interactive is not the best word, but I would not expect a process like
> the one I described to be considedred a CPU hog. It's a deadline driven
> semi realtime process.
>
> > we might make the '50%' rule to be '100% / nr_running_avg', so that if
> > your task is the only one in the system then it gets rated interactive -
> > but i suspect it will still be rated a CPU hog if it keeps trying to use
> > up 50% of CPU time even during busier periods. I have tried the
> > (1/nr_running) rule in earlier incarnations of the scheduler, and it didnt
> > make much difference, but we obviously need a boundary case like yours to
> > see the differences.
>
> I think the problem I have (that I loose a lot of performance to processes
> such as crond, httpd, etc.) is common to the whole class of semi-realtime
> processes, at least if they use >50% CPU. This means that CPU intensive
> audio and video (e.g. DVD) playback programs might have the same problem.
>
> I see three simple ways to solve the problem without changing the
> scheduler. Either run the process with nice -20, use SCHED_RR, or use a
> dedicated server with no other processes (such as crond, httpd, etc).
> The first two might be OK, but you need root privilegies to run renice and
> to change the scheduler policy. The third one is not an option for all
> users, and definately not for the video playback case.
>
> A problem is that this new scheduler behaviour will hit people running
> semi realtime processes as a regression when they switch to 2.6. It would
> be nice to avoid that.
>
> One solution might be to teach the scheduler how to detect these deadline
> driven semi-realtime processes, and not punish them. It is not obvious to
> me how to do that.
>
> Another much simpler solution that might work just as well is be to change
> the CPU utilization threshold from 50% to 90%.
>
> You're the expert of course. I'm only fumbling in the dark... :-)
>
> > (it could in theory make a difference in some rare cases, in which the
> > frequency of sampling resonates with internal timings of the application -
> > i asked for this only to make sure there are no interactions.)
>
> I'll try it out and let you know if it does make a difference.
>
> /Tobias
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
[email protected]/[email protected]/[email protected] __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

2002-09-03 16:41:56

by John Alvord

[permalink] [raw]

Subject: Re: Problem with the O(1) scheduler in 2.4.19

On Tue, 3 Sep 2002 12:28:18 +0200 (CEST), Ingo Molnar <[email protected]>
wrote:

>
>On Tue, 3 Sep 2002, Tobias Ringstrom wrote:
>
>> For the case of a game server, this means that when the CPU utilization
>> gets above 50% (roughly), it will switch from -5 to +5 in dynamic
>> priority in a few seconds and stay there until the CPU utilization drops
>> under 50%.
>>
>> Is my analysis correct, and is this what we want?
>
>do you expect a task that uses up 50% CPU time over an extended period of
>time to be rated 'interactive'?
>
>we might make the '50%' rule to be '100% / nr_running_avg', so that if
>your task is the only one in the system then it gets rated interactive -
>but i suspect it will still be rated a CPU hog if it keeps trying to use
>up 50% of CPU time even during busier periods. I have tried the
>(1/nr_running) rule in earlier incarnations of the scheduler, and it didnt
>make much difference, but we obviously need a boundary case like yours to
>see the differences.
>
>> I tried that yesterday (without the O(1) scheduler), and it does wonders
>> for the in-game latency (i.e. ping). I suppose that the dynamic prio
>> will still be +5 at 70% CPU utilization even with a HZ of 1000 using the
>> O(1) scheduler. Why would it make a difference?
>
>(it could in theory make a difference in some rare cases, in which the
>frequency of sampling resonates with internal timings of the application -
>i asked for this only to make sure there are no interactions.)
>
It seems to me that this condition could arise for any server process
which is used by many interactive processes. Imagine 300 users and a
server process which needs 70% to do the work. This could be a
database server as well as the current game server.

john

2002-09-03 16:43:54

by Ingo Molnar

[permalink] [raw]

Subject: Re: Problem with the O(1) scheduler in 2.4.19

On Tue, 3 Sep 2002, Tobias Ringstrom wrote:

> [...] It's a deadline driven semi realtime process.

> [...] I see three simple ways to solve the problem without changing the
> scheduler. Either run the process with nice -20, use SCHED_RR, or use a
> dedicated server with no other processes (such as crond, httpd, etc).
> The first two might be OK, but you need root privilegies to run renice
> and to change the scheduler policy. The third one is not an option for
> all users, and definately not for the video playback case.

do you see the conflict between your two statements?

if it's a "semi-realtime" process that needs more CPU time and needs it
sooner than other 'unimportant' processes in the system like httpd or
remote shells, then give it a higher priority.

under the O(1) scheduler this will now do something meaningful. Yes, this
needs root privileges, otherwise it could be abused to lift priority and
effectively lock out eg. the root shell.

under the old scheduler the nice levels were just a rough mechanism to
determine how CPU hogs use the CPU - interactiveness-wise it did not make
a big difference.

but, i have a spare plan for this, mentioned previously: to enable
unprivileged processes to lower their priority to -5 if they want to.
Could you please test your game server, does it feel interactive enough at
-5?

(allowing -10 might be too much of a stretch.)

Ingo

2002-09-03 16:51:40

by Ingo Molnar

[permalink] [raw]

Subject: Re: Problem with the O(1) scheduler in 2.4.19

On Tue, 3 Sep 2002, John Alvord wrote:

> It seems to me that this condition could arise for any server process
> which is used by many interactive processes. Imagine 300 users and a
> server process which needs 70% to do the work. This could be a database
> server as well as the current game server.

well, if there is enough CPU power around then there is no problem -
everyone gets enough CPU time.

if CPU power becomes scarce then the kernel will do like it does for every
other resource: it starts to partition the resource, and no-one will get
the absolute maximum it has asked for.

the 2.5 scheduler adds another thing to the mix: if a task behaves in an
'interactive' way then it will get more CPU time than what it got in 2.4 -
if it behaves like a 'CPU hog' then it will get less CPU time than what it
used to get in 2.4.

the penalty is at most +-5 priority levels, so you can always offset (much
of) this effect by moving the task 10 priority levels lower. (Hence the
magic '-10' priority level i keep suggesting, and hence the magic -5
priority levels i'd like to allow ordinary tasks to lower their priority.)

[the scheduler also has other code to ensure fairness in highly loaded
situations, it makes sure that no task waits CPU-less for more than 3
seconds due to the interactiveness bonuses. This effect does not play in
this current situation, it needs a couple of tens of currently running
agressive tasks to trigger on most normal boxes.]

those tasks that need a disproportionate amount of CPU time need to be
reniced, so that the penalty for being an 'unfair' CPU user is offset.
There is no way the scheduler could figure out how important a task is -
some people have a game server have higher priority, other people would
give httpd (or remote shells) a higher priority. Since this information is
only available in the administrator's head, it needs help from the
administrator to handle the situation. The kernel has a good default, but
it cannot work in every case, this is why we have the ability to renice
tasks.

Ingo

2002-09-03 16:53:39

by Tobias Ringstrom

[permalink] [raw]

Subject: Re: Problem with the O(1) scheduler in 2.4.19

On Tue, 3 Sep 2002, Mark Mielke wrote:

> I wonder if it does not make sense to just give the process real time
> priority? No scheduler will be excellent in all situations. I would not
> consider a game, or game server, to be a standard application.

If you are talking about SCHED_RR, I think it would lock up the server
since it only sleeps 1 ms which is done as a busy sleep for SCHED_RR
tasks. The game server would have to be designed to use SCHED_RR in a
sensible way, in that case. The source code is not availible... :-(

/Tobias

2002-09-03 17:50:44

by Tobias Ringstrom

[permalink] [raw]

Subject: Re: Problem with the O(1) scheduler in 2.4.19

On Tue, 3 Sep 2002, Ingo Molnar wrote:

> On Tue, 3 Sep 2002, Tobias Ringstrom wrote:
>
> > [...] It's a deadline driven semi realtime process.
>
> > [...] I see three simple ways to solve the problem without changing the
> > scheduler. Either run the process with nice -20, use SCHED_RR, or use a
> > dedicated server with no other processes (such as crond, httpd, etc).
> > The first two might be OK, but you need root privilegies to run renice
> > and to change the scheduler policy. The third one is not an option for
> > all users, and definately not for the video playback case.
>
> do you see the conflict between your two statements?

Certainly, it's very hard for the kernel to do the right thing. Perhaps
the only viable solution is for the user to solve the problem.

Would it really be so unfair go give the user a way to state that a
process is interactive? The kernel obviously make mistakes. The system
is not fair for users anyway. If a user wants to compete with other
users, he can create more processes to get more CPU.

I'm really concerned about the video decompression/playback situation,
which is quite similar, and can easily take >50% CPU. It also very
inconvenient to have to have superuser support to get good frame rate
stability. A way to define a process as interactive is one way to solve
that problem. Another solution is to let ordinary users use negative nice
values, as you mention below.

> but, i have a spare plan for this, mentioned previously: to enable
> unprivileged processes to lower their priority to -5 if they want to.
> Could you please test your game server, does it feel interactive enough at
> -5?

It helps a little, but the problem is still very visible.

> (allowing -10 might be too much of a stretch.)

Why? If it's using more than 50% CPU, the prio will be the same as a
zero-niced interactive process.

The minimum user nice value might be a good candidate for a new rlimit...

/Tobias

2002-09-03 17:57:46

by Ingo Molnar

[permalink] [raw]

Subject: Re: Problem with the O(1) scheduler in 2.4.19

On Tue, 3 Sep 2002, Tobias Ringstrom wrote:

> > (allowing -10 might be too much of a stretch.)
>
> Why? If it's using more than 50% CPU, the prio will be the same as a
> zero-niced interactive process.

well, perhaps -10 could also be allowed.

does -10 make it equivalent to the 2.4 behavior? Could you somehow measure
the priority where it's still acceptable? Ie. -8 or -9?

> The minimum user nice value might be a good candidate for a new
> rlimit...

yes.

Ingo

2002-09-04 00:31:44

by Roger Larsson

[permalink] [raw]

Subject: [SOURCE] RT monitor (Was: Re: Problem with the O(1) scheduler in 2.4.19)

On Tuesday 03 September 2002 14.23, Tobias Ringstrom wrote:
> I see three simple ways to solve the problem without changing the
> scheduler. Either run the process with nice -20, use SCHED_RR, or use a
> dedicated server with no other processes (such as crond, httpd, etc).
> The first two might be OK, but you need root privilegies to run renice and
> to change the scheduler policy. The third one is not an option for all
> users, and definately not for the video playback case.
>

Here comes some code that works as a RT requester/monitor and
an small utility to try it out.

With this monitor any process can request RT priorities.
If those (or other) processes overloads the system,
all will be returned to normal priorities.

Note:
* this code is still experimental. I had a situation where
a previous monitor reduced its own priority... (rendering it useless)
* It does probably not work on SMP - I have not given that
much of a thought yet...

compile the source:
gcc -Wall rt.c -o rt
gcc -Wall rt_monitor.c -o rt_monitor

then as root:
mkfifo -m 622 /var/named/rt-request
./rt_monitor

start another shell (as a normal user - not root)
to check the function of the monitor (sleeps 3 s then loops,
the monitor should reduce the priority in about 4 seconds)
./rt -c

to set RT priority on any process do
(note: this should be quite safe since the monitor does the raising
so it has to be running :-)
./rt -p anypid

/RogerL
--
Roger Larsson
Skellefte?
Sweden

Attachments:

(No filename) (1.49 kB)
rt.c (2.54 kB)
rt_monitor.c (8.83 kB)
Download all attachments

2002-09-04 20:17:25

by Bill Davidsen

[permalink] [raw]

Subject: Re: Problem with the O(1) scheduler in 2.4.19

On Tue, 3 Sep 2002, Ingo Molnar wrote:

>
> On Tue, 3 Sep 2002, John Alvord wrote:
>
> > It seems to me that this condition could arise for any server process
> > which is used by many interactive processes. Imagine 300 users and a
> > server process which needs 70% to do the work. This could be a database
> > server as well as the current game server.

As I see it, there are two possibilities here, that the game server is
being run on a dedicated server hardware, or at least with the blessing of
the root user. In that case root can adjust nice appropriately. The other
possibility is that root is counting on the scheduler to protect the other
processes in the system from being starved. In that case it's working
nicely.

> the 2.5 scheduler adds another thing to the mix: if a task behaves in an
> 'interactive' way then it will get more CPU time than what it got in 2.4 -
> if it behaves like a 'CPU hog' then it will get less CPU time than what it
> used to get in 2.4.

Yes, and it works really well! Job mixes which used to result in poor
response now work just fine, nice actually does something, and behaviour
of processes which are intended to get resources can be given negative
nice (nasty?) to make them run well.

> the penalty is at most +-5 priority levels, so you can always offset (much
> of) this effect by moving the task 10 priority levels lower. (Hence the
> magic '-10' priority level i keep suggesting, and hence the magic -5
> priority levels i'd like to allow ordinary tasks to lower their priority.)

Seems to defeat all the wonderful work which went into this. On any shared
system there will be people who know how to trick the scheduler into
running their jobs faster. Used to do that myself when machine were really
slow ;-) Actually if I understand the way the scheduler works, and I think
I do at the high level, if this server was a well-behaved threaded app
individual threads would show as interactive, they could have various
priority depending on the behaviour of the threads, and things would run
pretty well. If that server is doing a huge select or poll of 300 users I
bet all the CPU is in the system call anyway.

> [the scheduler also has other code to ensure fairness in highly loaded
> situations, it makes sure that no task waits CPU-less for more than 3
> seconds due to the interactiveness bonuses. This effect does not play in
> this current situation, it needs a couple of tens of currently running
> agressive tasks to trigger on most normal boxes.]
>
> those tasks that need a disproportionate amount of CPU time need to be
> reniced, so that the penalty for being an 'unfair' CPU user is offset.
> There is no way the scheduler could figure out how important a task is -
> some people have a game server have higher priority, other people would
> give httpd (or remote shells) a higher priority. Since this information is
> only available in the administrator's head, it needs help from the
> administrator to handle the situation. The kernel has a good default, but
> it cannot work in every case, this is why we have the ability to renice
> tasks.

And I suspect that if users can push their own jobs, they will. I really
don't think the scheduler is doing the wrong thing, and there is a well
defined way to make the process have higher priority.

This isn't a kernel issue, it's an administration issue.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-09-10 22:54:04

by Tobias Ringstrom

[permalink] [raw]

Subject: Re: Problem with the O(1) scheduler in 2.4.19

On Tue, 3 Sep 2002, Ingo Molnar wrote:

> does -10 make it equivalent to the 2.4 behavior? Could you somehow measure
> the priority where it's still acceptable? Ie. -8 or -9?

I've done some more experimenting, and I've found something interesting.
I've attached two very simple CPU hog programs.

The program latency runs in a tight loop calling gettimeofday, and prints
the loop time if it exceeds 8 ms. This program simulates a game server,
video decoding program or whatever.

The program hog sleeps for five seconds, and then runs in a tight loop.
This program simulates a cron job. This program is always run at the
default nice level (0).

I will now run the latency program at the three different nice levels -20
(high prio), 0 (normal) and 20 (low prio). A few seconds after latency is
started, hog is started. Note that there are no visible latency when hog
program is started, the latency comes from the loop five seconds after
the start:

[root@boris Prog]# nice -n -20 ./latency
00:22:16: dt = 608.864 ms
00:22:17: dt = 150.978 ms
00:22:18: dt = 150.983 ms
00:22:19: dt = 150.979 ms
00:22:20: dt = 150.981 ms

[root@boris Prog]# nice -n 0 ./latency
00:22:49: dt = 604.865 ms
00:22:50: dt = 150.966 ms
00:22:50: dt = 150.964 ms
00:22:51: dt = 150.963 ms
00:22:51: dt = 152.981 ms

[root@boris Prog]# nice -n 19 ./latency
00:23:44: dt = 678.848 ms
00:23:44: dt = 150.964 ms
00:23:44: dt = 150.978 ms
00:23:44: dt = 150.978 ms
00:23:45: dt = 150.978 ms

Here we can see that the time slice for hog is stabilized at 150 ms, and
that as the latency program is niced, the hog program gets its time slices
more often. I think this is what's supposed to happen, but the problem is
the >600 ms timeslice that hog gets when it starts to run. Comments?

One could also argue that 150 ms is a bit too much. For video playback at
25 FPS, that means three lost frames. I do understand the benefits of
long timeslices, of course. It's a hard choice...

This is on a HZ=1000 2.4.19+sched-2.4.19-rc2-A4 kernel.

/Tobias

Attachments:

hog.c (67.00 B)
latency.c (557.00 B)
Download all attachments

2002-09-11 21:09:27

by Tobias Ringstrom

[permalink] [raw]

Subject: Re: Problem with the O(1) scheduler in 2.4.19

On Wed, 11 Sep 2002, Tobias Ringstrom wrote:

> On Tue, 3 Sep 2002, Ingo Molnar wrote:
>
> > does -10 make it equivalent to the 2.4 behavior? Could you somehow measure
> > the priority where it's still acceptable? Ie. -8 or -9?
>
> I've done some more experimenting, and I've found something interesting.
> I've attached two very simple CPU hog programs.

...and now I've done some code study. I think the following is what
happens:

1. hog is sleeping, and is interactive
2. latency is running and is non-interactive
3. hog becomes runnable
4. latency is preemted and put on the expired list
5. hog runs uses it's timeslice (151 ms), but sice
it is interactive it stays on the active list and
continues to run.
6. after 4/11*2 s = 0.7 s (and a few expired timeslices)
hog is no longer interactive and is moved to the
expired list
7. latency runs after a 0.7 s break.

Do you agree?

In other words: Any nice-0 task that has been sleeping for two seconds or
more will be able to monololize the CPU for up to 0.7 seconds. Do you
agree that this is a problem, or am I being too narrow-minded? :-)

/Tobias

2002-09-12 07:56:41

by Ingo Molnar

[permalink] [raw]

Subject: Re: Problem with the O(1) scheduler in 2.4.19

On Wed, 11 Sep 2002, Tobias Ringstrom wrote:

> In other words: Any nice-0 task that has been sleeping for two seconds
> or more will be able to monololize the CPU for up to 0.7 seconds. Do
> you agree that this is a problem, or am I being too narrow-minded? :-)

well, 'monopolize' the CPU from CPU-hogs - yes. Take the CPU from other
interactive tasks: no.

Ingo

2002-09-12 08:58:38

by Tobias Ringstrom

[permalink] [raw]

Subject: Re: Problem with the O(1) scheduler in 2.4.19

On Thu, 12 Sep 2002, Ingo Molnar wrote:

> On Wed, 11 Sep 2002, Tobias Ringstrom wrote:
>
> > In other words: Any nice-0 task that has been sleeping for two seconds
> > or more will be able to monololize the CPU for up to 0.7 seconds. Do
> > you agree that this is a problem, or am I being too narrow-minded? :-)
>
> well, 'monopolize' the CPU from CPU-hogs - yes. Take the CPU from other
> interactive tasks: no.

(Thanks Ingo for your quick answers!)

I don't mind that interactive processes can take the CPU from CPU hogs,
but I do think that there is room for classification improvements.

A few observations (with suggested solutions):

1. The nice levels are not symmetric. Compared to a nice 0 process, a
nice 19 process will get 6% CPU, but compared to a nice -20 process, a
nice 0 process will get 33 % CPU. This can be solved by scaling the
conversion from nice level to priority in a different way. The
drawback of this is shorter time slices for nice 0 processes.

2. Nice -20 is really impotent. In addition to the point above, the
interactive classification stuff is what makes it really impotent.
That a nice -20 process loses 0.7 seconds to a nice 0 task says it all.
How about making -20 processes interactive unconditionally?

3. More than 90% of all tasks in a system are classified as interactive at
any given time (since they are sleeping). For example all cron jobs
are classified as interactive, which sounds really strange. IMHO, it's
a good example of a non-interactive background job. (I'll run my crond
at nice 19 for now.)

I'm curious, why are you using the process average sleep time to
determine interactiveness and not the presense of prematurely abandoned
timeslices?

4. Using SCHED_RR is one way out, but I suspect that the busy-loop
nanosleep implementation for "realtime" processes will lock up the
machine in my case. I suggest that the 2 ms limit is removed. It can
be done in userspace as a gettimeofday loop for applications which
care.

I'll continue thinking about this to see if I can come up with something
constructive, but it would be extremely valuable to get your view since
you are the expert and you have been working on this for a long time.

/Tobias

2002-09-13 12:04:19

by Bill Davidsen

[permalink] [raw]

Subject: Re: Problem with the O(1) scheduler in 2.4.19

On Thu, 12 Sep 2002, Tobias Ringstrom wrote:

> 3. More than 90% of all tasks in a system are classified as interactive at
> any given time (since they are sleeping). For example all cron jobs
> are classified as interactive, which sounds really strange. IMHO, it's
> a good example of a non-interactive background job. (I'll run my crond
> at nice 19 for now.)
>
> I'm curious, why are you using the process average sleep time to
> determine interactiveness and not the presense of prematurely abandoned
> timeslices?

I'll ask that, too. Not because I doubt you have a good reason, but
because it doesn't jump out at me. I would like the CPU to go to the
process most likely to start an i/o and block, so the CPU hog can run
while the i/o takes place, because that seems to get the highest overlap
of CPU and i/o. I assume the current scheduler that as one of the goal,
clearly not the only one.

A few words of clarification would be educational.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.