2009-01-30 05:49:56

by Nathanael Hoyle

[permalink] [raw]
Subject: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling

All (though perhaps of special interest to a few such as Ingo, Peter,
and David),

I am posting regarding an issue I have been dealing with recently,
though this post is not really a request for troubleshooting. Instead
I'd like to ramble for just a moment about my understanding of the
current 2.6 scheduler, describe the behavior I'm seeing, and discuss a
couple of the architectural solutions I've considered, as well as pose
the question whether anyone else views this as a general-case problem
worthy of being addressed, or whether this is something that gets
ignored by and large. It is my hope that this is not too off-topic for
this group.

First, let me explain the issue I encountered. I am running a relatively
powerful system for a home desktop, an Intel Core 2 Quad Q9450 with 4 GB
of RAM. If it matters for the discussion, it also has 4 drives in an
mdraid raid-5 array, and decent I/O throughput. In normal circumstances
it is quite responsive as a desktop (kde 3.5.4 atm). It is further a
very carefully configured kernel build, including only those things
which I truly need, and excluding everything else. I often use it to
watch DVD movies, and have had no trouble with performance in general.

Recently I installed the Folding@Home client, which many of you may be
familiar with, intended to utilize spare CPU cycles to perform protein
folding simulations in order to further medical research. It is not a
multi-threaded client at this point, so it simply runs four instances on
my system, since it has four cores. It is configured to run at
nice-level 19.

Because it is heavily optimized, and needs little external data to
perform its work, it spends almost all of its time cpu-bound, with
little to no io-wait or blocking on network calls, etc. I had been
using it for about a week with no real difficulty until I went to watch
another DVD and found that the video was slightly stuttery/jerky so long
as foldingathome was running in the background. Once I shut it down,
the video playback resumed its normal smooth form.

There are a couple simple solutions to this:

Substantially boosting the process priority of the mplayer process also
returns the video to smooth playback, but this is undesirable in that it
requires manual intervention each time, and root privileges. It fails to
achieve what I want, which is for the foldingathome computation to not
interfere with anything else I may try to do. I want my compiles to be
as *exactly* as fast as they were without it as possible, etc.

Stopping foldingathome before I do something performance sensitive is
also possible, but again smacks of workaround rather than solution. The
scheduler should be able to resolve the goal without me stopping the
other work.

I have done a bit of research on how the kernel scheduler works, and why
I am seeing this behavior. I had previously, apparently ignorantly,
equated 'nice 19' with being akin to Microsoft Windows' 'idle' thread
priority, and assumed it would never steal CPU cycles from a process
with a higher(lower, depending on nomenclature) priority.

It is my current understanding that when mplayer is running (also
typically CPU bound, occassionally it becomes I/O bound briefly), one of
the instances of foldingathome, which is sharing the CPU (core) with
mplayer starts getting starved, and the scheduler dynamically rewards it
with up to four additional priority levels based on the time remaining
in its quantum which it was not allowed to execute for.

At this point, when mplayer blocks for just a moment, say to page in the
data for the next video frame, foldingathome gets scheduled again, and
gets to run for at least MIN_TIMESLICE (plus, due to the lack of kernel
pre-emptibility, possibly longer). It appears that it takes too long to
switch back to mplayer and the result is the stuttering picture I
observe.

I have tried adjusting CONFIG_HZ_xxx from 300 (where I had it) to 1000,
and noted some improvement, but not complete remedy.

In my prior searching on this, I found only one poster with the same
essential problem (from 2004, and regarding distributed.net in the
background, which is essentially the same problem). The only technical
answer given him was to perhaps try tuning the MIN_TIMESLICE value
downward. It is my understanding that this parameter is relatively
important in order to avoid cache thrashing, and I do not wish to alter
it and have not so far.

Given all of the above, I am unconvinced that I see a good overall
solution. However, one thing that seems to me a glaring weakness of the
scheduler is that only realtime priority threads can be given static
priorities. What I really want for foldingathome, and similar tasks, is
static, low priority. Something that would not boost up, no matter how
well behaved it was or how much it had been starved, or how close to the
same memory segments the needed code was.

I think that there are probably (at least) three approaches here. One I
consider unnacceptable at the outset, which is to alter the semantics of
nice 19 such that it does not boost. Since this would break existing
assumptions and code, I do not think it is feasible.

Secondly, one could add additional nice levels which would correspond to
new static priorities below the bottom of the current user ones. This
should not interfere with the O(1) scheduler implementation as I
understand it, because current I believe 5 32-bit words are used to flag
the queue usage, and 140 priorities leaves 20 more bits available for
new priorities. This has its own problems however, in that existing
tools which examine process priorities could break on priorities outside
the known 'nice' range of -20 to 19.

Finally, new scheduling classes could be introduced, together with new
system calls so that applications could select a different scheduling
class at startup. In this way, applications could volunteer to use a
scheduling class which never received dynamic 'reward' boosts that would
raise their priorities. I believe Solaris has done this since Solaris
9, with the 'FX' scheduling class.

Stepping back:

1) Is my problem 'expected' based on others' understanding of the
current design of the scheduler, or do I have a one-off problem to
troubleshoot here?

2) Am I overlooking obvious alternative (but clean) fixes?

3) Does anyone else see the need for static, but low process priorities?

4) What is the view of introducing a new scheduler class to handle this?

I welcome any further feedback on this. I will try to follow replies
on-list, but would appreciate being CC'd off-list as well. Please make
the obvious substitution to my email address in order to bypass the
spam-killer.

Thanks,
Nathanael Hoyle


2009-01-30 06:16:56

by Jan Engelhardt

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling


On Friday 2009-01-30 06:49, Nathanael Hoyle wrote:
>
>I have done a bit of research on how the kernel scheduler works, and
>why I am seeing this behavior. I had previously, apparently
>ignorantly, equated 'nice 19' with being akin to Microsoft Windows'
>'idle' thread priority, and assumed it would never steal CPU cycles
>from a process with a higher(lower, depending on nomenclature)
>priority. [...]
>
>One[...] is to alter the semantics of nice 19 such that it does not
>boost. Since this would break existing assumptions and code, I do
>not think it is feasible. [...] Finally, new scheduling classes
>could be introduced[...]

Surprise. There is already SCHED_BATCH (intended for computing tasks
as I gathered) and SCHED_IDLE (for idle stuff).



>Please make the obvious substitution to my email address in order to
>bypass the spam-killer.

(Obviously this is not obvious... there are no 'nospam' keywords or
similar in it that could be removed.)

2009-01-30 06:24:20

by Mike Galbraith

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling

On Fri, 2009-01-30 at 00:49 -0500, Nathanael Hoyle wrote:

> Recently I installed the Folding@Home client, which many of you may be
> familiar with, intended to utilize spare CPU cycles to perform protein
> folding simulations in order to further medical research. It is not a
> multi-threaded client at this point, so it simply runs four instances on
> my system, since it has four cores. It is configured to run at
> nice-level 19.
>
> Because it is heavily optimized, and needs little external data to
> perform its work, it spends almost all of its time cpu-bound, with
> little to no io-wait or blocking on network calls, etc. I had been
> using it for about a week with no real difficulty until I went to watch
> another DVD and found that the video was slightly stuttery/jerky so long
> as foldingathome was running in the background. Once I shut it down,
> the video playback resumed its normal smooth form.

Sounds like a problem was recently fixed. Can you try 2.6.29-rc3 or
2.6.28.2?

-Mike

2009-01-30 06:40:48

by Nathanael Hoyle

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling

On Fri, 2009-01-30 at 07:16 +0100, Jan Engelhardt wrote:
> On Friday 2009-01-30 06:49, Nathanael Hoyle wrote:
> >
> >I have done a bit of research on how the kernel scheduler works, and
> >why I am seeing this behavior. I had previously, apparently
> >ignorantly, equated 'nice 19' with being akin to Microsoft Windows'
> >'idle' thread priority, and assumed it would never steal CPU cycles
> >from a process with a higher(lower, depending on nomenclature)
> >priority. [...]
> >
> >One[...] is to alter the semantics of nice 19 such that it does not
> >boost. Since this would break existing assumptions and code, I do
> >not think it is feasible. [...] Finally, new scheduling classes
> >could be introduced[...]
>
> Surprise. There is already SCHED_BATCH (intended for computing tasks
> as I gathered) and SCHED_IDLE (for idle stuff).
>

The one discussion I saw referencing SCHED_BATCH seemed to imply that it
was a non-standard kernel patch by Con Kolivas in one of his -ck
variants that never made it into mainline and is not being maintained.
Is this inaccurate?

I was unfamiliar with SCHED_IDLE. Having done a little Googling now, I
finally find reference to the man page for sched_setscheduler(2). This
appears that it is likely what I wanted.

I think the information I had been able to find was somehwat out of
date. It had indicated that the only static priority levels were the
realtime ones.

Is there currently a standardized userspace tool to use to run a command
in order to alter its scheduling class? Obviously writing one would be
trivial, but didn't know if something like:

$ runidle ./foldingathome

would be available.

Thanks for your helpful reply.

>
>
> >Please make the obvious substitution to my email address in order to
> >bypass the spam-killer.
>
> (Obviously this is not obvious... there are no 'nospam' keywords or
> similar in it that could be removed.)

I made a failed attempt to post earlier in the evening, which included
the address '[email protected]'. When that one didn't
make it to the list (though I'm unsure it had to do with the address I
used) I retried with the clean address. I forgot to remove the note at
the bottom of the posting.

Sincerely,
Nathanael Hoyle

2009-01-30 06:48:59

by Nathanael Hoyle

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling

On Fri, 2009-01-30 at 11:47 +0530, V.Radhakrishnan wrote:
> Clear description of a "problem" which may not be directly "solvable"
> unless one becomes philosophical.
>
> The linux scheduler by default is a "FAIR" scheduler which means that
> every runnable process ready to occupy CPU time will at some point RUN
> and never be denied cpu time. This is possible by boosting the dynamic
> priority of runnable processes so that one day they rule the roost.
>
> However, the kernel also supports SCHED_FIFO and SCHED_RR which supports
> Real Time capabilities, albeit as root.
>
> The DVD player is a soft real time application where the display gets
> jittery whenever the frame display rate is not achievable.
>
> If you wish 100% smooth display, you could make it run as SCHED_FIFO
> which means that your foldingathome would wait quietly for the movie to
> get completed fully. What's "wrong" with that aproach, which is
> essentially what you want ?

My view of what's "wrong" with that approach is that it requires root
privileges to boost the scheduling priority of each and every process
(although in this case, mplayer is the issue) which I want to not be
affected by foldingathome's CPU usage. While I happen to be root on
this system, since it is my desktop, I would imagine there are instances
where the root user/administrator of a system wanted to be able to run
items which had no impact on other users, including allowing them to run
fast and responsive applications. Aside from that, it's a PITA to start
mplayer playing, go renice -19 it and resume watching my movie every
time.

>
> The key question is whether YOU want the foldingathome application to
> run in parallel with the dvd player or not.
>
I want it to be able to run in the background without my having to
intervene, but I want the core which shares mplayer to starve
foldingathome until my movie is over :-).

> Lets wait for more light from the gurus...!!
>
> V. Radhakrishnan
>
>
> On Fri, 2009-01-30 at 00:49 -0500, Nathanael Hoyle wrote:
> > All (though perhaps of special interest to a few such as Ingo, Peter,
> > and David),
> >
> > I am posting regarding an issue I have been dealing with recently,
> > though this post is not really a request for troubleshooting. Instead
> > I'd like to ramble for just a moment about my understanding of the
> > current 2.6 scheduler, describe the behavior I'm seeing, and discuss a
> > couple of the architectural solutions I've considered, as well as pose
> > the question whether anyone else views this as a general-case problem
> > worthy of being addressed, or whether this is something that gets
> > ignored by and large. It is my hope that this is not too off-topic for
> > this group.
> >
> > First, let me explain the issue I encountered. I am running a relatively
> > powerful system for a home desktop, an Intel Core 2 Quad Q9450 with 4 GB
> > of RAM. If it matters for the discussion, it also has 4 drives in an
> > mdraid raid-5 array, and decent I/O throughput. In normal circumstances
> > it is quite responsive as a desktop (kde 3.5.4 atm). It is further a
> > very carefully configured kernel build, including only those things
> > which I truly need, and excluding everything else. I often use it to
> > watch DVD movies, and have had no trouble with performance in general.
> >
> > Recently I installed the Folding@Home client, which many of you may be
> > familiar with, intended to utilize spare CPU cycles to perform protein
> > folding simulations in order to further medical research. It is not a
> > multi-threaded client at this point, so it simply runs four instances on
> > my system, since it has four cores. It is configured to run at
> > nice-level 19.
> >
> > Because it is heavily optimized, and needs little external data to
> > perform its work, it spends almost all of its time cpu-bound, with
> > little to no io-wait or blocking on network calls, etc. I had been
> > using it for about a week with no real difficulty until I went to watch
> > another DVD and found that the video was slightly stuttery/jerky so long
> > as foldingathome was running in the background. Once I shut it down,
> > the video playback resumed its normal smooth form.
> >
> > There are a couple simple solutions to this:
> >
> > Substantially boosting the process priority of the mplayer process also
> > returns the video to smooth playback, but this is undesirable in that it
> > requires manual intervention each time, and root privileges. It fails to
> > achieve what I want, which is for the foldingathome computation to not
> > interfere with anything else I may try to do. I want my compiles to be
> > as *exactly* as fast as they were without it as possible, etc.
> >
> > Stopping foldingathome before I do something performance sensitive is
> > also possible, but again smacks of workaround rather than solution. The
> > scheduler should be able to resolve the goal without me stopping the
> > other work.
> >
> > I have done a bit of research on how the kernel scheduler works, and why
> > I am seeing this behavior. I had previously, apparently ignorantly,
> > equated 'nice 19' with being akin to Microsoft Windows' 'idle' thread
> > priority, and assumed it would never steal CPU cycles from a process
> > with a higher(lower, depending on nomenclature) priority.
> >
> > It is my current understanding that when mplayer is running (also
> > typically CPU bound, occassionally it becomes I/O bound briefly), one of
> > the instances of foldingathome, which is sharing the CPU (core) with
> > mplayer starts getting starved, and the scheduler dynamically rewards it
> > with up to four additional priority levels based on the time remaining
> > in its quantum which it was not allowed to execute for.
> >
> > At this point, when mplayer blocks for just a moment, say to page in the
> > data for the next video frame, foldingathome gets scheduled again, and
> > gets to run for at least MIN_TIMESLICE (plus, due to the lack of kernel
> > pre-emptibility, possibly longer). It appears that it takes too long to
> > switch back to mplayer and the result is the stuttering picture I
> > observe.
> >
> > I have tried adjusting CONFIG_HZ_xxx from 300 (where I had it) to 1000,
> > and noted some improvement, but not complete remedy.
> >
> > In my prior searching on this, I found only one poster with the same
> > essential problem (from 2004, and regarding distributed.net in the
> > background, which is essentially the same problem). The only technical
> > answer given him was to perhaps try tuning the MIN_TIMESLICE value
> > downward. It is my understanding that this parameter is relatively
> > important in order to avoid cache thrashing, and I do not wish to alter
> > it and have not so far.
> >
> > Given all of the above, I am unconvinced that I see a good overall
> > solution. However, one thing that seems to me a glaring weakness of the
> > scheduler is that only realtime priority threads can be given static
> > priorities. What I really want for foldingathome, and similar tasks, is
> > static, low priority. Something that would not boost up, no matter how
> > well behaved it was or how much it had been starved, or how close to the
> > same memory segments the needed code was.
> >
> > I think that there are probably (at least) three approaches here. One I
> > consider unnacceptable at the outset, which is to alter the semantics of
> > nice 19 such that it does not boost. Since this would break existing
> > assumptions and code, I do not think it is feasible.
> >
> > Secondly, one could add additional nice levels which would correspond to
> > new static priorities below the bottom of the current user ones. This
> > should not interfere with the O(1) scheduler implementation as I
> > understand it, because current I believe 5 32-bit words are used to flag
> > the queue usage, and 140 priorities leaves 20 more bits available for
> > new priorities. This has its own problems however, in that existing
> > tools which examine process priorities could break on priorities outside
> > the known 'nice' range of -20 to 19.
> >
> > Finally, new scheduling classes could be introduced, together with new
> > system calls so that applications could select a different scheduling
> > class at startup. In this way, applications could volunteer to use a
> > scheduling class which never received dynamic 'reward' boosts that would
> > raise their priorities. I believe Solaris has done this since Solaris
> > 9, with the 'FX' scheduling class.
> >
> > Stepping back:
> >
> > 1) Is my problem 'expected' based on others' understanding of the
> > current design of the scheduler, or do I have a one-off problem to
> > troubleshoot here?
> >
> > 2) Am I overlooking obvious alternative (but clean) fixes?
> >
> > 3) Does anyone else see the need for static, but low process priorities?
> >
> > 4) What is the view of introducing a new scheduler class to handle this?
> >
> > I welcome any further feedback on this. I will try to follow replies
> > on-list, but would appreciate being CC'd off-list as well. Please make
> > the obvious substitution to my email address in order to bypass the
> > spam-killer.
> >
> > Thanks,
> > Nathanael Hoyle
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
>

2009-01-30 06:52:18

by Nathanael Hoyle

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling

On Fri, 2009-01-30 at 07:24 +0100, Mike Galbraith wrote:
> On Fri, 2009-01-30 at 00:49 -0500, Nathanael Hoyle wrote:
>
> > Recently I installed the Folding@Home client, which many of you may be
> > familiar with, intended to utilize spare CPU cycles to perform protein
> > folding simulations in order to further medical research. It is not a
> > multi-threaded client at this point, so it simply runs four instances on
> > my system, since it has four cores. It is configured to run at
> > nice-level 19.
> >
> > Because it is heavily optimized, and needs little external data to
> > perform its work, it spends almost all of its time cpu-bound, with
> > little to no io-wait or blocking on network calls, etc. I had been
> > using it for about a week with no real difficulty until I went to watch
> > another DVD and found that the video was slightly stuttery/jerky so long
> > as foldingathome was running in the background. Once I shut it down,
> > the video playback resumed its normal smooth form.
>
> Sounds like a problem was recently fixed. Can you try 2.6.29-rc3 or
> 2.6.28.2?
>
> -Mike
>

I will try to do so as soon as I get the chance. Do you have any
specific info on the problem that you believe was fixed and/or the fix
applied?

Thanks,
-Nathanael

2009-01-30 07:09:19

by Mike Galbraith

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling

On Fri, 2009-01-30 at 01:52 -0500, Nathanael Hoyle wrote:
> On Fri, 2009-01-30 at 07:24 +0100, Mike Galbraith wrote:

> > Sounds like a problem was recently fixed. Can you try 2.6.29-rc3 or
> > 2.6.28.2?
> >
>
> I will try to do so as soon as I get the chance. Do you have any
> specific info on the problem that you believe was fixed and/or the fix
> applied.

The commit text below also applies to +nice tasks.

commit 046e7f77d734778a3b2e7d51ce63da3dbe7a8168
Author: Peter Zijlstra <[email protected]>
Date: Thu Jan 15 14:53:39 2009 +0100

sched: fix update_min_vruntime

commit e17036dac189dd034c092a91df56aa740db7146d upstream.

Impact: fix SCHED_IDLE latency problems

OK, so we have 1 running task A (which is obviously curr and the tree is
equally obviously empty).

'A' nicely chugs along, doing its thing, carrying min_vruntime along as it
goes.

Then some whacko speed freak SCHED_IDLE task gets inserted due to SMP
balancing, which is very likely far right, in that case

update_curr
update_min_vruntime
cfs_rq->rb_leftmost := true (the crazy task sitting in a tree)
vruntime = se->vruntime

and voila, min_vruntime is waaay right of where it ought to be.

OK, so why did I write it like that to begin with...

Aah, yes.

Say we've just dequeued current

schedule
deactivate_task(prev)
dequeue_entity
update_min_vruntime

Then we'll set

vruntime = cfs_rq->min_vruntime;

we find !cfs_rq->curr, but do find someone in the tree. Then we _must_
do vruntime = se->vruntime, because

vruntime = min_vruntime(vruntime := cfs_rq->min_vruntime, se->vruntime)

will not advance vruntime, and cause lags the other way around (which we
fixed with that initial patch: 1af5f730fc1bf7c62ec9fb2d307206e18bf40a69
(sched: more accurate min_vruntime accounting).

Signed-off-by: Peter Zijlstra <[email protected]>
Tested-by: Mike Galbraith <[email protected]>
Acked-by: Mike Galbraith <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 98345e4..06a68c4 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -283,7 +283,7 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
struct sched_entity,
run_node);

- if (vruntime == cfs_rq->min_vruntime)
+ if (!cfs_rq->curr)
vruntime = se->vruntime;
else
vruntime = min_vruntime(vruntime, se->vruntime);

2009-01-30 07:17:50

by V.Radhakrishnan

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling

Clear description of a "problem" which may not be directly "solvable"
unless one becomes philosophical.

The linux scheduler by default is a "FAIR" scheduler which means that
every runnable process ready to occupy CPU time will at some point RUN
and never be denied cpu time. This is possible by boosting the dynamic
priority of runnable processes so that one day they rule the roost.

However, the kernel also supports SCHED_FIFO and SCHED_RR which supports
Real Time capabilities, albeit as root.

The DVD player is a soft real time application where the display gets
jittery whenever the frame display rate is not achievable.

If you wish 100% smooth display, you could make it run as SCHED_FIFO
which means that your foldingathome would wait quietly for the movie to
get completed fully. What's "wrong" with that aproach, which is
essentially what you want ?

The key question is whether YOU want the foldingathome application to
run in parallel with the dvd player or not.

Lets wait for more light from the gurus...!!

V. Radhakrishnan


On Fri, 2009-01-30 at 00:49 -0500, Nathanael Hoyle wrote:
> All (though perhaps of special interest to a few such as Ingo, Peter,
> and David),
>
> I am posting regarding an issue I have been dealing with recently,
> though this post is not really a request for troubleshooting. Instead
> I'd like to ramble for just a moment about my understanding of the
> current 2.6 scheduler, describe the behavior I'm seeing, and discuss a
> couple of the architectural solutions I've considered, as well as pose
> the question whether anyone else views this as a general-case problem
> worthy of being addressed, or whether this is something that gets
> ignored by and large. It is my hope that this is not too off-topic for
> this group.
>
> First, let me explain the issue I encountered. I am running a relatively
> powerful system for a home desktop, an Intel Core 2 Quad Q9450 with 4 GB
> of RAM. If it matters for the discussion, it also has 4 drives in an
> mdraid raid-5 array, and decent I/O throughput. In normal circumstances
> it is quite responsive as a desktop (kde 3.5.4 atm). It is further a
> very carefully configured kernel build, including only those things
> which I truly need, and excluding everything else. I often use it to
> watch DVD movies, and have had no trouble with performance in general.
>
> Recently I installed the Folding@Home client, which many of you may be
> familiar with, intended to utilize spare CPU cycles to perform protein
> folding simulations in order to further medical research. It is not a
> multi-threaded client at this point, so it simply runs four instances on
> my system, since it has four cores. It is configured to run at
> nice-level 19.
>
> Because it is heavily optimized, and needs little external data to
> perform its work, it spends almost all of its time cpu-bound, with
> little to no io-wait or blocking on network calls, etc. I had been
> using it for about a week with no real difficulty until I went to watch
> another DVD and found that the video was slightly stuttery/jerky so long
> as foldingathome was running in the background. Once I shut it down,
> the video playback resumed its normal smooth form.
>
> There are a couple simple solutions to this:
>
> Substantially boosting the process priority of the mplayer process also
> returns the video to smooth playback, but this is undesirable in that it
> requires manual intervention each time, and root privileges. It fails to
> achieve what I want, which is for the foldingathome computation to not
> interfere with anything else I may try to do. I want my compiles to be
> as *exactly* as fast as they were without it as possible, etc.
>
> Stopping foldingathome before I do something performance sensitive is
> also possible, but again smacks of workaround rather than solution. The
> scheduler should be able to resolve the goal without me stopping the
> other work.
>
> I have done a bit of research on how the kernel scheduler works, and why
> I am seeing this behavior. I had previously, apparently ignorantly,
> equated 'nice 19' with being akin to Microsoft Windows' 'idle' thread
> priority, and assumed it would never steal CPU cycles from a process
> with a higher(lower, depending on nomenclature) priority.
>
> It is my current understanding that when mplayer is running (also
> typically CPU bound, occassionally it becomes I/O bound briefly), one of
> the instances of foldingathome, which is sharing the CPU (core) with
> mplayer starts getting starved, and the scheduler dynamically rewards it
> with up to four additional priority levels based on the time remaining
> in its quantum which it was not allowed to execute for.
>
> At this point, when mplayer blocks for just a moment, say to page in the
> data for the next video frame, foldingathome gets scheduled again, and
> gets to run for at least MIN_TIMESLICE (plus, due to the lack of kernel
> pre-emptibility, possibly longer). It appears that it takes too long to
> switch back to mplayer and the result is the stuttering picture I
> observe.
>
> I have tried adjusting CONFIG_HZ_xxx from 300 (where I had it) to 1000,
> and noted some improvement, but not complete remedy.
>
> In my prior searching on this, I found only one poster with the same
> essential problem (from 2004, and regarding distributed.net in the
> background, which is essentially the same problem). The only technical
> answer given him was to perhaps try tuning the MIN_TIMESLICE value
> downward. It is my understanding that this parameter is relatively
> important in order to avoid cache thrashing, and I do not wish to alter
> it and have not so far.
>
> Given all of the above, I am unconvinced that I see a good overall
> solution. However, one thing that seems to me a glaring weakness of the
> scheduler is that only realtime priority threads can be given static
> priorities. What I really want for foldingathome, and similar tasks, is
> static, low priority. Something that would not boost up, no matter how
> well behaved it was or how much it had been starved, or how close to the
> same memory segments the needed code was.
>
> I think that there are probably (at least) three approaches here. One I
> consider unnacceptable at the outset, which is to alter the semantics of
> nice 19 such that it does not boost. Since this would break existing
> assumptions and code, I do not think it is feasible.
>
> Secondly, one could add additional nice levels which would correspond to
> new static priorities below the bottom of the current user ones. This
> should not interfere with the O(1) scheduler implementation as I
> understand it, because current I believe 5 32-bit words are used to flag
> the queue usage, and 140 priorities leaves 20 more bits available for
> new priorities. This has its own problems however, in that existing
> tools which examine process priorities could break on priorities outside
> the known 'nice' range of -20 to 19.
>
> Finally, new scheduling classes could be introduced, together with new
> system calls so that applications could select a different scheduling
> class at startup. In this way, applications could volunteer to use a
> scheduling class which never received dynamic 'reward' boosts that would
> raise their priorities. I believe Solaris has done this since Solaris
> 9, with the 'FX' scheduling class.
>
> Stepping back:
>
> 1) Is my problem 'expected' based on others' understanding of the
> current design of the scheduler, or do I have a one-off problem to
> troubleshoot here?
>
> 2) Am I overlooking obvious alternative (but clean) fixes?
>
> 3) Does anyone else see the need for static, but low process priorities?
>
> 4) What is the view of introducing a new scheduler class to handle this?
>
> I welcome any further feedback on this. I will try to follow replies
> on-list, but would appreciate being CC'd off-list as well. Please make
> the obvious substitution to my email address in order to bypass the
> spam-killer.
>
> Thanks,
> Nathanael Hoyle
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2009-01-30 07:21:39

by Jan Engelhardt

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling


On Friday 2009-01-30 07:40, Nathanael Hoyle wrote:
>On Fri, 2009-01-30 at 07:16 +0100, Jan Engelhardt wrote:
>
>The one discussion I saw referencing SCHED_BATCH seemed to imply that it
>was a non-standard kernel patch by Con Kolivas in one of his -ck
>variants that never made it into mainline and is not being maintained.
>Is this inaccurate?

The presence of SCHED_BATCH in linux/sched.h tells me it is available
(on the other hand, SCHED_ISO, also from -ck, is only listed as a comment.)

>I was unfamiliar with SCHED_IDLE. Having done a little Googling now, I
>finally find reference to the man page for sched_setscheduler(2). This
>appears that it is likely what I wanted.
>
>I think the information I had been able to find was somehwat out of
>date.

The manpage does say it, but if your local distro does not
mention SCHED_BATCH/SCHED_IDLE, then that's a pretty sad distro.

The doc in sched_setschedule seems complete to me as of man-pages 3.13.

>Is there currently a standardized userspace tool to use to run a command
>in order to alter its scheduling class? Obviously writing one would be
>trivial, but didn't know if something like:

man chrt

2009-01-30 07:59:24

by Nathanael Hoyle

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling

On Fri, 2009-01-30 at 08:21 +0100, Jan Engelhardt wrote:
> On Friday 2009-01-30 07:40, Nathanael Hoyle wrote:
> >On Fri, 2009-01-30 at 07:16 +0100, Jan Engelhardt wrote:
> >
> >The one discussion I saw referencing SCHED_BATCH seemed to imply that it
> >was a non-standard kernel patch by Con Kolivas in one of his -ck
> >variants that never made it into mainline and is not being maintained.
> >Is this inaccurate?
>
> The presence of SCHED_BATCH in linux/sched.h tells me it is available
> (on the other hand, SCHED_ISO, also from -ck, is only listed as a comment.)
>

Fair enough. I should have re-checked recent sources after your mention
rather than going on the old thread I found.

> >I was unfamiliar with SCHED_IDLE. Having done a little Googling now, I
> >finally find reference to the man page for sched_setscheduler(2). This
> >appears that it is likely what I wanted.
> >
> >I think the information I had been able to find was somehwat out of
> >date.
>
> The manpage does say it, but if your local distro does not
> mention SCHED_BATCH/SCHED_IDLE, then that's a pretty sad distro.
>
> The doc in sched_setschedule seems complete to me as of man-pages 3.13.
>
> >Is there currently a standardized userspace tool to use to run a command
> >in order to alter its scheduling class? Obviously writing one would be
> >trivial, but didn't know if something like:
>
> man chrt

The latest version of man chrt that I can find implies that it handles
SCHED_BATCH but not SCHED_IDLE. To that end, if anyone else is
interested, I have thrown together the above-suggested 'runidle' which
will invoke the passed command using the SCHED_IDLE scheduler; it's
nothing fancy.

I am running foldingathome under it at the moment, and it seems to be
improving the situation somewhat, but I still need/want to test with
Mike's referenced patches.

runidle.c:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sched.h>
#include <linux/sched.h>

extern char **environ;

int
main(int argc, char **argv) {
if(argc<2) {
perror("Must specify at least one argument: the path to the program
to " \
"execute. Additional arguments may be specified which will be
passed " \
"to the called program.");
return EXIT_FAILURE;
}

sched_setscheduler(0, SCHED_IDLE, NULL);

if(argc==2) {
if(execve(argv[1], NULL, environ) == -1) {
perror("Failed to execve target!");
}
} else {
if(execve(argv[1], argv+1, environ) == -1) {
perror("Failed to execve target!");
}
}

/* should be unreachable */
return EXIT_FAILURE;
}

-Nathanael

2009-01-30 08:07:54

by Mike Galbraith

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling

On Fri, 2009-01-30 at 02:59 -0500, Nathanael Hoyle wrote:

> I am running foldingathome under it at the moment, and it seems to be
> improving the situation somewhat, but I still need/want to test with
> Mike's referenced patches.

You will most definitely encounter evilness running SCHED_IDLE tasks in
a kernel without the SCHED_IDLE fixes.

-Mike

2009-01-30 08:16:50

by Nathanael Hoyle

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling

On Fri, 2009-01-30 at 02:59 -0500, Nathanael Hoyle wrote:
> On Fri, 2009-01-30 at 08:21 +0100, Jan Engelhardt wrote:
> > On Friday 2009-01-30 07:40, Nathanael Hoyle wrote:
> > >On Fri, 2009-01-30 at 07:16 +0100, Jan Engelhardt wrote:
> > >
> > >The one discussion I saw referencing SCHED_BATCH seemed to imply that it
> > >was a non-standard kernel patch by Con Kolivas in one of his -ck
> > >variants that never made it into mainline and is not being maintained.
> > >Is this inaccurate?
> >
> > The presence of SCHED_BATCH in linux/sched.h tells me it is available
> > (on the other hand, SCHED_ISO, also from -ck, is only listed as a comment.)
> >
>
> Fair enough. I should have re-checked recent sources after your mention
> rather than going on the old thread I found.
>
> > >I was unfamiliar with SCHED_IDLE. Having done a little Googling now, I
> > >finally find reference to the man page for sched_setscheduler(2). This
> > >appears that it is likely what I wanted.
> > >
> > >I think the information I had been able to find was somehwat out of
> > >date.
> >
> > The manpage does say it, but if your local distro does not
> > mention SCHED_BATCH/SCHED_IDLE, then that's a pretty sad distro.
> >
> > The doc in sched_setschedule seems complete to me as of man-pages 3.13.
> >
> > >Is there currently a standardized userspace tool to use to run a command
> > >in order to alter its scheduling class? Obviously writing one would be
> > >trivial, but didn't know if something like:
> >
> > man chrt
>
> The latest version of man chrt that I can find implies that it handles
> SCHED_BATCH but not SCHED_IDLE. To that end, if anyone else is
> interested, I have thrown together the above-suggested 'runidle' which
> will invoke the passed command using the SCHED_IDLE scheduler; it's
> nothing fancy.
>
> I am running foldingathome under it at the moment, and it seems to be
> improving the situation somewhat, but I still need/want to test with
> Mike's referenced patches.
>

<snipped old version, because of a fixed goof, and this has better
formatting for mail client>


#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sched.h>
#include <linux/sched.h>

extern char **environ;

int
main(int argc, char **argv) {
struct sched_param sp;
sp.sched_priority = 0;

if(argc<2) {
perror("Must specify at least one argument: the path to " \
"the program to execute. Additional arguments may be " \
"specified which will be passed to the called program.");
return EXIT_FAILURE;
}

if(sched_setscheduler(0, SCHED_IDLE, &sp) == -1) {
perror("Failed to alter scheduling class!");
return EXIT_FAILURE;
}

if(argc==2) {
if(execve(argv[1], NULL, environ) == -1) {
perror("Failed to execve target!");
}
} else {
if(execve(argv[1], argv+1, environ) == -1) {
perror("Failed to execve target!");
}
}

/* should be unreachable */
return EXIT_FAILURE;
}

> -Nathanael

2009-01-30 08:50:49

by Peter Zijlstra

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling

On Fri, 2009-01-30 at 00:49 -0500, Nathanael Hoyle wrote:
>
> 1) Is my problem 'expected' based on others' understanding of the
> current design of the scheduler, or do I have a one-off problem to
> troubleshoot here?

What kernel are you running (or did my eye glance over that detail in
your longish email) ?

> 2) Am I overlooking obvious alternative (but clean) fixes?

Maybe, we fixed a glaring bug in this department recently (or more even,
if you're on older than .28).

> 3) Does anyone else see the need for static, but low process priorities?

Yep, its rather common.

> 4) What is the view of introducing a new scheduler class to handle this?

We should have plenty available, SCHED_IDLE should just work -- as
should nice 19 for that matter.

2009-01-30 08:56:06

by Nathanael Hoyle

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling

On Fri, 2009-01-30 at 09:07 +0100, Mike Galbraith wrote:
> On Fri, 2009-01-30 at 02:59 -0500, Nathanael Hoyle wrote:
>
> > I am running foldingathome under it at the moment, and it seems to be
> > improving the situation somewhat, but I still need/want to test with
> > Mike's referenced patches.
>
> You will most definitely encounter evilness running SCHED_IDLE tasks in
> a kernel without the SCHED_IDLE fixes.
>
> -Mike
>

Mike,

Any reason not to apply this fairly simple patch against the 2.6.27
series kernel I'm running now? Are there other relevant changes you're
aware of in the later kernel revs for this problem?

Thanks,
-Nathanael

2009-01-30 09:00:43

by Nathanael Hoyle

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling

On Fri, 2009-01-30 at 09:50 +0100, Peter Zijlstra wrote:
> On Fri, 2009-01-30 at 00:49 -0500, Nathanael Hoyle wrote:
> >
> > 1) Is my problem 'expected' based on others' understanding of the
> > current design of the scheduler, or do I have a one-off problem to
> > troubleshoot here?
>
> What kernel are you running (or did my eye glance over that detail in
> your longish email) ?
>

I didn't include it, I should have:

$ uname -a
Linux nightmare 2.6.27-gentoo-r7-nhoyle #2 SMP Wed Jan 28 19:04:37 EST
2009 x86_64 Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz GenuineIntel
GNU/Linux

> > 2) Am I overlooking obvious alternative (but clean) fixes?
>
> Maybe, we fixed a glaring bug in this department recently (or more even,
> if you're on older than .28).
>

Yep, .27 atm.

> > 3) Does anyone else see the need for static, but low process priorities?
>
> Yep, its rather common.
>
> > 4) What is the view of introducing a new scheduler class to handle this?
>
> We should have plenty available, SCHED_IDLE should just work -- as
> should nice 19 for that matter.
>

Thanks!
-Nathanael

2009-01-30 09:04:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling

On Fri, 2009-01-30 at 04:00 -0500, Nathanael Hoyle wrote:
> On Fri, 2009-01-30 at 09:50 +0100, Peter Zijlstra wrote:
> > On Fri, 2009-01-30 at 00:49 -0500, Nathanael Hoyle wrote:
> > >
> > > 1) Is my problem 'expected' based on others' understanding of the
> > > current design of the scheduler, or do I have a one-off problem to
> > > troubleshoot here?
> >
> > What kernel are you running (or did my eye glance over that detail in
> > your longish email) ?
> >
>
> I didn't include it, I should have:
>
> $ uname -a
> Linux nightmare 2.6.27-gentoo-r7-nhoyle #2 SMP Wed Jan 28 19:04:37 EST
> 2009 x86_64 Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz GenuineIntel
> GNU/Linux

Ah, then please do as Mike suggested, try 28.2 or 29-rc3, if you still
have trouble with those, please let us know.

2009-01-30 09:29:22

by Mike Galbraith

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling

On Fri, 2009-01-30 at 03:55 -0500, Nathanael Hoyle wrote:

> Any reason not to apply this fairly simple patch against the 2.6.27
> series kernel I'm running now? Are there other relevant changes you're
> aware of in the later kernel revs for this problem?

You'd have to dig out a few other changes in order to apply it to 27.

Yes there are other relevant changes, but unless you're familiar with
the source, you'll be better off just trying 28.stable of the latest rc.
You'll have to find extract and back-port otherwise.

-Mike

2009-01-30 10:23:31

by Nathanael Hoyle

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling

On Fri, 2009-01-30 at 10:03 +0100, Peter Zijlstra wrote:
> On Fri, 2009-01-30 at 04:00 -0500, Nathanael Hoyle wrote:
> > On Fri, 2009-01-30 at 09:50 +0100, Peter Zijlstra wrote:
> > > On Fri, 2009-01-30 at 00:49 -0500, Nathanael Hoyle wrote:
> > > >
> > > > 1) Is my problem 'expected' based on others' understanding of the
> > > > current design of the scheduler, or do I have a one-off problem to
> > > > troubleshoot here?
> > >
> > > What kernel are you running (or did my eye glance over that detail in
> > > your longish email) ?
> > >
> >
> > I didn't include it, I should have:
> >
> > $ uname -a
> > Linux nightmare 2.6.27-gentoo-r7-nhoyle #2 SMP Wed Jan 28 19:04:37 EST
> > 2009 x86_64 Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz GenuineIntel
> > GNU/Linux
>
> Ah, then please do as Mike suggested, try 28.2 or 29-rc3, if you still
> have trouble with those, please let us know.
>

Ok, I'm now running:

Linux nightmare 2.6.28.2-nhoyle #1 SMP Fri Jan 30 04:50:03 EST 2009
x86_64 Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz GenuineIntel
GNU/Linux

Initial conclusion is that whatever defects were corrected (non
SCHED_IDLE specific defects that is), the newer kernel version does the
trick. Video playback is as smooth as ever when running foldingathome
at simple nice 19 priority.

I am not sure that I can perceive a difference so far in testing that
versus using SCHED_IDLE. I will probably continue to use the latter
anyhow, as that represents more accurately the semantics that I'm trying
to achieve.

I had previously tried upgrading my kernel version in case that would
fix it, but even the latest available kernels in the portage tree for
Gentoo are older than 2.6.28.2. The 'stable' ones were all still 2.6.27.
I have had some concerns about Gentoo as a distro for some time, but it
still allows me more freedom and perfomance optimization than do most
other distros. I'll leave that at that for now to avoid starting any
religous wars over distros.

Once I downloaded and built the latest vanilla 2.6.28.2 sources from
kernel.org though, everything seems improved, as mentioned above.

Thanks to each of you who responded for all the help. I will continue to
experiment over the next week or so and provide feedback if I see
anything further unusual, but so far things seem good.

-Nathanael

2009-01-30 10:32:17

by Mike Galbraith

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling

On Fri, 2009-01-30 at 05:18 -0500, Nathanael Hoyle wrote:

> Ok, I'm now running:
>
> Linux nightmare 2.6.28.2-nhoyle #1 SMP Fri Jan 30 04:50:03 EST 2009
> x86_64 Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz GenuineIntel
> GNU/Linux
>
> Initial conclusion is that whatever defects were corrected (non
> SCHED_IDLE specific defects that is), the newer kernel version does the
> trick. Video playback is as smooth as ever when running foldingathome
> at simple nice 19 priority.

Good to hear, thanks for testing.

Peter, since 27 is a long term maintenance kernel, do you think 1af5f73
and 046e7f7 (at least) are 27.stable candidates?

-Mike


2009-01-30 10:40:57

by Peter Zijlstra

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling

On Fri, 2009-01-30 at 11:31 +0100, Mike Galbraith wrote:
> On Fri, 2009-01-30 at 05:18 -0500, Nathanael Hoyle wrote:
>
> > Ok, I'm now running:
> >
> > Linux nightmare 2.6.28.2-nhoyle #1 SMP Fri Jan 30 04:50:03 EST 2009
> > x86_64 Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz GenuineIntel
> > GNU/Linux
> >
> > Initial conclusion is that whatever defects were corrected (non
> > SCHED_IDLE specific defects that is), the newer kernel version does the
> > trick. Video playback is as smooth as ever when running foldingathome
> > at simple nice 19 priority.
>
> Good to hear, thanks for testing.
>
> Peter, since 27 is a long term maintenance kernel, do you think 1af5f73
> and 046e7f7 (at least) are 27.stable candidates?

1af5f730fc1bf7c62ec9fb2d307206e18bf40a69 and
e17036dac189dd034c092a91df56aa740db7146d you mean?

I guess that makes sense.

2009-01-30 10:51:18

by Mike Galbraith

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling

On Fri, 2009-01-30 at 11:40 +0100, Peter Zijlstra wrote:
> On Fri, 2009-01-30 at 11:31 +0100, Mike Galbraith wrote:
> > On Fri, 2009-01-30 at 05:18 -0500, Nathanael Hoyle wrote:
> >
> > > Ok, I'm now running:
> > >
> > > Linux nightmare 2.6.28.2-nhoyle #1 SMP Fri Jan 30 04:50:03 EST 2009
> > > x86_64 Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz GenuineIntel
> > > GNU/Linux
> > >
> > > Initial conclusion is that whatever defects were corrected (non
> > > SCHED_IDLE specific defects that is), the newer kernel version does the
> > > trick. Video playback is as smooth as ever when running foldingathome
> > > at simple nice 19 priority.
> >
> > Good to hear, thanks for testing.
> >
> > Peter, since 27 is a long term maintenance kernel, do you think 1af5f73
> > and 046e7f7 (at least) are 27.stable candidates?
>
> 1af5f730fc1bf7c62ec9fb2d307206e18bf40a69 and
> e17036dac189dd034c092a91df56aa740db7146d you mean?

Yeah.

> I guess that makes sense.

(adds cc)

-Mike

2009-01-30 13:56:48

by Jan Engelhardt

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling


On Friday 2009-01-30 09:16, Nathanael Hoyle wrote:
><snipped old version, because of a fixed goof, and this has better
>formatting for mail client>
>
>extern char **environ;
>[...]
> if(argc==2) {
> if(execve(argv[1], NULL, environ) == -1) {
> perror("Failed to execve target!");
> }
> } else {
> if(execve(argv[1], argv+1, environ) == -1) {
> perror("Failed to execve target!");
> }
> }

Are you sure your first execve even works? I would have used

if (execvp(argv[1], &argv[1]) < 0)
...

just so (a) I do not have to deal with the ugly 'extern char **environ'
[such should have been in a libc header imho] or use
int main(int argc, char **argv, char **envp); (b) execvp so that
it looks through $PATH, just as /bin/su (resp. the shell it starts) would do.

2009-01-30 14:15:26

by Jan Engelhardt

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling


On Friday 2009-01-30 07:48, Nathanael Hoyle wrote:
>On Fri, 2009-01-30 at 11:47 +0530, V.Radhakrishnan wrote:
>>
>> However, the kernel also supports SCHED_FIFO and SCHED_RR which supports
>> Real Time capabilities, albeit as root.
>> [...]
>> If you wish 100% smooth display, you could make it run as SCHED_FIFO
>> which means that your foldingathome would wait quietly for the movie to
>> get completed fully. What's "wrong" with that aproach, which is
>> essentially what you want ?
>
>My view of what's "wrong" with that approach is that it requires root
>privileges to boost the scheduling priority of each and every process
>(although in this case, mplayer is the issue) which I want to not be
>affected by foldingathome's CPU usage.

SCHED_FIFO is dangerous - it is easy to essentially lock up your box
simply because the process in question (e.g. video decoder) just runs
forever (e.g. bug causing busyloop, and/or others do not get to run
(no other processes in the same priority class). Or they (X.org, for
displaying your video and handling user input) run only for short
amounts of time only, giving a borked responsiveness experience to
the user.

It was about time SCHED_{BATCH,IDLE} came along ;-)

>While I happen to be root on
>this system, since it is my desktop, I would imagine there are instances
>where the root user/administrator of a system wanted to be able to run
>items which had no impact on other users, including allowing them to run
>fast and responsive applications. Aside from that, it's a PITA to start
>mplayer playing, go renice -19 it and resume watching my movie every
>time.

Even making mplayer -19 will not completely cause FAH to get zero CPU
time if the Regular Desktop Processes Everybody Needs would already
max out the CPU.

2009-01-30 14:16:07

by Jan Engelhardt

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling


On Friday 2009-01-30 08:59, Nathanael Hoyle wrote:
>> >Is there currently a standardized userspace tool to use to run a command
>> >in order to alter its scheduling class? Obviously writing one would be
>> >trivial, but didn't know if something like:
>>
>> man chrt
>
>The latest version of man chrt that I can find implies that it handles
>SCHED_BATCH but not SCHED_IDLE. To that end, if anyone else is
>interested, I have thrown together the above-suggested 'runidle' which
>will invoke the passed command using the SCHED_IDLE scheduler; it's
>nothing fancy.

Should have added -i to chrt instead and submit ;-)

2009-01-30 22:11:43

by Brian Rogers

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling

Mike Galbraith wrote:
> On Fri, 2009-01-30 at 02:59 -0500, Nathanael Hoyle wrote:
>
>> I am running foldingathome under it at the moment, and it seems to be
>> improving the situation somewhat, but I still need/want to test with
>> Mike's referenced patches.
>>
> You will most definitely encounter evilness running SCHED_IDLE tasks in
> a kernel without the SCHED_IDLE fixes.
>
Speaking of SCHED_IDLE fixes, is
6bc912b71b6f33b041cfde93ca3f019cbaa852bc going to be put into the next
stable 2.6.28 release? Without it on 2.6.28.2, I can still produce
minutes-long freezes with BOINC or other idle processes.

With the above commit on top of 2.6.28.2 and also
cce7ade803699463ecc62a065ca522004f7ccb3d, the problem is solved, though
I assume cce7ad isn't actually required to fix that, and I can test that
if desired.

2009-01-31 05:38:19

by Mike Galbraith

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling

On Fri, 2009-01-30 at 14:12 -0800, Brian Rogers wrote:
> Mike Galbraith wrote:
> > On Fri, 2009-01-30 at 02:59 -0500, Nathanael Hoyle wrote:
> >
> >> I am running foldingathome under it at the moment, and it seems to be
> >> improving the situation somewhat, but I still need/want to test with
> >> Mike's referenced patches.
> >>
> > You will most definitely encounter evilness running SCHED_IDLE tasks in
> > a kernel without the SCHED_IDLE fixes.
> >
> Speaking of SCHED_IDLE fixes, is
> 6bc912b71b6f33b041cfde93ca3f019cbaa852bc going to be put into the next
> stable 2.6.28 release? Without it on 2.6.28.2, I can still produce
> minutes-long freezes with BOINC or other idle processes.
>
> With the above commit on top of 2.6.28.2 and also
> cce7ade803699463ecc62a065ca522004f7ccb3d, the problem is solved, though
> I assume cce7ad isn't actually required to fix that, and I can test that
> if desired.

I think they both should go to stable, but dunno if they're headed that
direction or not.

One way to find out, CCs added.

-Mike

2009-01-31 09:08:46

by Mike Galbraith

[permalink] [raw]
Subject: Re: scheduler nice 19 versus 'idle' behavior / static low-priority scheduling

On Sat, 2009-01-31 at 06:38 +0100, Mike Galbraith wrote:
> On Fri, 2009-01-30 at 14:12 -0800, Brian Rogers wrote:
> > Mike Galbraith wrote:
> > > On Fri, 2009-01-30 at 02:59 -0500, Nathanael Hoyle wrote:
> > >
> > >> I am running foldingathome under it at the moment, and it seems to be
> > >> improving the situation somewhat, but I still need/want to test with
> > >> Mike's referenced patches.
> > >>
> > > You will most definitely encounter evilness running SCHED_IDLE tasks in
> > > a kernel without the SCHED_IDLE fixes.
> > >
> > Speaking of SCHED_IDLE fixes, is
> > 6bc912b71b6f33b041cfde93ca3f019cbaa852bc going to be put into the next
> > stable 2.6.28 release? Without it on 2.6.28.2, I can still produce
> > minutes-long freezes with BOINC or other idle processes.
> >
> > With the above commit on top of 2.6.28.2 and also
> > cce7ade803699463ecc62a065ca522004f7ccb3d, the problem is solved, though
> > I assume cce7ad isn't actually required to fix that, and I can test that
> > if desired.
>
> I think they both should go to stable, but dunno if they're headed that
> direction or not.
>
> One way to find out, CCs added.

For those who may want to run SCHED_IDLE tasks in .27, I've integrated
and lightly tested the fixes required to do so. One additional commit
was needed to get SCHED_IDLE vs nice 19 working right, namely f9c0b09.
Without that, SCHED_IDLE tasks received more CPU than nice 19 tasks.

Since .27 is in long-term maintenance, I'd integrate into stable, but
that's not my decision. Anyone who applies the below to their stable
kernel gets to keep all the pieces should something break ;-)

commit f9c0b0950d5fd8c8c5af39bc061f27ea8fddcac3
Author: Peter Zijlstra <[email protected]>
Date: Fri Oct 17 19:27:04 2008 +0200

sched: revert back to per-rq vruntime

Vatsa rightly points out that having the runqueue weight in the vruntime
calculations can cause unfairness in the face of task joins/leaves.

Suppose: dv = dt * rw / w

Then take 10 tasks t_n, each of similar weight. If the first will run 1
then its vruntime will increase by 10. Now, if the next 8 tasks leave after
having run their 1, then the last task will get a vruntime increase of 2
after having run 1.

Which will leave us with 2 tasks of equal weight and equal runtime, of which
one will not be scheduled for 8/2=4 units of time.

Ergo, we cannot do that and must use: dv = dt / w.

This means we cannot have a global vruntime based on effective priority, but
must instead go back to the vruntime per rq model we started out with.

This patch was lightly tested by doing starting while loops on each nice level
and observing their execution time, and a simple group scenario of 1:2:3 pinned
to a single cpu.

Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

---
kernel/sched_fair.c | 32 +++++++++++++++-----------------
1 file changed, 15 insertions(+), 17 deletions(-)

Index: linux-2.6.27/kernel/sched_fair.c
===================================================================
--- linux-2.6.27.orig/kernel/sched_fair.c
+++ linux-2.6.27/kernel/sched_fair.c
@@ -334,7 +334,7 @@ int sched_nr_latency_handler(struct ctl_
#endif

/*
- * delta *= w / rw
+ * delta *= P[w / rw]
*/
static inline unsigned long
calc_delta_weight(unsigned long delta, struct sched_entity *se)
@@ -348,15 +348,13 @@ calc_delta_weight(unsigned long delta, s
}

/*
- * delta *= rw / w
+ * delta /= w
*/
static inline unsigned long
calc_delta_fair(unsigned long delta, struct sched_entity *se)
{
- for_each_sched_entity(se) {
- delta = calc_delta_mine(delta,
- cfs_rq_of(se)->load.weight, &se->load);
- }
+ if (unlikely(se->load.weight != NICE_0_LOAD))
+ delta = calc_delta_mine(delta, NICE_0_LOAD, &se->load);

return delta;
}
@@ -386,26 +384,26 @@ static u64 __sched_period(unsigned long
* We calculate the wall-time slice from the period by taking a part
* proportional to the weight.
*
- * s = p*w/rw
+ * s = p*P[w/rw]
*/
static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
- return calc_delta_weight(__sched_period(cfs_rq->nr_running), se);
+ unsigned long nr_running = cfs_rq->nr_running;
+
+ if (unlikely(!se->on_rq))
+ nr_running++;
+
+ return calc_delta_weight(__sched_period(nr_running), se);
}

/*
* We calculate the vruntime slice of a to be inserted task
*
- * vs = s*rw/w = p
+ * vs = s/w
*/
-static u64 sched_vslice_add(struct cfs_rq *cfs_rq, struct sched_entity *se)
+static u64 sched_vslice(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
- unsigned long nr_running = cfs_rq->nr_running;
-
- if (!se->on_rq)
- nr_running++;
-
- return __sched_period(nr_running);
+ return calc_delta_fair(sched_slice(cfs_rq, se), se);
}

/*
@@ -683,7 +681,7 @@ place_entity(struct cfs_rq *cfs_rq, stru
* stays open at the end.
*/
if (initial && sched_feat(START_DEBIT))
- vruntime += sched_vslice_add(cfs_rq, se);
+ vruntime += sched_vslice(cfs_rq, se);

if (!initial) {
/* sleeps upto a single latency don't count. */
commit 1af5f730fc1bf7c62ec9fb2d307206e18bf40a69
Author: Peter Zijlstra <[email protected]>
Date: Fri Oct 24 11:06:13 2008 +0200

sched: more accurate min_vruntime accounting

Mike noticed the current min_vruntime tracking can go wrong and skip the
current task. If the only remaining task in the tree is a nice 19 task
with huge vruntime, new tasks will be inserted too far to the right too,
causing some interactibity issues.

min_vruntime can only change due to the leftmost entry disappearing
(dequeue_entity()), or by the leftmost entry being incremented past the
next entry, which elects a new leftmost (__update_curr())

Due to the current entry not being part of the actual tree, we have to
compare the leftmost tree entry with the current entry, and take the
leftmost of these two.

So create a update_min_vruntime() function that takes computes the
leftmost vruntime in the system (either tree of current) and increases
the cfs_rq->min_vruntime if the computed value is larger than the
previously found min_vruntime. And call this from the two sites we've
identified that can change min_vruntime.

Reported-by: Mike Galbraith <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Mike Galbraith <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

---
kernel/sched_fair.c | 49 +++++++++++++++++++++++++------------------------
1 file changed, 25 insertions(+), 24 deletions(-)

Index: linux-2.6.27/kernel/sched_fair.c
===================================================================
--- linux-2.6.27.orig/kernel/sched_fair.c
+++ linux-2.6.27/kernel/sched_fair.c
@@ -221,6 +221,27 @@ static inline s64 entity_key(struct cfs_
return se->vruntime - cfs_rq->min_vruntime;
}

+static void update_min_vruntime(struct cfs_rq *cfs_rq)
+{
+ u64 vruntime = cfs_rq->min_vruntime;
+
+ if (cfs_rq->curr)
+ vruntime = cfs_rq->curr->vruntime;
+
+ if (cfs_rq->rb_leftmost) {
+ struct sched_entity *se = rb_entry(cfs_rq->rb_leftmost,
+ struct sched_entity,
+ run_node);
+
+ if (vruntime == cfs_rq->min_vruntime)
+ vruntime = se->vruntime;
+ else
+ vruntime = min_vruntime(vruntime, se->vruntime);
+ }
+
+ cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime);
+}
+
/*
* Enqueue an entity into the rb-tree:
*/
@@ -254,15 +275,8 @@ static void __enqueue_entity(struct cfs_
* Maintain a cache of leftmost tree entries (it is frequently
* used):
*/
- if (leftmost) {
+ if (leftmost)
cfs_rq->rb_leftmost = &se->run_node;
- /*
- * maintain cfs_rq->min_vruntime to be a monotonic increasing
- * value tracking the leftmost vruntime in the tree.
- */
- cfs_rq->min_vruntime =
- max_vruntime(cfs_rq->min_vruntime, se->vruntime);
- }

rb_link_node(&se->run_node, parent, link);
rb_insert_color(&se->run_node, &cfs_rq->tasks_timeline);
@@ -272,18 +286,9 @@ static void __dequeue_entity(struct cfs_
{
if (cfs_rq->rb_leftmost == &se->run_node) {
struct rb_node *next_node;
- struct sched_entity *next;

next_node = rb_next(&se->run_node);
cfs_rq->rb_leftmost = next_node;
-
- if (next_node) {
- next = rb_entry(next_node,
- struct sched_entity, run_node);
- cfs_rq->min_vruntime =
- max_vruntime(cfs_rq->min_vruntime,
- next->vruntime);
- }
}

if (cfs_rq->next == se)
@@ -480,6 +485,7 @@ __update_curr(struct cfs_rq *cfs_rq, str
schedstat_add(cfs_rq, exec_clock, delta_exec);
delta_exec_weighted = calc_delta_fair(delta_exec, curr);
curr->vruntime += delta_exec_weighted;
+ update_min_vruntime(cfs_rq);
}

static void update_curr(struct cfs_rq *cfs_rq)
@@ -666,13 +672,7 @@ static void check_spread(struct cfs_rq *
static void
place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
{
- u64 vruntime;
-
- if (first_fair(cfs_rq)) {
- vruntime = min_vruntime(cfs_rq->min_vruntime,
- __pick_next_entity(cfs_rq)->vruntime);
- } else
- vruntime = cfs_rq->min_vruntime;
+ u64 vruntime = cfs_rq->min_vruntime;

/*
* The 'current' period is already promised to the current tasks,
@@ -749,6 +749,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
if (se != cfs_rq->curr)
__dequeue_entity(cfs_rq, se);
account_entity_dequeue(cfs_rq, se);
+ update_min_vruntime(cfs_rq);
}

/*
commit e17036dac189dd034c092a91df56aa740db7146d
Author: Peter Zijlstra <[email protected]>
Date: Thu Jan 15 14:53:39 2009 +0100

sched: fix update_min_vruntime

Impact: fix SCHED_IDLE latency problems

OK, so we have 1 running task A (which is obviously curr and the tree is
equally obviously empty).

'A' nicely chugs along, doing its thing, carrying min_vruntime along as it
goes.

Then some whacko speed freak SCHED_IDLE task gets inserted due to SMP
balancing, which is very likely far right, in that case

update_curr
update_min_vruntime
cfs_rq->rb_leftmost := true (the crazy task sitting in a tree)
vruntime = se->vruntime

and voila, min_vruntime is waaay right of where it ought to be.

OK, so why did I write it like that to begin with...

Aah, yes.

Say we've just dequeued current

schedule
deactivate_task(prev)
dequeue_entity
update_min_vruntime

Then we'll set

vruntime = cfs_rq->min_vruntime;

we find !cfs_rq->curr, but do find someone in the tree. Then we _must_
do vruntime = se->vruntime, because

vruntime = min_vruntime(vruntime := cfs_rq->min_vruntime, se->vruntime)

will not advance vruntime, and cause lags the other way around (which we
fixed with that initial patch: 1af5f730fc1bf7c62ec9fb2d307206e18bf40a69
(sched: more accurate min_vruntime accounting).

Signed-off-by: Peter Zijlstra <[email protected]>
Tested-by: Mike Galbraith <[email protected]>
Acked-by: Mike Galbraith <[email protected]>
Cc: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

---
kernel/sched_fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.27/kernel/sched_fair.c
===================================================================
--- linux-2.6.27.orig/kernel/sched_fair.c
+++ linux-2.6.27/kernel/sched_fair.c
@@ -233,7 +233,7 @@ static void update_min_vruntime(struct c
struct sched_entity,
run_node);

- if (vruntime == cfs_rq->min_vruntime)
+ if (!cfs_rq->curr)
vruntime = se->vruntime;
else
vruntime = min_vruntime(vruntime, se->vruntime);
commit 6bc912b71b6f33b041cfde93ca3f019cbaa852bc
Author: Peter Zijlstra <[email protected]>
Date: Thu Jan 15 14:53:38 2009 +0100

sched: SCHED_OTHER vs SCHED_IDLE isolation

Stronger SCHED_IDLE isolation:

- no SCHED_IDLE buddies
- never let SCHED_IDLE preempt on wakeup
- always preempt SCHED_IDLE on wakeup
- limit SLEEPER fairness for SCHED_IDLE.

Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

---
kernel/sched_fair.c | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)

Index: linux-2.6.27/kernel/sched_fair.c
===================================================================
--- linux-2.6.27.orig/kernel/sched_fair.c
+++ linux-2.6.27/kernel/sched_fair.c
@@ -689,9 +689,13 @@ place_entity(struct cfs_rq *cfs_rq, stru
unsigned long thresh = sysctl_sched_latency;

/*
- * convert the sleeper threshold into virtual time
+ * Convert the sleeper threshold into virtual time.
+ * SCHED_IDLE is a special sub-class. We care about
+ * fairness only relative to other SCHED_IDLE tasks,
+ * all of which have the same weight.
*/
- if (sched_feat(NORMALIZED_SLEEPER))
+ if (sched_feat(NORMALIZED_SLEEPER) &&
+ task_of(se)->policy != SCHED_IDLE)
thresh = calc_delta_fair(thresh, se);

vruntime -= thresh;
@@ -1347,15 +1351,22 @@ static void check_preempt_wakeup(struct
if (unlikely(se == pse))
return;

- cfs_rq_of(pse)->next = pse;
+ if (likely(task_of(se)->policy != SCHED_IDLE))
+ cfs_rq_of(pse)->next = pse;

/*
- * Batch tasks do not preempt (their preemption is driven by
+ * Batch and idle tasks do not preempt (their preemption is driven by
* the tick):
*/
- if (unlikely(p->policy == SCHED_BATCH))
+ if (unlikely(p->policy != SCHED_NORMAL))
return;

+ /* Idle tasks are by definition preempted by everybody. */
+ if (unlikely(curr->policy == SCHED_IDLE)) {
+ resched_task(curr);
+ return;
+ }
+
if (!sched_feat(WAKEUP_PREEMPT))
return;