This patch makes a process with nice values >= 20 (according to
setpriority(2)) completely stop when there are other runnable
processes with smaller nice values.
Try run something with `nice -n 30' (which `setpriority' to 20)
NOTE: This Patch Should Not Be Used On Production Machine Unless
You Know What You Are Doing.
Only tested on a uniprocessor PentiumII.
I don't know whether this breaks the standard, but it should
break no programs.
Applies to 2.4.16.
Note: this patch file is hand-modified from rcsdiff output.
========= cut here ==========
diff -Nur linux-2.4.16/kernel/sched.c linux/kernel/sched.c
--- linux-2.4.16/kernel/sched.c 2001/11/23 09:15:55
+++ linux/kernel/sched.c 2001/11/23 09:42:46
@@ -19,6 +19,12 @@
* current-task
*/
+/*
+ private patch by WQC: we use nice values 20 and beyond for `idle priority' processes,
+ so that they take absolutely no process time when there are higher priority processes
+ running.
+*/
+
#include <linux/config.h>
#include <linux/mm.h>
#include <linux/init.h>
@@ -68,7 +74,8 @@
#define TICK_SCALE(x) ((x) << 2)
#endif
-#define NICE_TO_TICKS(nice) (TICK_SCALE(20-(nice))+1)
+#define NICE_TO_TICKS(nice)\
+ (((nice) < 20) ? (TICK_SCALE(20-(nice))+1) : (TICK_SCALE(1)+1))
/*
@@ -150,7 +157,8 @@
* runnable process, but before the idle thread.
* Also, dont trigger a counter recalculation.
*/
- weight = -1;
+ /* If it yields, it ranks below the nicest processes */
+ weight = -999;
if (p->policy & SCHED_YIELD)
goto out;
@@ -166,9 +174,18 @@
* over..
*/
weight = p->counter;
+ /* If this is included, the oh-so-large addition below
+ will not work
+
if (!weight)
goto out;
-
+ However, to be consistent, we should give such a process a disadvantage.
+ */
+ if (! weight) {
+ if (p->nice < 20) weight = 800;
+ else if (p->nice < 27) weight = (27 - p->nice) * 100;
+ goto out;
+ }
#ifdef CONFIG_SMP
/* Give a largish advantage to the same processor... */
/* (this is equivalent to penalizing other processors) */
@@ -179,7 +196,12 @@
/* .. and a slight advantage to the current MM */
if (p->mm == this_mm || !p->mm)
weight += 1;
- weight += 20 - p->nice;
+ if (p->nice < 20)
+ weight += 800 + 20 - p->nice;
+ else if (p->nice < 27)
+ /* For super-nice ones, they are scheduled if no one else wants the CPU */
+ weight += (27 - p->nice) * 100;
+ if (weight > 999) weight = 999;
goto out;
}
@@ -537,7 +559,7 @@
struct task_struct *prev, *next, *p;
struct list_head *tmp;
int this_cpu, c;
-
+ int need_recalc;
spin_lock_prefetch(&runqueue_lock);
@@ -590,17 +612,20 @@
*/
next = idle_task(this_cpu);
c = -1000;
+ need_recalc = 0;
list_for_each(tmp, &runqueue_head) {
p = list_entry(tmp, struct task_struct, run_list);
if (can_schedule(p, this_cpu)) {
int weight = goodness(p, this_cpu, prev->active_mm);
- if (weight > c)
+ if (weight > c) {
c = weight, next = p;
+ need_recalc = (p->counter == 0);
+ }
}
}
/* Do we need to re-calculate counters? */
- if (unlikely(!c)) {
+ if (unlikely(need_recalc)) {
struct task_struct *p;
spin_unlock_irq(&runqueue_lock);
@@ -857,17 +882,17 @@
if (increment < 0) {
if (!capable(CAP_SYS_NICE))
return -EPERM;
- if (increment < -40)
- increment = -40;
+ if (increment < -50)
+ increment = -50;
}
- if (increment > 40)
- increment = 40;
+ if (increment > 50)
+ increment = 50;
newprio = current->nice + increment;
if (newprio < -20)
newprio = -20;
- if (newprio > 19)
- newprio = 19;
+ if (newprio > 29)
+ newprio = 29;
current->nice = newprio;
return 0;
}
diff -Nur linux-2.4.16/kernel/sys.c linux/kernel/sys.c
--- linux-2.4.16/kernel/sys.c 2001/11/17 11:42:42
+++ linux/kernel/sys.c 2001/11/17 11:44:50
@@ -206,8 +206,8 @@
error = -ESRCH;
if (niceval < -20)
niceval = -20;
- if (niceval > 19)
- niceval = 19;
+ if (niceval > 29)
+ niceval = 29;
read_lock(&tasklist_lock);
for_each_task(p) {
@@ -249,7 +249,7 @@
long niceval;
if (!proc_sel(p, which, who))
continue;
- niceval = 20 - p->nice;
+ niceval = 30 - p->nice;
if (niceval > retval)
retval = niceval;
}
On Fri, 2001-12-07 at 23:38, root wrote:
> This patch makes a process with nice values >= 20 (according to
> setpriority(2)) completely stop when there are other runnable
> processes with smaller nice values.
> Try run something with `nice -n 30' (which `setpriority' to 20)
What do you think will happen when an "idle" task holds a resource or is
otherwise a producer for something a higher priority, running, task
needs?
Robert Love
On Friday, December 7, 2001, at 11:39 , Robert Love wrote:
> What do you think will happen when an "idle" task holds a
> resource or is
> otherwise a producer for something a higher priority, running, task
> needs?
One of two things:
1) The higher priority task will no longer be runnable; or
2) We gave enough rope to hang yourself, and, well, you did.
On Sun, 2001-12-09 at 17:31, Anthony DeRobertis wrote:
> One of two things:
> 1) The higher priority task will no longer be runnable; or
> 2) We gave enough rope to hang yourself, and, well, you did.
and (3) the lower priority task won't be runnable either.
Without addressing this (and it is addressable, see below) this feature
won't make it into the kernel. It isn't an argument to say "we gave you
the rope and you took it" because if I idle task some random application
because it deserves little time, I shouldn't have to think of what
resource/kernel semantics it and another task are going to get into a
priority inversion fight over.
I've seen a few solutions. The easiest is to just give idle tasks a
"boost" on occasion to give them a chance to prevent the deadlock. You
then, however, have the problem where the tasks can take advantage of
the boost... Or, we could fix in-kernel deadlocks by doing priority
inheriting on locks held by A and wanted by B (i.e., if A holds
something B wants, boost A's priority temporarily to that of B's). But
that is probably overkill ... note to do any of these it is probably
cleanest to make a SCHED_IDLE scheduling class.
maybe I'll put a patch together ...
Robert Love
On Sun, Dec 09, 2001 at 06:05:13PM -0500, Robert Love wrote:
> the boost... Or, we could fix in-kernel deadlocks by doing priority
> inheriting on locks held by A and wanted by B (i.e., if A holds
> something B wants, boost A's priority temporarily to that of B's). But
> that is probably overkill ... note to do any of these it is probably
> cleanest to make a SCHED_IDLE scheduling class.
Even better would be to keep the process at low priority while in userland
and reverts to normal "nice" priority while in kernelspace.
-ben
On Sun, 2001-12-09 at 18:16, Benjamin LaHaise wrote:
> Even better would be to keep the process at low priority while in userland
> and reverts to normal "nice" priority while in kernelspace.
But the point of a SCHED_IDLE would be to only run them while idle, so
they can still never even get the CPU.
Ahh ... wait, do you mean periodically run them, but only give them the
boost while they are in kernel space? Very good idea. Can you see an
easy way to do this?
Robert Love
On Sun, Dec 09, 2001 at 06:21:05PM -0500, Robert Love wrote:
> Ahh ... wait, do you mean periodically run them, but only give them the
> boost while they are in kernel space? Very good idea. Can you see an
> easy way to do this?
Actually, yes: in entry.S the ret_from_syscall path which calls schedule
can be changed to pass a parameter indicating it is returning to userspace
afterwards which would let schedule know the bump is not needed.
-ben
--
Fish.
On Sun, 2001-12-09 at 18:46, Benjamin LaHaise wrote:
> Actually, yes: in entry.S the ret_from_syscall path which calls schedule
> can be changed to pass a parameter indicating it is returning to userspace
> afterwards which would let schedule know the bump is not needed.
Hmm, what if we only boosted it based on something like this:
if (p->policy == SCHED_IDLE) {
weight = p->counter;
if (p->lock_depth >= 0 || signal_pending(p))
/* boost somehow ... */
}
(I'm writing the patch now :>)
Would it still make sense to only boost it in kernel space ?
Robert Love
On 9 Dec 2001, Robert Love wrote:
> Hmm, what if we only boosted it based on something like this:
>
> if (p->policy == SCHED_IDLE) {
> weight = p->counter;
> if (p->lock_depth >= 0 || signal_pending(p))
> /* boost somehow ... */
> }
Now what if the process is holding an inode or superblock
semaphore ?
Rik
--
Shortwave goes a long way: irc.starchat.net #swl
http://www.surriel.com/ http://distro.conectiva.com/
On Mon, Dec 10, 2001 at 12:46:23AM -0200, Rik van Riel wrote:
> On 9 Dec 2001, Robert Love wrote:
>
> > Hmm, what if we only boosted it based on something like this:
> >
> > if (p->policy == SCHED_IDLE) {
> > weight = p->counter;
> > if (p->lock_depth >= 0 || signal_pending(p))
> > /* boost somehow ... */
> > }
>
> Now what if the process is holding an inode or superblock
> semaphore ?
Even better:
What if the SCHED_IDLE task holds a POSIX read lock on a file ?
Say we have three processes:
A is SCHED_IDLE holding read lock on /foo/bar
B is SCHED_OTHER wanting to acquire write lock /foo/bar
C is SCHED_OTHER computing fractals and eating up every cycle it can get
What we want is A to get B's priority until it releases the lock on
/foo/bar and then revert it to SCHED_IDLE policy. Otherwise B would get
deadlocked with A while C (or any other CPU hog) is running.
I know this is a userspace problem (similar to real-time processes vs.
normal processes), but I think it would be nice to make SCHED_IDLE
non-priviliged policy.
--
Kind regards,
Robert Varga
------------------------------------------------------------------------------
[email protected] http://hq.sk/~nite/gpgkey.txt
I'm not a kernel hacker by any stretch of the imagination but I do
follow lkml. While SCHED_IDLE would be really nice it seems like this
topic comes up from time to time there always seems to be a (deadlock)
catch that keeps it from getting into the main kernel tree. While it
would be really great if Robert's patch turns out to be the magic bullet
this time, as a Linux user and SetiAtHome supporter I would be happy
just to have a "nice -19" that was truly nice. Running setiathome at
nice -19 on my box and then firing up a processor intensive process at
normal priority, I find that seti still uses 14% processor according to
top. I like to run 2 setiathome processes because I get better overall
throughput that way, but the 2 combined then get ~25% of the processor.
Not really all that nice. Would it be difficult to make "19" a special
case that was really, really nice, but not *totally* nice? Avoiding
deadlock but still following the spirit of SCHED_IDLE?
In article <1007939114.878.1.camel@phantasy>,
Robert Love <[email protected]> writes:
> I've seen a few solutions. The easiest is to just give idle tasks a
> "boost" on occasion to give them a chance to prevent the deadlock. You
> then, however, have the problem where the tasks can take advantage of
> the boost... Or, we could fix in-kernel deadlocks by doing priority
> inheriting on locks held by A and wanted by B (i.e., if A holds
Please don't. Whenever you think you priority inheritance, it's a sign your
system has got too complicated. The simplest solution is to simply have no
priorities when a task is in-kernel (or at least non that can completely
exclude a task).
On Mon, 2001-12-10 at 20:36, Ton Hospel wrote:
> Please don't. Whenever you think you priority inheritance, it's a sign your
> system has got too complicated. The simplest solution is to simply have no
> priorities when a task is in-kernel (or at least non that can completely
> exclude a task).
I agree, I said it was overkill.
My solution is going to be to schedule the task as a SCHED_OTHER task
when in the kernel, and as SCHED_IDLE task otherwise.
Robert Love
Hi!
> > Please don't. Whenever you think you priority inheritance, it's a sign your
> > system has got too complicated. The simplest solution is to simply have no
> > priorities when a task is in-kernel (or at least non that can completely
> > exclude a task).
>
> I agree, I said it was overkill.
>
> My solution is going to be to schedule the task as a SCHED_OTHER task
> when in the kernel, and as SCHED_IDLE task otherwise.
Yep, and you can do it without making syscalls any slower, and patch
was already on l-k.
Use ptrace-hooks for branching into your priority-promoting code, and
you'll have 0 impact on fast path.
Pavel
--
"I do not steal MS software. It is not worth it."
-- Pavel Kankovsky
Hi!
> > Even better would be to keep the process at low priority while in userland
> > and reverts to normal "nice" priority while in kernelspace.
>
> But the point of a SCHED_IDLE would be to only run them while idle, so
> they can still never even get the CPU.
>
> Ahh ... wait, do you mean periodically run them, but only give them the
> boost while they are in kernel space? Very good idea. Can you see an
> easy way to do this?
This was done before... As I wrote... Make it flag similar to "this is
being ptraced" to get out of fast path, and rest is easy. Unset
"low_priority" on entering of kernel, and set it back on exit from
kernel.
Pavel
--
"I do not steal MS software. It is not worth it."
-- Pavel Kankovsky