2003-08-03 10:09:42

by Con Kolivas

[permalink] [raw]
Subject: [PATCH] O12.2int for interactivity

I've continued development on the interactivity patches against Ingo's new
infrastructure changes.

This is a complete resync of the work so far with changes to suit the
nanosecond timing.

Rather than just a changelog, here is a complete list of what this patch
does on top of Ingo's :

Interactivity credit determines whether a task is interactive or not and
treats them differently during priority changing.

Some #defines are used to help conversion from ns to jiffies, and the tuning
knobs have been altered slightly to work with these changes.

Tasks do not earn sleep_avg on their initial wakeup.

User tasks sleeping for a long period get just interactive sleep_avg to stay
on the active queue but be unable to starve if they turn into cpu hogs.

Tasks with interactive credit elevate their sleep avg proportionately more as
their priority bonus starts dropping off.

Tasks without interactive credit are limited to one timeslice worth of sleep
avg as a bonus.

Accumulated sleep_avg of over MAX_SLEEP_AVG starts earning tasks interactive
credits.

Run out of sleep_avg makes you lose interactive credits.

On their very first wakeup new forks don't get any sleep_avg bonus.

Only tasks with interactive_credits gain sleep_avg for time waiting on the
runqueue.

New forks get the lowest value of sleep_avg compatible with the dynamic
priority they will wake up with.

Reverted Ingo's EXPIRED_STARVING definition; it was making interactive tasks
expire faster as cpu load increased.

Only user tasks get requeued after TIMESLICE_GRANULARITY, and only if they
have at least MIN_TIMESLICE remaining.

Tasks with interactive credit lose proportionately less sleep_avg as their
dynamic priority drops.

Using up a full timeslice makes you lose interactive credits.

The patch patch-A3-O12.2int is available against 2.6.0-test2-mm3 here:

http://kernel.kolivas.org/2.5

Ingo's patch-test2-A3 patch against 2.6.0-test2 is also available there, and a
patch-test2-A3-O12.2 to go on top of that is also available.

Please note if you do test the interactivity it would be most valuable if you
test Ingo's A3 patch first looking for improvements and problems compared to
vanilla _first_, and then test my O12.2 patch on top of it. Hopefully there
has been no regression and only improvement in going to Ingo's new
infrastructure.

Con

patch-A3-O12.2int against 2.6.0-test2-mm3:

--- linux-2.6.0-test2-mm3/include/linux/sched.h 2003-08-03 09:51:49.000000000 +1000
+++ linux-2.6.0-test2-mm3-O12.2/include/linux/sched.h 2003-08-03 09:53:32.000000000 +1000
@@ -341,6 +341,7 @@ struct task_struct {
prio_array_t *array;

unsigned long sleep_avg;
+ long interactive_credit;
unsigned long long timestamp;
int activated;

--- linux-2.6.0-test2-mm3/kernel/sched.c 2003-08-03 09:51:49.000000000 +1000
+++ linux-2.6.0-test2-mm3-O12.2/kernel/sched.c 2003-08-03 10:19:22.000000000 +1000
@@ -58,6 +58,14 @@
#define USER_PRIO(p) ((p)-MAX_RT_PRIO)
#define TASK_USER_PRIO(p) USER_PRIO((p)->static_prio)
#define MAX_USER_PRIO (USER_PRIO(MAX_PRIO))
+#define AVG_TIMESLICE (MIN_TIMESLICE + ((MAX_TIMESLICE - MIN_TIMESLICE) *\
+ (MAX_PRIO-1-NICE_TO_PRIO(0))/(MAX_USER_PRIO - 1)))
+
+/*
+ * Some helpers for converting nanosecond timing to jiffy resolution
+ */
+#define NS_TO_JIFFIES(TIME) (TIME / (1000000000 / HZ))
+#define JIFFIES_TO_NS(TIME) (TIME * (1000000000 / HZ))

/*
* These are the 'tuning knobs' of the scheduler:
@@ -70,13 +78,15 @@
#define MAX_TIMESLICE (200 * HZ / 1000)
#define TIMESLICE_GRANULARITY (HZ/40 ?: 1)
#define ON_RUNQUEUE_WEIGHT 30
-#define CHILD_PENALTY 95
+#define CHILD_PENALTY 90
#define PARENT_PENALTY 100
#define EXIT_WEIGHT 3
#define PRIO_BONUS_RATIO 25
+#define MAX_BONUS (MAX_USER_PRIO * PRIO_BONUS_RATIO / 100)
#define INTERACTIVE_DELTA 2
-#define MAX_SLEEP_AVG (1*1000000000)
-#define STARVATION_LIMIT HZ
+#define MAX_SLEEP_AVG (AVG_TIMESLICE * MAX_BONUS)
+#define STARVATION_LIMIT (MAX_SLEEP_AVG)
+#define NS_MAX_SLEEP_AVG (JIFFIES_TO_NS(MAX_SLEEP_AVG))
#define NODE_THRESHOLD 125

/*
@@ -117,6 +127,9 @@
#define TASK_INTERACTIVE(p) \
((p)->prio <= (p)->static_prio - DELTA(p))

+#define JUST_INTERACTIVE_SLEEP(p) \
+ (MAX_SLEEP_AVG - (DELTA(p) * AVG_TIMESLICE))
+
#define TASK_PREEMPTS_CURR(p, rq) \
((p)->prio < (rq)->curr->prio || \
((p)->prio == (rq)->curr->prio && \
@@ -326,8 +339,9 @@ static int effective_prio(task_t *p)
if (rt_task(p))
return p->prio;

- bonus = MAX_USER_PRIO*PRIO_BONUS_RATIO*(p->sleep_avg/1024)/(MAX_SLEEP_AVG/1024)/100;
- bonus -= MAX_USER_PRIO*PRIO_BONUS_RATIO/100/2;
+ bonus = MAX_USER_PRIO * PRIO_BONUS_RATIO *
+ NS_TO_JIFFIES(p->sleep_avg) / MAX_SLEEP_AVG / 100;
+ bonus -= MAX_USER_PRIO * PRIO_BONUS_RATIO / 100 / 2;

prio = p->static_prio - bonus;
if (prio < MAX_RT_PRIO)
@@ -351,37 +365,65 @@ static void recalc_task_prio(task_t *p,
unsigned long long __sleep_time = now - p->timestamp;
unsigned long sleep_time;

- if (__sleep_time > MAX_SLEEP_AVG)
- sleep_time = MAX_SLEEP_AVG;
+ if (unlikely(!p->timestamp))
+ __sleep_time = 0;
+
+ if (__sleep_time > NS_MAX_SLEEP_AVG)
+ sleep_time = NS_MAX_SLEEP_AVG;
else
sleep_time = (unsigned long)__sleep_time;

- if (sleep_time > 0) {
- unsigned long long sleep_avg;
+ if (likely(sleep_time > 0)) {

/*
- * This code gives a bonus to interactive tasks.
- *
- * The boost works by updating the 'average sleep time'
- * value here, based on ->timestamp. The more time a task
- * spends sleeping, the higher the average gets - and the
- * higher the priority boost gets as well.
+ * User tasks that sleep a long time are categorised as
+ * idle and will get just interactive status to stay active &
+ * prevent them suddenly becoming cpu hogs and starving
+ * other processes.
*/
- sleep_avg = p->sleep_avg + sleep_time;
+ if (p->mm && sleep_time >
+ JIFFIES_TO_NS(JUST_INTERACTIVE_SLEEP(p)))
+ p->sleep_avg =
+ JIFFIES_TO_NS(JUST_INTERACTIVE_SLEEP(p));
+ else {
+ /*
+ * Tasks with interactive credits get boosted more
+ * rapidly if their bonus has dropped off. Other
+ * tasks are limited to one timeslice worth of
+ * sleep avg.
+ */
+ if (p->interactive_credit > 0)
+ sleep_time *= (MAX_BONUS + 1 -
+ (NS_TO_JIFFIES(p->sleep_avg) *
+ MAX_BONUS / MAX_SLEEP_AVG));
+ else if (sleep_time > JIFFIES_TO_NS(task_timeslice(p)))
+ sleep_time = JIFFIES_TO_NS(task_timeslice(p));

- /*
- * 'Overflow' bonus ticks go to the waker as well, so the
- * ticks are not lost. This has the effect of further
- * boosting tasks that are related to maximum-interactive
- * tasks.
- */
- if (sleep_avg > MAX_SLEEP_AVG)
- sleep_avg = MAX_SLEEP_AVG;
- if (p->sleep_avg != sleep_avg) {
- p->sleep_avg = sleep_avg;
- p->prio = effective_prio(p);
+ /*
+ * This code gives a bonus to interactive tasks.
+ *
+ * The boost works by updating the 'average sleep time'
+ * value here, based on ->timestamp. The more time a task
+ * spends sleeping, the higher the average gets - and the
+ * higher the priority boost gets as well.
+ */
+ p->sleep_avg += sleep_time;
+
+ /*
+ * 'Overflow' bonus ticks go to the waker as well, so the
+ * ticks are not lost. This has the effect of further
+ * boosting tasks that are related to maximum-interactive
+ * tasks.
+ */
+ if (p->sleep_avg > NS_MAX_SLEEP_AVG){
+ p->sleep_avg = NS_MAX_SLEEP_AVG;
+ p->interactive_credit++;
+ }
}
- }
+ } else if (!p->sleep_avg)
+ p->interactive_credit--;
+
+ p->prio = effective_prio(p);
}

/*
@@ -411,6 +453,10 @@ static inline void activate_task(task_t
* but it will be weighted down:
*/
p->activated = 1;
+
+ if (unlikely(!p->timestamp))
+ p->activated = 0;
+
p->timestamp = now;

__activate_task(p, rq);
@@ -579,7 +625,7 @@ int wake_up_state(task_t *p, unsigned in
*/
void wake_up_forked_process(task_t * p)
{
- unsigned long flags;
+ unsigned long flags, sleep_avg;
runqueue_t *rq = task_rq_lock(current, &flags);

p->state = TASK_RUNNING;
@@ -588,8 +634,18 @@ void wake_up_forked_process(task_t * p)
* and children as well, to keep max-interactive tasks
* from forking tasks that are max-interactive.
*/
- current->sleep_avg = current->sleep_avg / 100 * PARENT_PENALTY;
- p->sleep_avg = p->sleep_avg / 100 * CHILD_PENALTY;
+ sleep_avg = NS_TO_JIFFIES(current->sleep_avg) * MAX_BONUS /
+ MAX_SLEEP_AVG * PARENT_PENALTY / 100 *
+ MAX_SLEEP_AVG / MAX_BONUS;
+ current->sleep_avg = JIFFIES_TO_NS(sleep_avg);
+
+ sleep_avg = NS_TO_JIFFIES(p->sleep_avg) * MAX_BONUS / MAX_SLEEP_AVG *
+ CHILD_PENALTY / 100 * MAX_SLEEP_AVG / MAX_BONUS;
+ p->sleep_avg = JIFFIES_TO_NS(sleep_avg);
+
+ p->interactive_credit = 0;
+ p->timestamp = 0;
+
p->prio = effective_prio(p);
set_task_cpu(p, smp_processor_id());

@@ -630,7 +686,9 @@ void sched_exit(task_t * p)
* the sleep_avg of the parent as well.
*/
if (p->sleep_avg < p->parent->sleep_avg)
- p->parent->sleep_avg = p->parent->sleep_avg / (EXIT_WEIGHT + 1) * EXIT_WEIGHT + p->sleep_avg / (EXIT_WEIGHT + 1);
+ p->parent->sleep_avg = p->parent->sleep_avg /
+ (EXIT_WEIGHT + 1) * EXIT_WEIGHT + p->sleep_avg /
+ (EXIT_WEIGHT + 1);
}

/**
@@ -1204,7 +1262,8 @@ EXPORT_PER_CPU_SYMBOL(kstat);
*/
#define EXPIRED_STARVING(rq) \
(STARVATION_LIMIT && ((rq)->expired_timestamp && \
- (jiffies - (rq)->expired_timestamp >= STARVATION_LIMIT)))
+ (jiffies - (rq)->expired_timestamp >= \
+ STARVATION_LIMIT * ((rq)->nr_running) + 1)))

/*
* This function gets called by the timer code, with HZ frequency.
@@ -1275,6 +1334,7 @@ void scheduler_tick(int user_ticks, int
p->prio = effective_prio(p);
p->time_slice = task_timeslice(p);
p->first_time_slice = 0;
+ p->interactive_credit--;

if (!rq->expired_timestamp)
rq->expired_timestamp = jiffies;
@@ -1296,11 +1356,12 @@ void scheduler_tick(int user_ticks, int
* level, which is in essence a round-robin of tasks with
* equal priority.
*/
- if (!((task_timeslice(p) - p->time_slice) % TIMESLICE_GRANULARITY) &&
- (p->array == rq->active)) {
+ if (p->mm && !((task_timeslice(p) - p->time_slice) %
+ TIMESLICE_GRANULARITY) && (p->time_slice > MIN_TIMESLICE) &&
+ (p->array == rq->active)) {
+
dequeue_task(p, rq->active);
set_tsk_need_resched(p);
- p->prio = effective_prio(p);
enqueue_task(p, rq->active);
}
}
@@ -1344,10 +1405,21 @@ need_resched:

release_kernel_lock(prev);
now = sched_clock();
- if (likely(now - prev->timestamp < MAX_SLEEP_AVG))
+ if (likely(now - prev->timestamp < NS_MAX_SLEEP_AVG))
run_time = now - prev->timestamp;
else
- run_time = MAX_SLEEP_AVG;
+ run_time = NS_MAX_SLEEP_AVG;
+
+ /*
+ * Tasks with interactive credits get charged less run_time
+ * as their sleep_avg decreases to slow them losing their
+ * priority bonus
+ */
+ if (prev->interactive_credit > 0)
+ run_time /= (MAX_BONUS + 1 -
+ (NS_TO_JIFFIES(prev->sleep_avg) * MAX_BONUS /
+ MAX_SLEEP_AVG));
+
spin_lock_irq(&rq->lock);

/*
@@ -1395,7 +1467,7 @@ pick_next_task:
queue = array->queue + idx;
next = list_entry(queue->next, task_t, run_list);

- if (next->activated) {
+ if (next->activated && next->interactive_credit > 0) {
unsigned long long delta = now - next->timestamp;

if (next->activated == 1)


2003-08-03 11:31:20

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity


On Sun, 3 Aug 2003, Con Kolivas wrote:

> Reverted Ingo's EXPIRED_STARVING definition; it was making interactive
> tasks expire faster as cpu load increased.

hm, that bit was needed for fairness, otherwise 'interactive' tasks can
monopolize the CPU. [ Have you tried lots of copies of thud.c (and
Davide's irman2 with the most evil settings), do they still produce no
starvation with this bit restored to the original logic? ]

Ingo

2003-08-03 11:37:29

by Felipe Alfaro Solana

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity

On Sun, 2003-08-03 at 12:14, Con Kolivas wrote:

> Please note if you do test the interactivity it would be most valuable if you
> test Ingo's A3 patch first looking for improvements and problems compared to
> vanilla _first_, and then test my O12.2 patch on top of it. Hopefully there
> has been no regression and only improvement in going to Ingo's new
> infrastructure.

I'm currently testing both schedulers from Ingo and you on top of
2.6.0-test2-mm3. I'll need a little time to study some annoyances I've
seen. For example, I've seen XMMS skipping with patch-A3-O12.2int,
altough I've been able to manage to find a way to consistently reproduce
them. The Evolution 1.4.4 main window also feels a little "heavy", that
is, moving it all along the screen, makes the movement feel jumpy
sometimes.

The scheduler seems to be getting in good health, and it's getting
smoother with each release. However, I've found that the smoothest
scheduler I've seens was O10.

I'll keep you all informed about my experiences with both schedulers.

2003-08-03 11:36:49

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity

Quoting Ingo Molnar <[email protected]>:

>
> On Sun, 3 Aug 2003, Con Kolivas wrote:
>
> > Reverted Ingo's EXPIRED_STARVING definition; it was making interactive
> > tasks expire faster as cpu load increased.
>
> hm, that bit was needed for fairness, otherwise 'interactive' tasks can
> monopolize the CPU. [ Have you tried lots of copies of thud.c (and
> Davide's irman2 with the most evil settings), do they still produce no
> starvation with this bit restored to the original logic? ]

Ooh hang on I've not tried lots of copies of thud.c at once. Can't try for a
little while though.

Con

2003-08-03 21:17:56

by Voluspa

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity


On 2003-08-03 10:14:00 Con Kolivas wrote:

> Please note if you do test the interactivity it would be most valuable
> if you test Ingo's A3 patch first looking for improvements and
> problems compared to vanilla _first_, and then test my O12.2 patch on
> top of it. Hopefully there has been no regression and only
> improvement in going to Ingo's new infrastructure.

I can hardly spot any difference between 2.6.0-test2, A3 and A3-O12.2
while running the test as outlined in:

http://marc.theaimsgroup.com/?l=linux-kernel&m=105956126017752&w=2

I'll never do such a three hours again... All was equally well, which
probably just means that my environment is too lightweight during
"normal" operation. So I had to focus on the taxing game scenario.

XFree86 Version 4.3.99.9 compiled against my glibc 2.2.4, and the same
story with winex 3.1 running "Baldurs Gate I":

_2.6.0-test2_

Animations have cyclic hacking like --- . --- . --- . (dots are pauses).
Ambient sound hacks in concert, with an almost oscillating quality.
Mouse pointer is impossible to place since movement can mean a few
pixels or a full screen. Panning the play area is, of course, a real
pain. Playability = 0 (on a scale from 0 to 10)

_2.6.0-test2-A3_

All problems gone. Just slight bumps (very short graphic pauses) when
doing a long play area panning. Sound is not perceptibly disturbed by
any graphic bumps. Playability = 9.

_2.6.0-test2-A3-O12.2_

Regression compared to plain A3, both in sound and animations. Mouse and
panning seem less regressed. For example, a character can take four
steps, freeze for half a second, take another four steps, freeze etc.
Playablility = 8.

I did the game-test two times with each A3 and A3-O12.2 (alternating
boots) and the results were persistent. Looking at a "top" terminal
while in the game, I couldn't spot any striking differences between the
numbers. Just one oddity - and now I'm uncertain of its validity - as
can be seen on the screencaps below. In A3 wine had a PRI of 25,
wineserver had 15. In A3-O12.2 the numbers were reversed. I can redo the
test if necessary.

( the niced process is foldingathome, always running on my system)

--A3--
20:15:01 up 23 min, 4 users, load average: 2.86, 3.09, 2.46
58 processes: 53 sleeping, 4 running, 0 zombie, 1 stopped
CPU states: 53.7% user 42.1% system 4.1% nice 0.0% iowait 0.0% idle
Mem: 127032k av, 123384k used, 3648k free, 0k shrd, 732k buff
86876k active, 28660k inactive
Swap: 489940k av, 0k used, 489940k free 49436k cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
209 loke 25 0 75316 46M 14284 R 56.6 37.2 10:40 0 wine
216 loke 15 0 2712 992 2388 S 34.4 0.7 5:42 0 wineserver
147 loke 34 19 33220 2388 24144 S N 4.3 1.8 2:02 0 FahCore_ca.ex
252 loke 16 0 1820 968 1664 R 1.7 0.7 0:01 0 top
172 root 16 0 27652 16M 13884 R 1.3 13.1 4:04 0 X
178 loke 15 0 5752 3428 5200 S 1.1 2.6 0:13 0 gkrellm
175 loke 15 0 8056 5936 4528 S 0.3 4.6 0:06 0 enlightenment
--

--A3-O12.2--
20:53:19 up 14 min, 4 users, load average: 3.57, 2.52, 1.27
58 processes: 52 sleeping, 5 running, 0 zombie, 1 stopped
CPU states: 60.3% user 36.5% system 3.1% nice 0.0% iowait 0.0% idle
Mem: 127032k av, 123672k used, 3360k free, 0k shrd, 640k buff
87996k active, 27860k inactive
Swap: 489940k av, 0k used, 489940k free 50664k cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
193 loke 15 0 66548 37M 14284 S 60.3 30.2 2:36 0 wine
200 loke 25 0 2704 984 2388 R 31.7 0.7 1:24 0 wineserver
150 loke 39 19 37248 10M 28172 S N 3.1 8.2 8:49 0 FahCore_ca.ex
220 loke 18 0 1820 968 1664 R 1.9 0.7 0:03 0 top
175 root 15 0 27492 16M 13872 S 1.3 13.0 0:31 0 X
181 loke 16 0 5752 3428 5200 S 0.9 2.6 0:03 0 gkrellm
178 loke 16 0 8232 6084 4520 S 0.3 4.7 0:03 0 enlightenment
--

Mvh
Mats Johannesson

2003-08-04 02:29:27

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity

On Mon, 4 Aug 2003 07:19, Voluspa wrote:
> I can hardly spot any difference between 2.6.0-test2, A3 and A3-O12.2
> while running the test as outlined in:

Thanks for testing. The difference should only be noticable as X and other
interactive tasks remaining responsive under heavy load. Audio skips are far
less relevant any more.

> numbers. Just one oddity - and now I'm uncertain of its validity - as
> can be seen on the screencaps below. In A3 wine had a PRI of 25,
> wineserver had 15. In A3-O12.2 the numbers were reversed. I can redo the
> test if necessary.

No this is what I need to understand what went wrong. These hacks work on
changing the value of PRI basically. The wine/wineserver issue is an unusual
one as you've described it, and at O11 the test I put in to determine a task
is interactive by credits seems to select out the wrong one and keep that one
elevated.

Anyway most of this discussion is now moot as I've discussed with Ingo at
length a lot of the little things I've found that help and I suspect he will
develop his own, better solutions for them.

Con

2003-08-04 03:01:09

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity

On Sun, 3 Aug 2003 21:25, Ingo Molnar wrote:
> On Sun, 3 Aug 2003, Con Kolivas wrote:
> > Reverted Ingo's EXPIRED_STARVING definition; it was making interactive
> > tasks expire faster as cpu load increased.
>
> hm, that bit was needed for fairness, otherwise 'interactive' tasks can
> monopolize the CPU. [ Have you tried lots of copies of thud.c (and
> Davide's irman2 with the most evil settings), do they still produce no
> starvation with this bit restored to the original logic? ]

I tried them aggressively; irman2 and thud don't hurt here. The idle detection
limits both of them from gaining too much sleep_avg while waiting around and
they dont get better dynamic priority than 17.

Con

2003-08-04 19:48:54

by Charlie Baylis

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity


> I tried them aggressively; irman2 and thud don't hurt here. The idle
> detection limits both of them from gaining too much sleep_avg while waiting
> around and they dont get better dynamic priority than 17.

Sounds like you've taken the teeth out of the thud program :) The original aim
was to demonstrate what happens when a maximally interactive task suddenly
becomes a CPU hog - similar to a web browser starting to render and causing
intense X activity in the process. Stopping thud getting maximum priority is
addressing the symptom, not the cause. (That's not to say the idle detection
is a bad idea - but it's not the complete answer)

What happens if you change the line
struct timespec st={10,50000000};
to
struct timespec st={0,250000000};

and the line
nanosleep(&st, 0);
to
for (n=0; n<40; n++) nanosleep(&st, 0);

the idea is to do a little bit of work so that the idle detection doesn't kick
in and thud can reach the max interactive bonus. (I haven't tried your patch
yet to see if this change achieves this)

Charlie

2003-08-05 02:05:36

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity

On Tue, 5 Aug 2003 05:50, Charlie Baylis wrote:
> > I tried them aggressively; irman2 and thud don't hurt here. The idle
> > detection limits both of them from gaining too much sleep_avg while
> > waiting around and they dont get better dynamic priority than 17.
>
> Sounds like you've taken the teeth out of the thud program :) The original
> aim was to demonstrate what happens when a maximally interactive task
> suddenly becomes a CPU hog - similar to a web browser starting to render
> and causing intense X activity in the process. Stopping thud getting
> maximum priority is addressing the symptom, not the cause. (That's not to
> say the idle detection is a bad idea - but it's not the complete answer)

It was a side effect that it helped this particular issue. The idle detection
was based around helping real world scenarios and it just happened to help.

> the idea is to do a little bit of work so that the idle detection doesn't
> kick in and thud can reach the max interactive bonus. (I haven't tried your
> patch yet to see if this change achieves this)

Good call; I was quite aware this is the most effective way to create a fork
bomb with my patch, but it's effect while being noticably worse than the
original thud is still not disastrous. Yes I do appreciate variations on the
theme can be made worse again; I'm doing some testing and experimenting there
to see how best to tackle it.

Con

2003-08-05 22:37:47

by Timothy Miller

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity



Charlie Baylis wrote:
>>I tried them aggressively; irman2 and thud don't hurt here. The idle
>>detection limits both of them from gaining too much sleep_avg while waiting
>>around and they dont get better dynamic priority than 17.
>
>
> Sounds like you've taken the teeth out of the thud program :) The original aim
> was to demonstrate what happens when a maximally interactive task suddenly
> becomes a CPU hog - similar to a web browser starting to render and causing
> intense X activity in the process. Stopping thud getting maximum priority is
> addressing the symptom, not the cause. (That's not to say the idle detection
> is a bad idea - but it's not the complete answer)
>
> What happens if you change the line
> struct timespec st={10,50000000};
> to
> struct timespec st={0,250000000};
>
> and the line
> nanosleep(&st, 0);
> to
> for (n=0; n<40; n++) nanosleep(&st, 0);
>
> the idea is to do a little bit of work so that the idle detection doesn't kick
> in and thud can reach the max interactive bonus. (I haven't tried your patch
> yet to see if this change achieves this)


MR.BARNARD: (shouting) What do you want?
MAN: Well I was told outside...
MRB: Don't give me that you snotty-faced heap of parrot droppings!
MAN: What!
MRB: Shut your festering gob you tit! Your type makes me puke! You
vacuous toffee-nosed malodorous pervert!!
MAN: Look! I came here for an argument.
MRB: (calmly) Oh! I'm sorry, this is abuse!


Or closer to the point:

"For each record player, there is a record which it cannot play."
For more detail, please read this dialog:
http://www.geocities.com/g0del_escher_bach/dialogue4.html



The interactivity detection algorithm will always be inherently
imperfect. Given that it is not psychic, it cannot tell in advance
whether or not a given process is supposed to be interactive, so it must
GUESS based on its behavior.

Furthermore, for any given scheduler algorithm, it is ALWAYS possible to
write a program which causes it to misbehave.

This "thud" program is Goedel's theorem for the interactivity scheduler
(well, that's not exactly right, but you get the idea). It breaks the
system. If you redesign the scheduler to make "thud" work, then someone
will write "thud2" (which is what you have just done!) which breaks the
scheduler. Ad infinitum. It will never end. And in this case,
optimizing for "thud" is likely also to make some other much more common
situations WORSE.


So, while it MAY be of value to write a few "thud" programs, don't let
it go too far. The scheduler should optimize for REAL loads -- things
that people actually DO. You will always be able to break it by
reverse-engineering it and writing a program which violates its
expectations. Don't worry about it. You will always be able to break
it if you try hard enough.


As Linus says, "Perfect" is the enemy of "Good".


2003-08-06 00:10:49

by charlie.baylis

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity

On Tue, Aug 05, 2003 at 06:49:56PM -0400, Timothy Miller wrote:
> The interactivity detection algorithm will always be inherently
> imperfect. Given that it is not psychic, it cannot tell in advance
> whether or not a given process is supposed to be interactive, so it must
> GUESS based on its behavior.
>
> Furthermore, for any given scheduler algorithm, it is ALWAYS possible to
> write a program which causes it to misbehave.
>
> This "thud" program is Goedel's theorem for the interactivity scheduler
> (well, that's not exactly right, but you get the idea). It breaks the
> system. If you redesign the scheduler to make "thud" work, then someone
> will write "thud2" (which is what you have just done!) which breaks the
> scheduler. Ad infinitum. It will never end. And in this case,
> optimizing for "thud" is likely also to make some other much more common
> situations WORSE.

No, you've got this backwards. The thud program is a short piece of code
written to allow other kernel developers to reliably reproduce a specific
problem with the scheduler - that is, when only a small number of maximally
interactive tasks suddenly become CPU hogs they were able to starve most other
processes for several seconds. This is an entirely real world case (see my
original posting to explain this), and it causes problems because, as you say,
the scheduler is not psychic.

I say you've got it backwards because thud is a much simpler piece of code than
Konqueror + XFree86, and it is simpler because uses 'reverse-engineered'
knowledge about the scheduler to manipulate it's dynamic priority to the point
where the scheduler problems I reported could be reproduced. As Con's changes
have broken these assumptions, thud needs updating so that it can continue to
behave equivalently on the newer schedulers. (The changes I gave will have no
effect on the old versions of the scheduler)

> So, while it MAY be of value to write a few "thud" programs, don't let
> it go too far. The scheduler should optimize for REAL loads -- things
> that people actually DO. You will always be able to break it by
> reverse-engineering it and writing a program which violates its
> expectations.

I think you're confused between the thud test and the thud implementation. The
thud implementation is a hacked up bit of C. The thud test is seeing a
maximally interactive task suddenly become a CPU hog. Once the implementation
no longer performs the test it ceases to be interesting[1], and it must be
updated, otherwise you aren't getting the test coverage you think you are
getting.

Certainly, thud isn't the only program by which a scheduler should be measured.
If you have a better, more realistic and reproducible test case, or even just a
different one, which can help evaluate the performance of the interactivity
estimator then I'd love to see it :) (Actually, if you/someone can write a
simple program which can replace xmms skips as the standard "scheduler is
good/bad" benchmark that would be great. It wants to do n milliseconds of work
every m milliseconds and report the minimum/maximum/average time it took to do
the sleep and the work. Or something like that)

> Don't worry about it. You will always be able to break it if you try hard
> enough.

The scheduler DOES have to cope with tasks which behave oddly, because it can
make bad decisions, and it has to be able to recover quickly or these corner
cases may be used to form a denial of service attack.

> As Linus says, "Perfect" is the enemy of "Good".

That doesn't preclude making things better. Otherwise, why ditch the "good" 2.4
scheduler :)

Charlie

[1] unless the new test it performs is interesting for some other reason.

2003-08-06 01:23:58

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity

Quoting Timothy Miller <[email protected]>:
>
> The interactivity detection algorithm will always be inherently
> imperfect. Given that it is not psychic, it cannot tell in advance
> whether or not a given process is supposed to be interactive, so it must
> GUESS based on its behavior.
>
> Furthermore, for any given scheduler algorithm, it is ALWAYS possible to
> write a program which causes it to misbehave.
>
> This "thud" program is Goedel's theorem for the interactivity scheduler
> (well, that's not exactly right, but you get the idea). It breaks the
> system. If you redesign the scheduler to make "thud" work, then someone
> will write "thud2" (which is what you have just done!) which breaks the
> scheduler. Ad infinitum. It will never end. And in this case,
> optimizing for "thud" is likely also to make some other much more common
> situations WORSE.
>
>
> So, while it MAY be of value to write a few "thud" programs, don't let
> it go too far. The scheduler should optimize for REAL loads -- things
> that people actually DO. You will always be able to break it by
> reverse-engineering it and writing a program which violates its
> expectations. Don't worry about it. You will always be able to break
> it if you try hard enough.

Thank you for your commentary which I agree with. With respect to these
potential issues I have always worked on a fix for where I thought real world
applications might cause these rather than try and fix it for just that program.
It was actually the opposite reason that my patch prevented thud from working;
it is idle tasks that become suddenly cpu hogs that in the real world are
potential starvers, and I made a useful fix for that issue. Thud just happened
to simulate those conditions and I only tested for it after I heard of thud. So
just a (hopefully reassuring) reminder; I'm not making an xmms interactivity
estimator, nor an X estimator, nor a "fix this exploit" one and so on.

Con

2003-08-06 22:11:45

by Timothy Miller

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity



Con Kolivas wrote:
> Quoting Timothy Miller <[email protected]>:
>

>
>
> Thank you for your commentary which I agree with. With respect to these
> potential issues I have always worked on a fix for where I thought real world
> applications might cause these rather than try and fix it for just that program.
> It was actually the opposite reason that my patch prevented thud from working;
> it is idle tasks that become suddenly cpu hogs that in the real world are
> potential starvers, and I made a useful fix for that issue. Thud just happened
> to simulate those conditions and I only tested for it after I heard of thud. So
> just a (hopefully reassuring) reminder; I'm not making an xmms interactivity
> estimator, nor an X estimator, nor a "fix this exploit" one and so on.
>


I have always assumed that things like X and xmms were just examples of
the various sorts of things people would run when testing your scheduler.

But it was a mistaken assumption on my part that thud was an artificial
work load. The author of thud, I believe it was, explained to me how
thud is a simulation of a real workload, reverse-engineered from
real-world experience.

My apologies.


2003-08-11 13:44:27

by Rob Landley

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity

On Tuesday 05 August 2003 18:49, Timothy Miller wrote:

> Or closer to the point:
>
> "For each record player, there is a record which it cannot play."
> For more detail, please read this dialog:
> http://www.geocities.com/g0del_escher_bach/dialogue4.html
...
> The interactivity detection algorithm will always be inherently
> imperfect. Given that it is not psychic, it cannot tell in advance
> whether or not a given process is supposed to be interactive, so it must
> GUESS based on its behavior.

Another way of looking at it is that every time you remove a bottleneck, the
next most serious problem becomes the new bottleneck.

Does this mean it's a bad idea to stop trying to identify the next bottleneck?
(Whether or not you then choose to deal with it is another question...)

Rob

2003-08-11 23:35:32

by Timothy Miller

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity



Rob Landley wrote:
> On Tuesday 05 August 2003 18:49, Timothy Miller wrote:
>
>
>>Or closer to the point:
>>
>>"For each record player, there is a record which it cannot play."
>>For more detail, please read this dialog:
>>http://www.geocities.com/g0del_escher_bach/dialogue4.html
>
> ...
>
>>The interactivity detection algorithm will always be inherently
>>imperfect. Given that it is not psychic, it cannot tell in advance
>>whether or not a given process is supposed to be interactive, so it must
>>GUESS based on its behavior.
>
>
> Another way of looking at it is that every time you remove a bottleneck, the
> next most serious problem becomes the new bottleneck.
>
> Does this mean it's a bad idea to stop trying to identify the next bottleneck?
> (Whether or not you then choose to deal with it is another question...)


No. It just means that it's possible to produce artificial loads that
break things, and since those artificial loads won't be encountered in
typical usage, they should not be optimized for.


Mind you, we prefer that the worst case "record you cannot play" doesn't
have TOO much impact, because we don't want people writing DoS programs
which exploit those artificial cases.

2003-08-12 00:16:52

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity

Rob Landley wrote:
>> Another way of looking at it is that every time you remove a bottleneck,
>> the next most serious problem becomes the new bottleneck.
>> Does this mean it's a bad idea to stop trying to identify the next
>> bottleneck? (Whether or not you then choose to deal with it is another
>> question...)

On Mon, Aug 11, 2003 at 07:49:31PM -0400, Timothy Miller wrote:
> No. It just means that it's possible to produce artificial loads that
> break things, and since those artificial loads won't be encountered in
> typical usage, they should not be optimized for.
> Mind you, we prefer that the worst case "record you cannot play" doesn't
> have TOO much impact, because we don't want people writing DoS programs
> which exploit those artificial cases.

Guys, it's _way_ premature to say any of this. AFAICT _no_ alternatives
to the duelling queues with twiddled priorities have been explored yet,
nor has the maximum been squeezed out of twiddling the methods for
priority adjustment in that yet (which is Con Kolivas' area).


-- wli

2003-08-12 14:50:39

by Timothy Miller

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity



William Lee Irwin III wrote:

>
> Guys, it's _way_ premature to say any of this. AFAICT _no_ alternatives
> to the duelling queues with twiddled priorities have been explored yet,
> nor has the maximum been squeezed out of twiddling the methods for
> priority adjustment in that yet (which is Con Kolivas' area).
>


Ok... this reminds me that there is an aspect of all of this that I
don't understand. Please pardon my ignorance. And furthermore, if
there is some document which answers all of my questions, please direct
me to it so I don't waste your time.



I understand that the O(1) scheduler uses two queues. One is the active
queue, and the other is the expired queue. When a process has exhausted
its timeslice, it gets put into the expired queue (at the end, I
presume). If not, it gets put into the active queue.

Is this the vanilla scheduler?

One thing I don't understand is, for a given queue, how do priorities
affect running of processes? Two possibilities come to mind:

1) All pri 10 processes will be run before any pri 11 processes.
2) A pri 10 process will be run SLIGHTLY more often than a pri 11 process.

For the former, is the active queue scanned for runnable processes of
the highest priority? If that's the case, why not have one queue for
each priority level? Wouldn't that reduce the amount of scanning
through the queue?

What it comes down to that I want to know is if priorities affect
running of processes linearly or exponentially.

How do nice levels affect priorities? (Vanilla and interactive)

How do priorities affect processes in the expired queue?

In the vanilla scheduler, can a low enough nice value keep an expired
process off the expired queue? How is that determined?

Does the vanilla scheduler have priorities? If so, how are they determined?


Thanks.

2003-08-12 23:31:11

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity

On Tue, Aug 12, 2003 at 11:04:44AM -0400, Timothy Miller wrote:
> Ok... this reminds me that there is an aspect of all of this that I
> don't understand. Please pardon my ignorance. And furthermore, if
> there is some document which answers all of my questions, please direct
> me to it so I don't waste your time.

No trouble at all.


On Tue, Aug 12, 2003 at 11:04:44AM -0400, Timothy Miller wrote:
> I understand that the O(1) scheduler uses two queues. One is the active
> queue, and the other is the expired queue. When a process has exhausted
> its timeslice, it gets put into the expired queue (at the end, I
> presume). If not, it gets put into the active queue.
> Is this the vanilla scheduler?

The equivalent for the 2.4.x scheduler is the "epoch"; essentially what
2.6.x has implemented is incremental timeslice assignment for epoch
expiry. The way that goes is that when a timeslice runs out, it's
recomputed, and the expired queue is used as a placeholder for the
information about what to run and in what order after the epoch expires
(i.e. the active queue empties). When the epoch expires/active queue
empties, the two queues exchange places. The net effect is that the
duelling queues end up representing a circular list maintained in some
special order that can have insertions done into the middle of it for
tasks designated as "interactive", and that priority preemption's
effectiveness is unclear.

Obviously this threatens to degenerate to something RR-like that fails
to give the desired bonuses to short-running tasks. The way priorities
are enforced is dynamic timeslice assignment, where longer tasks
receive shorter timeslices and shorter tasks receive longer timeslices,
and they're readjusted at various times, which is a somewhat unusual
design decision (as is the epoch bit). It also deviates from RR where
interactive tasks can re-enter the active queue at timeslice expiry.
So in this way, the favoring of short-running tasks is recovered from
the otherwise atypical design, as the interactive tasks will often be
re-inserted near the head of the queue as their priorities are high
and their timeslices are retained while sleeping.


On Tue, Aug 12, 2003 at 11:04:44AM -0400, Timothy Miller wrote:
> One thing I don't understand is, for a given queue, how do priorities
> affect running of processes? Two possibilities come to mind:
> 1) All pri 10 processes will be run before any pri 11 processes.
> 2) A pri 10 process will be run SLIGHTLY more often than a pri 11 process.
> For the former, is the active queue scanned for runnable processes of
> the highest priority? If that's the case, why not have one queue for
> each priority level? Wouldn't that reduce the amount of scanning
> through the queue?

(1) is the case. Things differ a bit from what you might read about
because of the epoch business.

The active queue is scanned, but the queue data structure is organized
so that it's composed of a separate list for each priority, and a
bitmap is maintained alongside the array of lists for quicker searches
for the highest nonempty priority (numerically low), and so scanning is
an O(1) algorithm and touches very few cachelines (if more than one).


On Tue, Aug 12, 2003 at 11:04:44AM -0400, Timothy Miller wrote:
> What it comes down to that I want to know is if priorities affect
> running of processes linearly or exponentially.
> How do nice levels affect priorities? (Vanilla and interactive)
> How do priorities affect processes in the expired queue?
> In the vanilla scheduler, can a low enough nice value keep an expired
> process off the expired queue? How is that determined?
> Does the vanilla scheduler have priorities? If so, how are they determined?

Neither linearly nor exponentially; nice levels assign static
priorities; dynamic priorities (the ones used for scheduling decisions)
are restricted to the range where |dynamic_prio - static_prio| <= 5.
Nice values do not keep tasks off the expired queue. Tasks on the
expired queue are ordered by priority and not examined until the epoch
expiry. The 2.4.x scheduler had static priorities (nice numbers) and
recomputed what amounted to dynamic priorities (the "goodness" rating)
on the fly each time a task was examined. 2.4.x also recomputed
priorities via the infamous for_each_task() loop
for_each_task(p)
p->count = (p->counter >> 1) + NICE_TO_TICKS(p->nice)
in the if (unlikely(!c)) case of the repeat_schedule: label where 2.6.x
merely examines the priority when the task exhausts its timeslice to
recompute it (and so the expired queue is a useful part of the
mechanics of the algorithm: it's a placeholder for tasks whose
timeslices have been recomputed but don't actually deserve to be run).


-- wli

2003-08-13 15:32:26

by Timothy Miller

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity

Thank you for answering. Your explanation are very enlightening. You
should be quoted on KernelTrap.org. :)

I have more questions....


William Lee Irwin III wrote:
> On Tue, Aug 12, 2003 at 11:04:44AM -0400, Timothy Miller wrote:
>
>>Ok... this reminds me that there is an aspect of all of this that I
>>don't understand. Please pardon my ignorance. And furthermore, if
>>there is some document which answers all of my questions, please direct
>>me to it so I don't waste your time.
>
>
> No trouble at all.
>
>
> On Tue, Aug 12, 2003 at 11:04:44AM -0400, Timothy Miller wrote:
>
>>I understand that the O(1) scheduler uses two queues. One is the active
>>queue, and the other is the expired queue. When a process has exhausted
>>its timeslice, it gets put into the expired queue (at the end, I
>>presume). If not, it gets put into the active queue.
>>Is this the vanilla scheduler?
>
>
> The equivalent for the 2.4.x scheduler is the "epoch"; essentially what
> 2.6.x has implemented is incremental timeslice assignment for epoch
> expiry.

What do you mean by "incremental"? And by 2.6.x, are you talking about
vanilla, or interactive?

> The way that goes is that when a timeslice runs out, it's
> recomputed, and the expired queue is used as a placeholder for the
> information about what to run and in what order after the epoch expires
> (i.e. the active queue empties).

Where are blocked tasks placed? Are they put somewhere else and readded
to the active queue when their I/O (or whatever) completes? How is
their ordering in the queue determined? That is, does a task which
wakes up preempt the running task? Does it get run "next"?

In Operating Systems design class in college, we were told that when an
I/O interrupt arrives, the currently active task isn't immediately
returned to, because the overhead of the context switch and the fact
that we don't know how near to the end the task is to its timeslice
makes it not worth it, so we just move on to the next task.

> When the epoch expires/active queue
> empties, the two queues exchange places. The net effect is that the
> duelling queues end up representing a circular list maintained in some
> special order that can have insertions done into the middle of it for
> tasks designated as "interactive", and that priority preemption's
> effectiveness is unclear.

And by "middle", do you mean at the end of the active queue which is the
"middle" if you think of the expired queue as being a continuation of
the active queue?

> Obviously this threatens to degenerate to something RR-like that fails
> to give the desired bonuses to short-running tasks. The way priorities
> are enforced is dynamic timeslice assignment, where longer tasks
> receive shorter timeslices and shorter tasks receive longer timeslices,

How do you vary timeslice? Can you tell the interrupt hardware to
"interrupt me after X clock ticks"? Or do you service the interrupt and
just do very little in the ISR if you're not going to switch contexts?
Isn't that a bit of overhead?

> and they're readjusted at various times, which is a somewhat unusual
> design decision (as is the epoch bit). It also deviates from RR where
> interactive tasks can re-enter the active queue at timeslice expiry.
> So in this way, the favoring of short-running tasks is recovered from
> the otherwise atypical design, as the interactive tasks will often be
> re-inserted near the head of the queue as their priorities are high
> and their timeslices are retained while sleeping.
>
>
> On Tue, Aug 12, 2003 at 11:04:44AM -0400, Timothy Miller wrote:
>
>>One thing I don't understand is, for a given queue, how do priorities
>>affect running of processes? Two possibilities come to mind:
>>1) All pri 10 processes will be run before any pri 11 processes.
>>2) A pri 10 process will be run SLIGHTLY more often than a pri 11 process.
>>For the former, is the active queue scanned for runnable processes of
>>the highest priority? If that's the case, why not have one queue for
>>each priority level? Wouldn't that reduce the amount of scanning
>>through the queue?
>
>
> (1) is the case. Things differ a bit from what you might read about
> because of the epoch business.
>
> The active queue is scanned, but the queue data structure is organized
> so that it's composed of a separate list for each priority, and a
> bitmap is maintained alongside the array of lists for quicker searches
> for the highest nonempty priority (numerically low), and so scanning is
> an O(1) algorithm and touches very few cachelines (if more than one).

Cool. So, it's not so much a "queue" as it is a sort of an array of
lists, eh?

>
> On Tue, Aug 12, 2003 at 11:04:44AM -0400, Timothy Miller wrote:
>
>>What it comes down to that I want to know is if priorities affect
>>running of processes linearly or exponentially.
>>How do nice levels affect priorities? (Vanilla and interactive)
>>How do priorities affect processes in the expired queue?
>>In the vanilla scheduler, can a low enough nice value keep an expired
>>process off the expired queue? How is that determined?
>>Does the vanilla scheduler have priorities? If so, how are they determined?
>
>
> Neither linearly nor exponentially; nice levels assign static
> priorities; dynamic priorities (the ones used for scheduling decisions)
> are restricted to the range where |dynamic_prio - static_prio| <= 5.
> Nice values do not keep tasks off the expired queue. Tasks on the
> expired queue are ordered by priority and not examined until the epoch
> expiry.

Doesn't it cause starvation of expired tasks if interactive tasks keep
hogging the active queue? Does it matter?

> The 2.4.x scheduler had static priorities (nice numbers) and
> recomputed what amounted to dynamic priorities (the "goodness" rating)
> on the fly each time a task was examined. 2.4.x also recomputed
> priorities via the infamous for_each_task() loop
> for_each_task(p)
> p->count = (p->counter >> 1) + NICE_TO_TICKS(p->nice)

What does this do, exactly?

> in the if (unlikely(!c)) case of the repeat_schedule: label where 2.6.x
> merely examines the priority when the task exhausts its timeslice to
> recompute it (and so the expired queue is a useful part of the
> mechanics of the algorithm: it's a placeholder for tasks whose
> timeslices have been recomputed but don't actually deserve to be run).

Is this recomputation of timeslice based on the heuristics in the
interactive scheduler, or does vanilla also do something like this?
What does it do differently?

2003-08-14 06:09:12

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity

William Lee Irwin III wrote:
>> The equivalent for the 2.4.x scheduler is the "epoch"; essentially what
>> 2.6.x has implemented is incremental timeslice assignment for epoch
>> expiry.

On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote:
> What do you mean by "incremental"? And by 2.6.x, are you talking about
> vanilla, or interactive?

I had no idea you had in mind Con Kolivas' changes vs. 2.6.x as
"vanilla" vs. otherwise; I thought you meant 2.4.x vs. 2.6.x. In
general, Con's patches just change how dynamic priorities are computed.

The "incremental" business meant that tasks have their timeslices
for the next epoch as they are moved onto the expired queue. In the
2.4.x scheduler this was not incremental, so when the epoch expired,
every task on the system had its timeslice recomputed in one large
(and time-consuming) loop. The 2.4.x case was also moderately wasteful
as it recomputed priorities for tasks regardless of whether they were
runnable, which IMHO is a flaw for which corrections are 2.4.x-mergeable.


William Lee Irwin III wrote:
>> The way that goes is that when a timeslice runs out, it's
>> recomputed, and the expired queue is used as a placeholder for the
>> information about what to run and in what order after the epoch expires
>> (i.e. the active queue empties).

On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote:
> Where are blocked tasks placed? Are they put somewhere else and readded
> to the active queue when their I/O (or whatever) completes? How is
> their ordering in the queue determined? That is, does a task which
> wakes up preempt the running task? Does it get run "next"?
> In Operating Systems design class in college, we were told that when an
> I/O interrupt arrives, the currently active task isn't immediately
> returned to, because the overhead of the context switch and the fact
> that we don't know how near to the end the task is to its timeslice
> makes it not worth it, so we just move on to the next task.

Blocked tasks are taken off the runqueues entirely, and are found by
device drivers etc. by putting them on a waitqueue so whenever whatever
they're blocked on happens, they can be found and put on the runqueue.


William Lee Irwin III wrote:
>> When the epoch expires/active queue
>> empties, the two queues exchange places. The net effect is that the
>> duelling queues end up representing a circular list maintained in some
>> special order that can have insertions done into the middle of it for
>> tasks designated as "interactive", and that priority preemption's
>> effectiveness is unclear.

On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote:
> And by "middle", do you mean at the end of the active queue which is the
> "middle" if you think of the expired queue as being a continuation of
> the active queue?

The "middle" is some arbitrary position in this list that makes it
neither the last nor the first of the currently runnable tasks to be
run after the currently running task. The above analogy removes the
distinctions between active and expired queues, so it doesn't make
sense to ask about the boundary between the two.

To rehash, consider changing the representation of the scheduler's
internals from two arrays to a single doubly-linked list (obviously
it's not useful to implement this, but it is conceptually useful).
Movement from the active queue to the expired queue would then be
modeled by simply list_move_tail(&task->run_list, &run_list).
Recycling interactive tasks to the active list or inserting newly
woken tasks at particular priorities would involve hunting for a
special position in the list and doing list_add() to there. Timeslices
are entirely orthogonal; they control when list movement is done, but
not how it's done.

In this way, we can see that the priorities merely control how tasks
are jumbled together into the list because the above alternative
representation faithfully models the current arrangement. We're
essentially cycling through this circular list and have to struggle
to recover preferences for interactive tasks by fiddling with
timeslices and the like, as otherwise such a circular queue would be
completely fair, to the detriment of interactive tasks.

More common arrangements avoid this "tying of the knot" with the queue
swapping, and because of this, are capable of expressing long-term
preferences directly in terms of priorities, as opposed to having to
use timeslices to do so. Instead, timeslices are used to express the
"scale" on which scheduling events should happen, and as tasks become
more cpu-bound, they have longer timeslices, so that two cpu-bound
tasks of identical priority will RR very slowly and have reduced
context switch overhead, but are near infinitely preemptible by more
interactive or short-running tasks.


William Lee Irwin III wrote:
>> Obviously this threatens to degenerate to something RR-like that fails
>> to give the desired bonuses to short-running tasks. The way priorities
>> are enforced is dynamic timeslice assignment, where longer tasks
>> receive shorter timeslices and shorter tasks receive longer timeslices,

On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote:
> How do you vary timeslice? Can you tell the interrupt hardware to
> "interrupt me after X clock ticks"? Or do you service the interrupt and
> just do very little in the ISR if you're not going to switch contexts?
> Isn't that a bit of overhead?

task_timeslice() does a computation based on the dynamic priority to
assign timeslices. This computation is effectively linear.


William Lee Irwin III wrote:
>> The active queue is scanned, but the queue data structure is organized
>> so that it's composed of a separate list for each priority, and a
>> bitmap is maintained alongside the array of lists for quicker searches
>> for the highest nonempty priority (numerically low), and so scanning is
>> an O(1) algorithm and touches very few cachelines (if more than one).

On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote:
> Cool. So, it's not so much a "queue" as it is a sort of an array of
> lists, eh?

The data structure itself is a priority queue; the implementation (or
representation) of it used in the scheduler is an array of lists with
a bitmap on the side. Other representations such as relaxed heaps or
complete binary tree (CBT) heaps are possible, though probably not
particularly useful for such small priority spaces, which are faster
to exhaustively search with ffz() and the like than to use such
structures to search.


William Lee Irwin III wrote:
>> Neither linearly nor exponentially; nice levels assign static
>> priorities; dynamic priorities (the ones used for scheduling decisions)
>> are restricted to the range where |dynamic_prio - static_prio| <= 5.
>> Nice values do not keep tasks off the expired queue. Tasks on the
>> expired queue are ordered by priority and not examined until the epoch
>> expiry.

On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote:
> Doesn't it cause starvation of expired tasks if interactive tasks keep
> hogging the active queue? Does it matter?

It may very well cause starvation of expired tasks. It's unclear whether
it matters. Instrumentation for these kinds of things is lacking, along
with any results indicating what on earth is going on.


William Lee Irwin III wrote:
>> The 2.4.x scheduler had static priorities (nice numbers) and
>> recomputed what amounted to dynamic priorities (the "goodness" rating)
>> on the fly each time a task was examined. 2.4.x also recomputed
>> priorities via the infamous for_each_task() loop
>> for_each_task(p)
>> p->count = (p->counter >> 1) + NICE_TO_TICKS(p->nice)

On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote:
> What does this do, exactly?

It loops through every task on the system assigning timeslices to every
task according to its nice number.


William Lee Irwin III wrote:
>> in the if (unlikely(!c)) case of the repeat_schedule: label where 2.6.x
>> merely examines the priority when the task exhausts its timeslice to
>> recompute it (and so the expired queue is a useful part of the
>> mechanics of the algorithm: it's a placeholder for tasks whose
>> timeslices have been recomputed but don't actually deserve to be run).

On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote:
> Is this recomputation of timeslice based on the heuristics in the
> interactive scheduler, or does vanilla also do something like this?
> What does it do differently?

The timeslices in 2.6.x are controlled indirectly by the dynamic
priority. But the dynamic priorities in 2.6.x are assigned according to
interactivity heuristics, in both vanilla and Kolivas' patches.


-- wli

2003-08-14 07:00:18

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity

On Thu, 14 Aug 2003 16:09, William Lee Irwin III wrote:
>> "scale" on which scheduling events should happen, and as tasks become
>> more cpu-bound, they have longer timeslices, so that two cpu-bound
>> tasks of identical priority will RR very slowly and have reduced
>> context switch overhead, but are near infinitely preemptible by more
>> interactive or short-running tasks.

On Thu, Aug 14, 2003 at 04:59:33PM +1000, Con Kolivas wrote:
> Actually the timeslice handed out is purely dependent on the static priority,
> not the priority it is elevated or demoted to by the interactivity estimator.
> However lower priority tasks (cpu bound ones if the estimator has worked
> correctly) will always be preempted by higher priority tasks (interactive
> ones) whenever they wake up.

So it is; the above commentary was rather meant to suggest that the
lengthening of timeslices in conventional arrangements did not penalize
interactive tasks, not to imply that priority preemption was not done
at all in the current scheduler.


-- wli

2003-08-14 06:53:41

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity

On Thu, 14 Aug 2003 16:09, William Lee Irwin III wrote:
> William Lee Irwin III wrote:

> "scale" on which scheduling events should happen, and as tasks become
> more cpu-bound, they have longer timeslices, so that two cpu-bound
> tasks of identical priority will RR very slowly and have reduced
> context switch overhead, but are near infinitely preemptible by more
> interactive or short-running tasks.

Actually the timeslice handed out is purely dependent on the static priority,
not the priority it is elevated or demoted to by the interactivity estimator.
However lower priority tasks (cpu bound ones if the estimator has worked
correctly) will always be preempted by higher priority tasks (interactive
ones) whenever they wake up.

Con

2003-08-14 07:40:55

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity

On Thu, 14 Aug 2003 17:01, William Lee Irwin III wrote:
> On Thu, 14 Aug 2003 16:09, William Lee Irwin III wrote:
> >> "scale" on which scheduling events should happen, and as tasks become
> >> more cpu-bound, they have longer timeslices, so that two cpu-bound
> >> tasks of identical priority will RR very slowly and have reduced
> >> context switch overhead, but are near infinitely preemptible by more
> >> interactive or short-running tasks.
>
> On Thu, Aug 14, 2003 at 04:59:33PM +1000, Con Kolivas wrote:
> > Actually the timeslice handed out is purely dependent on the static
> > priority, not the priority it is elevated or demoted to by the
> > interactivity estimator. However lower priority tasks (cpu bound ones if
> > the estimator has worked correctly) will always be preempted by higher
> > priority tasks (interactive ones) whenever they wake up.
>
> So it is; the above commentary was rather meant to suggest that the
> lengthening of timeslices in conventional arrangements did not penalize
> interactive tasks, not to imply that priority preemption was not done
> at all in the current scheduler.

While you're on the subject can I just clarify some details about the arrays.
If a task doesn't use up it's full timeslice and goes back to sleep, it
starts gaining bonus points which elevate it's interactivity in the eyes of
the scheduler. When it wakes up again, it will always go onto the active
array. While running, it's bonus points are being burnt off on a time basis.
If it then uses up it's timeslice, depending on whether it is above or below
a cuttoff (the delta) based on the level of interactivity of the task, the
scheduler will decide to put it on the end of the active array again, or
expire it. Thus tasks that never sleep are always below the interactive delta
so each time they use up their timeslice they go onto the expired array.
Tasks with enough bonus points can go back onto the active array if they
haven't used up those bonus points.

As wli said, most of my tweaks just change what I do with the data given to me
by the scheduler, and looks for patterns in the way processes run to choose
what bonus to give each process. The rest is the same as the vanilla tree.

Lots more to say but that should make it clear enough for understanding.

Con

2003-08-14 19:45:56

by Timothy Miller

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity



William Lee Irwin III wrote:
> On Thu, 14 Aug 2003 16:09, William Lee Irwin III wrote:
>
>>>"scale" on which scheduling events should happen, and as tasks become
>>>more cpu-bound, they have longer timeslices, so that two cpu-bound
>>>tasks of identical priority will RR very slowly and have reduced
>>>context switch overhead, but are near infinitely preemptible by more
>>>interactive or short-running tasks.
>>
>
> On Thu, Aug 14, 2003 at 04:59:33PM +1000, Con Kolivas wrote:
>
>>Actually the timeslice handed out is purely dependent on the static priority,
>>not the priority it is elevated or demoted to by the interactivity estimator.
>>However lower priority tasks (cpu bound ones if the estimator has worked
>>correctly) will always be preempted by higher priority tasks (interactive
>>ones) whenever they wake up.
>
>
> So it is; the above commentary was rather meant to suggest that the
> lengthening of timeslices in conventional arrangements did not penalize
> interactive tasks, not to imply that priority preemption was not done
> at all in the current scheduler.


If my guess from my previous email was correct (that is pri 5 gets
shorter timeslide than pri 6), then that means that tasks of higher
static priority have are penalized more than lower pri tasks for expiring.

Say a task has to run for 15ms. If it's at a priority that gives it a
10ms timeslice, then it'll expire and get demoted. If it's at a
priority that gives it a 20ms timeslice, then it'll not expire and
therefore get promoted.

Is that fair?


2003-08-14 19:48:35

by Timothy Miller

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity



Con Kolivas wrote:
> Thus tasks that never sleep are always below the interactive delta
> so each time they use up their timeslice they go onto the expired array.
> Tasks with enough bonus points can go back onto the active array if they
> haven't used up those bonus points.

How does a bonus point translate to a priority level? How many bonus
points can you collect?

2003-08-14 19:40:17

by Timothy Miller

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity

Thanks again for your explanations. I hope my questions don't
eventually get on your nerves. :)

William Lee Irwin III wrote:

> The "incremental" business meant that tasks have their timeslices
> for the next epoch as they are moved onto the expired queue. In the
> 2.4.x scheduler this was not incremental, so when the epoch expired,
> every task on the system had its timeslice recomputed in one large
> (and time-consuming) loop. The 2.4.x case was also moderately wasteful
> as it recomputed priorities for tasks regardless of whether they were
> runnable, which IMHO is a flaw for which corrections are 2.4.x-mergeable.

Ok, so aside from the waste, the difference is that 2.4 and 2.6 do the
similar amounts of work to compute timeslices, but 2.4 did it all at
once, while 2.6 does it one at a time?


>
> William Lee Irwin III wrote:

>
> Blocked tasks are taken off the runqueues entirely, and are found by
> device drivers etc. by putting them on a waitqueue so whenever whatever
> they're blocked on happens, they can be found and put on the runqueue.

Ah... so, say a driver has a bug where it forgets to wake up a task. Is
this what people are talking about when they complain about tasks being
stuck in the "D" state? What does the "D" stand for?


>
> William Lee Irwin III wrote:
>
>>> When the epoch expires/active queue
>>>empties, the two queues exchange places. The net effect is that the
>>>duelling queues end up representing a circular list maintained in some
>>>special order that can have insertions done into the middle of it for
>>>tasks designated as "interactive", and that priority preemption's
>>>effectiveness is unclear.
>>
>
> On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote:
>
>>And by "middle", do you mean at the end of the active queue which is the
>>"middle" if you think of the expired queue as being a continuation of
>>the active queue?
>
>
> The "middle" is some arbitrary position in this list that makes it
> neither the last nor the first of the currently runnable tasks to be
> run after the currently running task. The above analogy removes the
> distinctions between active and expired queues, so it doesn't make
> sense to ask about the boundary between the two.


Are you saying that the boundary CAN be conceptually removed, or are you
saying that by attempting to do that, I am demonstrating a
misunderstanding of the point in having the two separate queues?

Ok, so beyond the prioritization based on dynamic priority, it seems
that you have even finer-grained priorities which affect where in the
queue for a given priority a task is placed once it expires


You talk about priorities being recomputed when a task expires. Does
vanilla do that just on timeslice expiration, while intsched does it
also when a task blocks on I/O?


How many dynamic priority levels are there?


>
> To rehash, consider changing the representation of the scheduler's
> internals from two arrays to a single doubly-linked list (obviously
> it's not useful to implement this, but it is conceptually useful).
> Movement from the active queue to the expired queue would then be
> modeled by simply list_move_tail(&task->run_list, &run_list).
> Recycling interactive tasks to the active list or inserting newly
> woken tasks at particular priorities would involve hunting for a
> special position in the list and doing list_add() to there. Timeslices
> are entirely orthogonal; they control when list movement is done, but
> not how it's done.

Ok, so timeslice length for a task is yet another level of
prioritization then? It seems we have a list of different things that
affect when and how tasks are run that may be related to "priority" in a
multidimentional way.

For active tasks:

- timeslice length (how long it's run)
- dynamic priority (when it's run relative to other priorities)
- position in dynamic priority list (when it's run relative to others at
same priority)

For expired tasks:

- being on the expired queue
- position in the expired queue
- something else I don't know about


This hasn't been entirely clear to me, but is it correct that intsched,
over vanilla, affects more of these things than vanilla (ie. maybe
vanilla doesn't put active tasks in the middle of a queue?) and does so
by paying attention to more about what the task does (ie. vanilla only
recomputes dynamic priority based on expiration and doesn't put expired
tasks into the active array?) ?


>
> In this way, we can see that the priorities merely control how tasks
> are jumbled together into the list because the above alternative
> representation faithfully models the current arrangement. We're
> essentially cycling through this circular list and have to struggle
> to recover preferences for interactive tasks by fiddling with
> timeslices and the like, as otherwise such a circular queue would be
> completely fair, to the detriment of interactive tasks.

Hmmm... I'm confused. I was under the impression that the interactive
scheduler affected both timeslice length AND ordering. It looks like
you're saying O(1)int doesn't affect ordering.

> More common arrangements avoid this "tying of the knot" with the queue
> swapping, and because of this, are capable of expressing long-term
> preferences directly in terms of priorities, as opposed to having to
> use timeslices to do so.

What do you mean by "tying the knot"?

"More common" as in, things other than the way Linux does it?

Are you saying that as a result of the architecture of the O(1)
scheduler, O(1)int has to fiddle with certain things (perhaps less
optimal things?) than other architectures do, perhaps because they are
not O(1)?

My understanding is that O(1)int adjusts dynamic priority based on task
behavior. It looks like you're saying that it doesn't.

> Instead, timeslices are used to express the
> "scale" on which scheduling events should happen, and as tasks become
> more cpu-bound, they have longer timeslices, so that two cpu-bound
> tasks of identical priority will RR very slowly and have reduced
> context switch overhead, but are near infinitely preemptible by more
> interactive or short-running tasks.

It makes sense to me to affect both ordering and timeslice length. What
heuristics affect each of those parameters?

Can you preempt a task in the middle, or do you have to wait until it's
expired?

>
> William Lee Irwin III wrote:
>
>>>Obviously this threatens to degenerate to something RR-like that fails
>>>to give the desired bonuses to short-running tasks. The way priorities
>>>are enforced is dynamic timeslice assignment, where longer tasks
>>>receive shorter timeslices and shorter tasks receive longer timeslices,
>>
>
> On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote:
>
>>How do you vary timeslice? Can you tell the interrupt hardware to
>>"interrupt me after X clock ticks"? Or do you service the interrupt and
>>just do very little in the ISR if you're not going to switch contexts?
>>Isn't that a bit of overhead?
>
>
> task_timeslice() does a computation based on the dynamic priority to
> assign timeslices. This computation is effectively linear.

I was asking more of a hardware question. If your timeslice granularity
is 100Hz, does that mean you get an interrupt from a timer at 100Hz?
What about tasks which have their timeslice adjusted to be 30ms instead
of the minimum 10ms? One of these must be the case:

1) The interrupt occurs at 100Hz, but for two of them, the ISR
immediately returns to the task. ISR runtime is minimized to minimize
overhead of the ignored interrupts.

2) The hardware can be told to only interrupt after N number of timer ticks.


>
> William Lee Irwin III wrote:
>
>>>The active queue is scanned, but the queue data structure is organized
>>>so that it's composed of a separate list for each priority, and a
>>>bitmap is maintained alongside the array of lists for quicker searches
>>>for the highest nonempty priority (numerically low), and so scanning is
>>>an O(1) algorithm and touches very few cachelines (if more than one).
>>
>
> On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote:
>
>>Cool. So, it's not so much a "queue" as it is a sort of an array of
>>lists, eh?
>
>
> The data structure itself is a priority queue; the implementation (or
> representation) of it used in the scheduler is an array of lists with
> a bitmap on the side. Other representations such as relaxed heaps or
> complete binary tree (CBT) heaps are possible, though probably not
> particularly useful for such small priority spaces, which are faster
> to exhaustively search with ffz() and the like than to use such
> structures to search.

Well, as I interpret what I read, it's more of a priority queue of
priority queues, that is if you are able to affect the ordering _within_
a given priority level.



>>>The 2.4.x scheduler had static priorities (nice numbers) and
>>>recomputed what amounted to dynamic priorities (the "goodness" rating)
>>>on the fly each time a task was examined. 2.4.x also recomputed
>>>priorities via the infamous for_each_task() loop
>>> for_each_task(p)
>>> p->count = (p->counter >> 1) + NICE_TO_TICKS(p->nice)
>>
>
> On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote:
>
>>What does this do, exactly?
>
>
> It loops through every task on the system assigning timeslices to every
> task according to its nice number.

So, a priority is measured in ticks, and a task has a accumulation of
unused ticks, and that's divided by two and added to the nice boost and
that's how many it gets next?

I'm just guessing from that code. :)


>
> William Lee Irwin III wrote:
>
>>>in the if (unlikely(!c)) case of the repeat_schedule: label where 2.6.x
>>>merely examines the priority when the task exhausts its timeslice to
>>>recompute it (and so the expired queue is a useful part of the
>>>mechanics of the algorithm: it's a placeholder for tasks whose
>>>timeslices have been recomputed but don't actually deserve to be run).
>>
>
> On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote:
>
>>Is this recomputation of timeslice based on the heuristics in the
>>interactive scheduler, or does vanilla also do something like this?
>>What does it do differently?
>
>
> The timeslices in 2.6.x are controlled indirectly by the dynamic
> priority. But the dynamic priorities in 2.6.x are assigned according to
> interactivity heuristics, in both vanilla and Kolivas' patches.

Indirectly... but what controls them directly?



Thanks.

2003-08-14 19:43:00

by Timothy Miller

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity



Con Kolivas wrote:
> On Thu, 14 Aug 2003 16:09, William Lee Irwin III wrote:
>
>>William Lee Irwin III wrote:
>
>
>>"scale" on which scheduling events should happen, and as tasks become
>>more cpu-bound, they have longer timeslices, so that two cpu-bound
>>tasks of identical priority will RR very slowly and have reduced
>>context switch overhead, but are near infinitely preemptible by more
>>interactive or short-running tasks.
>
>
> Actually the timeslice handed out is purely dependent on the static priority,
> not the priority it is elevated or demoted to by the interactivity estimator.
> However lower priority tasks (cpu bound ones if the estimator has worked
> correctly) will always be preempted by higher priority tasks (interactive
> ones) whenever they wake up.


Ok, so tasks at priority, say, 5 are all run before any tasks at
priority 6, but when a priority 6 task runs, it gets a longer timeslice?

How much longer?


2003-08-15 16:38:56

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity

On Fri, 15 Aug 2003 06:00, Timothy Miller wrote:
> If my guess from my previous email was correct (that is pri 5 gets
> shorter timeslide than pri 6), then that means that tasks of higher
> static priority have are penalized more than lower pri tasks for expiring.
>
> Say a task has to run for 15ms. If it's at a priority that gives it a
> 10ms timeslice, then it'll expire and get demoted. If it's at a
> priority that gives it a 20ms timeslice, then it'll not expire and
> therefore get promoted.
>
> Is that fair?

Yes, it's a simple cutoff at the end of the timeslice. If you use up the
timeslice allocated to you, then you have to pass a test to see if you can go
onto the active array or get expired. Since higher static priority (lower
nice) tasks get longer timeslices, they are less likely to expire unless they
are purely cpu bound and never sleep.

Con

2003-08-15 16:41:51

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity

On Fri, 15 Aug 2003 06:03, Timothy Miller wrote:
> Con Kolivas wrote:
> > Thus tasks that never sleep are always below the interactive delta
> > so each time they use up their timeslice they go onto the expired array.
> > Tasks with enough bonus points can go back onto the active array if they
> > haven't used up those bonus points.
>
> How does a bonus point translate to a priority level? How many bonus
> points can you collect?

That depends entirely on the algorithm used, and that's where my patches
differ from the main kernel tree. In the main kernel tree, you need to
accumulate about one second worth of sleep before being elevated one priority
(woefully long), and use up one second before dropping. Mine is non linear so
it's not a simple relationship.

Con

2003-08-15 16:31:41

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity

On Fri, 15 Aug 2003 05:57, Timothy Miller wrote:
> > Actually the timeslice handed out is purely dependent on the static
> > priority, not the priority it is elevated or demoted to by the
> > interactivity estimator. However lower priority tasks (cpu bound ones if
> > the estimator has worked correctly) will always be preempted by higher
> > priority tasks (interactive ones) whenever they wake up.
>
> Ok, so tasks at priority, say, 5 are all run before any tasks at
> priority 6, but when a priority 6 task runs, it gets a longer timeslice?

All "nice" 0 tasks get the same size timeslice. If their dynamic priority is
different (the PRI column in top) they still get the same timeslice.

> How much longer?

Different nice levels are about 5ms apart in size.

Con

2003-08-15 17:57:27

by Timothy Miller

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity



Con Kolivas wrote:
> On Fri, 15 Aug 2003 06:00, Timothy Miller wrote:
>
>>If my guess from my previous email was correct (that is pri 5 gets
>>shorter timeslide than pri 6), then that means that tasks of higher
>>static priority have are penalized more than lower pri tasks for expiring.
>>
>>Say a task has to run for 15ms. If it's at a priority that gives it a
>>10ms timeslice, then it'll expire and get demoted. If it's at a
>>priority that gives it a 20ms timeslice, then it'll not expire and
>>therefore get promoted.
>>
>>Is that fair?
>
>
> Yes, it's a simple cutoff at the end of the timeslice. If you use up the
> timeslice allocated to you, then you have to pass a test to see if you can go
> onto the active array or get expired. Since higher static priority (lower
> nice) tasks get longer timeslices, they are less likely to expire unless they
> are purely cpu bound and never sleep.


Ok, I'm just a little confused, because of this inversion of "high
priority" with "low numbers".

First, am I correct in understanding that a lower number means a higher
priority?

And for a higher priority, in addition to begin run before all tasks of
lower priority, they also get a longer timeslice?


2003-08-15 18:02:25

by Timothy Miller

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity



Con Kolivas wrote:

>On Fri, 15 Aug 2003 05:57, Timothy Miller wrote:
>
>
>>>Actually the timeslice handed out is purely dependent on the static
>>>priority, not the priority it is elevated or demoted to by the
>>>interactivity estimator. However lower priority tasks (cpu bound ones if
>>>the estimator has worked correctly) will always be preempted by higher
>>>priority tasks (interactive ones) whenever they wake up.
>>>
>>>
>>Ok, so tasks at priority, say, 5 are all run before any tasks at
>>priority 6, but when a priority 6 task runs, it gets a longer timeslice?
>>
>>
>
>All "nice" 0 tasks get the same size timeslice. If their dynamic priority is
>different (the PRI column in top) they still get the same timeslice.
>
>

Why isn't dynamic priority just an extension of static priority? Why do
you modify only the ordering while leaving the timeslice alone?


So, tell me if I infer this correctly: If you have a nice 5 and a nice
7, but the nice 5 is a cpu hog, while the nice 7 is interactive, then
the interactivity scheduler can modify their dynamic priorities so that
the nice 7 is being run before the nice 5. However, despite that, the
nice 7 still gets a shorter timeslice than tha nice 5.

Have you tried altering this?


2003-08-16 02:22:48

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity

On Sat, 16 Aug 2003 04:17, Timothy Miller wrote:
> >All "nice" 0 tasks get the same size timeslice. If their dynamic priority
> > is different (the PRI column in top) they still get the same timeslice.
>
> Why isn't dynamic priority just an extension of static priority? Why do
> you modify only the ordering while leaving the timeslice alone?

Because master engineer Molnar has determined that's the correct way.

> So, tell me if I infer this correctly: If you have a nice 5 and a nice
> 7, but the nice 5 is a cpu hog, while the nice 7 is interactive, then
> the interactivity scheduler can modify their dynamic priorities so that
> the nice 7 is being run before the nice 5. However, despite that, the
> nice 7 still gets a shorter timeslice than tha nice 5.
>
> Have you tried altering this?

Yes, not good with fluctuating timeslices all over the place makes for more
bounce in the algorithm, and the big problem - the cpu intensive applications
get demoted to smaller timeslices and they are the tasks that benefit the
most from larger timeslices (for effective cpu cache usage).

Con

2003-08-17 02:18:15

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity

On Fri, Aug 15, 2003 at 02:12:32PM -0400, Timothy Miller wrote:
> Ok, I'm just a little confused, because of this inversion of "high
> priority" with "low numbers".
> First, am I correct in understanding that a lower number means a higher
> priority?
> And for a higher priority, in addition to begin run before all tasks of
> lower priority, they also get a longer timeslice?

Yes on both counts.


-- wli

2003-08-17 18:00:48

by Mike Fedyk

[permalink] [raw]
Subject: Re: [PATCH] O12.2int for interactivity

On Fri, Aug 15, 2003 at 02:12:32PM -0400, Timothy Miller wrote:
> And for a higher priority, in addition to begin run before all tasks of
> lower priority, they also get a longer timeslice?

With priority there are two values. One that can be set from userspace (the
nice value), and one that is purely kept in the kernel (the priority value).

The interactivity estimating changes the priority value based on heuristics.

The nice value (by default zero), determines the range of priorities the
kernel will give to a task (based on its "interactivity rating"). And it is
this relativly static nice value that determines the time slice size also.