2003-08-19 02:54:42

by Eric St-Laurent

[permalink] [raw]
Subject: scheduler interactivity: timeslice calculation seem wrong

currently, nicer tasks (nice value toward -20) get larger timeslices,
and less nice tasks (nice value toward 19) get small timeslices.

this is contrary to all process scheduling theory i've read, and also
contrary to my intuition.

maybe it was done this way for fairness reasons, but that's another
story...

high priority (interactive) tasks should get small timeslices for best
interactive feeling, and low priority (cpu hog) tasks should get large
timeslices for best efficiency, anyway they can be preempted by higher
priority tasks if needed.

also, i think dynamic priority should be used for timeslice calculation
instead of static priority. the reason is, if a low priority task get a
priority boost (to prevent starvation, for example) it should use the
small timeslice corresponding to it's new priority level, instead of
using it's original large timeslice that can ruin the interactive feel.

something like this: (Sorry, not tested...)

PS: i've attached a small program to calculate and print the timeslices
length from code ripped from linux-2.6.0-test3



--- sched-orig.c Sat Aug 09 00:39:34 2003
+++ sched.c Mon Aug 18 10:52:22 2003
@@ -126,8 +126,8 @@
* task_timeslice() is the interface that is used by the scheduler.
*/

-#define BASE_TIMESLICE(p) (MIN_TIMESLICE + \
- ((MAX_TIMESLICE - MIN_TIMESLICE)
*(MAX_PRIO-1-(p)->static_prio)/(MAX_USER_PRIO - 1)))
+#define BASE_TIMESLICE(p) ((p)->prio < MAX_RT_PRIO ? MIN_TIMESLICE :
(MAX_TIMESLICE - \
+ ((MAX_TIMESLICE - MIN_TIMESLICE) *
(MAX_PRIO-1-(p)->prio)/(MAX_USER_PRIO - 1))))

static inline unsigned int task_timeslice(task_t *p)
{



Best regards,

Eric St-Laurent


Attachments:
diff.txt (541.00 B)
ts26.cpp (1.34 kB)
Download all attachments

2003-08-19 03:06:59

by Nick Piggin

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong



Eric St-Laurent wrote:

>currently, nicer tasks (nice value toward -20) get larger timeslices,
>and less nice tasks (nice value toward 19) get small timeslices.
>

Funny, isn't it!

>
>this is contrary to all process scheduling theory i've read, and also
>contrary to my intuition.
>

Yep.

>
>maybe it was done this way for fairness reasons, but that's another
>story...
>
>high priority (interactive) tasks should get small timeslices for best
>interactive feeling, and low priority (cpu hog) tasks should get large
>timeslices for best efficiency, anyway they can be preempted by higher
>priority tasks if needed.
>

Its done this way because this is really how the priorities are
enforced. With some complicated exceptions, every task will be
allowed to complete 1 timeslice before any task completes 2
(assuming they don't block).

So higher priority tasks need bigger timeslices.

>
>also, i think dynamic priority should be used for timeslice calculation
>instead of static priority. the reason is, if a low priority task get a
>priority boost (to prevent starvation, for example) it should use the
>small timeslice corresponding to it's new priority level, instead of
>using it's original large timeslice that can ruin the interactive feel.
>

Among other things, yes, I think this is a good idea too. I'll be
addressing both these issues in my scheduler fork.

I do have dynamic timeslices, but currently high priority tasks
still get big timeslices.


2003-08-19 04:07:22

by Eric St-Laurent

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong

> >this is contrary to all process scheduling theory i've read, and also
> >contrary to my intuition.
> Yep.

Yep? What this ack does mean? Books, papers and old unix ways are wrong?
or beyond this theory, practice shows otherwise?

> Its done this way because this is really how the priorities are
> enforced. With some complicated exceptions, every task will be
> allowed to complete 1 timeslice before any task completes 2
> (assuming they don't block).
>
> So higher priority tasks need bigger timeslices.

Frankly i don't get this part. Maybe i should study the code more, or
perhaps you have an illuminating explanation?

Anyway i always tought linux default timeslice of 100 ms is way too long
for desktop uses. Starting with this in mind, i think that a 10 ms
timeslice should bring back good interactive feel, and by using longer
timeslices for (lower prio) cpu-bound processes, we can save some costly
context switches.

Unfortunatly i'm unable to test those ideas right now but i share them,
maybe it can help other's work.

- (previously mentionned) higher prio tasks should use small timeslices
and lower prio tasks, longer ones. i think, maybe falsely, that this can
lower context switch rate for cpu-bound tasks. by using up to 200 ms
slices instead of 10 ms...

- (previously mentionned) use dynamic priority to calculate timeslice
length.

- maybe adjust the max timeslice length depending on how many tasks are
running/ready.

- timeslices in cycles, instead of ticks, to scale with cpu_khz. maybe
account for the cpu caches size to decide the larger timeslice length.

- a /proc tunable might be needed to let the user choose the
interactivity vs efficiency compromise. for example, with 100 background
tasks, does the user want the most efficiency or prefer wasting some
cycles to get a smoother progress between jobs?

- nanoseconds, or better, cycle accurate (rdtsc) timeslice accouting, it
may help heuristics.

- maybe add some instrumentations, like keeping static and dynamic
priorities, task_struct internal variables and other relevant data for
all processes with each scheduling decisions, that a userspace program
can capture and analyse, to better fine-tune the heuristics.

- lastly, it may be usefull to better encapsulate the scheduler to ease
adding alternative scheduler, much like the i/o schedulers work...


Eric St-Laurent

2003-08-19 04:06:22

by Con Kolivas

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong

On Tue, 19 Aug 2003 12:54, Eric St-Laurent wrote:
> currently, nicer tasks (nice value toward -20) get larger timeslices,
> and less nice tasks (nice value toward 19) get small timeslices.

You mean this the other way round, no? +nice means more nice.

For the most part, most tasks start at nice 0 so they pretty much all get the
same size timslices unless they get preempted. The rest of the discussion
you can debate to the end of the earth, but application counts once you've
implemented theory. Changing it up and down by dynamic priority one way and
then the other wasn't helpful when I've tried it previously.

Con

2003-08-19 04:23:53

by Eric St-Laurent

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong

> You mean this the other way round, no? +nice means more nice.

sure you're right. and i know that timeslices get asssigned based on
static priority (which is nice value rescaled).

> For the most part, most tasks start at nice 0 so they pretty much all get the
> same size timslices unless they get preempted. The rest of the discussion

i've read that tasks should start at higher dynamic priority with a
small timeslice (a priority boost for a starting task) then immediatly
drop to a lower priority if it use all it's timeslice.

> implemented theory. Changing it up and down by dynamic priority one way and
> then the other wasn't helpful when I've tried it previously.

maybe it's because the timeslice calculation is reversed?


2003-08-19 04:29:58

by Con Kolivas

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong

Quoting Eric St-Laurent <[email protected]>:

> i've read that tasks should start at higher dynamic priority with a
> small timeslice (a priority boost for a starting task) then immediatly
> drop to a lower priority if it use all it's timeslice.

There's a scheduler implementation dating pre 1970 that does this and I am led
to believe someone is working on an implementation for perhaps 2.7

>
> > implemented theory. Changing it up and down by dynamic priority one way and
>
> > then the other wasn't helpful when I've tried it previously.
>
> maybe it's because the timeslice calculation is reversed?

In my email I said I tried it both ways.

Con

2003-08-19 05:06:06

by Eric St-Laurent

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong


> There's a scheduler implementation dating pre 1970 that does this and I am led
> to believe someone is working on an implementation for perhaps 2.7

The first implementation is in 1962 with CTSS if i remember correctly.
Multics initially had something like that too.

http://www.multicians.org/mult-sched.html

Anyway that's pretty standard CS stuff. Multi-level Queues with
feedback, exponentially longer timeslices with lower priority.

I was reading this recently, that's why i wondered why linux calculate
timeslice "inversed" versus what is proposed in theory.



2003-08-19 05:30:18

by Nick Piggin

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong

Eric St-Laurent wrote:

>>>this is contrary to all process scheduling theory i've read, and also
>>>contrary to my intuition.
>>>
>>Yep.
>>
>
>Yep? What this ack does mean? Books, papers and old unix ways are wrong?
>or beyond this theory, practice shows otherwise?
>

No, it means the nice / timeslice thing _is_ contrary to scheduling theory.
The aspect usually given by books is most definitely correct - a bigger
timeslice means better efficiency, and grows in importance as caches get
bigger and memory gets slower, etc.

>
>>Its done this way because this is really how the priorities are
>>enforced. With some complicated exceptions, every task will be
>>allowed to complete 1 timeslice before any task completes 2
>>(assuming they don't block).
>>
>>So higher priority tasks need bigger timeslices.
>>
>
>Frankly i don't get this part. Maybe i should study the code more, or
>perhaps you have an illuminating explanation?
>

OK, this is an implementation issue, and we _are_ talking about the 2.5
scheduler, right?

Basically what happens is: all runnable processes are assigned a priority
and a timeslice. Processes are run in order of prioritty, and are allowed
to run their full timeslice. When a process finishes its timeslice, it is
put onto an "expired" queue, and the next process with the highest prio
is run. When no processes are left with a timeslice, all those on the
expired queue are given new timeslices.

So, if you give low priority processes bigger timeslices than high prio
ones, the low priority processes will be allowed to consume more CPU than
high. Obviously not the intended result. I agree with your intended result,
but it is a bit difficult to implement in the scheduler at present.

I'm working on it ;)

>
>Anyway i always tought linux default timeslice of 100 ms is way too long
>for desktop uses. Starting with this in mind, i think that a 10 ms
>timeslice should bring back good interactive feel, and by using longer
>timeslices for (lower prio) cpu-bound processes, we can save some costly
>context switches.
>

I agree completely.

>
>Unfortunatly i'm unable to test those ideas right now but i share them,
>maybe it can help other's work.
>
>- (previously mentionned) higher prio tasks should use small timeslices
>and lower prio tasks, longer ones. i think, maybe falsely, that this can
>lower context switch rate for cpu-bound tasks. by using up to 200 ms
>slices instead of 10 ms...
>
>- (previously mentionned) use dynamic priority to calculate timeslice
>length.
>
>- maybe adjust the max timeslice length depending on how many tasks are
>running/ready.
>

I agree with your previous two points. Not this one. I think it is very
easy to get bad feedback loops and difficult to ensure it doesn't break
down under load when doing stuff like this. I might be wrong though.

>
>- timeslices in cycles, instead of ticks, to scale with cpu_khz. maybe
>account for the cpu caches size to decide the larger timeslice length.
>

I don't think you need that much grainularity. Might be a benefit though.

I don't think its a wise idea to get too smart with the hardware properties.
Just keep the defaults good for current PC sort of hardware, and add
tunables for embedded / server hardware. Unless you can show a really big
problem / improvement of course.

>
>- a /proc tunable might be needed to let the user choose the
>interactivity vs efficiency compromise. for example, with 100 background
>tasks, does the user want the most efficiency or prefer wasting some
>cycles to get a smoother progress between jobs?
>
>- nanoseconds, or better, cycle accurate (rdtsc) timeslice accouting, it
>may help heuristics.
>
>- maybe add some instrumentations, like keeping static and dynamic
>priorities, task_struct internal variables and other relevant data for
>all processes with each scheduling decisions, that a userspace program
>can capture and analyse, to better fine-tune the heuristics.
>

yep yep yep

>
>- lastly, it may be usefull to better encapsulate the scheduler to ease
>adding alternative scheduler, much like the i/o schedulers work...
>

I hope not


2003-08-19 06:17:21

by William Lee Irwin III

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong

At some point in the past, someone wrote:
>> There's a scheduler implementation dating pre 1970 that does this
>> and I am led to believe someone is working on an implementation for
>> perhaps 2.7

On Tue, Aug 19, 2003 at 01:06:00AM -0400, Eric St-Laurent wrote:
> The first implementation is in 1962 with CTSS if i remember correctly.
> Multics initially had something like that too.
> http://www.multicians.org/mult-sched.html
> Anyway that's pretty standard CS stuff. Multi-level Queues with
> feedback, exponentially longer timeslices with lower priority.
> I was reading this recently, that's why i wondered why linux calculate
> timeslice "inversed" versus what is proposed in theory.

None of the scheduling policies described there are multi-level queue
-based policies. The bottom-level scheduler was FB until ca. 1976,
when a deadline-based scheduling policy was adopted.

Multilevel queue -based policies segregate tasks into service levels
by the amount of runtime accumulated without blocking (i.e. continual
spinning interrupted only by the scheduler yanking the thing off the
cpu to run something else because it's run too long) and do some other
kind of scheduling within the levels, typically RR with quanta
determined by the level. Various hacks to elevate or lower the service
level with different kinds of I/O or device access were typically used
as interactivity heuristics and were used in the 1967-1975 Multics FB
scheduling policies as described in that webpage.

The deadline scheduler is very different in its mechanics and very
flexible; it's almost impossible to say anything about it without
knowing how the deadlines were assigned. FB is actually very well-
understood, but trivially DoS'able in real-life environments, as
looping coprocesses will monopolize the cpu. The fact device access
and other similar heuristics were required to make it behave well
in interactive environments should be a big hint that the FB design
is fundamentally inadequate.

One obviously nice thing about deadline scheduling is that it's trivial
to prevent starvation: merely assign nonzero offsets from the present
to all tasks, and the ``finger'' pointing at the current task to run
is guaranteed to advance and so cover every queued task. In essence,
deadline scheduling controls how often tasks get to run directly as
opposed to remaining at the mercy of wakeups.

The description of the Multics algorithm implies that at each wakeup
one of two quanta was assigned to the task when it got scheduled, that
one of two workclass-specific virtual deadlines was assigned, and that
the only attempt to account for service time was a distinction between
the first time the task was run and all other times, where the first
time it was run got quantum Q1 and deadline R1 and the rest quantum Q2
and deadline R2, where presumably Q1 < Q2 and R1 < R2. This obviously
does not fully exploit the expressiveness of the deadline design as
one can easily conceive of service time and priority (nice number)
dependent deadline/quanta assignments that make various kinds of sense.


-- wli

2003-08-19 06:54:10

by Eric St-Laurent

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong

> So, if you give low priority processes bigger timeslices than high prio
> ones, the low priority processes will be allowed to consume more CPU than
> high. Obviously not the intended result. I agree with your intended result,

Thanks for the crystal clear explanation.

that 'switched-arrays timeslice distribution' is good for fairness but
maybe it add unwanted scheduling latency to a high priority task that
sit with it's timeslice expired...

i know it's more a real-time os thing, but i always liked the concept of
pure priority scheduling with priority boost (calculated from aging) to
prevent starvation. in a multi-level feedback queue scheduler, a
processor share percentile could be assigned to each priority level.
anyway i'm sure there is some proven fair-share scheduling algos out
there that's better than this old stuff.

> I don't think you need that much grainularity. Might be a benefit though.

personally, i not a fan of the jiffies/tick concept; conversions, lost
ticks problems, drifts, sub-tick high-res-posix-timers etc. everything
should use the highest resolution timer/counter in the system (TSC, ACPI
PM counter, ...) directly. it's a major cleanup and many old PCs don't
have the newer timers.

> >- lastly, it may be usefull to better encapsulate the scheduler to ease
> >adding alternative scheduler, much like the i/o schedulers work...

Well, i was looking at TimeSys scheduler, trying something like that in
2.6 requires modifications to many files and it's a PITA to maintain a
diff with frequents kernel releases. having a structure in place to
plug-in other schedulers sure helps.

Eric St-Laurent


2003-08-19 18:04:29

by Mike Fedyk

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong

On Tue, Aug 19, 2003 at 01:06:49PM +1000, Nick Piggin wrote:
> Its done this way because this is really how the priorities are
> enforced. With some complicated exceptions, every task will be
> allowed to complete 1 timeslice before any task completes 2
> (assuming they don't block).
>
> So higher priority tasks need bigger timeslices.
>
> >
> >also, i think dynamic priority should be used for timeslice calculation
> >instead of static priority. the reason is, if a low priority task get a
> >priority boost (to prevent starvation, for example) it should use the
> >small timeslice corresponding to it's new priority level, instead of
> >using it's original large timeslice that can ruin the interactive feel.
> >
>
> Among other things, yes, I think this is a good idea too. I'll be
> addressing both these issues in my scheduler fork.
>
> I do have dynamic timeslices, but currently high priority tasks
> still get big timeslices.

TS = Time Slice

What needs to be changed is the 1TS per pass through the active array
concept.

Devide the time slice into smaller Time Units, so that you can add one unit
per priority level.

TU = Time Units

Then you account these TUs instead of slices.

so, if nice -19 has 1 TU, and nice -19 has 40 TUs (maybe ranging from 1ms -
200ms with a TU of 5ms).

So nice -19 can have a long time slice and run until it expires if it
doesn't get preempted.

The more I think this through, the harder it gets to take this concept to
completion, but the basic idea is to have multiple TSes per slice, and to
account on TSes as well as slices. That way, you have longer slices for
nicer tasks, but I'm not sure how it would fit into the two array scheduler
we have now. You'd have to have another list for the processes that are
have used up their slice, but haven't waited long enough for them to get
another slice (because you want to give more total CPU percentage to the
higher priorities, while still giving them smaller slices).

Anyway, if anyone can take this idea and make it into a working scheduler,
I'd be most interested in the results...

2003-08-19 19:06:42

by Mike Fedyk

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong

On Tue, Aug 19, 2003 at 12:07:13AM -0400, Eric St-Laurent wrote:
> - a /proc tunable might be needed to let the user choose the
> interactivity vs efficiency compromise. for example, with 100 background
> tasks, does the user want the most efficiency or prefer wasting some
> cycles to get a smoother progress between jobs?
>

No, this should be avoided. Then you get a bunch of web pages talking about
how this tunable should be set this way, and the problem cases will not be
reported to the people who can fix them. This is good.

> - nanoseconds, or better, cycle accurate (rdtsc) timeslice accouting, it
> may help heuristics.
>

Already done. See the scheduler patch Ingo submitted a few weeks ago.

2003-08-19 19:28:18

by Bill Davidsen

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong

In article <1061276043.6974.33.camel@orbiter>,
Eric St-Laurent <[email protected]> wrote:

| Well, i was looking at TimeSys scheduler, trying something like that in
| 2.6 requires modifications to many files and it's a PITA to maintain a
| diff with frequents kernel releases. having a structure in place to
| plug-in other schedulers sure helps.

I agree. In fact I'm pretty sure I said something similar a while ago.
Unlike you I didn't do any major changes, certainly none I felt were of
general interest.

This could go in 2.7, though, or possibly in 2.6.x depending on how the
powers that be feel. I think having the scheduler as a plugin is a win
in terms of having whole special-use algorithms. It would have to be
done *very* carefully to be sure it didn't add measurable overhead.
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2003-08-19 20:41:55

by Bill Davidsen

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong

In article <[email protected]>,
Nick Piggin <[email protected]> wrote:
| Eric St-Laurent wrote:

| >
| >Anyway i always tought linux default timeslice of 100 ms is way too long
| >for desktop uses. Starting with this in mind, i think that a 10 ms
| >timeslice should bring back good interactive feel, and by using longer
| >timeslices for (lower prio) cpu-bound processes, we can save some costly
| >context switches.
| >
|
| I agree completely.
|
| >
| >Unfortunatly i'm unable to test those ideas right now but i share them,
| >maybe it can help other's work.
| >
| >- (previously mentionned) higher prio tasks should use small timeslices
| >and lower prio tasks, longer ones. i think, maybe falsely, that this can
| >lower context switch rate for cpu-bound tasks. by using up to 200 ms
| >slices instead of 10 ms...
| >
| >- (previously mentionned) use dynamic priority to calculate timeslice
| >length.
| >
| >- maybe adjust the max timeslice length depending on how many tasks are
| >running/ready.
| >
|
| I agree with your previous two points. Not this one. I think it is very
| easy to get bad feedback loops and difficult to ensure it doesn't break
| down under load when doing stuff like this. I might be wrong though.

I have to agree with Eric that sizing the max timeslices to fit the
hardware seems like a reasonable thing to do. I have little salvaged
systems running (well strolling would be more accurate) Linux on old
Pentium Classic 90-166MHz machines. I also have 3+ GHz SMP machines. I
have a gut feeling that the timeslices need to be longer on the slow
machines to allow them to get something done, that the SMP machines
will perform best with a different timeslice than UP of the same speed,
etc.

I think you also want a user tunable for throughput vs. interactive, so
you know what you're trying to do best.

Perhaps some counting of wait time between dispatches would be useful,
and drop the timeslices to keep that limited. Sort of a "deadline CPU
scheduler" with limiting bounds. That might at least give some hope of
compensating for CPU speed changes after boot.
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2003-08-19 23:54:46

by Eric St-Laurent

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong

> This could go in 2.7, though, or possibly in 2.6.x depending on how the
> powers that be feel. I think having the scheduler as a plugin is a win
> in terms of having whole special-use algorithms. It would have to be
> done *very* carefully to be sure it didn't add measurable overhead.

Well i was thinking of a compile time or at best boot time selection. It
should not add any mesurable overhead.


2003-08-19 23:48:46

by Eric St-Laurent

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong

> | diff with frequents kernel releases. having a structure in place to
> | plug-in other schedulers sure helps.
>
> I agree. In fact I'm pretty sure I said something similar a while ago.
> Unlike you I didn't do any major changes, certainly none I felt were of

Of course, i was thinking

2003-08-20 00:15:53

by Eric St-Laurent

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong

> I have to agree with Eric that sizing the max timeslices to fit the
> hardware seems like a reasonable thing to do. I have little salvaged
> systems running (well strolling would be more accurate) Linux on old
> Pentium Classic 90-166MHz machines. I also have 3+ GHz SMP machines. I
> have a gut feeling that the timeslices need to be longer on the slow
> machines to allow them to get something done, that the SMP machines
> will perform best with a different timeslice than UP of the same speed,

scaling the timeslice with cpu_khz is a start. already there the
smp_tune_scheduling() function that tune the load balancing based on cpu
speed and cache size.

the problem is that the tick timer (1000 hz) is pretty limited
resolution in relation to cpu clock speed and most HZ-related
calculations are correct within a limited range. that's why i was
talking about cycles or nanoseconds resolution all the way.

with accurate accouting we could bench the context switch time, on boot,
and adjust timeslices based on this.. like something a 10000:1 ratio.

> I think you also want a user tunable for throughput vs. interactive, so
> you know what you're trying to do best.

the kernel should have sane defauts but the user should be able to fine
tune them. because it depends on the "intention" efficient vs
interactivity. it's a compromise and it's not the same for server that
desktop. middleground works but it's not the best for either.

I've read a paper someday that measured the scheduler overhead about
0.07% for HZ=100 and 0.7% for HZ=1000. personnally i would sacrifice a
few percent of my cpu time to have a silky-smooth interactive feel.

It's bizarre that my 2 GHz P4 feel slower than my old Amiga with 33Mhz.
Throughput is greater but latency far worst.

Eric St-Laurent


2003-08-20 00:34:30

by David Lang

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong

while thinking about scaling based on CPU speed remember systems with
variable CPU clocks (or even just variable performance like the transmeta
CPU's)

David Lang

On Tue, 19 Aug 2003, Eric St-Laurent wrote:

> Date: Tue, 19 Aug 2003 20:15:48 -0400
> From: Eric St-Laurent <[email protected]>
> To: bill davidsen <[email protected]>
> Cc: [email protected]
> Subject: Re: scheduler interactivity: timeslice calculation seem wrong
>
> > I have to agree with Eric that sizing the max timeslices to fit the
> > hardware seems like a reasonable thing to do. I have little salvaged
> > systems running (well strolling would be more accurate) Linux on old
> > Pentium Classic 90-166MHz machines. I also have 3+ GHz SMP machines. I
> > have a gut feeling that the timeslices need to be longer on the slow
> > machines to allow them to get something done, that the SMP machines
> > will perform best with a different timeslice than UP of the same speed,
>
> scaling the timeslice with cpu_khz is a start. already there the
> smp_tune_scheduling() function that tune the load balancing based on cpu
> speed and cache size.
>
> the problem is that the tick timer (1000 hz) is pretty limited
> resolution in relation to cpu clock speed and most HZ-related
> calculations are correct within a limited range. that's why i was
> talking about cycles or nanoseconds resolution all the way.
>
> with accurate accouting we could bench the context switch time, on boot,
> and adjust timeslices based on this.. like something a 10000:1 ratio.
>
> > I think you also want a user tunable for throughput vs. interactive, so
> > you know what you're trying to do best.
>
> the kernel should have sane defauts but the user should be able to fine
> tune them. because it depends on the "intention" efficient vs
> interactivity. it's a compromise and it's not the same for server that
> desktop. middleground works but it's not the best for either.
>
> I've read a paper someday that measured the scheduler overhead about
> 0.07% for HZ=100 and 0.7% for HZ=1000. personnally i would sacrifice a
> few percent of my cpu time to have a silky-smooth interactive feel.
>
> It's bizarre that my 2 GHz P4 feel slower than my old Amiga with 33Mhz.
> Throughput is greater but latency far worst.
>
> Eric St-Laurent
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2003-08-20 00:47:42

by William Lee Irwin III

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong

On Tue, Aug 19, 2003 at 05:32:04PM -0700, David Lang wrote:
> while thinking about scaling based on CPU speed remember systems with
> variable CPU clocks (or even just variable performance like the transmeta
> CPU's)

This and/or mixed cpu speeds could make load balancing interesting on
SMP. I wonder who's tried. jejb?


-- wli

2003-08-20 02:42:10

by Nick Piggin

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong



Mike Fedyk wrote:

>On Tue, Aug 19, 2003 at 01:06:49PM +1000, Nick Piggin wrote:
>
>>Its done this way because this is really how the priorities are
>>enforced. With some complicated exceptions, every task will be
>>allowed to complete 1 timeslice before any task completes 2
>>(assuming they don't block).
>>
>>So higher priority tasks need bigger timeslices.
>>
>>
>>>also, i think dynamic priority should be used for timeslice calculation
>>>instead of static priority. the reason is, if a low priority task get a
>>>priority boost (to prevent starvation, for example) it should use the
>>>small timeslice corresponding to it's new priority level, instead of
>>>using it's original large timeslice that can ruin the interactive feel.
>>>
>>>
>>Among other things, yes, I think this is a good idea too. I'll be
>>addressing both these issues in my scheduler fork.
>>
>>I do have dynamic timeslices, but currently high priority tasks
>>still get big timeslices.
>>
>
>TS = Time Slice
>
>What needs to be changed is the 1TS per pass through the active array
>concept.
>
>Devide the time slice into smaller Time Units, so that you can add one unit
>per priority level.
>
>TU = Time Units
>
>Then you account these TUs instead of slices.
>
>so, if nice -19 has 1 TU, and nice -19 has 40 TUs (maybe ranging from 1ms -
>200ms with a TU of 5ms).
>
>So nice -19 can have a long time slice and run until it expires if it
>doesn't get preempted.
>
>The more I think this through, the harder it gets to take this concept to
>completion, but the basic idea is to have multiple TSes per slice, and to
>account on TSes as well as slices. That way, you have longer slices for
>nicer tasks, but I'm not sure how it would fit into the two array scheduler
>we have now. You'd have to have another list for the processes that are
>have used up their slice, but haven't waited long enough for them to get
>another slice (because you want to give more total CPU percentage to the
>higher priorities, while still giving them smaller slices).
>
>Anyway, if anyone can take this idea and make it into a working scheduler,
>I'd be most interested in the results...
>

My idea is just to modify timeslices. It should achieve a similar
effect to what you describe I think.


2003-08-20 02:52:58

by Nick Piggin

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong



bill davidsen wrote:

>In article <[email protected]>,
>Nick Piggin <[email protected]> wrote:
>| Eric St-Laurent wrote:
>
>| >
>| >Anyway i always tought linux default timeslice of 100 ms is way too long
>| >for desktop uses. Starting with this in mind, i think that a 10 ms
>| >timeslice should bring back good interactive feel, and by using longer
>| >timeslices for (lower prio) cpu-bound processes, we can save some costly
>| >context switches.
>| >
>|
>| I agree completely.
>|
>| >
>| >Unfortunatly i'm unable to test those ideas right now but i share them,
>| >maybe it can help other's work.
>| >
>| >- (previously mentionned) higher prio tasks should use small timeslices
>| >and lower prio tasks, longer ones. i think, maybe falsely, that this can
>| >lower context switch rate for cpu-bound tasks. by using up to 200 ms
>| >slices instead of 10 ms...
>| >
>| >- (previously mentionned) use dynamic priority to calculate timeslice
>| >length.
>| >
>| >- maybe adjust the max timeslice length depending on how many tasks are
>| >running/ready.
>| >
>|
>| I agree with your previous two points. Not this one. I think it is very
>| easy to get bad feedback loops and difficult to ensure it doesn't break
>| down under load when doing stuff like this. I might be wrong though.
>
>I have to agree with Eric that sizing the max timeslices to fit the
>hardware seems like a reasonable thing to do. I have little salvaged
>systems running (well strolling would be more accurate) Linux on old
>Pentium Classic 90-166MHz machines. I also have 3+ GHz SMP machines. I
>have a gut feeling that the timeslices need to be longer on the slow
>machines to allow them to get something done, that the SMP machines
>will perform best with a different timeslice than UP of the same speed,
>etc.
>

Well its just so hard to get right. If someone could demonstrate a
big performance improvement, then the scheduler might need fixing
anyway.

For example, on your machines, its very likely that your new machines
would benefit more from big timeslices than the old ones due to more
cache, quicker cpu to memory, multiple cpus...


2003-08-20 04:20:25

by Bill Davidsen

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong

On Tue, 19 Aug 2003, William Lee Irwin III wrote:

> On Tue, Aug 19, 2003 at 05:32:04PM -0700, David Lang wrote:
> > while thinking about scaling based on CPU speed remember systems with
> > variable CPU clocks (or even just variable performance like the transmeta
> > CPU's)
>
> This and/or mixed cpu speeds could make load balancing interesting on
> SMP. I wonder who's tried. jejb?

Hum, I *guess* that if you are using some "mean time between dispatches"
to tune time slice you could apply a CPU speed correction, but mixed speed
SMP is too corner a case for me. I think if you were tuning time slice by
mean time between dispatches (or similar) you could either apply a
correction, set affinity low to keep jobs changing CPUs, or just ignore
it.

The thing I like about the idea is that if the CPU speed changes the MTBD
will change and the timeslice will compensate. You could use median MTBD,
or pick some percentile to tune for response or throughput.

I thought I was just thinking out loud, but it does sound interesting to
try, since it would not prevent using some priorities as well.

>
>
> -- wli
>

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2003-08-20 04:35:12

by William Lee Irwin III

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong

On Tue, 19 Aug 2003, William Lee Irwin III wrote:
>> This and/or mixed cpu speeds could make load balancing interesting on
>> SMP. I wonder who's tried. jejb?

On Wed, Aug 20, 2003 at 12:11:26AM -0400, Bill Davidsen wrote:
> Hum, I *guess* that if you are using some "mean time between dispatches"
> to tune time slice you could apply a CPU speed correction, but mixed speed
> SMP is too corner a case for me. I think if you were tuning time slice by
> mean time between dispatches (or similar) you could either apply a
> correction, set affinity low to keep jobs changing CPUs, or just ignore
> it.

Not corner case at all. It's very typical with incrementally upgradeable
hardware (it would be very nice if commodity hardware were so, as it's
very wasteful to have to throw out preexisting hardware just to upgrade).
Conceptually what has to be done is very simple: pressure on cpus needs
to be weighted by cpu speed. The question is about specifics, not concepts.

I've even seen a system "in the field" (basically operating in a server
capacity as opposed to being a kernel hacking vehicle) with cpu speeds
ranging from 180MHz to 900MHz, with about 3 or 4 points in between.


On Wed, Aug 20, 2003 at 12:11:26AM -0400, Bill Davidsen wrote:
> The thing I like about the idea is that if the CPU speed changes the MTBD
> will change and the timeslice will compensate. You could use median MTBD,
> or pick some percentile to tune for response or throughput.
> I thought I was just thinking out loud, but it does sound interesting to
> try, since it would not prevent using some priorities as well.

Conceptually this is simple, too: take some tuning method based on cpu
speed and periodically (or possibly in an event-driven fashion) re-tune.
Again, this question's about the specifics, not the concept.


-- wli

2003-08-20 14:00:25

by Andrew Theurer

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong

On Tuesday 19 August 2003 23:11, Bill Davidsen wrote:
> On Tue, 19 Aug 2003, William Lee Irwin III wrote:
> > On Tue, Aug 19, 2003 at 05:32:04PM -0700, David Lang wrote:
> > > while thinking about scaling based on CPU speed remember systems with
> > > variable CPU clocks (or even just variable performance like the
> > > transmeta CPU's)
> >
> > This and/or mixed cpu speeds could make load balancing interesting on
> > SMP. I wonder who's tried. jejb?
>
> Hum, I *guess* that if you are using some "mean time between dispatches"
> to tune time slice you could apply a CPU speed correction, but mixed speed
> SMP is too corner a case for me. I think if you were tuning time slice by
> mean time between dispatches (or similar) you could either apply a
> correction, set affinity low to keep jobs changing CPUs, or just ignore
> it.

One could continue this thinking (more load_balance corrections than
timeslice, IMO) on to SMT processors, where the throughput of a sibling is
highly dependent on what the other siblings are doing in the same core. For
example, in a dual proc system, the first physical cpu with one task will run
much faster than the second cpu with 2 tasks. Actually, using a shared
runqueue would probably fix this (something we still don't have in 2.6-test).

But one other thing, and maybe this has been brought up before (sorry, I have
not been following all the discussions), but why are we not altering
timeslice based on the runqueue length for that processor? Would it not make
sense, for the sake of good interactivity, to lower all the timeslices when
we have a longer runqueue?

-Andrew Theurer

2003-08-20 16:28:52

by Bill Davidsen

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong

On Wed, 20 Aug 2003, Andrew Theurer wrote:

> On Tuesday 19 August 2003 23:11, Bill Davidsen wrote:

> > Hum, I *guess* that if you are using some "mean time between dispatches"
> > to tune time slice you could apply a CPU speed correction, but mixed speed
> > SMP is too corner a case for me. I think if you were tuning time slice by
> > mean time between dispatches (or similar) you could either apply a
> > correction, set affinity low to keep jobs changing CPUs, or just ignore
> > it.
>
> One could continue this thinking (more load_balance corrections than
> timeslice, IMO) on to SMT processors, where the throughput of a sibling is
> highly dependent on what the other siblings are doing in the same core. For
> example, in a dual proc system, the first physical cpu with one task will run
> much faster than the second cpu with 2 tasks. Actually, using a shared
> runqueue would probably fix this (something we still don't have in 2.6-test).
>
> But one other thing, and maybe this has been brought up before (sorry, I have
> not been following all the discussions), but why are we not altering
> timeslice based on the runqueue length for that processor? Would it not make
> sense, for the sake of good interactivity, to lower all the timeslices when
> we have a longer runqueue?

The length of the runqueue isn't really the issue, if you have a lot of
interractive processes they may only run a ms or less each. My suggestion
is that the time between dispatches (time spent waiting on the runqueue)
was a better indicator, and that timeslices could be sized based on how
long a process waited for a CPU. And ordering would also be subject to
priority effects, of course.

My thought is that this would tune the system based on current overall
behaviour, and also adjust for CPU clock speed changes, not as rare as
they once were.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2003-08-20 18:45:51

by Mike Fedyk

[permalink] [raw]
Subject: Re: scheduler interactivity: timeslice calculation seem wrong

On Wed, Aug 20, 2003 at 12:41:47PM +1000, Nick Piggin wrote:
> My idea is just to modify timeslices. It should achieve a similar
> effect to what you describe I think.

And how do you have one time slice per array switch (what's the term for
that?) and larger slices for lower nice levels, how does that work?