2004-03-15 22:44:35

by Kurt Garloff

[permalink] [raw]
Subject: dynamic sched timeslices

Hi,

attached patch allows userspace to tune the scheduling timeslices.
It can be used for a couple of things:
* Tune a workload for batch processing:
You'd probably wnat to use long timeslices in order to not reschedule
as often to make good use of your CPU caches
* Tune a workload for interactive use:
Under load, you may want to reduce the scedulilng latencies by using
shorter timeslices (and there are situations where the interactiviy
tweak -- even if they were perfect -- can't save you).
* Tune the ration betweeen maximum and minimum timeslices to make
nice much nicer e.g.

The patch exports /proc/sys/kernel/max_timeslice and min_timeslice,
unites are us. It also exports HZ (readonly).
The patch implementes the desktop boot parameter which introduces
shorter timeslices.

Patch is from andrea and is in our 2.4 tree; 2.6 port was done by me and
straightforward.

Regards,
--
Kurt Garloff <[email protected]> Cologne, DE
SUSE LINUX AG, Nuernberg, DE SUSE Labs (Head)


Attachments:
(No filename) (0.00 B)
(No filename) (189.00 B)
Download all attachments

2004-03-15 23:01:24

by Christoph Hellwig

[permalink] [raw]
Subject: Re: dynamic sched timeslices

On Mon, Mar 15, 2004 at 11:42:01PM +0100, Kurt Garloff wrote:
> Hi,
>
> attached patch allows userspace to tune the scheduling timeslices.
> It can be used for a couple of things:
> * Tune a workload for batch processing:
> You'd probably wnat to use long timeslices in order to not reschedule
> as often to make good use of your CPU caches
> * Tune a workload for interactive use:
> Under load, you may want to reduce the scedulilng latencies by using
> shorter timeslices (and there are situations where the interactiviy
> tweak -- even if they were perfect -- can't save you).
> * Tune the ration betweeen maximum and minimum timeslices to make
> nice much nicer e.g.
>
> The patch exports /proc/sys/kernel/max_timeslice and min_timeslice,
> unites are us. It also exports HZ (readonly).
> The patch implementes the desktop boot parameter which introduces
> shorter timeslices.
>
> Patch is from andrea and is in our 2.4 tree; 2.6 port was done by me and
> straightforward.

Remove the silly desktop boot parameter and the patch looks basically
okay to me.

I remember we had a more complete patch to allow tuning the scheduler
through sysctls in -mm once, though. Questions is why that one wasn't
merged and if the same reasons apply to a 'light' version.

2004-03-15 23:10:29

by Kurt Garloff

[permalink] [raw]
Subject: Re: dynamic sched timeslices

Hi Christoph,

quick response!
On Mon, Mar 15, 2004 at 10:59:39PM +0000, Christoph Hellwig wrote:
> Remove the silly desktop boot parameter and the patch looks basically
> okay to me.

See attachment.

> I remember we had a more complete patch to allow tuning the scheduler
> through sysctls in -mm once, though. Questions is why that one wasn't
> merged and if the same reasons apply to a 'light' version.

Hmm, I fail to remember unfortunately. Probably it had too many knobs.
Andrew?

Regards,
--
Kurt Garloff <[email protected]> Cologne, DE
SUSE LINUX AG, Nuernberg, DE SUSE Labs (Head)


Attachments:
(No filename) (0.00 B)
(No filename) (189.00 B)
Download all attachments

2004-03-15 23:38:48

by Andrew Morton

[permalink] [raw]
Subject: Re: dynamic sched timeslices

Kurt Garloff <[email protected]> wrote:
>
> > I remember we had a more complete patch to allow tuning the scheduler
> > through sysctls in -mm once, though. Questions is why that one wasn't
> > merged and if the same reasons apply to a 'light' version.
>
> Hmm, I fail to remember unfortunately. Probably it had too many knobs.
> Andrew?

It had a zillion knobs, and was mainly for developers.

Your patch didn't come with any subjective or measured testing results. In
theory, the scheduler should magically tune itself to the current workload.
If your patch is indeed necessary then this may point at a bug in the
current CPU scheduler.

Please tell us more...

2004-03-16 11:36:23

by Kurt Garloff

[permalink] [raw]
Subject: Re: dynamic sched timeslices

Hi Andrew,

On Mon, Mar 15, 2004 at 03:40:42PM -0800, Andrew Morton wrote:
> Kurt Garloff <[email protected]> wrote:
> Your patch didn't come with any subjective or measured testing results.

We've done some measurements with 2.4 and O(1):
* HZ=1000 cost about 1.5% perf. on a kernel compile (plus problems
with lost timer ticks)
* Seting the scheduling timeslices from 1ms--30ms rather than 10ms thr
300ms cost another ~3% kernel compile performance
* Depending on the workload the effect can be larger. Numbercrunching
would come to mind.
* Some people are unhappy that nice is not nice enough.

> In
> theory, the scheduler should magically tune itself to the current workload.

No, the computer can not take the decision whether a machine is sitting
in a machine room and mainly doing batch processing or whether it's used
by somebody sitting in front of it as workstation.

You can add heuristics and look at the load and sleep_avg of processes,
and scale timeslices dynamically, but these heuristics are IMVHO a very
bad idea. They tend to break in subtle ways and disallow to reproduce
benchmark numbers etc.

I do know that the priorites do ensure that interactive processes have
some bonus, so even with long timeslices a system should be usable.
But heuristics fail and some situtaions with high load just can't be
solved by such bonuses.

> If your patch is indeed necessary then this may point at a bug in the
> current CPU scheduler.

No, why should it? How should the computer know what the user wants if
he has no way to tell him?

It's a classical throughput vs. latency tradeoff and the patch allows
the user to set it. I'm sure some people are willing to have long
timeslices in order to gain 5% and don't care about the sched latencies.

Regards,
--
Kurt Garloff <[email protected]> Cologne, DE
SUSE LINUX AG, Nuernberg, DE SUSE Labs (Head)


Attachments:
(No filename) (1.88 kB)
(No filename) (189.00 B)
Download all attachments

2004-03-16 13:14:06

by Con Kolivas

[permalink] [raw]
Subject: Re: dynamic sched timeslices

On Tue, 16 Mar 2004 10:36 pm, Kurt Garloff wrote:
> On Mon, Mar 15, 2004 at 03:40:42PM -0800, Andrew Morton wrote:
> > Your patch didn't come with any subjective or measured testing results.

> We've done some measurements with 2.4 and O(1):
> * HZ=1000 cost about 1.5% perf. on a kernel compile (plus problems
> with lost timer ticks)
> * Seting the scheduling timeslices from 1ms--30ms rather than 10ms thr
> 300ms cost another ~3% kernel compile performance
> * Depending on the workload the effect can be larger. Numbercrunching
> would come to mind.

2.4 O(1) effects do not directly apply with 2.6

Dropping Hz will save you performance for sure on 2.6.

Changing the timeslices in 2.6 will be disappointing, though. Although the
apparent timeslice of nice 0 tasks is 102ms, interactive tasks round robin at
10ms. If you drop the timeslice to 10ms you will not improve the interactive
feel but you will speed up expiration instead which will almost certainly
worsen interactive feel. If you drop timeslices below 10ms you will get
significant cache trashing and drop in performance (which your 2.4 results
confirm).

Increasing timeslices does benefit pure number crunching workloads. The
benchmarking I've done using cache intensive workloads (which are the most
likely to benefit) show you are chasing diminishing returns, though. You can
mathematically model them based on the fact that keeping a task bound to a
cpu instead of shifting it to another cpu on SMP saves about 2ms processing
time on P4. Suffice to say the benefit is only worth it if you do nothing but
cpu intensive things, and becomes virtually insignificant beyond 200ms. On
other architecture with longer cache decays you will benefit more;
arch/i386/mach-voyager seems the longest at 20ms.

>It's a classical throughput vs. latency tradeoff and the patch allows
>the user to set it. I'm sure some people are willing to have long
>timeslices in order to gain 5% and don't care about the sched latencies.

As you can see from the above it is not that clear cut.

Con

2004-03-16 14:53:59

by Timothy Miller

[permalink] [raw]
Subject: Re: dynamic sched timeslices



Kurt Garloff wrote:
> Hi,
>
> attached patch allows userspace to tune the scheduling timeslices.
> It can be used for a couple of things:
> * Tune a workload for batch processing:
> You'd probably wnat to use long timeslices in order to not reschedule
> as often to make good use of your CPU caches
> * Tune a workload for interactive use:
> Under load, you may want to reduce the scedulilng latencies by using
> shorter timeslices (and there are situations where the interactiviy
> tweak -- even if they were perfect -- can't save you).
> * Tune the ration betweeen maximum and minimum timeslices to make
> nice much nicer e.g.
>
> The patch exports /proc/sys/kernel/max_timeslice and min_timeslice,
> unites are us. It also exports HZ (readonly).
> The patch implementes the desktop boot parameter which introduces
> shorter timeslices.
>
> Patch is from andrea and is in our 2.4 tree; 2.6 port was done by me and
> straightforward.
>
> Regards,

If this doesn't change the total amount of CPU a process can get but
lets a process tweak how its CPU time is divided up, then it sounds
wonderful.

2004-03-16 15:09:58

by Kurt Garloff

[permalink] [raw]
Subject: Re: dynamic sched timeslices

Hi Timothy,

On Tue, Mar 16, 2004 at 10:03:39AM -0500, Timothy Miller wrote:
> If this doesn't change the total amount of CPU a process can get but
> lets a process tweak how its CPU time is divided up, then it sounds
> wonderful.

It let's the sysadming tune how often the kernel switches between
processes competing for the CPU.

Regards,
--
Kurt Garloff <[email protected]> Cologne, DE
SUSE LINUX AG, Nuernberg, DE SUSE Labs (Head)


Attachments:
(No filename) (491.00 B)
(No filename) (189.00 B)
Download all attachments

2004-03-16 14:47:08

by Kurt Garloff

[permalink] [raw]
Subject: Re: dynamic sched timeslices

Hi Con,

On Wed, Mar 17, 2004 at 12:13:37AM +1100, Con Kolivas wrote:
> 2.4 O(1) effects do not directly apply with 2.6
>
> Dropping Hz will save you performance for sure on 2.6.
>
> Changing the timeslices in 2.6 will be disappointing, though. Although the
> apparent timeslice of nice 0 tasks is 102ms, interactive tasks round robin at
> 10ms. If you drop the timeslice to 10ms you will not improve the interactive
> feel but you will speed up expiration instead which will almost certainly
> worsen interactive feel.

If you have a system with an easy workload (say one clear CPU hog and
one interactive job), things are easy. The fact that you preempt the
not-yet expired CPU hog is enough.
That's easy, and that worked with 2.4 O(1) (if tweaked a bit to estimate
interactiveness better, see other patch) and it works with 2.6.

Things start to get difficult if you have something like a calculation
program with a non-multithreaded GUI. It will look like a CPU hog and
still you'd like to see it responsive. Now add a second CPU hog.

The kernel can not fix this problem, but it can limit the damage by
not having too long timeslices.

There are other scenarios where the preemption will not solve all
problems.
Think two interactive processes, one playing audio, another one being
your shell. The audio player may take the CPU for extended periods of
times occasionally to decode the next N ogg frames. You still want the
shell to react promptly, but it can't ... Thus you wish the timeslice
not being too long.

Thus you'll set them not too long for desktop kind of machines to
not have to rely completely on the interactiveness estimator.

> If you drop timeslices below 10ms you will get
> significant cache trashing and drop in performance (which your 2.4 results
> confirm).

No doubt. Don't overdo it. It's a tradeoff. If you impact throughput too
much, you'll not enjoy the short latency ;-)

> Increasing timeslices does benefit pure number crunching workloads. The
> benchmarking I've done using cache intensive workloads (which are the most
> likely to benefit) show you are chasing diminishing returns, though. You can
> mathematically model them based on the fact that keeping a task bound to a
> cpu instead of shifting it to another cpu on SMP saves about 2ms processing
> time on P4. Suffice to say the benefit is only worth it if you do nothing but
> cpu intensive things, and becomes virtually insignificant beyond 200ms. On
> other architecture with longer cache decays you will benefit more;
> arch/i386/mach-voyager seems the longest at 20ms.

That's why I think we should offer the tunables.

Regards,
--
Kurt Garloff <[email protected]> [Koeln, DE]
Physics:Plasma modeling <[email protected]> [TU Eindhoven, NL]
Linux: SUSE Labs (Head) <[email protected]> [SUSE Nuernberg, DE]


Attachments:
(No filename) (2.80 kB)
(No filename) (189.00 B)
Download all attachments

2004-03-16 20:45:38

by Con Kolivas

[permalink] [raw]
Subject: Re: dynamic sched timeslices

On Wed, 17 Mar 2004 01:29 am, Kurt Garloff wrote:
> Hi Con,
>
> On Wed, Mar 17, 2004 at 12:13:37AM +1100, Con Kolivas wrote:
> > 2.4 O(1) effects do not directly apply with 2.6
> >
> > Dropping Hz will save you performance for sure on 2.6.
> >
> > Changing the timeslices in 2.6 will be disappointing, though. Although
> > the apparent timeslice of nice 0 tasks is 102ms, interactive tasks round
> > robin at 10ms. If you drop the timeslice to 10ms you will not improve the
> > interactive feel but you will speed up expiration instead which will
> > almost certainly worsen interactive feel.
>
> If you have a system with an easy workload (say one clear CPU hog and
> one interactive job), things are easy. The fact that you preempt the
> not-yet expired CPU hog is enough.
> That's easy, and that worked with 2.4 O(1) (if tweaked a bit to estimate
> interactiveness better, see other patch) and it works with 2.6.
>
> Things start to get difficult if you have something like a calculation
> program with a non-multithreaded GUI. It will look like a CPU hog and
> still you'd like to see it responsive. Now add a second CPU hog.
>
> The kernel can not fix this problem, but it can limit the damage by
> not having too long timeslices.
>
> There are other scenarios where the preemption will not solve all
> problems.
> Think two interactive processes, one playing audio, another one being
> your shell. The audio player may take the CPU for extended periods of
> times occasionally to decode the next N ogg frames. You still want the
> shell to react promptly, but it can't ... Thus you wish the timeslice
> not being too long.
>
> Thus you'll set them not too long for desktop kind of machines to
> not have to rely completely on the interactiveness estimator.

I'm not arguing with your logic I'm saying you will ruin the estimator in the
process because it isn't just about timeslices and preemption. Ultimately
you're still only give me theoretical cases which are exactly what were my
test cases during development. Find a test case that you can prove it does
something rather than a theoretical one. Oh and a test case should be one
where it is the cpu scheduler that is responsible, not one where you're
fixing problems due to poor hardware config with alsa, oss, IDE, dma settings
etc.

> > If you drop timeslices below 10ms you will get
> > significant cache trashing and drop in performance (which your 2.4
> > results confirm).
>
> No doubt. Don't overdo it. It's a tradeoff. If you impact throughput too
> much, you'll not enjoy the short latency ;-)
>
> > Increasing timeslices does benefit pure number crunching workloads. The
> > benchmarking I've done using cache intensive workloads (which are the
> > most likely to benefit) show you are chasing diminishing returns, though.
> > You can mathematically model them based on the fact that keeping a task
> > bound to a cpu instead of shifting it to another cpu on SMP saves about
> > 2ms processing time on P4. Suffice to say the benefit is only worth it if
> > you do nothing but cpu intensive things, and becomes virtually
> > insignificant beyond 200ms. On other architecture with longer cache
> > decays you will benefit more; arch/i386/mach-voyager seems the longest at
> > 20ms.
>
> That's why I think we should offer the tunables.

If your workload is so dedicated to just number crunching it isn't hard to add
a zero to maximum timeslice in kernel/sced.c. Then again this is just
semantics about how to tune it so I don't really care, but I'm sure the
maintainer wants proof that changing it shows some real world improvement.

Con

2004-03-18 00:22:58

by Kurt Garloff

[permalink] [raw]
Subject: Re: dynamic sched timeslices

Hi Con,

On Wed, Mar 17, 2004 at 07:45:02AM +1100, Con Kolivas wrote:
> > That's why I think we should offer the tunables.
>
> If your workload is so dedicated to just number crunching it isn't hard to add
> a zero to maximum timeslice in kernel/sced.c.

Of course I can compile a custom kernel for myself and tune all sorts of
things. But this is not the way most Linux users want to use Linux any
more. Actually that's a long time ago.

Best regards,
--
Kurt Garloff <[email protected]> [Koeln, DE]
Physics:Plasma modeling <[email protected]> [TU Eindhoven, NL]
Linux: SUSE Labs (Head) <[email protected]> [SUSE Nuernberg, DE]


Attachments:
(No filename) (679.00 B)
(No filename) (189.00 B)
Download all attachments

2004-03-18 00:32:22

by Andrew Morton

[permalink] [raw]
Subject: Re: dynamic sched timeslices

Kurt Garloff <[email protected]> wrote:
>
> Hi Con,
>
> On Wed, Mar 17, 2004 at 07:45:02AM +1100, Con Kolivas wrote:
> > > That's why I think we should offer the tunables.
> >
> > If your workload is so dedicated to just number crunching it isn't hard to add
> > a zero to maximum timeslice in kernel/sced.c.
>
> Of course I can compile a custom kernel for myself and tune all sorts of
> things. But this is not the way most Linux users want to use Linux any
> more. Actually that's a long time ago.
>

I don't think we should be averse to offering a couple of nice high-level
scheduler tunables. But I do think we should have testing results which
clearly show that they provide some benefit, and we should agree that the
scheduler cannot provide the same benefit automagically.

Apologies in advance if we've seen those testing results and I missed it.

2004-03-18 03:19:56

by Con Kolivas

[permalink] [raw]
Subject: Re: dynamic sched timeslices

Quoting Andrew Morton <[email protected]>:

> Kurt Garloff <[email protected]> wrote:
> >
> > Hi Con,
> >
> > On Wed, Mar 17, 2004 at 07:45:02AM +1100, Con Kolivas wrote:
> > > > That's why I think we should offer the tunables.
> > >
> > > If your workload is so dedicated to just number crunching it isn't hard
> to add
> > > a zero to maximum timeslice in kernel/sced.c.
> >
> > Of course I can compile a custom kernel for myself and tune all sorts of
> > things. But this is not the way most Linux users want to use Linux any
> > more. Actually that's a long time ago.
> >
>
> I don't think we should be averse to offering a couple of nice high-level
> scheduler tunables. But I do think we should have testing results which
> clearly show that they provide some benefit, and we should agree that the
> scheduler cannot provide the same benefit automagically.
>
> Apologies in advance if we've seen those testing results and I missed it.

Well that reply takes my message out of context. I'm not averse to tunables - if
they do something.

The only evidence Kurt has shown so far is that he can decrease throughput. The
rest is theoretical based on a scheduler that isn't the 2.6 kernel.

Con

2004-03-25 14:25:37

by Pavel Machek

[permalink] [raw]
Subject: Re: dynamic sched timeslices

Hi!

> attached patch allows userspace to tune the scheduling timeslices.
> It can be used for a couple of things:
> * Tune a workload for batch processing:
> You'd probably wnat to use long timeslices in order to not reschedule
> as often to make good use of your CPU caches
> * Tune a workload for interactive use:
> Under load, you may want to reduce the scedulilng latencies by using
> shorter timeslices (and there are situations where the interactiviy
> tweak -- even if they were perfect -- can't save you).
> * Tune the ration betweeen maximum and minimum timeslices to make
> nice much nicer e.g.

If you make ration much bigger, you are going to see
priority inversion issues. Some kind of "boost priority when in
kernel" would be needed...

--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms