[permalink] [raw]

Subject: Re: BFS vs. mainline scheduler benchmarks and measurements

On Mon, Sep 07 2009, Avi Kivity wrote:
> On 09/07/2009 12:49 PM, Jens Axboe wrote:
>>
>> I ran a simple test as well, since I was curious to see how it performed
>> wrt interactiveness. One of my pet peeves with the current scheduler is
>> that I have to nice compile jobs, or my X experience is just awful while
>> the compile is running.
>>
>
> I think the problem is that CFS is optimizing for the wrong thing. It's
> trying to be fair to tasks, but these are meaningless building blocks of
> jobs, which is what the user sees and measures. Your make -j128
> dominates your interactive task by two orders of magnitude. If the
> scheduler attempts to bridge this gap using heuristics, it will fail
> badly when it misdetects since it will starve the really important
> 100-thread job for a task that was misdetected as interactive.

Agree, I was actually looking into doing joint latency for X number of
tasks for the test app. I'll try and do that and see if we can detect
something from that.

--
Jens Axboe

2009-09-07 18:47:24

by Daniel Walker

[permalink] [raw]

Subject: Re: BFS vs. mainline scheduler benchmarks and measurements

On Mon, 2009-09-07 at 20:26 +0200, Ingo Molnar wrote:
> That's interesting. I tried to reproduce it on x86, but the profile
> does not show any scheduler overhead at all on the server:

If the scheduler isn't running the task which causes the lower
throughput , would that even show up in profiling output?

Daniel

2009-09-07 18:51:15

by Michael Büsch

[permalink] [raw]

Subject: Re: BFS vs. mainline scheduler benchmarks and measurements

On Monday 07 September 2009 20:26:29 Ingo Molnar wrote:
> Could you profile it please? Also, what's the context-switch rate?

As far as I can tell, the broadcom mips architecture does not have profiling support.
It does only have some proprietary profiling registers that nobody wrote kernel
support for, yet.

--
Greetings, Michael.

2009-09-07 20:36:37

by Ingo Molnar

[permalink] [raw]

Subject: Re: BFS vs. mainline scheduler benchmarks and measurements

* Jens Axboe <[email protected]> wrote:

> Agree, I was actually looking into doing joint latency for X
> number of tasks for the test app. I'll try and do that and see if
> we can detect something from that.

Could you please try latest -tip:

http://people.redhat.com/mingo/tip.git/README

(c26f010 or later)

Does it get any better with make -j128 build jobs? Peter just fixed
a bug in the SMP load-balancer that can cause interactivity problems
on large CPU count systems.

Ingo

2009-09-07 20:44:58

* Jens Axboe <[email protected]> wrote:

> On Thu, Sep 10 2009, Peter Zijlstra wrote:
> > On Wed, 2009-09-09 at 14:20 +0200, Jens Axboe wrote:
> > >
> > > One thing I also noticed is that when I have logged in, I run xmodmap
> > > manually to load some keymappings (I always tell myself to add this to
> > > the log in scripts, but I suspend/resume this laptop for weeks at the
> > > time and forget before the next boot). With the stock kernel, xmodmap
> > > will halt X updates and take forever to run. With BFS, it returned
> > > instantly. As I would expect.
> >
> > Can you provide a little more detail (I'm a xmodmap n00b), how
> > does one run xmodmap and maybe provide your xmodmap config?
>
> Will do, let me get the notebook and strace time it on both bfs
> and mainline.

A 'perf stat' comparison would be nice as well - that will show us
events strace doesnt show, and shows us the basic scheduler behavior
as well.

A 'full' trace could be done as well via trace-cmd.c (attached), if
you enable:

CONFIG_CONTEXT_SWITCH_TRACER=y

and did something like:

trace-cmd -s xmodmap ... > trace.txt

Ingo

Attachments:

(No filename) (1.10 kB)
trace-cmd.c (6.39 kB)
Download all attachments

2009-09-10 07:33:17

On Thu, Sep 10 2009, Con Kolivas wrote:
> > It probably just means that latt isn't a good measure of the problem.
> > Which isn't really too much of a surprise.
>
> And that's a real shame because this was one of the first real good attempts
> I've seen to actually measure the difference, and I thank you for your
> efforts Jens. I believe the reason it's limited is because all you're
> measuring is time from wakeup and the test app isn't actually doing any work.
> The issue is more than just waking up as fast as possible, it's then doing
> some meaningful amount of work within a reasonable time frame as well. What
> the "meaningful amount of work" and "reasonable time frame" are, remains a
> mystery, but I guess could be added on to this testing app.

Here's a quickie addition that adds some work to the threads. The
latency measure is now 'when did I wake up and complete my work'. The
default work is filling a buffer with pseudo random data and then
compressing it with zlib. Default is 64kb of data, can be adjusted with
-x. -x0 turns off work processing.

--
Jens Axboe

Attachments:

(No filename) (1.07 kB)
latt.c (10.98 kB)
Download all attachments

2009-09-10 11:09:14

Am Mittwoch 09 September 2009 schrieb Peter Zijlstra:
> On Wed, 2009-09-09 at 12:05 +0300, Nikos Chantziaras wrote:
> > Thank you for mentioning min_granularity. After:
> >
> > echo 10000000 > /proc/sys/kernel/sched_latency_ns
> > echo 2000000 > /proc/sys/kernel/sched_min_granularity_ns
>
> You might also want to do:
>
> echo 2000000 > /proc/sys/kernel/sched_wakeup_granularity_ns
>
> That affects when a newly woken task will preempt an already running
> task.

Heh that scheduler thing again... and unfortunately Col appearing to feel
hurt while I am think that Ingo is honest on his offer on collaboration...

While it makes fun playing with that numbers and indeed experiencing
subjectively a more fluid deskopt how about just a

echo "This is a f* desktop!" > /proc/sys/kernel/sched_workload

Or to say it in other words: The Linux kernel should not require me to
fine-tune three or more values to have the scheduler act in a way that
matches my workload.

I am willing to test stuff on my work thinkpad and my Amarok thinkpad in
order to help improving with that.

--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

Attachments:

signature.asc (197.00 B)
This is a digitally signed message part.

2009-09-10 20:06:46

by Ingo Molnar

[permalink] [raw]

Subject: Re: BFS vs. mainline scheduler benchmarks and measurements

* Martin Steigerwald <[email protected]> wrote:

> Am Mittwoch 09 September 2009 schrieb Peter Zijlstra:
> > On Wed, 2009-09-09 at 12:05 +0300, Nikos Chantziaras wrote:
> > > Thank you for mentioning min_granularity. After:
> > >
> > > echo 10000000 > /proc/sys/kernel/sched_latency_ns
> > > echo 2000000 > /proc/sys/kernel/sched_min_granularity_ns
> >
> > You might also want to do:
> >
> > echo 2000000 > /proc/sys/kernel/sched_wakeup_granularity_ns
> >
> > That affects when a newly woken task will preempt an already running
> > task.
>
> Heh that scheduler thing again... and unfortunately Col appearing
> to feel hurt while I am think that Ingo is honest on his offer on
> collaboration...
>
> While it makes fun playing with that numbers and indeed
> experiencing subjectively a more fluid deskopt how about just a
>
> echo "This is a f* desktop!" > /proc/sys/kernel/sched_workload

No need to do that, that's supposed to be the default :-) The knobs
are really just there to help us make it even more so - i.e. you
dont need to tune them. But it really relies on people helping us
out and tell us which combinations work best ...

> Or to say it in other words: The Linux kernel should not require
> me to fine-tune three or more values to have the scheduler act in
> a way that matches my workload.
>
> I am willing to test stuff on my work thinkpad and my Amarok
> thinkpad in order to help improving with that.

It would be great if you could check latest -tip:

http://people.redhat.com/mingo/tip.git/README

and compare it to vanilla .31?

Also, could you outline the interactivity problems/complaints you
have?

Ingo

2009-09-10 20:25:47

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: BFS vs. mainline scheduler benchmarks and measurements

On Tue, Sep 08, 2009 at 09:15:22PM +0300, Nikos Chantziaras wrote:
> On 09/07/2009 02:01 PM, Frederic Weisbecker wrote:
>> That looks eventually benchmarkable. This is about latency.
>> For example, you could try to run high load tasks in the
>> background and then launch a task that wakes up in middle/large
>> periods to do something. You could measure the time it takes to wake
>> it up to perform what it wants.
>>
>> We have some events tracing infrastructure in the kernel that can
>> snapshot the wake up and sched switch events.
>>
>> Having CONFIG_EVENT_TRACING=y should be sufficient for that.
>>
>> You just need to mount a debugfs point, say in /debug.
>>
>> Then you can activate these sched events by doing:
>>
>> echo 0> /debug/tracing/tracing_on
>> echo 1> /debug/tracing/events/sched/sched_switch/enable
>> echo 1> /debug/tracing/events/sched/sched_wake_up/enable
>>
>> #Launch your tasks
>>
>> echo 1> /debug/tracing/tracing_on
>>
>> #Wait for some time
>>
>> echo 0> /debug/tracing/tracing_off
>>
>> That will require some parsing of the result in /debug/tracing/trace
>> to get the delays between wake_up events and switch in events
>> for the task that periodically wakes up and then produce some
>> statistics such as the average or the maximum latency.
>>
>> That's a bit of a rough approach to measure such latencies but that
>> should work.
>
> I've tried this with 2.6.31-rc9 while running mplayer and alt+tabbing
> repeatedly to the point where mplayer starts to stall and drop frames.
> This produced a 4.1MB trace file (132k bzip2'ed):
>
> http://foss.math.aegean.gr/~realnc/kernel/trace1.bz2
>
> Uncompressed for online viewing:
>
> http://foss.math.aegean.gr/~realnc/kernel/trace1
>
> I must admit that I don't know what it is I'm looking at :P

Hehe :-)

Basically you have samples of two kind of events:

- wake up (when thread A wakes up B)

The format is as follows:

task-pid
(the waker A)
|
| cpu timestamp event-name wakee(B) prio status
| | | | | | |
X-11482 [001] 1023.219246: sched_wakeup: task kwin:11571 [120] success=1

Here X is awakening kwin.

- sched switch (when the scheduler stops A and launches B)

A, task B, task
that gets that gets
sched sched
out in
A cpu timestamp event-name | A prio | B prio
| | | | | | | |
X-11482 [001] 1023.219247: sched_switch: task X:11482 [120] (R) ==> kwin:11571 [120]
|
|
State of A
For A state we can have either:

R: TASK_RUNNING, the task is not sleeping but it is rescheduled for later
to let another task run

S: TASK_INTERRUPTIBLE, the task is sleeping, waiting for an event that may
wake it up. The task can be waked by a signal

D: TASK_UNINTERRUPTIBLE, same as above but can't be waked by a signal.

Now what could be interesting interesting is to measure the time between
such pair of events:

- t0: A wakes up B
- t1: B is sched in

t1 - t0 would then be the scheduler latency, or at least part of it:

The scheduler latency may be an addition of several factors:

- the time it takes for the actual wake up to perform (re-insert
the task into a runqueue, which can be subject to the runqueue(s)
design, the rebalancing if needed, etc..

- the time between a task is waked up and the scheduler eventually
decide to schedule it in.

- the time it takes to perform the task switch, which is not only
in the scheduler scope. But the time it takes may depend of a
rebalancing decision (cache cold, etc..)

Unfortunately we can only measure the second part with the above ftrace
events. But that's still an interesting scheduler abstract that is a
large part of the scheduler latency.

We could write a tiny parser that could walk through such ftrace traces
and produce some average, maximum, standard deviation numbers.

But we have userspace tools that can parse ftrace events (through perf
counter), so I'm trying to write something there, hopefully I could get
a relevant end result.

Thanks.

2009-09-10 20:39:49

by Martin Steigerwald

[permalink] [raw]

Subject: Re: BFS vs. mainline scheduler benchmarks and measurements

Am Donnerstag 10 September 2009 schrieb Ingo Molnar:
> * Martin Steigerwald <[email protected]> wrote:
> > Am Mittwoch 09 September 2009 schrieb Peter Zijlstra:
> > > On Wed, 2009-09-09 at 12:05 +0300, Nikos Chantziaras wrote:
> > > > Thank you for mentioning min_granularity. After:
> > > >
> > > > echo 10000000 > /proc/sys/kernel/sched_latency_ns
> > > > echo 2000000 > /proc/sys/kernel/sched_min_granularity_ns
> > >
> > > You might also want to do:
> > >
> > > echo 2000000 > /proc/sys/kernel/sched_wakeup_granularity_ns
> > >
> > > That affects when a newly woken task will preempt an already
> > > running task.
> >
> > Heh that scheduler thing again... and unfortunately Col appearing
> > to feel hurt while I am think that Ingo is honest on his offer on
> > collaboration...
> >
> > While it makes fun playing with that numbers and indeed
> > experiencing subjectively a more fluid deskopt how about just a
> >
> > echo "This is a f* desktop!" > /proc/sys/kernel/sched_workload
>
> No need to do that, that's supposed to be the default :-) The knobs
> are really just there to help us make it even more so - i.e. you
> dont need to tune them. But it really relies on people helping us
> out and tell us which combinations work best ...

Well currently I have:

shambhala:/proc/sys/kernel> grep "" sched_latency_ns
sched_min_granularity_ns sched_wakeup_granularity_ns
sched_latency_ns:100000
sched_min_granularity_ns:200000
sched_wakeup_granularity_ns:0

And this give me *a completely different* desktop experience.

I am using KDE 4.3.1 on a mixture of Debian Squeeze/Sid/Experimental, with
compositing. And now when I flip desktops or open a window I can *actually
see* the animation. Before I jusooooooooooooooooooooot saw two to five
steps of the animation,
now its really a lot more fluid.

perceived _latency--! Well its like
oooooooooooooooooooooooooooooooooooooooooooooooooooooooopening the eyes
again cause I tended
to take the jerky behavior as normal and possibly related to having KDE
4.3.1 with compositing enabled on a ThinkPad T42 with
ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
01:00.0 VGA compatible controller [0300]: ATI Technologies Inc RV350
[Mobility Radeon 9600 M10] [1002:4e50]

which I consider to be low end for that workload. But then why actually?
Next to me is a Sam440ep with PPC 440 667 MHz and and even older Radeon M9
with AmigaOS 4.1 and some simple transparency effects with compositing. And
well this combo does feel like it wheel spins cause the hardware is
actually to fast
foooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

>
> > Or to say it in other words: The Linux kernel should not require
> > me to fine-tune three or more values to have the scheduler act in
> > a way that matches my workload.
> >
> > I am willing to test stuff on my work thinkpad and my Amarok
> > thinkpad in order to help improving with that.
>
> It would be great if you could check latest -tip:
>
> http://people.redhat.com/mingo/tip.git/README
>
> and compare it to vanilla .31?
>
> Also, could you outline the interactivity problems/complaints you
> have?
>
> Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> in the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

2009-09-10 20:43:02

by Ingo Molnar

by Martin Schwidefsky

[permalink] [raw]

Subject: Re: [crash, bisected] Re: clocksource: Resolve cpu hotplug dead lock with TSC unstable

On Fri, 11 Sep 2009 09:37:47 +0200
Ingo Molnar <[email protected]> wrote:

>
> * Ingo Molnar <[email protected]> wrote:
>
> >
> > * Ingo Molnar <[email protected]> wrote:
> >
> > >
> > > * Jens Axboe <[email protected]> wrote:
> > >
> > > > I went to try -tip btw, but it crashes on boot. Here's the
> > > > backtrace, typed manually, it's crashing in
> > > > queue_work_on+0x28/0x60.
> > > >
> > > > Call Trace:
> > > > queue_work
> > > > schedule_work
> > > > clocksource_mark_unstable
> > > > mark_tsc_unstable
> > > > check_tsc_sync_source
> > > > native_cpu_up
> > > > relay_hotcpu_callback
> > > > do_forK_idle
> > > > _cpu_up
> > > > cpu_up
> > > > kernel_init
> > > > kernel_thread_helper
> > >
> > > hm, that looks like an old bug i fixed days ago via:
> > >
> > > 00a3273: Revert "x86: Make tsc=reliable override boot time stability checks"
> > >
> > > Have you tested tip:master - do you still know which sha1?
> >
> > Ok, i reproduced it on a testbox and bisected it, the crash is
> > caused by:
> >
> > 7285dd7fd375763bfb8ab1ac9cf3f1206f503c16 is first bad commit
> > commit 7285dd7fd375763bfb8ab1ac9cf3f1206f503c16
> > Author: Thomas Gleixner <[email protected]>
> > Date: Fri Aug 28 20:25:24 2009 +0200
> >
> > clocksource: Resolve cpu hotplug dead lock with TSC unstable
> >
> > Martin Schwidefsky analyzed it:
> >
> > I've reverted it in tip/master for now.
>
> and that uncovers the circular locking bug that this commit was
> supposed to fix ...
>
> Martin?

This patch should fix the obvious problem that the watchdog_work
structure is not yet initialized if the clocksource watchdog is not
running yet.
--
Subject: [PATCH] clocksource: statically initialize watchdog workqueue

From: Martin Schwidefsky <[email protected]>

The watchdog timer is started after the watchdog clocksource and at least
one watched clocksource have been registered. The clocksource work element
watchdog_work is initialized just before the clocksource timer is started.
This is too late for the clocksource_mark_unstable call from native_cpu_up.
To fix this use a static initializer for watchdog_work.

Signed-off-by: Martin Schwidefsky <[email protected]>
---
kernel/time/clocksource.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

Index: linux-2.6/kernel/time/clocksource.c
===================================================================
--- linux-2.6.orig/kernel/time/clocksource.c
+++ linux-2.6/kernel/time/clocksource.c
@@ -123,10 +123,12 @@ static DEFINE_MUTEX(clocksource_mutex);
static char override_name[32];

#ifdef CONFIG_CLOCKSOURCE_WATCHDOG
+static void clocksource_watchdog_work(struct work_struct *work);
+
static LIST_HEAD(watchdog_list);
static struct clocksource *watchdog;
static struct timer_list watchdog_timer;
-static struct work_struct watchdog_work;
+static DECLARE_WORK(watchdog_work, clocksource_watchdog_work);
static DEFINE_SPINLOCK(watchdog_lock);
static cycle_t watchdog_last;
static int watchdog_running;
@@ -230,7 +232,6 @@ static inline void clocksource_start_wat
{
if (watchdog_running || !watchdog || list_empty(&watchdog_list))
return;
- INIT_WORK(&watchdog_work, clocksource_watchdog_work);
init_timer(&watchdog_timer);
watchdog_timer.function = clocksource_watchdog;
watchdog_last = watchdog->read(watchdog);

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.

2009-09-11 18:23:03

by Martin Schwidefsky

[permalink] [raw]

Subject: [tip:timers/core] clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash

Commit-ID: f79e0258ea1f04d63db499479b5fb855dff6dbc5
Gitweb: http://git.kernel.org/tip/f79e0258ea1f04d63db499479b5fb855dff6dbc5
Author: Martin Schwidefsky <[email protected]>
AuthorDate: Fri, 11 Sep 2009 15:33:05 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 11 Sep 2009 20:17:18 +0200

clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash

The watchdog timer is started after the watchdog clocksource
and at least one watched clocksource have been registered. The
clocksource work element watchdog_work is initialized just
before the clocksource timer is started. This is too late for
the clocksource_mark_unstable call from native_cpu_up. To fix
this use a static initializer for watchdog_work.

This resolves a boot crash reported by multiple people.

Signed-off-by: Martin Schwidefsky <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: John Stultz <[email protected]>
LKML-Reference: <20090911153305.3fe9a361@skybase>
Signed-off-by: Ingo Molnar <[email protected]>

---
kernel/time/clocksource.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index a0af4ff..5697155 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -123,10 +123,12 @@ static DEFINE_MUTEX(clocksource_mutex);
static char override_name[32];

#ifdef CONFIG_CLOCKSOURCE_WATCHDOG
+static void clocksource_watchdog_work(struct work_struct *work);
+
static LIST_HEAD(watchdog_list);
static struct clocksource *watchdog;
static struct timer_list watchdog_timer;
-static struct work_struct watchdog_work;
+static DECLARE_WORK(watchdog_work, clocksource_watchdog_work);
static DEFINE_SPINLOCK(watchdog_lock);
static cycle_t watchdog_last;
static int watchdog_running;
@@ -257,7 +259,6 @@ static inline void clocksource_start_watchdog(void)
{
if (watchdog_running || !watchdog || list_empty(&watchdog_list))
return;
- INIT_WORK(&watchdog_work, clocksource_watchdog_work);
init_timer(&watchdog_timer);
watchdog_timer.function = clocksource_watchdog;
watchdog_last = watchdog->read(watchdog);

2009-09-11 18:33:42

Am Freitag 11 September 2009 schrieb Mat:
> Martin Steigerwald <Martin <at> lichtvoll.de> writes:
> > Am Donnerstag 10 September 2009 schrieb Ingo Molnar:
>
> [snip]
>
> > > what is /debug/sched_features - is NO_NEW_FAIR_SLEEPERS set? If not
> > > set yet then try it:
> > >
> > > echo NO_NEW_FAIR_SLEEPERS > /debug/sched_features
> > >
> > > that too might make things more fluid.
>
> Hi Martin,

Hi Mat,

> it made an tremendous difference which still has to be tested out :)

[...]

> Concerning that "NO_NEW_FAIR_SLEEPERS" switch - isn't it as easy as to
>
> do the following ? (I'm not sure if there's supposed to be another
> debug)
>
> echo NO_NEW_FAIR_SLEEPERS > /sys/kernel/debug/sched_features
>
> which after the change says:
>
> cat /sys/kernel/debug/sched_features
> NO_NEW_FAIR_SLEEPERS NO_NORMALIZED_SLEEPER ADAPTIVE_GRAN WAKEUP_PREEMPT
> START_DEBIT AFFINE_WAKEUPS CACHE_HOT_BUDDY SYNC_WAKEUPS NO_HRTICK
> NO_DOUBLE_TICK ASYM_GRAN LB_BIAS LB_WAKEUP_UPDATE ASYM_EFF_LOAD
> NO_WAKEUP_OVERLAP LAST_BUDDY OWNER_SPIN
>
> I hope that's the correct switch ^^

Thanks. Appears to work here nicely ;-). I thought this might be a debug
fs that I need to mount separately, but its already there here. I will see
how it works out.

I wondered whethere it might be a good idea to have a

echo default > /sys/kernel/kernel-tuning-knob

that will reset it to the compiled in factory defaults. Would be a nice
way to go back to safe settings again once you got carried away to far
with trying those tuning knobs.

Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

Attachments:

signature.asc (197.00 B)
This is a digitally signed message part.

2009-09-12 11:45:30

by Martin Steigerwald

[permalink] [raw]

Subject: Re: [tip:sched/core] sched: Re-tune the scheduler latency defaults to decrease worst-case latencies

Am Mittwoch 09 September 2009 schrieb tip-bot for Mike Galbraith:
> Commit-ID: 172e082a9111ea504ee34cbba26284a5ebdc53a7
> Gitweb:
> http://git.kernel.org/tip/172e082a9111ea504ee34cbba26284a5ebdc53a7
> Author: Mike Galbraith <[email protected]>
> AuthorDate: Wed, 9 Sep 2009 15:41:37 +0200
> Committer: Ingo Molnar <[email protected]>
> CommitDate: Wed, 9 Sep 2009 17:30:06 +0200
>
> sched: Re-tune the scheduler latency defaults to decrease worst-case
> latencies
>
> Reduce the latency target from 20 msecs to 5 msecs.
>
> Why? Larger latencies increase spread, which is good for scaling,
> but bad for worst case latency.
>
> We still have the ilog(nr_cpus) rule to scale up on bigger
> server boxes.
>
> Signed-off-by: Mike Galbraith <[email protected]>
> Acked-by: Peter Zijlstra <[email protected]>
> LKML-Reference: <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
>
>
> ---
> kernel/sched_fair.c | 12 ++++++------
> 1 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index af325a3..26fadb4 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -24,7 +24,7 @@
>
> /*
> * Targeted preemption latency for CPU-bound tasks:
> - * (default: 20ms * (1 + ilog(ncpus)), units: nanoseconds)
> + * (default: 5ms * (1 + ilog(ncpus)), units: nanoseconds)
> *
> * NOTE: this latency value is not the same as the concept of
> * 'timeslice length' - timeslices in CFS are of variable length
> @@ -34,13 +34,13 @@
> * (to see the precise effective timeslice length of your workload,
> * run vmstat and monitor the context-switches (cs) field)
> */
> -unsigned int sysctl_sched_latency = 20000000ULL;
> +unsigned int sysctl_sched_latency = 5000000ULL;
>
> /*
> * Minimal preemption granularity for CPU-bound tasks:
> - * (default: 4 msec * (1 + ilog(ncpus)), units: nanoseconds)
> + * (default: 1 msec * (1 + ilog(ncpus)), units: nanoseconds)
> */
> -unsigned int sysctl_sched_min_granularity = 4000000ULL;
> +unsigned int sysctl_sched_min_granularity = 1000000ULL;

Needs to be lower for a fluid desktop experience here:

shambhala:/proc/sys/kernel> cat sched_min_granularity_ns
100000

>
> /*
> * is kept at sysctl_sched_latency / sysctl_sched_min_granularity
> @@ -63,13 +63,13 @@ unsigned int __read_mostly
> sysctl_sched_compat_yield;
>
> /*
> * SCHED_OTHER wake-up granularity.
> - * (default: 5 msec * (1 + ilog(ncpus)), units: nanoseconds)
> + * (default: 1 msec * (1 + ilog(ncpus)), units: nanoseconds)
> *
> * This option delays the preemption effects of decoupled workloads
> * and reduces their over-scheduling. Synchronous workloads will still
> * have immediate wakeup/sleep latencies.
> */
> -unsigned int sysctl_sched_wakeup_granularity = 5000000UL;
> +unsigned int sysctl_sched_wakeup_granularity = 1000000UL;

Dito:

shambhala:/proc/sys/kernel> cat sched_wakeup_granularity_ns
100000

With

shambhala:~> cat /proc/version
Linux version 2.6.31-rc7-tp42-toi-3.0.1-04741-g57e61c0 (martin@shambhala)
(gcc version 4.3.3 (Debian 4.3.3-10) ) #6 PREEMPT Sun Aug 23 10:51:32 CEST
2009

on my ThinkPad T42.

Otherwise compositing animations like switching desktops and zooming in
newly opening windows still appear jerky. Even with:

shambhala:/sys/kernel/debug> cat sched_features
NO_NEW_FAIR_SLEEPERS NO_NORMALIZED_SLEEPER ADAPTIVE_GRAN WAKEUP_PREEMPT
START_DEBIT AFFINE_WAKEUPS CACHE_HOT_BUDDY SYNC_WAKEUPS NO_HRTICK
NO_DOUBLE_TICK ASYM_GRAN LB_BIAS LB_WAKEUP_UPDATE ASYM_EFF_LOAD
NO_WAKEUP_OVERLAP LAST_BUDDY OWNER_SPIN

But NO_NEW_FAIR_SLEEPERS also gives a benefit. It makes those animation
even more fluent.

In complete I am quity happy with

shambhala:/proc/sys/kernel> grep "" *sched*
sched_child_runs_first:0
sched_compat_yield:0
sched_features:113916
sched_latency_ns:5000000
sched_migration_cost:500000
sched_min_granularity_ns:100000
sched_nr_migrate:32
sched_rt_period_us:1000000
sched_rt_runtime_us:950000
sched_shares_ratelimit:250000
sched_shares_thresh:4
sched_wakeup_granularity_ns:100000

for now.

It really makes a *lot* of difference. But it appears that both
sched_min_granularity_ns and sched_wakeup_granularity_ns have to be lower
on my ThinkPad for best effect.

I would still prefer some autotuning, where I say "desktop!" or nothing at
all. And thats it.

Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

Attachments:

signature.asc (197.00 B)
This is a digitally signed message part.

2009-09-12 11:48:38

Benjamin Herrenschmidt wrote:
> On Wed, 2009-09-16 at 20:27 +0200, Frans Pop wrote:
>> Benjamin Herrenschmidt wrote:
>> > I'll have a look after the merge window madness. Multiple windows is
>> > also still an option I suppose even if i don't like it that much: we
>> > could support double-click on an app or "global" in the left list,
>> > making that pop a new window with the same content as the right pane
>> > for that app (or global) that updates at the same time as the rest.
>>
>> I have another request. If I select a specific application to watch (say
>> a mail client) but it is idle for a while and thus has no latencies, it
>> will get dropped from the list and thus my selection of it will be lost.
>>
>> It would be nice if in that case a selected application would stay
>> visible and selected, or maybe get reselected automatically when it
>> appears again.
>
> Hrm... I though I forced the selected app to remain ... or maybe I
> wanted to do that and failed :-) Ok. On the list. Please ping me next
> week if nothing happens.

As requested: ping?

And while I'm writing anyway, one more suggestion.
I find the fact that the buttons jump twice every 30 seconds (because of a
change in the timer between <10 and >=10 seconds) slightly annoying.
Any chance of making the position of the buttons fixed? One option could be
moving the timer to the left side of the bottom bar.

Cheers,
FJP