2008-01-27 20:01:27

by Guillaume Chazarain

[permalink] [raw]
Subject: High wake up latencies with FAIR_USER_SCHED

Hi,

I noticed some strangely high wake up latencies with FAIR_USER_SCHED
using this script:

#!/usr/bin/python

import os
import time

SLEEP_TIME = 0.1
SAMPLES = 100
PRINT_DELAY = 0.5

def print_wakeup_latency():
times = []
last_print = 0
while True:
start = time.time()
time.sleep(SLEEP_TIME)
end = time.time()
times.insert(0, end - start - SLEEP_TIME)
del times[SAMPLES:]
if end > last_print + PRINT_DELAY:
copy = times[:]
copy.sort()
print '%f ms' % (copy[len(copy)/2] * 1000)
last_print = end

if os.fork() == 0:
os.setuid(1)
for i in xrange(2):
if os.fork() == 0:
while True:
pass
else:
os.setuid(2) # <-- here
print_wakeup_latency()

We have two busy loops with UID=1.
And UID=2 maintains the running median of its wake up latency.
I get these latencies:

# ./sched.py
4.300022 ms
4.801178 ms
4.604006 ms
4.606867 ms
4.604006 ms
4.606867 ms
4.604006 ms
4.606867 ms
4.606867 ms
4.676008 ms
4.604006 ms
4.604006 ms
4.606867 ms

Disabling FAIR_USER_SCHED restores wake up latencies in the noise:

# ./sched.py
-0.156975 ms
-0.067091 ms
-0.022984 ms
-0.022984 ms
-0.022030 ms
-0.022030 ms
-0.022030 ms
-0.021076 ms
-0.015831 ms
-0.015831 ms
-0.016069 ms
-0.015831 ms

Strangely enough, another way to restore normal latencies is to change
setuid(2) to setuid(1), that is, putting the latency measurement in
the same group as the two busy loops.

Thanks in advance for any help.

--
Guillaume


2008-01-28 02:17:58

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: High wake up latencies with FAIR_USER_SCHED

On Sun, Jan 27, 2008 at 09:01:15PM +0100, Guillaume Chazarain wrote:
> I noticed some strangely high wake up latencies with FAIR_USER_SCHED
> using this script:

<snip>

> We have two busy loops with UID=1.
> And UID=2 maintains the running median of its wake up latency.
> I get these latencies:
>
> # ./sched.py
> 4.300022 ms
> 4.801178 ms
> 4.604006 ms

Given that sysctl_sched_wakeup_granularity is set to 10ms by default,
this doesn't sound abnormal.

<snip>

> Disabling FAIR_USER_SCHED restores wake up latencies in the noise:
>
> # ./sched.py
> -0.156975 ms
> -0.067091 ms
> -0.022984 ms

The reason why we are getting better wakeup latencies for !FAIR_USER_SCHED is
because of this snippet of code in place_entity():

if (!initial) {
/* sleeps upto a single latency don't count. */
if (sched_feat(NEW_FAIR_SLEEPERS) && entity_is_task(se))
^^^^^^^^^^^^^^^^^^
vruntime -= sysctl_sched_latency;

/* ensure we never gain time by being placed backwards. */
vruntime = max_vruntime(se->vruntime, vruntime);
}


NEW_FAIR_SLEEPERS feature gives credit for sleeping only to tasks and
not group-level entities. With the patch attached, I could see that wakeup
latencies with FAIR_USER_SCHED are restored to the same level as
!FAIR_USER_SCHED.

However I am not sure whether that is the way to go. We want to let one group of
tasks running as much as possible until the fairness/wakeup-latency threshold is
exceeded. If someone does want better wakeup latencies between groups too, they
can always tune sysctl_sched_wakeup_granularity.

<snip>

> Strangely enough, another way to restore normal latencies is to change
> setuid(2) to setuid(1), that is, putting the latency measurement in
> the same group as the two busy loops.



--
Regards,
vatsa


Attachments:
(No filename) (1.74 kB)
fix.patch (548.00 B)
Download all attachments

2008-01-28 12:28:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: High wake up latencies with FAIR_USER_SCHED


* Srivatsa Vaddagiri <[email protected]> wrote:

> NEW_FAIR_SLEEPERS feature gives credit for sleeping only to tasks and
> not group-level entities. With the patch attached, I could see that
> wakeup latencies with FAIR_USER_SCHED are restored to the same level
> as !FAIR_USER_SCHED.
>
> However I am not sure whether that is the way to go. We want to let
> one group of tasks running as much as possible until the
> fairness/wakeup-latency threshold is exceeded. If someone does want
> better wakeup latencies between groups too, they can always tune
> sysctl_sched_wakeup_granularity.

the patch does look like the right thing to do. There's nothing special
about 'groups' versus 'tasks' in terms of scheduling. And most
importantly, this solves the behavioral assymetry observed by Guillaume
as well - which makes it an obvious-to-add regression fix. I've added
your patch to the scheduler queue.

Ingo

2008-01-28 16:57:59

by Guillaume Chazarain

[permalink] [raw]
Subject: Re: High wake up latencies with FAIR_USER_SCHED

Hi Srivatsa,

On Jan 28, 2008 3:31 AM, Srivatsa Vaddagiri <[email protected]> wrote:
> Given that sysctl_sched_wakeup_granularity is set to 10ms by default,
> this doesn't sound abnormal.

Indeed, by lowering sched_wakeup_granularity I get much better
latencies, but lowering sched_latency seems to be more effective.

> NEW_FAIR_SLEEPERS feature gives credit for sleeping only to tasks and
> not group-level entities. With the patch attached, I could see that wakeup
> latencies with FAIR_USER_SCHED are restored to the same level as
> !FAIR_USER_SCHED.

Thanks for the patch, it works perfectly.

> However I am not sure whether that is the way to go. We want to let one group of
> tasks running as much as possible until the fairness/wakeup-latency threshold is
> exceeded. If someone does want better wakeup latencies between groups too, they
> can always tune sysctl_sched_wakeup_granularity.

Having an inconsistency here between FAIR_USER_SCHED and
!FAIR_USER_SCHED sounds strange, but Ingo took the patch, so I'm happy
:-)

Thanks for the replies.

--
Guillaume

2008-01-28 20:14:06

by Guillaume Chazarain

[permalink] [raw]
Subject: Re: High wake up latencies with FAIR_USER_SCHED

Unfortunately it seems to not be completely fixed, with this script:

#!/usr/bin/python

import os
import time

SLEEP_TIME = 0.1
SAMPLES = 5
PRINT_DELAY = 0.5

def print_wakeup_latency():
times = []
last_print = 0
while True:
start = time.time()
time.sleep(SLEEP_TIME)
end = time.time()
times.insert(0, end - start - SLEEP_TIME)
del times[SAMPLES:]
if end > last_print + PRINT_DELAY:
copy = times[:]
copy.sort()
print '%f ms' % (copy[len(copy)/2] * 1000)
last_print = end

if os.fork() == 0:
if os.fork() == 0:
os.setuid(1)
while True:
pass
else:
os.setuid(2)
while True:
pass
else:
os.setuid(1)
print_wakeup_latency()

I get seemingly unpredictable latencies (with or without the patch applied):

# ./sched.py
14.810944 ms
19.829893 ms
1.968050 ms
8.021021 ms
-0.017977 ms
4.926109 ms
11.958027 ms
5.995893 ms
1.992130 ms
0.007057 ms
0.217819 ms
-0.004864 ms
5.907202 ms
6.547832 ms
-0.012970 ms
0.209951 ms
-0.002003 ms
4.989052 ms

Without FAIR_USER_SCHED, latencies are consistently in the noise.
Also, I forgot to mention that I'm on a single CPU.

Thanks for the help.

--
Guillaume

2008-01-29 05:34:18

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: High wake up latencies with FAIR_USER_SCHED

On Mon, Jan 28, 2008 at 09:13:53PM +0100, Guillaume Chazarain wrote:
> Unfortunately it seems to not be completely fixed, with this script:

The maximum scheduling latency of a task with group scheduler is:

Lmax = latency to schedule group entity at level0 +
latency to schedule group entity at level1 +
...
latency to schedule task entity at last level

More the hierarchical levels, more the latency looks like. This is particularly
so because vruntime (and not wall-clock time) is used as the basis of preemption
of entities. The latency at each level also depends on number of entities at
that level and sysctl_sched_latency/sched_nr_latency setting.

In this case, we only have two levels - userid + task. So the max scheduling
latency is:

Lmax = latency to schedule uid1 group entity (L0) +
latency to schedule the sleeper task within uid1 group (L1)

In the first script that you had, uid1 had only one sleeper task, while uid2 has
two cpu-hogs. This means L1 is always zero for the sleeper task. L0 is also
substantially reduced with the patch I sent (giving sleep credit for group
level entities). Thus we were able to get low scheduling latencies in the case
of first script.

The second script you have sent is generating two tasks (sleeper + hog) under
uid 1 and one cpuhog task under uid 2. Consequently the group-entity
corresponding to uid 1 is always active and hence there is no question of giving
credit to it for sleeping.

As a result, we should expect worst-case latencies of upto [2 * 10 = 20ms] in
this case. The results you have fall within this range.

In case of !FAIR_USER_SCHED, the sleeper task always gets sleep-credits
and hence its latency is drastically reduced.

IMHO this is expected results and if someone really needs to cut down
this latency, they can reduce sysctl_sched_latency (which will be bad
from perf standpoint, as we will cause more cache thrashing with that).


> #!/usr/bin/python
>
> import os
> import time
>
> SLEEP_TIME = 0.1
> SAMPLES = 5
> PRINT_DELAY = 0.5
>
> def print_wakeup_latency():
> times = []
> last_print = 0
> while True:
> start = time.time()
> time.sleep(SLEEP_TIME)
> end = time.time()
> times.insert(0, end - start - SLEEP_TIME)
> del times[SAMPLES:]
> if end > last_print + PRINT_DELAY:
> copy = times[:]
> copy.sort()
> print '%f ms' % (copy[len(copy)/2] * 1000)
> last_print = end
>
> if os.fork() == 0:
> if os.fork() == 0:
> os.setuid(1)
> while True:
> pass
> else:
> os.setuid(2)
> while True:
> pass
> else:
> os.setuid(1)
> print_wakeup_latency()
>
> I get seemingly unpredictable latencies (with or without the patch applied):
>
> # ./sched.py
> 14.810944 ms
> 19.829893 ms
> 1.968050 ms
> 8.021021 ms
> -0.017977 ms
> 4.926109 ms
> 11.958027 ms
> 5.995893 ms
> 1.992130 ms
> 0.007057 ms
> 0.217819 ms
> -0.004864 ms
> 5.907202 ms
> 6.547832 ms
> -0.012970 ms
> 0.209951 ms
> -0.002003 ms
> 4.989052 ms
>
> Without FAIR_USER_SCHED, latencies are consistently in the noise.
> Also, I forgot to mention that I'm on a single CPU.
>
> Thanks for the help.
>
> --
> Guillaume

--
Regards,
vatsa

2008-01-29 15:55:13

by Guillaume Chazarain

[permalink] [raw]
Subject: Re: High wake up latencies with FAIR_USER_SCHED

On Jan 29, 2008 6:47 AM, Srivatsa Vaddagiri <[email protected]> wrote:

> IMHO this is expected results and if someone really needs to cut down
> this latency, they can reduce sysctl_sched_latency (which will be bad
> from perf standpoint, as we will cause more cache thrashing with that).

Thank you very much for the detailed explanation Srivatsa, that made a
lot of sense. Unfortunately, it means I'll disable FAIR_USER_SCHED as
I initially thought these latencies were caused by my local patches
that give each group a load proportional to the max load of its
elements. Anyway, I don't absolutely need a fair user scheduler on my
laptop, but low latencies in the default configuration are nice to
have.

I just thought about something to restore low latencies with
FAIR_GROUP_SCHED, but it's possibly utter nonsense, so bear with me
;-) The idea would be to reverse the trees upside down. The scheduler
would only see tasks (on the leaves) so could apply its interactivity
magic, but the hierarchical groups would be used to compute dynamic
loads for each task according to their position in the tree:

- now:
- we schedule each level of the tree starting from the root

- with my proposition:
- we schedule tasks like with !FAIR_GROUP_SCHED, but
calc_delta_fair() would traverse the tree starting from the leaves to
compute the dynamic load.

Thanks.

--
Guillaume

2008-01-29 16:13:31

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: High wake up latencies with FAIR_USER_SCHED

On Tue, Jan 29, 2008 at 04:53:56PM +0100, Guillaume Chazarain wrote:
> I just thought about something to restore low latencies with
> FAIR_GROUP_SCHED, but it's possibly utter nonsense, so bear with me
> ;-) The idea would be to reverse the trees upside down. The scheduler
> would only see tasks (on the leaves) so could apply its interactivity
> magic, but the hierarchical groups would be used to compute dynamic
> loads for each task according to their position in the tree:

I think this is equivalent to flattening the hierarchy? We discussed this a bit
sometime back [1], but one of its weaknesses is providing strong
partitioning between groups when it comes to ensuring fairness. Ex: imagine a
group which does a fork-bomb. With the flattened tree, it affects other groups
more than it would with a 1-level deep hierarchy.

Having said that, I would be interested to hear other solutions that maintain
this good partitioning b/n groups and still provide good interactivity!

1. http://lkml.org/lkml/2007/5/30/300

> - now:
> - we schedule each level of the tree starting from the root
>
> - with my proposition:
> - we schedule tasks like with !FAIR_GROUP_SCHED, but
> calc_delta_fair() would traverse the tree starting from the leaves to
> compute the dynamic load.

--
Regards,
vatsa

2008-01-31 10:48:01

by Peter Zijlstra

[permalink] [raw]
Subject: Re: High wake up latencies with FAIR_USER_SCHED


On Mon, 2008-01-28 at 21:13 +0100, Guillaume Chazarain wrote:
> Unfortunately it seems to not be completely fixed, with this script:
>
> #!/usr/bin/python
>
> import os
> import time
>
> SLEEP_TIME = 0.1
> SAMPLES = 5
> PRINT_DELAY = 0.5
>
> def print_wakeup_latency():
> times = []
> last_print = 0
> while True:
> start = time.time()
> time.sleep(SLEEP_TIME)
> end = time.time()
> times.insert(0, end - start - SLEEP_TIME)
> del times[SAMPLES:]
> if end > last_print + PRINT_DELAY:
> copy = times[:]
> copy.sort()
> print '%f ms' % (copy[len(copy)/2] * 1000)
> last_print = end
>
> if os.fork() == 0:
> if os.fork() == 0:
> os.setuid(1)
> while True:
> pass
> else:
> os.setuid(2)
> while True:
> pass
> else:
> os.setuid(1)
> print_wakeup_latency()
>
> I get seemingly unpredictable latencies (with or without the patch applied):
>
> # ./sched.py
> 14.810944 ms
> 19.829893 ms
> 1.968050 ms
> 8.021021 ms
> -0.017977 ms
> 4.926109 ms
> 11.958027 ms
> 5.995893 ms
> 1.992130 ms
> 0.007057 ms
> 0.217819 ms
> -0.004864 ms
> 5.907202 ms
> 6.547832 ms
> -0.012970 ms
> 0.209951 ms
> -0.002003 ms
> 4.989052 ms
>
> Without FAIR_USER_SCHED, latencies are consistently in the noise.
> Also, I forgot to mention that I'm on a single CPU.
>
> Thanks for the help.

Does something like this help?


Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -267,8 +267,12 @@ static u64 sched_slice(struct cfs_rq *cf
{
u64 slice = __sched_period(cfs_rq->nr_running);

- slice *= se->load.weight;
- do_div(slice, cfs_rq->load.weight);
+ for_each_sched_entity(se) {
+ cfs_rq = cfs_rq_of(se);
+
+ slice *= se->load.weight;
+ do_div(slice, cfs_rq->load.weight);
+ }

return slice;
}

2008-01-31 12:49:45

by Guillaume Chazarain

[permalink] [raw]
Subject: Re: High wake up latencies with FAIR_USER_SCHED

On 1/31/08, Peter Zijlstra <[email protected]> wrote:
> Does something like this help?

I made it compile by open coding undefined macros instead of
refactoring the whole file.
But it didn't affect wake up latencies.

Thanks.

--
Guillaume

2008-01-31 13:00:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: High wake up latencies with FAIR_USER_SCHED


On Thu, 2008-01-31 at 13:49 +0100, Guillaume Chazarain wrote:
> On 1/31/08, Peter Zijlstra <[email protected]> wrote:
> > Does something like this help?
>
> I made it compile by open coding undefined macros instead of
> refactoring the whole file.
> But it didn't affect wake up latencies.

Ah, well, what can one expect from mid-night ideas :-)

Thanks for testing!