2009-10-05 13:01:33

by Frans Pop

[permalink] [raw]
Subject: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

With .32-rc3 I'm getting occasional skips from amarok while playing music
from an NFS share (3 today so far), something I've not had in a long time.

The reason looks to be that latencytop consistently shows 'iwlagn' as the
top affected process with "Scheduler: waiting for CPU".
Values of 100-140 ms occur frequently, but I've also seen higher values
(180, 207). I don't remember seeing iwlagn high in the list before, but I
have not checked earlier kernels yet.

Added to that 'phy0' also frequently shows high with values of 25-75 ms.

The system is otherwise essentially idle. Other processes typically show a
latency < 5 ms.

System: HP 2510p notebook, x86_64, Core2 Duo, Debian stable

Cheers,
FJP


2009-10-07 17:11:32

by Frans Pop

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Tuesday 06 October 2009, Frans Pop wrote:
> I've checked for 2.6.31.1 now and iwlagn is listed high there too when
> the system is idle, but with normal values of 60-100 ms. And phy0 has
> normal values of below 10 ms.
> I've now rebooted with today's mainline git; phy0 now frequently shows
> with values of around 100 ms too (i.e. higher than last time).
>
> Both still go way down as soon as the system is given work to do.
>
> With a 5 second sleep I was unable to get any significant latencies (I
> started perf on a latencytop refresh and did a manual refresh as it
> finished to see what happened during the perf run). The perf run does
> seem to affect the latencies.
> I've uploaded a chart for a 10s sleep during which I got latencies of
> 101ms for iwlagn and 77ms for phy0:
> http://people.debian.org/~fjp/tmp/kernel/.

Mike privately sent me a script to try to capture the latencies with perf,
but the perf output does not show any high latencies at all. It looks as if
we may have found a bug in latencytop here instead.

Below a relevant quote from our private mails and the data I captured with
Mike's script.

Cheers,
FJP

On Wednesday 07 October 2009, Mike Galbraith wrote:
> On Wed, 2009-10-07 at 17:53 +0200, Frans Pop wrote:
> > I had latencytop running at the same time and that happily showed
> > values of 160 and 102 ms for iwlagn during the same period.
> > The highest values I see from your script for any process are ~12 ms.
> >
> > Does this mean there is a bug in latencytop somewhere? Or is there
> > something else going on?
>
> I suspect a bug in latencytop.. Arjan might suspect perf ;-)
>
> The only way you can get a 100+ ms latency hit on a mostly idle box is
> for there to be a problem with the clock, _and_ have multiple runnable
> heavy cpu using tasks on the runqueue so you can have a wide vruntime
> spread. Peter's been all over the vruntime update logic, and if it or
> the clock were generally screwed, you'd see latency spikes all over
> creation. It _could_ be some rare event though, why I hacked script
> together to chop off bite sized chunks for analysis. Since nothing
> showed up there, and nothing's showing up here doing the same amarok+nfs
> thing, I'll place my bet on latencytop.. for now.
>
> > At the bottom a sample from your unmodified script. That one is fairly
> > typical, although it has does have about the highest value I've seen
> > for iwlagn.
>
> Yeah, looks perfectly normal. Which leaves us at square one :-/



Typical output from perf (with fairly high phy0 and iwlagn):
-----------------------------------------------------------------------------------------
Task | Runtime ms | Switches | Average delay ms | Maximum delay ms |
-----------------------------------------------------------------------------------------
kondemand/0:282 | 2.085 ms | 123 | avg: 0.152 ms | max: 7.007 ms |
amarokapp:5485 | 123.867 ms | 240 | avg: 0.120 ms | max: 5.260 ms |
kondemand/1:283 | 1.977 ms | 119 | avg: 0.107 ms | max: 4.781 ms |
phy0:2750 | 0.684 ms | 28 | avg: 0.181 ms | max: 3.921 ms |
iwlagn:2749 | 0.668 ms | 28 | avg: 0.196 ms | max: 3.697 ms |
Xorg:4147 | 40.570 ms | 224 | avg: 0.109 ms | max: 3.615 ms |
kblockd/0:78 | 0.013 ms | 1 | avg: 3.121 ms | max: 3.121 ms |
kcryptd:1520 | 64.962 ms | 11 | avg: 0.347 ms | max: 2.812 ms |
perf:9451 | 322.937 ms | 40 | avg: 0.144 ms | max: 2.198 ms |
:5662:5662 | 50.572 ms | 147 | avg: 0.087 ms | max: 2.141 ms |
head:9452 | 2.589 ms | 4 | avg: 0.454 ms | max: 1.648 ms |
:9439:9439 | 6.670 ms | 225 | avg: 0.127 ms | max: 1.574 ms |
kicker:5451 | 24.650 ms | 102 | avg: 0.158 ms | max: 1.465 ms |
ksysguardd:5454 | 6.614 ms | 7 | avg: 0.283 ms | max: 1.218 ms |
:5661:5661 | 0.208 ms | 9 | avg: 0.130 ms | max: 1.024 ms |
latencytop:7776 | 2.010 ms | 4 | avg: 0.256 ms | max: 0.980 ms |

Series of greps from that output for only iwlagn and phy0:
Oct 07 17:23:39 iwlagn:2749 | 0.721 ms | 35 | avg: 0.050 ms | max: 0.083 ms |
Oct 07 17:23:39 phy0:2750 | 0.779 ms | 35 | avg: 0.032 ms | max: 0.036 ms |
Oct 07 17:23:46 iwlagn:2749 | 0.617 ms | 32 | avg: 0.107 ms | max: 1.753 ms |
Oct 07 17:23:46 phy0:2750 | 0.783 ms | 32 | avg: 0.123 ms | max: 1.745 ms |
Oct 07 17:23:53 iwlagn:2749 | 0.629 ms | 31 | avg: 0.055 ms | max: 0.109 ms |
Oct 07 17:23:53 phy0:2750 | 0.774 ms | 31 | avg: 0.029 ms | max: 0.050 ms |
Oct 07 17:24:01 iwlagn:2749 | 0.908 ms | 35 | avg: 0.054 ms | max: 0.097 ms |
Oct 07 17:24:01 phy0:2750 | 0.839 ms | 35 | avg: 0.033 ms | max: 0.056 ms |
Oct 07 17:24:08 phy0:2750 | 0.742 ms | 32 | avg: 0.033 ms | max: 0.124 ms |
Oct 07 17:24:08 iwlagn:2749 | 0.734 ms | 32 | avg: 0.050 ms | max: 0.083 ms |
Oct 07 17:24:15 phy0:2750 | 0.963 ms | 41 | avg: 0.034 ms | max: 0.164 ms |
Oct 07 17:24:15 iwlagn:2749 | 0.721 ms | 41 | avg: 0.049 ms | max: 0.118 ms |
Oct 07 17:24:23 iwlagn:2749 | 6.434 ms | 73 | avg: 0.052 ms | max: 0.153 ms |
Oct 07 17:24:23 phy0:2750 | 0.620 ms | 22 | avg: 0.023 ms | max: 0.094 ms |
Oct 07 17:24:30 iwlagn:2749 | 0.934 ms | 38 | avg: 0.055 ms | max: 0.194 ms |
Oct 07 17:24:30 phy0:2750 | 0.941 ms | 37 | avg: 0.033 ms | max: 0.123 ms |
Oct 07 17:24:37 iwlagn:2749 | 5.324 ms | 74 | avg: 0.054 ms | max: 0.106 ms |
Oct 07 17:24:37 phy0:2750 | 0.993 ms | 36 | avg: 0.030 ms | max: 0.050 ms |
Oct 07 17:24:45 iwlagn:2749 | 0.844 ms | 36 | avg: 0.049 ms | max: 0.080 ms |
Oct 07 17:24:45 phy0:2750 | 0.770 ms | 34 | avg: 0.032 ms | max: 0.043 ms |
Oct 07 17:24:52 phy0:2750 | 0.832 ms | 36 | avg: 0.044 ms | max: 0.185 ms |
Oct 07 17:24:52 iwlagn:2749 | 0.923 ms | 36 | avg: 0.050 ms | max: 0.081 ms |
Oct 07 17:24:59 iwlagn:2749 | 0.778 ms | 35 | avg: 0.054 ms | max: 0.104 ms |
Oct 07 17:24:59 phy0:2750 | 0.778 ms | 34 | avg: 0.025 ms | max: 0.041 ms |
Oct 07 17:25:07 iwlagn:2749 | 0.854 ms | 34 | avg: 0.056 ms | max: 0.103 ms |
Oct 07 17:25:07 phy0:2750 | 0.780 ms | 34 | avg: 0.027 ms | max: 0.040 ms |
Oct 07 17:25:14 phy0:2750 | 0.762 ms | 33 | avg: 0.035 ms | max: 0.168 ms |
Oct 07 17:25:14 iwlagn:2749 | 0.853 ms | 33 | avg: 0.051 ms | max: 0.080 ms |
Oct 07 17:25:21 iwlagn:2749 | 0.892 ms | 34 | avg: 0.053 ms | max: 0.144 ms |
Oct 07 17:25:21 phy0:2750 | 0.780 ms | 34 | avg: 0.029 ms | max: 0.040 ms |
Oct 07 17:25:29 iwlagn:2749 | 0.972 ms | 39 | avg: 0.046 ms | max: 0.079 ms |
Oct 07 17:25:29 phy0:2750 | 0.843 ms | 38 | avg: 0.031 ms | max: 0.040 ms |
Oct 07 17:25:36 iwlagn:2749 | 0.896 ms | 39 | avg: 0.054 ms | max: 0.106 ms |
Oct 07 17:25:36 phy0:2750 | 0.873 ms | 37 | avg: 0.032 ms | max: 0.070 ms |
Oct 07 17:25:43 phy0:2750 | 0.919 ms | 39 | avg: 0.083 ms | max: 2.161 ms |
Oct 07 17:25:43 iwlagn:2749 | 0.950 ms | 40 | avg: 0.050 ms | max: 0.102 ms |

2009-10-08 15:09:38

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Thu, 8 Oct 2009 16:55:36 +0200
Frans Pop <[email protected]> wrote:
> > It turns out that on x86, these two 'opportunistic' timers only
> > get checked when another "real" timer happens.
> > These opportunistic timers have the objective to save power by
> > hitchhiking on other wakeups, as to avoid CPU wakeups by themselves
> > as much as possible.
>
> This patch makes quite a difference for me. iwlagn and phy0 now
> consistently show at ~10 ms or lower.\

most excellent


> I do still get occasional high latencies, but those are for things
> like "[rpc_wait_bit_killable]" or "Writing a page to disk", where I
> guess you'd expect them. Those high latencies are mostly only listed
> for "Global" and don't translate to individual processes.

and they're very different types of latencies, caused by disk and such.


> The ~10 ms I still get for iwlagn and phy0 (and sometimes higher (~30
> ms) for others like Xorg and artsd) is still "Scheduler: waiting for
> cpu'. If it is actually due to (un)interuptable sleep, isn't that a
> misleading label? I directly associated that with scheduler latency.

it's actually the time between wakeup and running, as measured by
scheduler statistics

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-10-08 13:19:23

by Mike Galbraith

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Thu, 2009-10-08 at 15:09 +0200, Frans Pop wrote:
> On Thursday 08 October 2009, Mike Galbraith wrote:
> > Care to try a patch?
> >
> > latencytop: only account on-runqueue wait time as being scheduler
> > latency.
> >
> > Latencytop was accounting uninterruptible and interruptible sleep time
> > up to 5ms, which is not the latency the user is looking for. Account
> > time waiting on-runqueue instead. Also add a clock update to
> > update_curr() for the case where there's nobody home. When coming out
> > of idle with NO_HZ, not updating the clock leads to bogus max latency
> > report.
>
> With this patch the high values for iwlagn and phy0 are gone. The values
> now match roughly with the output of your script.
>
> If (un)interruptible sleep is worth displaying in latencytop then maybe it
> should be identified as a separate cause instead of lumped under "waiting
> for CPU"?

That's what I was just thing about trying, add a cause sleep.

> > TODO: Figure out where the heck empty braces come from.
>
> I also see these lines with "[]" sometimes; looks to come from 'sleep'.
>
> And I still see lines with only a "." as the cause; this seems specific to
> targets marked with the penguin (work queues?).
>
> > Signed-off-by: Mike Galbraith <[email protected]>
> > Arjan van de Ven <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > LKML-Reference: <new-submission>
>
> If Arjan is happy with the patch, please add my:

Probably just a diagnostic, but...

> Reported-and-tested-by: Frans Pop <[email protected]>

Saved.

-Mike


2009-10-09 03:36:12

by Mike Galbraith

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Thu, 2009-10-08 at 22:34 +0200, Markus Trippelsdorf wrote:
> On Thu, Oct 08, 2009 at 08:23:37PM +0200, Mike Galbraith wrote:
> > On Thu, 2009-10-08 at 16:55 +0200, Frans Pop wrote:
> > > On Thursday 08 October 2009, Arjan van de Ven wrote:
> > > > From: Arjan van de Ven <[email protected]>
> > > > Date: Thu, 24 Sep 2009 13:24:16 +0200
> > > > Subject: [PATCH] x86, timers: check for pending timers after (device)
> > > > interrupts
> > > >
> > > > Now that range timers and deferred timers are common, I found a
> > > > problem with these using the "perf timechart" tool.
> > > >
> > > > It turns out that on x86, these two 'opportunistic' timers only
> > > > get checked when another "real" timer happens.
> > > > These opportunistic timers have the objective to save power by
> > > > hitchhiking on other wakeups, as to avoid CPU wakeups by themselves
> > > > as much as possible.
> > >
> > > This patch makes quite a difference for me. iwlagn and phy0 now
> > > consistently show at ~10 ms or lower.
> > >
> > > I do still get occasional high latencies, but those are for things like
> > > "[rpc_wait_bit_killable]" or "Writing a page to disk", where I guess you'd
> > > expect them. Those high latencies are mostly only listed for "Global" and
> > > don't translate to individual processes.
> >
> > I still see very high latencies coming out of idle (last noted was >
> > 300ms, NO_HZ) with this patch, and wonder if the hunk below makes any
> > difference whatsoever for you. Here, it definitely does. (shouldn't)
>
> I'm also seeing these strange, very high latencies here. Your patch
> didn't help unfortunately.
>
> This is from an otherwise idle NO_NZ system:
>
> # ./perf sched latency
>
> -----------------------------------------------------------------------------------------
> Task | Runtime ms | Switches | Average delay ms | Maximum delay ms |
> -----------------------------------------------------------------------------------------
> ksoftirqd/0:4 | 2.216 ms | 170 | avg: 24.235 ms | max: 808.356 ms |
> ksoftirqd/1:6 | 2.611 ms | 205 | avg: 4.334 ms | max: 165.553 ms |
> migration/2:7 | 0.000 ms | 1 | avg: 3.060 ms | max: 3.060 ms |
>
> With latencytop the ksoftirqd latency is over 1 sec frequently. (Could be
> ondemand CPUfreq governor related?)

That's a separate issue, which Arjan was nice enough to fix for me. He
even wrote the changelog, and used my mouse to do so ;-)

Watch.

Virgin tip pulled this morning, perf sched record sleep 1 on idle box.

-----------------------------------------------------------------------------------------
Task | Runtime ms | Switches | Average delay ms | Maximum delay ms |
-----------------------------------------------------------------------------------------
kicker:5845 | 0.000 ms | 1 | avg: 999.406 ms | max: 999.406 ms |
rt2870TimerQHan:4952 | 0.008 ms | 3 | avg: 0.087 ms | max: 0.131 ms |
Xorg:4913 | 0.099 ms | 2 | avg: 0.027 ms | max: 0.044 ms |
kondemand/3:3106 | 0.014 ms | 2 | avg: 0.013 ms | max: 0.018 ms |
perf:7229 | 3.986 ms | 1 | avg: 0.017 ms | max: 0.017 ms |
events/3:14 | 0.038 ms | 1 | avg: 0.017 ms | max: 0.017 ms |
kondemand/2:3105 | 0.017 ms | 1 | avg: 0.008 ms | max: 0.008 ms |
kondemand/1:3104 | 0.041 ms | 1 | avg: 0.007 ms | max: 0.007 ms |
konsole:5855 | 0.362 ms | 3 | avg: 0.004 ms | max: 0.006 ms |
kondemand/0:3103 | 0.000 ms | 1 | avg: 0.000 ms | max: 0.000 ms |
-----------------------------------------------------------------------------------------
TOTAL: | 4.565 ms | 16 |
---------------------------------------------------
INFO: 40.741% state machine bugs (11 out of 27)
INFO: 51.852% context switch bugs (14 out of 27)

Repeatable. Apply patchlet, and the numbers below become repeatable.

perf_counter tools: make perf sched pass -F 0 to record

Commit 42e59d7d19dc4b4 introduced a sample frequency framework..
.. however it unfortunately changed how perf events get recorded,
and caused sched to miss events.

This patch causes the sched code to use -F 0 to not use the
new framework when recording sched data.

Signed-off-by: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arjan The Wise <[email protected]>
LKML-Reference: <new-submission>

---
tools/perf/builtin-sched.c | 1 +
1 file changed, 1 insertion(+)

Index: linux-2.6/tools/perf/builtin-sched.c
===================================================================
--- linux-2.6.orig/tools/perf/builtin-sched.c
+++ linux-2.6/tools/perf/builtin-sched.c
@@ -1902,6 +1902,7 @@ static const char *record_args[] = {
"-f",
"-m", "1024",
"-c", "1",
+ "-F", "0",
"-e", "sched:sched_switch:r",
"-e", "sched:sched_stat_wait:r",
"-e", "sched:sched_stat_sleep:r",



-----------------------------------------------------------------------------------------
Task | Runtime ms | Switches | Average delay ms | Maximum delay ms |
-----------------------------------------------------------------------------------------
Xorg:4913 | 2.296 ms | 43 | avg: 0.075 ms | max: 0.686 ms |
kicker:5845 | 3.106 ms | 28 | avg: 0.017 ms | max: 0.307 ms |
perf:7419 | 147.147 ms | 16 | avg: 0.028 ms | max: 0.261 ms |
klipper:6404 | 0.035 ms | 1 | avg: 0.256 ms | max: 0.256 ms |
kondemand/3:3106 | 0.139 ms | 18 | avg: 0.039 ms | max: 0.192 ms |
rt2870TimerQHan:4952 | 0.726 ms | 107 | avg: 0.010 ms | max: 0.165 ms |
konsole:5856 | 0.180 ms | 2 | avg: 0.050 ms | max: 0.094 ms |
kded:5831 | 0.044 ms | 1 | avg: 0.055 ms | max: 0.055 ms |
events/3:14 | 0.066 ms | 3 | avg: 0.018 ms | max: 0.031 ms |
kondemand/2:3105 | 0.015 ms | 2 | avg: 0.020 ms | max: 0.029 ms |
ksoftirqd/3:10 | 0.326 ms | 3 | avg: 0.018 ms | max: 0.027 ms |
kondemand/1:3104 | 0.142 ms | 24 | avg: 0.008 ms | max: 0.023 ms |
rt2870MlmeThrea:4950 | 0.095 ms | 13 | avg: 0.012 ms | max: 0.022 ms |
kondemand/0:3103 | 0.259 ms | 36 | avg: 0.013 ms | max: 0.020 ms |
kwin:5841 | 0.038 ms | 1 | avg: 0.018 ms | max: 0.018 ms |
sync_supers:92 | 0.008 ms | 1 | avg: 0.018 ms | max: 0.018 ms |
events/0:11 | 0.005 ms | 1 | avg: 0.015 ms | max: 0.015 ms |
konsole:5855 | 0.709 ms | 10 | avg: 0.007 ms | max: 0.013 ms |
events/2:13 | 0.009 ms | 1 | avg: 0.011 ms | max: 0.011 ms |
events/1:12 | 0.006 ms | 1 | avg: 0.010 ms | max: 0.010 ms |
mysqld:5030 | 0.185 ms | 1 | avg: 0.009 ms | max: 0.009 ms |
hald-addon-inpu:3191 | 0.006 ms | 1 | avg: 0.006 ms | max: 0.006 ms |
mono:6472 | 0.144 ms | 11 | avg: 0.005 ms | max: 0.006 ms |
ntpd:6020 | 0.030 ms | 1 | avg: 0.005 ms | max: 0.005 ms |
gpg-agent:5783 | 0.047 ms | 1 | avg: 0.005 ms | max: 0.005 ms |
sleep:7420 | 0.973 ms | 2 | avg: 0.003 ms | max: 0.005 ms |
httpd2-prefork:6416 | 0.013 ms | 1 | avg: 0.005 ms | max: 0.005 ms |
mysqld:5031 | 0.172 ms | 1 | avg: 0.005 ms | max: 0.005 ms |
kwrapper:5838 | 0.010 ms | 1 | avg: 0.005 ms | max: 0.005 ms |
-----------------------------------------------------------------------------------------
TOTAL: | 156.930 ms | 332 |
---------------------------------------------------





2009-10-08 13:40:50

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Thu, 08 Oct 2009 06:05:43 +0200
Mike Galbraith <[email protected]> wrote:

> On Wed, 2009-10-07 at 20:34 +0200, Frans Pop wrote:
> > On Wednesday 07 October 2009, Frans Pop wrote:
> > > On Tuesday 06 October 2009, Frans Pop wrote:
> > > > I've checked for 2.6.31.1 now and iwlagn is listed high there
> > > > too when the system is idle, but with normal values of 60-100
> > > > ms. And phy0 has normal values of below 10 ms.
> > > > I've now rebooted with today's mainline git; phy0 now
> > > > frequently shows with values of around 100 ms too (i.e. higher
> > > > than last time).
> > >
> > > Mike privately sent me a script to try to capture the latencies
> > > with perf, but the perf output does not show any high latencies
> > > at all. It looks as if we may have found a bug in latencytop here
> > > instead.
> >
> > Not sure if it's relevant nor what it means, but I frequently see
> > two lines for iwlagn, e.g:
> >
> > Scheduler: waiting for cpu 102.4 msec 99.7
> > % . 3.3 msec 0.3 %
> >
> > I get the same results with both latencytop 0.4 and 0.5.
>
> OK, I see latencytop spikes here on an idle box too, to the tune of up
> to a _second_. Booting with nohz=off seems to have cured it.

interesting..


by chance, does this patch fix it ?

>From 33725d4939f457b12d7bc1bcbcc0dfb8b2f5bd48 Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <[email protected]>
Date: Thu, 24 Sep 2009 13:24:16 +0200
Subject: [PATCH] x86, timers: check for pending timers after (device) interrupts

Now that range timers and deferred timers are common, I found a
problem with these using the "perf timechart" tool.

It turns out that on x86, these two 'opportunistic' timers only
get checked when another "real" timer happens.
These opportunistic timers have the objective to save power by
hitchhiking on other wakeups, as to avoid CPU wakeups by themselves
as much as possible.

The change in this patch runs this check not only at timer interrupts,
but at all (device) interrupts. The effect is that
1) the deferred timers/range timers get delayed less
2) the range timers cause less wakeups by themselves because
the percentage of hitchhiking on existing wakeup events goes up.

I've verified the working of the patch using "perf timechart",
the original exposed bug is gone with this patch.

Signed-off-by: Arjan van de Ven <[email protected]>
---
arch/x86/kernel/irq.c | 2 ++
arch/x86/kernel/smp.c | 1 +
2 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 74656d1..3912061 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -244,6 +244,7 @@ unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
__func__, smp_processor_id(), vector, irq);
}

+ run_local_timers();
irq_exit();

set_irq_regs(old_regs);
@@ -268,6 +269,7 @@ void smp_generic_interrupt(struct pt_regs *regs)
if (generic_interrupt_extension)
generic_interrupt_extension();

+ run_local_timers();
irq_exit();

set_irq_regs(old_regs);
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index ec1de97..d915d95 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -198,6 +198,7 @@ void smp_reschedule_interrupt(struct pt_regs *regs)
{
ack_APIC_irq();
inc_irq_stat(irq_resched_count);
+ run_local_timers();
/*
* KVM uses this interrupt to force a cpu out of guest mode
*/
--
1.6.0.6


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-10-09 03:05:23

by Mike Galbraith

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Thu, 2009-10-08 at 22:59 +0200, Frans Pop wrote:
> On Thursday 08 October 2009, Mike Galbraith wrote:
> > I still see very high latencies coming out of idle (last noted was >
> > 300ms, NO_HZ) with this patch,
> >
> > Like this:
> >
> > Cause Maximum Percentage
> > Scheduler: waiting for cpu 604.2 msec 49.0 %
>
> I'm not seeing anything even remotely like that.
>
> I've tried quieting the system down as much as possible, but that made
> no difference. I do have NO_HZ set; will send my full config privately.
>
> If you'd still like me to try your patch I will, but I'm not sure if
> that has much value.

No, if you watch for a bit with box mostly idle, and don't see > 100ms
hits occasionally, testing patchlet is likely a waste of time. To get
one like the above, I have to hold my tongue just right ;-) they're not
common.

-Mike


2009-10-09 08:03:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Fri, 2009-10-09 at 09:55 +0200, Sedat Dilek wrote:
>
> will you (re)send this patch as a new one with correct subject line to LKML?
> Otherwise, it might disapper in this long thread.

No need, I've got it.

2009-10-09 03:51:55

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Fri, Oct 09, 2009 at 05:35:18AM +0200, Mike Galbraith wrote:
> On Thu, 2009-10-08 at 22:34 +0200, Markus Trippelsdorf wrote:
> > On Thu, Oct 08, 2009 at 08:23:37PM +0200, Mike Galbraith wrote:
> > > On Thu, 2009-10-08 at 16:55 +0200, Frans Pop wrote:
> > > > On Thursday 08 October 2009, Arjan van de Ven wrote:
> > > > > From: Arjan van de Ven <[email protected]>
> > > > > Date: Thu, 24 Sep 2009 13:24:16 +0200
> > > > > Subject: [PATCH] x86, timers: check for pending timers after (device)
> > > > > interrupts
> > > > >
> > > > > Now that range timers and deferred timers are common, I found a
> > > > > problem with these using the "perf timechart" tool.
> > > > >
> > > > > It turns out that on x86, these two 'opportunistic' timers only
> > > > > get checked when another "real" timer happens.
> > > > > These opportunistic timers have the objective to save power by
> > > > > hitchhiking on other wakeups, as to avoid CPU wakeups by themselves
> > > > > as much as possible.
> > > >
> > > > This patch makes quite a difference for me. iwlagn and phy0 now
> > > > consistently show at ~10 ms or lower.
> > > >
> > > > I do still get occasional high latencies, but those are for things like
> > > > "[rpc_wait_bit_killable]" or "Writing a page to disk", where I guess you'd
> > > > expect them. Those high latencies are mostly only listed for "Global" and
> > > > don't translate to individual processes.
> > >
> > > I still see very high latencies coming out of idle (last noted was >
> > > 300ms, NO_HZ) with this patch, and wonder if the hunk below makes any
> > > difference whatsoever for you. Here, it definitely does. (shouldn't)
> >
> > I'm also seeing these strange, very high latencies here. Your patch
> > didn't help unfortunately.
> >
> > This is from an otherwise idle NO_NZ system:
> >
> > # ./perf sched latency
> >
> > -----------------------------------------------------------------------------------------
> > Task | Runtime ms | Switches | Average delay ms | Maximum delay ms |
> > -----------------------------------------------------------------------------------------
> > ksoftirqd/0:4 | 2.216 ms | 170 | avg: 24.235 ms | max: 808.356 ms |
> > ksoftirqd/1:6 | 2.611 ms | 205 | avg: 4.334 ms | max: 165.553 ms |
> > migration/2:7 | 0.000 ms | 1 | avg: 3.060 ms | max: 3.060 ms |
> >
> > With latencytop the ksoftirqd latency is over 1 sec frequently. (Could be
> > ondemand CPUfreq governor related?)
>
> That's a separate issue, which Arjan was nice enough to fix for me. He
> even wrote the changelog, and used my mouse to do so ;-)
>
>
> Repeatable. Apply patchlet, and the numbers below become repeatable.
>
> perf_counter tools: make perf sched pass -F 0 to record
>
> Commit 42e59d7d19dc4b4 introduced a sample frequency framework..
> .. however it unfortunately changed how perf events get recorded,
> and caused sched to miss events.
>
> This patch causes the sched code to use -F 0 to not use the
> new framework when recording sched data.

Yes, your're right. With this patch applied perf sched latency is back
to normal.

--
Markus

2009-10-09 07:10:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Fri, 2009-10-09 at 08:35 +0200, Mike Galbraith wrote:
>
> sched: update the clock of runqueue select_task_rq() selected.
>
> In try_to_wake_up(), we update the runqueue clock, but select_task_rq()
> may select a different runqueue than the one we updated, leaving the new
> runqueue's clock stale for a bit.
>
> This patch cures occasional huge latencies reported by latencytop when
> coming out of idle on a mostly idle NO_HZ box.
>
> Signed-off-by: Mike Galbraith <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>


Awesome catch Mike, seems I goofed up in my recent ttwu changes.

Acked-by: Peter Zijlstra <[email protected]>

>
> ---
> kernel/sched.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> Index: linux-2.6/kernel/sched.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched.c
> +++ linux-2.6/kernel/sched.c
> @@ -2311,7 +2311,7 @@ static int try_to_wake_up(struct task_st
> {
> int cpu, orig_cpu, this_cpu, success = 0;
> unsigned long flags;
> - struct rq *rq;
> + struct rq *rq, *orig_rq;
>
> if (!sched_feat(SYNC_WAKEUPS))
> wake_flags &= ~WF_SYNC;
> @@ -2319,7 +2319,7 @@ static int try_to_wake_up(struct task_st
> this_cpu = get_cpu();
>
> smp_wmb();
> - rq = task_rq_lock(p, &flags);
> + rq = orig_rq = task_rq_lock(p, &flags);
> update_rq_clock(rq);
> if (!(p->state & state))
> goto out;
> @@ -2350,6 +2350,10 @@ static int try_to_wake_up(struct task_st
> set_task_cpu(p, cpu);
>
> rq = task_rq_lock(p, &flags);
> +
> + if (rq != orig_rq)
> + update_rq_clock(rq);
> +
> WARN_ON(p->state != TASK_WAKING);
> cpu = task_cpu(p);
>
>
>
>

2009-10-08 13:45:28

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Thu, 08 Oct 2009 13:24:22 +0200
Mike Galbraith <[email protected]> wrote:
> Latencytop was accounting uninterruptible and interruptible sleep
> time up to 5ms, which is not the latency the user is looking for.

it is for everything but the scheduler latency!

latencytop wants to show where you're waiting for disk, etc etc.
that's not "time on runqueue".

for the scheduler delays a whole different codepath is used!




--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-10-09 16:27:39

by Frans Pop

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Friday 09 October 2009, Mike Galbraith wrote:
> sched: update the clock of runqueue select_task_rq() selected.
>
> In try_to_wake_up(), we update the runqueue clock, but select_task_rq()
> may select a different runqueue than the one we updated, leaving the new
> runqueue's clock stale for a bit.
>
> This patch cures occasional huge latencies reported by latencytop when
> coming out of idle on a mostly idle NO_HZ box.

This patch (without Arjan's previous one applied) also fixes my latency
issues. Nice work!

Cheers,
FJP

2009-10-06 15:49:40

by Frans Pop

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Monday 05 October 2009, Arjan van de Ven wrote:
> On Mon, 5 Oct 2009 16:13:31 +0200 Frans Pop <[email protected]> wrote:
> > On Monday 05 October 2009, Frans Pop wrote:
> > > With .32-rc3 I'm getting occasional skips from amarok while playing
> > > music from an NFS share (3 today so far), something I've not had in
> > > a long time.
> > >
> > > The reason looks to be that latencytop consistently shows 'iwlagn'
> > > as the top affected process with "Scheduler: waiting for CPU".
> > > Values of 100-140 ms occur frequently, but I've also seen higher
> > > values (180, 207). I don't remember seeing iwlagn high in the list
> > > before, but I have not checked earlier kernels yet.
> > >
> > > Added to that 'phy0' also frequently shows high with values of
> > > 25-75 ms.

I've checked for 2.6.31.1 now and iwlagn is listed high there too when the
system is idle, but with normal values of 60-100 ms. And phy0 has normal
values of below 10 ms.
I've now rebooted with today's mainline git; phy0 now frequently shows with
values of around 100 ms too (i.e. higher than last time).

Both still go way down as soon as the system is given work to do.

> > If I give the system some work to do (like compiling a kernel), the
> > latencies on iwlagn and phy0 disappear (values < 5 ms).
> > Is this related to ondemand frequency scaling?
>
> shouldn't be.... but
>
> can you do a 5 second or so timechart recording of this?
> That will capture all scheduler delays as well as the P states etc....

With a 5 second sleep I was unable to get any significant latencies (I
started perf on a latencytop refresh and did a manual refresh as it
finished to see what happened during the perf run). The perf run does seem
to affect the latencies.
I've uploaded a chart for a 10s sleep during which I got latencies of 101ms
for iwlagn and 77ms for phy0: http://people.debian.org/~fjp/tmp/kernel/.

I've also uploaded the raw data as the colors in the chart looked off to me
(CPU bars were mostly black in Debian unstable's inkscape). I can't make
much from the chart TBH, but maybe you can.

Not sure what to do with this. For now I'd suggest to ignore it as it's all
a bit vague. Unless of course you feel the latencies are higher than they
should be.
I'll see if I get more skips and what latencies I get during them.

Cheers,
FJP

2009-10-09 06:35:53

by Mike Galbraith

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Thu, 2009-10-08 at 22:59 +0200, Frans Pop wrote:
> On Thursday 08 October 2009, Mike Galbraith wrote:
> > I still see very high latencies coming out of idle (last noted was >
> > 300ms, NO_HZ) with this patch,
> >
> > Like this:
> >
> > Cause Maximum Percentage
> > Scheduler: waiting for cpu 604.2 msec 49.0 %
>
> I'm not seeing anything even remotely like that.

Instrumenting, I saw (stale) clock deltas of >900ms coming out of idle,
delta being the difference between rq->clock when we hit update_curr()
and discover that this queue was idle, and what the clock will be an
instant or two later when somebody winds the clock.

I've been watching latencytop for a while now to make sure latency is
really dead. I see no twitching, so...

sched: update the clock of runqueue select_task_rq() selected.

In try_to_wake_up(), we update the runqueue clock, but select_task_rq()
may select a different runqueue than the one we updated, leaving the new
runqueue's clock stale for a bit.

This patch cures occasional huge latencies reported by latencytop when
coming out of idle on a mostly idle NO_HZ box.

Signed-off-by: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
LKML-Reference: <new-submission>

---
kernel/sched.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2311,7 +2311,7 @@ static int try_to_wake_up(struct task_st
{
int cpu, orig_cpu, this_cpu, success = 0;
unsigned long flags;
- struct rq *rq;
+ struct rq *rq, *orig_rq;

if (!sched_feat(SYNC_WAKEUPS))
wake_flags &= ~WF_SYNC;
@@ -2319,7 +2319,7 @@ static int try_to_wake_up(struct task_st
this_cpu = get_cpu();

smp_wmb();
- rq = task_rq_lock(p, &flags);
+ rq = orig_rq = task_rq_lock(p, &flags);
update_rq_clock(rq);
if (!(p->state & state))
goto out;
@@ -2350,6 +2350,10 @@ static int try_to_wake_up(struct task_st
set_task_cpu(p, cpu);

rq = task_rq_lock(p, &flags);
+
+ if (rq != orig_rq)
+ update_rq_clock(rq);
+
WARN_ON(p->state != TASK_WAKING);
cpu = task_cpu(p);




2009-10-09 07:56:06

by Sedat Dilek

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

Hi Mike,

will you (re)send this patch as a new one with correct subject line to LKML?
Otherwise, it might disapper in this long thread.

Kind Regards,
- Sedat -

On Fri, Oct 9, 2009 at 8:35 AM, Mike Galbraith <[email protected]> wrote:
> On Thu, 2009-10-08 at 22:59 +0200, Frans Pop wrote:
>> On Thursday 08 October 2009, Mike Galbraith wrote:
>> > I still see very high latencies coming out of idle (last noted was >
>> > 300ms, NO_HZ) with this patch,
>> >
>> > Like this:
>> >
>> > Cause                                           Maximum    Percentage
>> > Scheduler: waiting for cpu                   604.2 msec        49.0 %
>>
>> I'm not seeing anything even remotely like that.
>
> Instrumenting, I saw (stale) clock deltas of >900ms coming out of idle,
> delta being the difference between rq->clock when we hit update_curr()
> and discover that this queue was idle, and what the clock will be an
> instant or two later when somebody winds the clock.
>
> I've been watching latencytop for a while now to make sure latency is
> really dead.  I see no twitching, so...
>
> sched: update the clock of runqueue select_task_rq() selected.
>
> In try_to_wake_up(), we update the runqueue clock, but select_task_rq()
> may select a different runqueue than the one we updated, leaving the new
> runqueue's clock stale for a bit.
>
> This patch cures occasional huge latencies reported by latencytop when
> coming out of idle on a mostly idle NO_HZ box.
>
> Signed-off-by: Mike Galbraith <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> LKML-Reference: <new-submission>
>
> ---
>  kernel/sched.c |    8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> Index: linux-2.6/kernel/sched.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched.c
> +++ linux-2.6/kernel/sched.c
> @@ -2311,7 +2311,7 @@ static int try_to_wake_up(struct task_st
>  {
>        int cpu, orig_cpu, this_cpu, success = 0;
>        unsigned long flags;
> -       struct rq *rq;
> +       struct rq *rq, *orig_rq;
>
>        if (!sched_feat(SYNC_WAKEUPS))
>                wake_flags &= ~WF_SYNC;
> @@ -2319,7 +2319,7 @@ static int try_to_wake_up(struct task_st
>        this_cpu = get_cpu();
>
>        smp_wmb();
> -       rq = task_rq_lock(p, &flags);
> +       rq = orig_rq = task_rq_lock(p, &flags);
>        update_rq_clock(rq);
>        if (!(p->state & state))
>                goto out;
> @@ -2350,6 +2350,10 @@ static int try_to_wake_up(struct task_st
>                set_task_cpu(p, cpu);
>
>        rq = task_rq_lock(p, &flags);
> +
> +       if (rq != orig_rq)
> +               update_rq_clock(rq);
> +
>        WARN_ON(p->state != TASK_WAKING);
>        cpu = task_cpu(p);
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

2009-10-08 14:16:02

by Mike Galbraith

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Thu, 2009-10-08 at 06:45 -0700, Arjan van de Ven wrote:
> On Thu, 08 Oct 2009 13:24:22 +0200
> Mike Galbraith <[email protected]> wrote:
> > Latencytop was accounting uninterruptible and interruptible sleep
> > time up to 5ms, which is not the latency the user is looking for.
>
> it is for everything but the scheduler latency!
>
> latencytop wants to show where you're waiting for disk, etc etc.
> that's not "time on runqueue".
>
> for the scheduler delays a whole different codepath is used!

That explains.. a few things.

-Mike


2009-10-05 14:24:40

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Mon, 5 Oct 2009 16:13:31 +0200
Frans Pop <[email protected]> wrote:

> On Monday 05 October 2009, Frans Pop wrote:
> > With .32-rc3 I'm getting occasional skips from amarok while playing
> > music from an NFS share (3 today so far), something I've not had in
> > a long time.
> >
> > The reason looks to be that latencytop consistently shows 'iwlagn'
> > as the top affected process with "Scheduler: waiting for CPU".
> > Values of 100-140 ms occur frequently, but I've also seen higher
> > values (180, 207). I don't remember seeing iwlagn high in the list
> > before, but I have not checked earlier kernels yet.
> >
> > Added to that 'phy0' also frequently shows high with values of
> > 25-75 ms.
> >
> > The system is otherwise essentially idle. Other processes typically
> > show a latency < 5 ms.
>
> If I give the system some work to do (like compiling a kernel), the
> latencies on iwlagn and phy0 disappear (values < 5 ms).
> Is this related to ondemand frequency scaling?

shouldn't be.... but

can you do a 5 second or so timechart recording of this?
That will capture all scheduler delays as well as the P states etc....

(to do a timechart recording, first make sure debugfs is mounted, and
then do in tools/perf

./perf timechart record sleep 5

then you can turn it into an SVG with

./perf timechart

)


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-10-08 04:06:37

by Mike Galbraith

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Wed, 2009-10-07 at 20:34 +0200, Frans Pop wrote:
> On Wednesday 07 October 2009, Frans Pop wrote:
> > On Tuesday 06 October 2009, Frans Pop wrote:
> > > I've checked for 2.6.31.1 now and iwlagn is listed high there too when
> > > the system is idle, but with normal values of 60-100 ms. And phy0 has
> > > normal values of below 10 ms.
> > > I've now rebooted with today's mainline git; phy0 now frequently shows
> > > with values of around 100 ms too (i.e. higher than last time).
> >
> > Mike privately sent me a script to try to capture the latencies with
> > perf, but the perf output does not show any high latencies at all. It
> > looks as if we may have found a bug in latencytop here instead.
>
> Not sure if it's relevant nor what it means, but I frequently see two lines
> for iwlagn, e.g:
>
> Scheduler: waiting for cpu 102.4 msec 99.7 %
> . 3.3 msec 0.3 %
>
> I get the same results with both latencytop 0.4 and 0.5.

OK, I see latencytop spikes here on an idle box too, to the tune of up
to a _second_. Booting with nohz=off seems to have cured it.

I wanted to see if that's also the perf sched record -C N trouble I
warned you not to try when recording with script, but unfortunately,
after pulling this morning...

marge:/root/tmp # perf sched lat --sort=max
Segmentation fault

...perf sched got busted. Seems likely to be same thing for both
though, as magnitude/frequency of bogons is very similar.

-Mike


2009-10-07 18:11:40

by Mike Galbraith

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Wed, 2009-10-07 at 19:10 +0200, Frans Pop wrote:
> On Tuesday 06 October 2009, Frans Pop wrote:
> > I've checked for 2.6.31.1 now and iwlagn is listed high there too when
> > the system is idle, but with normal values of 60-100 ms. And phy0 has
> > normal values of below 10 ms.
> > I've now rebooted with today's mainline git; phy0 now frequently shows
> > with values of around 100 ms too (i.e. higher than last time).
> >
> > Both still go way down as soon as the system is given work to do.
> >
> > With a 5 second sleep I was unable to get any significant latencies (I
> > started perf on a latencytop refresh and did a manual refresh as it
> > finished to see what happened during the perf run). The perf run does
> > seem to affect the latencies.
> > I've uploaded a chart for a 10s sleep during which I got latencies of
> > 101ms for iwlagn and 77ms for phy0:
> > http://people.debian.org/~fjp/tmp/kernel/.
>
> Mike privately sent me a script to try to capture the latencies with perf,
> but the perf output does not show any high latencies at all. It looks as if
> we may have found a bug in latencytop here instead.

Maybe. I have a little perturbation measurement proggy which I just
fired up to verify both perf and latencytop's numbers here. It's a dirt
simply cycle counter tool, which calibrates itself, sums perturbations
over a period of time and emit stats. Here, all three are in violent
agreement wrt how long "pert" is waiting for cpu.

-Mike


2009-10-08 14:56:15

by Frans Pop

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Thursday 08 October 2009, Arjan van de Ven wrote:
> From: Arjan van de Ven <[email protected]>
> Date: Thu, 24 Sep 2009 13:24:16 +0200
> Subject: [PATCH] x86, timers: check for pending timers after (device)
> interrupts
>
> Now that range timers and deferred timers are common, I found a
> problem with these using the "perf timechart" tool.
>
> It turns out that on x86, these two 'opportunistic' timers only
> get checked when another "real" timer happens.
> These opportunistic timers have the objective to save power by
> hitchhiking on other wakeups, as to avoid CPU wakeups by themselves
> as much as possible.

This patch makes quite a difference for me. iwlagn and phy0 now
consistently show at ~10 ms or lower.

I do still get occasional high latencies, but those are for things like
"[rpc_wait_bit_killable]" or "Writing a page to disk", where I guess you'd
expect them. Those high latencies are mostly only listed for "Global" and
don't translate to individual processes.

On Thursday 08 October 2009, Arjan van de Ven wrote:
> On Thu, 08 Oct 2009 13:24:22 +0200 Mike Galbraith <[email protected]> wrote:
> > Latencytop was accounting uninterruptible and interruptible sleep
> > time up to 5ms, which is not the latency the user is looking for.
>
> it is for everything but the scheduler latency!
>
> latencytop wants to show where you're waiting for disk, etc etc.
> that's not "time on runqueue".

The ~10 ms I still get for iwlagn and phy0 (and sometimes higher (~30 ms)
for others like Xorg and artsd) is still "Scheduler: waiting for cpu'. If
it is actually due to (un)interuptable sleep, isn't that a misleading
label? I directly associated that with scheduler latency.

Or are those that are left now real scheduler latencies? The values are now
low enough that they don't indicate a problem.

Thanks,
FJP

P.S. I may be seeing another issue in the latencytop GUI. Sometimes I see
"fsync() on a file (type 'F' for details)". But typing 'F' only gets me a
search for a target starting with "F", no details.

2009-10-08 18:24:27

by Mike Galbraith

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Thu, 2009-10-08 at 16:55 +0200, Frans Pop wrote:
> On Thursday 08 October 2009, Arjan van de Ven wrote:
> > From: Arjan van de Ven <[email protected]>
> > Date: Thu, 24 Sep 2009 13:24:16 +0200
> > Subject: [PATCH] x86, timers: check for pending timers after (device)
> > interrupts
> >
> > Now that range timers and deferred timers are common, I found a
> > problem with these using the "perf timechart" tool.
> >
> > It turns out that on x86, these two 'opportunistic' timers only
> > get checked when another "real" timer happens.
> > These opportunistic timers have the objective to save power by
> > hitchhiking on other wakeups, as to avoid CPU wakeups by themselves
> > as much as possible.
>
> This patch makes quite a difference for me. iwlagn and phy0 now
> consistently show at ~10 ms or lower.
>
> I do still get occasional high latencies, but those are for things like
> "[rpc_wait_bit_killable]" or "Writing a page to disk", where I guess you'd
> expect them. Those high latencies are mostly only listed for "Global" and
> don't translate to individual processes.

I still see very high latencies coming out of idle (last noted was >
300ms, NO_HZ) with this patch, and wonder if the hunk below makes any
difference whatsoever for you. Here, it definitely does. (shouldn't)

Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -495,8 +495,10 @@ static void update_curr(struct cfs_rq *c
u64 now = rq_of(cfs_rq)->clock;
unsigned long delta_exec;

- if (unlikely(!curr))
+ if (unlikely(!curr)) {
+ update_rq_clock(rq_of(cfs_rq));
return;
+ }

/*
* Get the amount of time the current task was running



2009-10-08 06:24:13

by Mike Galbraith

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Thu, 2009-10-08 at 06:05 +0200, Mike Galbraith wrote:
> On Wed, 2009-10-07 at 20:34 +0200, Frans Pop wrote:
> > On Wednesday 07 October 2009, Frans Pop wrote:
> > > On Tuesday 06 October 2009, Frans Pop wrote:
> > > > I've checked for 2.6.31.1 now and iwlagn is listed high there too when
> > > > the system is idle, but with normal values of 60-100 ms. And phy0 has
> > > > normal values of below 10 ms.
> > > > I've now rebooted with today's mainline git; phy0 now frequently shows
> > > > with values of around 100 ms too (i.e. higher than last time).
> > >
> > > Mike privately sent me a script to try to capture the latencies with
> > > perf, but the perf output does not show any high latencies at all. It
> > > looks as if we may have found a bug in latencytop here instead.
> >
> > Not sure if it's relevant nor what it means, but I frequently see two lines
> > for iwlagn, e.g:
> >
> > Scheduler: waiting for cpu 102.4 msec 99.7 %
> > . 3.3 msec 0.3 %
> >
> > I get the same results with both latencytop 0.4 and 0.5.
>
> OK, I see latencytop spikes here on an idle box too, to the tune of up
> to a _second_. Booting with nohz=off seems to have cured it.
>
> I wanted to see if that's also the perf sched record -C N trouble I
> warned you not to try when recording with script, but unfortunately,
> after pulling this morning...
>
> marge:/root/tmp # perf sched lat --sort=max
> Segmentation fault
>
> ...perf sched got busted. Seems likely to be same thing for both
> though, as magnitude/frequency of bogons is very similar.

Reverting problematic commit showed that nohz isn't the source of perf
sched record -C N oddity.

However, aside from latencytop seemingly having trouble with nohz, there
appears to be a problem with perf sched record -a as well, the same one
Arjan just fixed for perf timechart.

In thread - [PATCH] perf timechart: Work around commit 42e59d7d19dc4b4,
Arjan said..
<quote>
Commit 42e59d7d19dc4b4 introduced a sample frequency framework..
.. however it unfortunately changed how perf events get recorded,..
</quote>

Monkey see monkey (eep) do, and record mostly idle box, only amarok
doing it's thing...

Before:
marge:/root/tmp # perf sched record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.279 MB perf.data (~12208 samples) ]

After:
marge:/root/tmp # perf sched record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.649 MB perf.data (~28349 samples) ]


perf_counter tools: make perf sched pass -F 0 to record

Commit 42e59d7d19dc4b4 introduced a sample frequency framework..
.. however it unfortunately changed how perf events get recorded,
and caused sched to miss events.

This patch causes the sched code to use -F 0 to not use the
new framework when recording sched data.

Signed-off-by: Mike Galbraith <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
LKML-Reference: <new-submission>

---
tools/perf/builtin-sched.c | 1 +
1 file changed, 1 insertion(+)

Index: linux-2.6/tools/perf/builtin-sched.c
===================================================================
--- linux-2.6.orig/tools/perf/builtin-sched.c
+++ linux-2.6/tools/perf/builtin-sched.c
@@ -1902,6 +1902,7 @@ static const char *record_args[] = {
"-f",
"-m", "1024",
"-c", "1",
+ "-F", "0",
"-e", "sched:sched_switch:r",
"-e", "sched:sched_stat_wait:r",
"-e", "sched:sched_stat_sleep:r",




2009-10-08 13:09:43

by Frans Pop

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Thursday 08 October 2009, Mike Galbraith wrote:
> Care to try a patch?
>
> latencytop: only account on-runqueue wait time as being scheduler
> latency.
>
> Latencytop was accounting uninterruptible and interruptible sleep time
> up to 5ms, which is not the latency the user is looking for. Account
> time waiting on-runqueue instead. Also add a clock update to
> update_curr() for the case where there's nobody home. When coming out
> of idle with NO_HZ, not updating the clock leads to bogus max latency
> report.

With this patch the high values for iwlagn and phy0 are gone. The values
now match roughly with the output of your script.

If (un)interruptible sleep is worth displaying in latencytop then maybe it
should be identified as a separate cause instead of lumped under "waiting
for CPU"?

> TODO: Figure out where the heck empty braces come from.

I also see these lines with "[]" sometimes; looks to come from 'sleep'.

And I still see lines with only a "." as the cause; this seems specific to
targets marked with the penguin (work queues?).

> Signed-off-by: Mike Galbraith <[email protected]>
> Arjan van de Ven <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> LKML-Reference: <new-submission>

If Arjan is happy with the patch, please add my:

Reported-and-tested-by: Frans Pop <[email protected]>

Thanks,
FJP

2009-10-09 20:07:09

by Mike Galbraith

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Fri, 2009-10-09 at 18:27 +0200, Frans Pop wrote:
> On Friday 09 October 2009, Mike Galbraith wrote:
> > sched: update the clock of runqueue select_task_rq() selected.
> >
> > In try_to_wake_up(), we update the runqueue clock, but select_task_rq()
> > may select a different runqueue than the one we updated, leaving the new
> > runqueue's clock stale for a bit.
> >
> > This patch cures occasional huge latencies reported by latencytop when
> > coming out of idle on a mostly idle NO_HZ box.
>
> This patch (without Arjan's previous one applied) also fixes my latency
> issues. Nice work!

Thanks for the confirmation (and kind words).

-Mike


2009-10-08 14:55:43

by Mike Galbraith

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Thu, 2009-10-08 at 16:13 +0200, Mike Galbraith wrote:
> On Thu, 2009-10-08 at 06:40 -0700, Arjan van de Ven wrote:
> > On Thu, 08 Oct 2009 06:05:43 +0200
> > Mike Galbraith <[email protected]> wrote:
>
> > > OK, I see latencytop spikes here on an idle box too, to the tune of up
> > > to a _second_. Booting with nohz=off seems to have cured it.
> >
> > interesting..
> >
> >
> > by chance, does this patch fix it ?
>
> I haven't seen incredible yet, but with idle box, have seen > 200ms, so
> would say no.

But this dinky hunk of my patchlet does indeed still fix it right up.

Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -495,8 +495,10 @@ static void update_curr(struct cfs_rq *c
u64 now = rq_of(cfs_rq)->clock;
unsigned long delta_exec;

- if (unlikely(!curr))
+ if (unlikely(!curr)) {
+ update_rq_clock(rq_of(cfs_rq));
return;
+ }

/*
* Get the amount of time the current task was running

We have a clock problem with NO_HZ coming out of idle, so we may well
have real latency issues at the same time, not bogons.

-Mike


2009-10-08 20:41:45

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Thu, Oct 08, 2009 at 08:23:37PM +0200, Mike Galbraith wrote:
> On Thu, 2009-10-08 at 16:55 +0200, Frans Pop wrote:
> > On Thursday 08 October 2009, Arjan van de Ven wrote:
> > > From: Arjan van de Ven <[email protected]>
> > > Date: Thu, 24 Sep 2009 13:24:16 +0200
> > > Subject: [PATCH] x86, timers: check for pending timers after (device)
> > > interrupts
> > >
> > > Now that range timers and deferred timers are common, I found a
> > > problem with these using the "perf timechart" tool.
> > >
> > > It turns out that on x86, these two 'opportunistic' timers only
> > > get checked when another "real" timer happens.
> > > These opportunistic timers have the objective to save power by
> > > hitchhiking on other wakeups, as to avoid CPU wakeups by themselves
> > > as much as possible.
> >
> > This patch makes quite a difference for me. iwlagn and phy0 now
> > consistently show at ~10 ms or lower.
> >
> > I do still get occasional high latencies, but those are for things like
> > "[rpc_wait_bit_killable]" or "Writing a page to disk", where I guess you'd
> > expect them. Those high latencies are mostly only listed for "Global" and
> > don't translate to individual processes.
>
> I still see very high latencies coming out of idle (last noted was >
> 300ms, NO_HZ) with this patch, and wonder if the hunk below makes any
> difference whatsoever for you. Here, it definitely does. (shouldn't)

I'm also seeing these strange, very high latencies here. Your patch
didn't help unfortunately.

This is from an otherwise idle NO_NZ system:

# ./perf sched latency

-----------------------------------------------------------------------------------------
Task | Runtime ms | Switches | Average delay ms | Maximum delay ms |
-----------------------------------------------------------------------------------------
ksoftirqd/0:4 | 2.216 ms | 170 | avg: 24.235 ms | max: 808.356 ms |
ksoftirqd/1:6 | 2.611 ms | 205 | avg: 4.334 ms | max: 165.553 ms |
migration/2:7 | 0.000 ms | 1 | avg: 3.060 ms | max: 3.060 ms |

With latencytop the ksoftirqd latency is over 1 sec frequently. (Could be
ondemand CPUfreq governor related?)

--
Markus

2009-10-08 11:46:00

by Mike Galbraith

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Wed, 2009-10-07 at 20:34 +0200, Frans Pop wrote:
> On Wednesday 07 October 2009, Frans Pop wrote:
> > On Tuesday 06 October 2009, Frans Pop wrote:
> > > I've checked for 2.6.31.1 now and iwlagn is listed high there too when
> > > the system is idle, but with normal values of 60-100 ms. And phy0 has
> > > normal values of below 10 ms.
> > > I've now rebooted with today's mainline git; phy0 now frequently shows
> > > with values of around 100 ms too (i.e. higher than last time).
> >
> > Mike privately sent me a script to try to capture the latencies with
> > perf, but the perf output does not show any high latencies at all. It
> > looks as if we may have found a bug in latencytop here instead.
>
> Not sure if it's relevant nor what it means, but I frequently see two lines
> for iwlagn, e.g:
>
> Scheduler: waiting for cpu 102.4 msec 99.7 %
> . 3.3 msec 0.3 %
>
> I get the same results with both latencytop 0.4 and 0.5.

Care to try a patch?


latencytop: only account on-runqueue wait time as being scheduler latency.

Latencytop was accounting uninterruptible and interruptible sleep time up to
5ms, which is not the latency the user is looking for. Account time waiting
on-runqueue instead. Also add a clock update to update_curr() for the case
where there's nobody home. When coming out of idle with NO_HZ, not updating
the clock leads to bogus max latency report.

After on mostly idle NO_HZ box, was up to a second "waiting for cpu":

LatencyTOP version 0.5 (C) 2008 Intel Corporation

Cause Maximum Percentage
Scheduler: waiting for cpu 9.3 msec 50.3 %
[sysret_careful] 9.1 msec 21.6 %
Waiting for data on unix socket 7.9 msec 0.5 %
Waiting for event (select) 3.8 msec 21.8 %
[retint_careful] 1.2 msec 2.4 %
[] 0.4 msec 0.0 %
Waiting for event (poll) 0.4 msec 0.4 %
[__cond_resched] 0.4 msec 0.0 %
Userspace lock contention 0.2 msec 3.0 %

TODO: Examine migration, pull may create max latency fib.
TODO: Figure out where the heck empty braces come from.

Signed-off-by: Mike Galbraith <[email protected]>
Arjan van de Ven <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
LKML-Reference: <new-submission>

---
include/linux/latencytop.h | 6 +++---
kernel/latencytop.c | 19 +------------------
kernel/sched_fair.c | 16 +++++++++-------
3 files changed, 13 insertions(+), 28 deletions(-)

Index: linux-2.6/include/linux/latencytop.h
===================================================================
--- linux-2.6.orig/include/linux/latencytop.h
+++ linux-2.6/include/linux/latencytop.h
@@ -26,12 +26,12 @@ struct latency_record {
struct task_struct;

extern int latencytop_enabled;
-void __account_scheduler_latency(struct task_struct *task, int usecs, int inter);
+void __account_scheduler_latency(struct task_struct *task, unsigned long usecs);
static inline void
-account_scheduler_latency(struct task_struct *task, int usecs, int inter)
+account_scheduler_latency(struct task_struct *task, unsigned long usecs)
{
if (unlikely(latencytop_enabled))
- __account_scheduler_latency(task, usecs, inter);
+ __account_scheduler_latency(task, usecs);
}

void clear_all_latency_tracing(struct task_struct *p);
Index: linux-2.6/kernel/latencytop.c
===================================================================
--- linux-2.6.orig/kernel/latencytop.c
+++ linux-2.6/kernel/latencytop.c
@@ -157,34 +157,17 @@ static inline void store_stacktrace(stru
* __account_scheduler_latency - record an occured latency
* @tsk - the task struct of the task hitting the latency
* @usecs - the duration of the latency in microseconds
- * @inter - 1 if the sleep was interruptible, 0 if uninterruptible
*
* This function is the main entry point for recording latency entries
* as called by the scheduler.
- *
- * This function has a few special cases to deal with normal 'non-latency'
- * sleeps: specifically, interruptible sleep longer than 5 msec is skipped
- * since this usually is caused by waiting for events via select() and co.
- *
- * Negative latencies (caused by time going backwards) are also explicitly
- * skipped.
*/
void __sched
-__account_scheduler_latency(struct task_struct *tsk, int usecs, int inter)
+__account_scheduler_latency(struct task_struct *tsk, unsigned long usecs)
{
unsigned long flags;
int i, q;
struct latency_record lat;

- /* Long interruptible waits are generally user requested... */
- if (inter && usecs > 5000)
- return;
-
- /* Negative sleeps are time going backwards */
- /* Zero-time sleeps are non-interesting */
- if (usecs <= 0)
- return;
-
memset(&lat, 0, sizeof(lat));
lat.count = 1;
lat.time = usecs;
Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -495,8 +495,10 @@ static void update_curr(struct cfs_rq *c
u64 now = rq_of(cfs_rq)->clock;
unsigned long delta_exec;

- if (unlikely(!curr))
+ if (unlikely(!curr)) {
+ update_rq_clock(rq_of(cfs_rq));
return;
+ }

/*
* Get the amount of time the current task was running
@@ -548,8 +550,11 @@ update_stats_wait_end(struct cfs_rq *cfs
rq_of(cfs_rq)->clock - se->wait_start);
#ifdef CONFIG_SCHEDSTATS
if (entity_is_task(se)) {
- trace_sched_stat_wait(task_of(se),
- rq_of(cfs_rq)->clock - se->wait_start);
+ struct task_struct *tsk = task_of(se);
+ u64 delta = rq_of(cfs_rq)->clock - se->wait_start;
+
+ trace_sched_stat_wait(tsk, delta);
+ account_scheduler_latency(tsk, delta >> 10);
}
#endif
schedstat_set(se->wait_start, 0);
@@ -643,10 +648,8 @@ static void enqueue_sleeper(struct cfs_r
se->sleep_start = 0;
se->sum_sleep_runtime += delta;

- if (tsk) {
- account_scheduler_latency(tsk, delta >> 10, 1);
+ if (tsk)
trace_sched_stat_sleep(tsk, delta);
- }
}
if (se->block_start) {
u64 delta = rq_of(cfs_rq)->clock - se->block_start;
@@ -677,7 +680,6 @@ static void enqueue_sleeper(struct cfs_r
(void *)get_wchan(tsk),
delta >> 20);
}
- account_scheduler_latency(tsk, delta >> 10, 0);
}
}
#endif



2009-10-05 14:14:10

by Frans Pop

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Monday 05 October 2009, Frans Pop wrote:
> With .32-rc3 I'm getting occasional skips from amarok while playing
> music from an NFS share (3 today so far), something I've not had in a
> long time.
>
> The reason looks to be that latencytop consistently shows 'iwlagn' as
> the top affected process with "Scheduler: waiting for CPU".
> Values of 100-140 ms occur frequently, but I've also seen higher values
> (180, 207). I don't remember seeing iwlagn high in the list before, but
> I have not checked earlier kernels yet.
>
> Added to that 'phy0' also frequently shows high with values of 25-75 ms.
>
> The system is otherwise essentially idle. Other processes typically show
> a latency < 5 ms.

If I give the system some work to do (like compiling a kernel), the
latencies on iwlagn and phy0 disappear (values < 5 ms).
Is this related to ondemand frequency scaling?

2009-10-07 18:35:34

by Frans Pop

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Wednesday 07 October 2009, Frans Pop wrote:
> On Tuesday 06 October 2009, Frans Pop wrote:
> > I've checked for 2.6.31.1 now and iwlagn is listed high there too when
> > the system is idle, but with normal values of 60-100 ms. And phy0 has
> > normal values of below 10 ms.
> > I've now rebooted with today's mainline git; phy0 now frequently shows
> > with values of around 100 ms too (i.e. higher than last time).
>
> Mike privately sent me a script to try to capture the latencies with
> perf, but the perf output does not show any high latencies at all. It
> looks as if we may have found a bug in latencytop here instead.

Not sure if it's relevant nor what it means, but I frequently see two lines
for iwlagn, e.g:

Scheduler: waiting for cpu 102.4 msec 99.7 %
. 3.3 msec 0.3 %

I get the same results with both latencytop 0.4 and 0.5.

2009-10-08 14:14:03

by Mike Galbraith

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Thu, 2009-10-08 at 06:40 -0700, Arjan van de Ven wrote:
> On Thu, 08 Oct 2009 06:05:43 +0200
> Mike Galbraith <[email protected]> wrote:

> > OK, I see latencytop spikes here on an idle box too, to the tune of up
> > to a _second_. Booting with nohz=off seems to have cured it.
>
> interesting..
>
>
> by chance, does this patch fix it ?

I haven't seen incredible yet, but with idle box, have seen > 200ms, so
would say no.

-Mike


2009-10-08 21:00:13

by Frans Pop

[permalink] [raw]
Subject: Re: [.32-rc3] scheduler: iwlagn consistently high in "waiting for CPU"

On Thursday 08 October 2009, Mike Galbraith wrote:
> I still see very high latencies coming out of idle (last noted was >
> 300ms, NO_HZ) with this patch,
>
> Like this:
>
> Cause Maximum Percentage
> Scheduler: waiting for cpu 604.2 msec 49.0 %

I'm not seeing anything even remotely like that.

I've tried quieting the system down as much as possible, but that made
no difference. I do have NO_HZ set; will send my full config privately.

If you'd still like me to try your patch I will, but I'm not sure if
that has much value.

Cheers,
FJP