2010-11-09 18:55:30

by Kyle McMartin

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

On Fri, Oct 29, 2010 at 09:42:23PM +0200, Peter Zijlstra wrote:
>
> not tested other than compile, but how does that look...?
>

Hi Peter,

I finally got around to building a test kernel for a user who had been
reporting this against F-14 (2.6.35) but they report there's not much
improvement.

Sorry. :/

<quote>
Date: Tue, 09 Nov 2010 12:46:53 -0600
From: Michael Cronenworth <[email protected]>
To: [email protected]
Subject: Re: Virtually impossible to keep load under 0.5 on idle desktop with
F14+kms

On 11/09/2010 11:55 AM, Kyle McMartin wrote:
> http://kyle.fedorapeople.org/kernel/2.6.35.6-52.sched1.fc14/

Running this kernel is not any better. After leaving my system idle for
10 minutes I finally saw it dip below 0.10, but after leaving the
computer, and coming back 10 minutes later, I see a load of 0.25. No
programs running.

Before I hit "send" on this message, I ran uptime again and got this:
$ uptime
12:45:24 up 32 min, 3 users, load average: 0.82, 0.43, 0.34

Thunderbird was the only program running. 99.8% idle.
</quote>

regards, Kyle


2010-11-09 19:02:28

by Peter Zijlstra

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

On Tue, 2010-11-09 at 13:55 -0500, Kyle McMartin wrote:
> On Fri, Oct 29, 2010 at 09:42:23PM +0200, Peter Zijlstra wrote:
> >
> > not tested other than compile, but how does that look...?
> >
>
> Hi Peter,
>
> I finally got around to building a test kernel for a user who had been
> reporting this against F-14 (2.6.35) but they report there's not much
> improvement.
>
> Sorry. :/

Right, I tested it myself yesterday and got similar results. Does the
'simple' patch from a few posts back work for people?

http://lkml.org/lkml/2010/10/26/150

If that does work I fudged the fancy math, if that doesn't work either
we're back to square zero.

2010-11-10 02:38:07

by tmhikaru

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

On Tue, Nov 09, 2010 at 08:02:28PM +0100, Peter Zijlstra wrote:
> On Tue, 2010-11-09 at 13:55 -0500, Kyle McMartin wrote:
> > On Fri, Oct 29, 2010 at 09:42:23PM +0200, Peter Zijlstra wrote:
> > >
> > > not tested other than compile, but how does that look...?
> > >
> >
> > Hi Peter,
> >
> > I finally got around to building a test kernel for a user who had been
> > reporting this against F-14 (2.6.35) but they report there's not much
> > improvement.
> >
> > Sorry. :/
>
> Right, I tested it myself yesterday and got similar results. Does the
> 'simple' patch from a few posts back work for people?
>
> http://lkml.org/lkml/2010/10/26/150
>
> If that does work I fudged the fancy math, if that doesn't work either
> we're back to square zero.

I seem to remember someone saying this would only work on x86-64; if this is
meant to work with a 32bit intel single core processor, please let me know -
I'd like to try it. It would also help if I knew which kernel version this
simpler patch is meant to apply to.

Thank you,
Tim McGrath


Attachments:
(No filename) (1.02 kB)
(No filename) (482.00 B)
Download all attachments

2010-11-10 03:45:21

by Kyle McMartin

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

On Tue, Nov 09, 2010 at 08:02:28PM +0100, Peter Zijlstra wrote:
> Right, I tested it myself yesterday and got similar results. Does the
> 'simple' patch from a few posts back work for people?
>

Apparently it doesn't make a large amount of difference. :/

regards, Kyle

2010-11-10 12:00:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

On Tue, 2010-11-09 at 22:45 -0500, Kyle McMartin wrote:
> On Tue, Nov 09, 2010 at 08:02:28PM +0100, Peter Zijlstra wrote:
> > Right, I tested it myself yesterday and got similar results. Does the
> > 'simple' patch from a few posts back work for people?
> >
>
> Apparently it doesn't make a large amount of difference. :/

Crud, ok, back to basics it is.. thanks for testing!

2010-11-10 12:01:00

by Peter Zijlstra

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

On Tue, 2010-11-09 at 21:37 -0500, [email protected] wrote:
> On Tue, Nov 09, 2010 at 08:02:28PM +0100, Peter Zijlstra wrote:
> > On Tue, 2010-11-09 at 13:55 -0500, Kyle McMartin wrote:
> > > On Fri, Oct 29, 2010 at 09:42:23PM +0200, Peter Zijlstra wrote:
> > > >
> > > > not tested other than compile, but how does that look...?
> > > >
> > >
> > > Hi Peter,
> > >
> > > I finally got around to building a test kernel for a user who had been
> > > reporting this against F-14 (2.6.35) but they report there's not much
> > > improvement.
> > >
> > > Sorry. :/
> >
> > Right, I tested it myself yesterday and got similar results. Does the
> > 'simple' patch from a few posts back work for people?
> >
> > http://lkml.org/lkml/2010/10/26/150
> >
> > If that does work I fudged the fancy math, if that doesn't work either
> > we're back to square zero.
>
> I seem to remember someone saying this would only work on x86-64; if this is
> meant to work with a 32bit intel single core processor, please let me know -
> I'd like to try it. It would also help if I knew which kernel version this
> simpler patch is meant to apply to.

Venki's patch was, the one I linked should work on every arch.

2010-11-14 05:14:16

by tmhikaru

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

On Wed, Nov 10, 2010 at 01:00:24PM +0100, Peter Zijlstra wrote:
> On Tue, 2010-11-09 at 22:45 -0500, Kyle McMartin wrote:
> > On Tue, Nov 09, 2010 at 08:02:28PM +0100, Peter Zijlstra wrote:
> > > Right, I tested it myself yesterday and got similar results. Does the
> > > 'simple' patch from a few posts back work for people?
> > >
> >
> > Apparently it doesn't make a large amount of difference. :/
>
> Crud, ok, back to basics it is.. thanks for testing!

Much like the other tester, this doesn't make any difference at all for me,
load fluctuates semi randomly as it did before with an average after 20 mins
of around 0.8 with the machine idle. I have to wonder, is there currently a
reason NOT to revert this commit? Does this commit that's causing problems
for myself and other people actually fix problems for some people that
existed before? I just want to know why there seems to be a struggle to fix
the current implementation rather than backing out the change that caused
the problem I'm dealing with and then trying to work from there.

If you've forgotten, I'm talking specifically about this commit:

[74f5187ac873042f502227701ed1727e7c5fbfa9] sched: Cure load average vs NO_HZ woes

We've gone through two kernel releases with this commit, and I think I'm
understandably curious why it's not been reverted if it's causing problems
with no solution in sight.

Note that I'm currently running 2.6.35.8 with CONFIG_NO_HZ unset with no
problems with the load average, so at least that works properly, and proves
that I'm not being affected by some other quirk at least...

Machine's only been up ten minutes or so, but here's the uptime data anyway
for the curious:

00:10:02 up 12 min, 4 users, load average: 0.00, 0.11, 0.14

Tim McGrath


Attachments:
(No filename) (1.71 kB)
(No filename) (482.00 B)
Download all attachments

2010-11-25 13:40:08

by Damien Wyart

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

Hi Peter,

Do you have a plan about solving this problem in the coming weeks or
will it still be present in 2.6.37 final? I am not implying any critic
in any way, just want to know a status because compared to other bugs
discussed in lkml, this one has seen very little technical proposals
towards a resolution (except your attempts). People who signed off the
discussed commit did not make any technical analysis about the bug
reported and this gives a strange feeling. Earth will not stop rotating
if it is not solved, but I still this it is quite important and affects
many users.

> [74f5187ac873042f502227701ed1727e7c5fbfa9] sched: Cure load average vs> NO_HZ woes

> We've gone through two kernel releases with this commit, and I think
> I'm understandably curious why it's not been reverted if it's causing
> problems with no solution in sight.

Even with this commit reverted, the load is incorrect in NO_HZ mode, so
revert only is not enough.

--
Damien Wyart

2010-11-25 14:03:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

On Thu, 2010-11-25 at 14:31 +0100, Damien Wyart wrote:
> Hi Peter,
>
> Do you have a plan about solving this problem in the coming weeks or
> will it still be present in 2.6.37 final? I am not implying any critic
> in any way, just want to know a status because compared to other bugs
> discussed in lkml, this one has seen very little technical proposals
> towards a resolution (except your attempts). People who signed off the
> discussed commit did not make any technical analysis about the bug
> reported and this gives a strange feeling. Earth will not stop rotating
> if it is not solved, but I still this it is quite important and affects
> many users.

Yes I agree, I've put the problem on the back burner for a bit so that I
can forget all my tinkering and try again with a fresher mind :-)

I'll try and get around to it somewhere this week again.

2010-11-27 20:15:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

On Thu, 2010-11-25 at 15:03 +0100, Peter Zijlstra wrote:

> I'll try and get around to it somewhere this week again.

How does this work for you? Its hideous but lets start simple.

---
kernel/sched.c | 5 +++++
kernel/timer.c | 14 ++++++++++++--
2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index dc91a4d..2ebb07c 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3223,6 +3223,11 @@ static void calc_load_account_active(struct rq *this_rq)
this_rq->calc_load_update += LOAD_FREQ;
}

+void calc_load_account_this(void)
+{
+ calc_load_account_active(this_rq());
+}
+
/*
* The exact cpuload at various idx values, calculated at every tick would be
* load = (2^idx - 1) / 2^idx * load + 1 / 2^idx * cur_load
diff --git a/kernel/timer.c b/kernel/timer.c
index 68a9ae7..af095c0 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1315,11 +1315,21 @@ void run_local_timers(void)
* jiffies is defined in the linker script...
*/

+extern void calc_load_account_this(void);
+
void do_timer(unsigned long ticks)
{
- jiffies_64 += ticks;
+ if (ticks > 1) {
+ while (ticks--) {
+ jiffies_64++;
+ calc_load_account_this();
+ calc_global_load();
+ }
+ } else {
+ jiffies_64 += ticks;
+ calc_global_load();
+ }
update_wall_time();
- calc_global_load();
}

#ifdef __ARCH_WANT_SYS_ALARM

2010-11-28 04:26:15

by Kyle McMartin

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

On Sat, Nov 27, 2010 at 09:15:20PM +0100, Peter Zijlstra wrote:
> > I'll try and get around to it somewhere this week again.
>
> How does this work for you? Its hideous but lets start simple.
>

At least one user reports noticeably lower idle load average using a
kernel with that patch.

Thanks, looks promising!

Kyle

2010-11-28 11:40:52

by Damien Wyart

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

Hi,

* Peter Zijlstra <[email protected]> [2010-11-27 21:15]:
> How does this work for you? Its hideous but lets start simple.
> [...]

Doesn't give wrong numbers like initial bug and tentative patches, but
feels a bit too slow when numbers go up and down. Correct values are
reached when waiting long enough, but it feels slow.

As I've tested many combinations, maybe this is an impression because
I do not remember about "normal" delays for the load to rise and fall,
but this still feels slow.

Thanks for working on this problem again.

--
Damien Wyart

2010-11-28 18:08:21

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

On Sun, 28 Nov 2010 12:40:27 +0100, Damien Wyart said:
> Hi,
>
> * Peter Zijlstra <[email protected]> [2010-11-27 21:15]:
> > How does this work for you? Its hideous but lets start simple.
> > [...]
>
> Doesn't give wrong numbers like initial bug and tentative patches, but
> feels a bit too slow when numbers go up and down. Correct values are
> reached when waiting long enough, but it feels slow.

Umm.. maybe the code still averages the 1/5/15 over that many minute's
worth of ticks, which may be longer for a tickless kernel?


Attachments:
(No filename) (227.00 B)

2010-11-29 11:38:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

On Sun, 2010-11-28 at 12:40 +0100, Damien Wyart wrote:
> Hi,
>
> * Peter Zijlstra <[email protected]> [2010-11-27 21:15]:
> > How does this work for you? Its hideous but lets start simple.
> > [...]
>
> Doesn't give wrong numbers like initial bug and tentative patches, but
> feels a bit too slow when numbers go up and down. Correct values are
> reached when waiting long enough, but it feels slow.
>
> As I've tested many combinations, maybe this is an impression because
> I do not remember about "normal" delays for the load to rise and fall,
> but this still feels slow.

You can test this by either booting with nohz=off, or builting with
CONFIG_NO_HZ=n and then comparing the result, something like

make O=defconfig clean; while sleep 10; do uptime >> load.log; done &
make -j32 O=defconfig; kill %1

And comparing the curves between the NO_HZ and !NO_HZ kernels.

I'll try and make the patch less hideous ;-)

2010-11-29 19:40:50

by tmhikaru

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

On Mon, Nov 29, 2010 at 12:38:46PM +0100, Peter Zijlstra wrote:
> On Sun, 2010-11-28 at 12:40 +0100, Damien Wyart wrote:
> > Hi,
> >
> > * Peter Zijlstra <[email protected]> [2010-11-27 21:15]:
> > > How does this work for you? Its hideous but lets start simple.
> > > [...]
> >
> > Doesn't give wrong numbers like initial bug and tentative patches, but
> > feels a bit too slow when numbers go up and down. Correct values are
> > reached when waiting long enough, but it feels slow.
> >
> > As I've tested many combinations, maybe this is an impression because
> > I do not remember about "normal" delays for the load to rise and fall,
> > but this still feels slow.
>
> You can test this by either booting with nohz=off, or builting with
> CONFIG_NO_HZ=n and then comparing the result, something like
>
> make O=defconfig clean; while sleep 10; do uptime >> load.log; done &
> make -j32 O=defconfig; kill %1
>
> And comparing the curves between the NO_HZ and !NO_HZ kernels.
>
> I'll try and make the patch less hideous ;-)

I've tested this patch on my own use case, and it seems to work for the most
part - it's still not settling as low as the previous implementation used
to, nor is it settling as low as CONFIG_NO_HZ=N (that is to say, 0.00 across
the board when not being used) however, this is definitely an improvement:

14:26:04 up 9:08, 5 users, load average: 0.05, 0.01, 0.00

This is the result of running uptime on a checked out version of
[74f5187ac873042f502227701ed1727e7c5fbfa9] sched: Cure load average vs NO_HZ woes

with the patch applied, starting X, and simply letting the machine sit idle
for nine hours. For the brief period I spent watching it after boot, it
quickly began settling down to a reasonable value, I only let it sit idle
this long to verify the loadavg was consistently low. (the loadavg was
consistently erratic, anywhere from 0.6 to 1.2 with the machine idle without
this patch)

Thank you for the hard work,
Tim McGrath


Attachments:
(No filename) (1.93 kB)
(No filename) (482.00 B)
Download all attachments

2010-11-29 23:01:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

On Mon, 2010-11-29 at 14:40 -0500, [email protected] wrote:
> On Mon, Nov 29, 2010 at 12:38:46PM +0100, Peter Zijlstra wrote:
> > On Sun, 2010-11-28 at 12:40 +0100, Damien Wyart wrote:
> > > Hi,
> > >
> > > * Peter Zijlstra <[email protected]> [2010-11-27 21:15]:
> > > > How does this work for you? Its hideous but lets start simple.
> > > > [...]
> > >
> > > Doesn't give wrong numbers like initial bug and tentative patches, but
> > > feels a bit too slow when numbers go up and down. Correct values are
> > > reached when waiting long enough, but it feels slow.
> > >
> > > As I've tested many combinations, maybe this is an impression because
> > > I do not remember about "normal" delays for the load to rise and fall,
> > > but this still feels slow.
> >
> > You can test this by either booting with nohz=off, or builting with
> > CONFIG_NO_HZ=n and then comparing the result, something like
> >
> > make O=defconfig clean; while sleep 10; do uptime >> load.log; done &
> > make -j32 O=defconfig; kill %1
> >
> > And comparing the curves between the NO_HZ and !NO_HZ kernels.
> >
> > I'll try and make the patch less hideous ;-)
>
> I've tested this patch on my own use case, and it seems to work for the most
> part - it's still not settling as low as the previous implementation used
> to, nor is it settling as low as CONFIG_NO_HZ=N (that is to say, 0.00 across
> the board when not being used) however, this is definitely an improvement:
>
> 14:26:04 up 9:08, 5 users, load average: 0.05, 0.01, 0.00
>
> This is the result of running uptime on a checked out version of
> [74f5187ac873042f502227701ed1727e7c5fbfa9] sched: Cure load average vs NO_HZ woes
>
> with the patch applied, starting X, and simply letting the machine sit idle
> for nine hours. For the brief period I spent watching it after boot, it
> quickly began settling down to a reasonable value, I only let it sit idle
> this long to verify the loadavg was consistently low. (the loadavg was
> consistently erratic, anywhere from 0.6 to 1.2 with the machine idle without
> this patch)

Ok, that's good testing.. so its still not quite the same as NO_HZ=n,
how about this one?

(it seems to drop down to 0.00 if I wait a few minutes with top -d5)

---
kernel/sched.c | 5 +++++
kernel/time/tick-sched.c | 4 +++-
kernel/timer.c | 12 ++++++++++++
3 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 864040c..a859158 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3082,6 +3082,11 @@ static void calc_load_account_active(struct rq *this_rq)
this_rq->calc_load_update += LOAD_FREQ;
}

+void calc_load_account_this(void)
+{
+ calc_load_account_active(this_rq());
+}
+
/*
* The exact cpuload at various idx values, calculated at every tick would be
* load = (2^idx - 1) / 2^idx * load + 1 / 2^idx * cur_load
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 3e216e0..1e6d384 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -41,6 +41,8 @@ struct tick_sched *tick_get_tick_sched(int cpu)
return &per_cpu(tick_cpu_sched, cpu);
}

+extern void do_timer_nohz(unsigned long ticks);
+
/*
* Must be called with interrupts disabled !
*/
@@ -75,7 +77,7 @@ static void tick_do_update_jiffies64(ktime_t now)
last_jiffies_update = ktime_add_ns(last_jiffies_update,
incr * ticks);
}
- do_timer(++ticks);
+ do_timer_nohz(++ticks);

/* Keep the tick_next_period variable up to date */
tick_next_period = ktime_add(last_jiffies_update, tick_period);
diff --git a/kernel/timer.c b/kernel/timer.c
index d6ccb90..eb2646f 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1300,6 +1300,18 @@ void do_timer(unsigned long ticks)
calc_global_load();
}

+extern void calc_load_account_this(void);
+
+void do_timer_nohz(unsigned long ticks)
+{
+ while (ticks--) {
+ jiffies_64++;
+ calc_load_account_this();
+ calc_global_load();
+ }
+ update_wall_time();
+}
+
#ifdef __ARCH_WANT_SYS_ALARM

/*

2010-11-30 14:58:57

by Peter Zijlstra

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

On Tue, 2010-11-30 at 00:01 +0100, Peter Zijlstra wrote:
>
> Ok, that's good testing.. so its still not quite the same as NO_HZ=n,
> how about this one?
>
> (it seems to drop down to 0.00 if I wait a few minutes with top -d5)

OK, so here's a less crufty patch that gets the same result on my
machine, load drops down to 0.00 after a while.

It seems a bit slower to reach 0.00, but that could be because I
actually changed the load computation for NO_HZ=n as well, I added a
rounding factor in calc_load(), we no longer truncate the division.

If people want to compare, simply remove the third line from
calc_load(): load += 1UL << (FSHIFT - 1), to restore the old behaviour.

---
include/linux/sched.h | 2 +-
kernel/sched.c | 127 ++++++++++++++++++++++++++++++++++++++++++++----
kernel/timer.c | 2 +-
3 files changed, 118 insertions(+), 13 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index de4d68f..a6e0c62 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -143,7 +143,7 @@ extern unsigned long nr_iowait_cpu(int cpu);
extern unsigned long this_cpu_load(void);


-extern void calc_global_load(void);
+extern void calc_global_load(unsigned long ticks);

extern unsigned long get_parent_ip(unsigned long addr);

diff --git a/kernel/sched.c b/kernel/sched.c
index 864040c..7868a18 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2978,6 +2978,15 @@ static long calc_load_fold_active(struct rq *this_rq)
return delta;
}

+static unsigned long
+calc_load(unsigned long load, unsigned long exp, unsigned long active)
+{
+ load *= exp;
+ load += active * (FIXED_1 - exp);
+ load += 1UL << (FSHIFT - 1);
+ return load >> FSHIFT;
+}
+
#ifdef CONFIG_NO_HZ
/*
* For NO_HZ we delay the active fold to the next LOAD_FREQ update.
@@ -3007,6 +3016,105 @@ static long calc_load_fold_idle(void)

return delta;
}
+
+/**
+ * fixed_power_int - compute: x^n, in O(log n) time
+ *
+ * @x: base of the power
+ * @frac_bits: fractional bits of @x
+ * @n: power to raise @x to.
+ *
+ * By exploiting the relation between the definition of the natural power
+ * function: x^n := x*x*...*x (x multiplied by itself for n times), and
+ * the binary encoding of numbers used by computers: n := \Sum n_i * 2^i,
+ * (where: n_i \elem {0, 1}, the binary vector representing n),
+ * we find: x^n := x^(\Sum n_i * 2^i) := \Prod x^(n_i * 2^i), which is
+ * of course trivially computable in O(log_2 n), the length of our binary
+ * vector.
+ */
+static unsigned long
+fixed_power_int(unsigned long x, unsigned int frac_bits, unsigned int n)
+{
+ unsigned long result = 1UL << frac_bits;
+
+ if (n) for (;;) {
+ if (n & 1) {
+ result *= x;
+ result += 1UL << (frac_bits - 1);
+ result >>= frac_bits;
+ }
+ n >>= 1;
+ if (!n)
+ break;
+ x *= x;
+ x += 1UL << (frac_bits - 1);
+ x >>= frac_bits;
+ }
+
+ return result;
+}
+
+/*
+ * a1 = a0 * e + a * (1 - e)
+ *
+ * a2 = a1 * e + a * (1 - e)
+ * = (a0 * e + a * (1 - e)) * e + a * (1 - e)
+ * = a0 * e^2 + a * (1 - e) * (1 + e)
+ *
+ * a3 = a2 * e + a * (1 - e)
+ * = (a0 * e^2 + a * (1 - e) * (1 + e)) * e + a * (1 - e)
+ * = a0 * e^3 + a * (1 - e) * (1 + e + e^2)
+ *
+ * ...
+ *
+ * an = a0 * e^n + a * (1 - e) * (1 + e + ... + e^n-1) [1]
+ * = a0 * e^n + a * (1 - e) * (1 - e^n)/(1 - e)
+ * = a0 * e^n + a * (1 - e^n)
+ *
+ * [1] application of the geometric series:
+ *
+ * n 1 - x^(n+1)
+ * S_n := \Sum x^i = -------------
+ * i=0 1 - x
+ */
+static unsigned long
+calc_load_n(unsigned long load, unsigned long exp,
+ unsigned long active, unsigned int n)
+{
+
+ return calc_load(load, fixed_power_int(exp, FSHIFT, n), active);
+}
+
+static void calc_global_nohz(unsigned long ticks)
+{
+ long delta, active, n;
+
+ if (time_before(jiffies, calc_load_update))
+ return;
+
+ /*
+ * If we crossed a calc_load_update boundary, make sure to fold
+ * any pending idle changes, the respective CPUs might have
+ * missed the tick driven calc_load_account_active() update
+ * due to NO_HZ
+ */
+ delta = calc_load_fold_idle();
+ if (delta)
+ atomic_long_add(delta, &calc_load_tasks);
+
+ if (ticks >= LOAD_FREQ) {
+ n = ticks / LOAD_FREQ;
+
+ active = atomic_long_read(&calc_load_tasks);
+ active = active > 0 ? active * FIXED_1 : 0;
+
+ avenrun[0] = calc_load_n(avenrun[0], EXP_1, active, n);
+ avenrun[1] = calc_load_n(avenrun[1], EXP_5, active, n);
+ avenrun[2] = calc_load_n(avenrun[2], EXP_15, active, n);
+
+ calc_load_update += n * LOAD_FREQ;
+ }
+}
#else
static void calc_load_account_idle(struct rq *this_rq)
{
@@ -3016,6 +3124,10 @@ static inline long calc_load_fold_idle(void)
{
return 0;
}
+
+static void calc_global_nohz(unsigned long ticks)
+{
+}
#endif

/**
@@ -3033,24 +3145,17 @@ void get_avenrun(unsigned long *loads, unsigned long offset, int shift)
loads[2] = (avenrun[2] + offset) << shift;
}

-static unsigned long
-calc_load(unsigned long load, unsigned long exp, unsigned long active)
-{
- load *= exp;
- load += active * (FIXED_1 - exp);
- return load >> FSHIFT;
-}
-
/*
* calc_load - update the avenrun load estimates 10 ticks after the
* CPUs have updated calc_load_tasks.
*/
-void calc_global_load(void)
+void calc_global_load(unsigned long ticks)
{
- unsigned long upd = calc_load_update + 10;
long active;

- if (time_before(jiffies, upd))
+ calc_global_nohz(ticks);
+
+ if (time_before(jiffies, calc_load_update + 10))
return;

active = atomic_long_read(&calc_load_tasks);
diff --git a/kernel/timer.c b/kernel/timer.c
index d6ccb90..9f82b2a 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1297,7 +1297,7 @@ void do_timer(unsigned long ticks)
{
jiffies_64 += ticks;
update_wall_time();
- calc_global_load();
+ calc_global_load(ticks);
}

#ifdef __ARCH_WANT_SYS_ALARM

2010-11-30 15:39:49

by Kyle McMartin

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

On Tue, Nov 30, 2010 at 03:59:05PM +0100, Peter Zijlstra wrote:
> OK, so here's a less crufty patch that gets the same result on my
> machine, load drops down to 0.00 after a while.
>
> It seems a bit slower to reach 0.00, but that could be because I
> actually changed the load computation for NO_HZ=n as well, I added a
> rounding factor in calc_load(), we no longer truncate the division.
>
> If people want to compare, simply remove the third line from
> calc_load(): load += 1UL << (FSHIFT - 1), to restore the old behaviour.
>

Thanks Peter, I'll build a couple kernels for the bugzilla reporters and
see what they turn up.

--Kyle

> ---
> include/linux/sched.h | 2 +-
> kernel/sched.c | 127 ++++++++++++++++++++++++++++++++++++++++++++----
> kernel/timer.c | 2 +-
> 3 files changed, 118 insertions(+), 13 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index de4d68f..a6e0c62 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -143,7 +143,7 @@ extern unsigned long nr_iowait_cpu(int cpu);
> extern unsigned long this_cpu_load(void);
>
>
> -extern void calc_global_load(void);
> +extern void calc_global_load(unsigned long ticks);
>
> extern unsigned long get_parent_ip(unsigned long addr);
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 864040c..7868a18 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -2978,6 +2978,15 @@ static long calc_load_fold_active(struct rq *this_rq)
> return delta;
> }
>
> +static unsigned long
> +calc_load(unsigned long load, unsigned long exp, unsigned long active)
> +{
> + load *= exp;
> + load += active * (FIXED_1 - exp);
> + load += 1UL << (FSHIFT - 1);
> + return load >> FSHIFT;
> +}
> +
> #ifdef CONFIG_NO_HZ
> /*
> * For NO_HZ we delay the active fold to the next LOAD_FREQ update.
> @@ -3007,6 +3016,105 @@ static long calc_load_fold_idle(void)
>
> return delta;
> }
> +
> +/**
> + * fixed_power_int - compute: x^n, in O(log n) time
> + *
> + * @x: base of the power
> + * @frac_bits: fractional bits of @x
> + * @n: power to raise @x to.
> + *
> + * By exploiting the relation between the definition of the natural power
> + * function: x^n := x*x*...*x (x multiplied by itself for n times), and
> + * the binary encoding of numbers used by computers: n := \Sum n_i * 2^i,
> + * (where: n_i \elem {0, 1}, the binary vector representing n),
> + * we find: x^n := x^(\Sum n_i * 2^i) := \Prod x^(n_i * 2^i), which is
> + * of course trivially computable in O(log_2 n), the length of our binary
> + * vector.
> + */
> +static unsigned long
> +fixed_power_int(unsigned long x, unsigned int frac_bits, unsigned int n)
> +{
> + unsigned long result = 1UL << frac_bits;
> +
> + if (n) for (;;) {
> + if (n & 1) {
> + result *= x;
> + result += 1UL << (frac_bits - 1);
> + result >>= frac_bits;
> + }
> + n >>= 1;
> + if (!n)
> + break;
> + x *= x;
> + x += 1UL << (frac_bits - 1);
> + x >>= frac_bits;
> + }
> +
> + return result;
> +}
> +
> +/*
> + * a1 = a0 * e + a * (1 - e)
> + *
> + * a2 = a1 * e + a * (1 - e)
> + * = (a0 * e + a * (1 - e)) * e + a * (1 - e)
> + * = a0 * e^2 + a * (1 - e) * (1 + e)
> + *
> + * a3 = a2 * e + a * (1 - e)
> + * = (a0 * e^2 + a * (1 - e) * (1 + e)) * e + a * (1 - e)
> + * = a0 * e^3 + a * (1 - e) * (1 + e + e^2)
> + *
> + * ...
> + *
> + * an = a0 * e^n + a * (1 - e) * (1 + e + ... + e^n-1) [1]
> + * = a0 * e^n + a * (1 - e) * (1 - e^n)/(1 - e)
> + * = a0 * e^n + a * (1 - e^n)
> + *
> + * [1] application of the geometric series:
> + *
> + * n 1 - x^(n+1)
> + * S_n := \Sum x^i = -------------
> + * i=0 1 - x
> + */
> +static unsigned long
> +calc_load_n(unsigned long load, unsigned long exp,
> + unsigned long active, unsigned int n)
> +{
> +
> + return calc_load(load, fixed_power_int(exp, FSHIFT, n), active);
> +}
> +
> +static void calc_global_nohz(unsigned long ticks)
> +{
> + long delta, active, n;
> +
> + if (time_before(jiffies, calc_load_update))
> + return;
> +
> + /*
> + * If we crossed a calc_load_update boundary, make sure to fold
> + * any pending idle changes, the respective CPUs might have
> + * missed the tick driven calc_load_account_active() update
> + * due to NO_HZ
> + */
> + delta = calc_load_fold_idle();
> + if (delta)
> + atomic_long_add(delta, &calc_load_tasks);
> +
> + if (ticks >= LOAD_FREQ) {
> + n = ticks / LOAD_FREQ;
> +
> + active = atomic_long_read(&calc_load_tasks);
> + active = active > 0 ? active * FIXED_1 : 0;
> +
> + avenrun[0] = calc_load_n(avenrun[0], EXP_1, active, n);
> + avenrun[1] = calc_load_n(avenrun[1], EXP_5, active, n);
> + avenrun[2] = calc_load_n(avenrun[2], EXP_15, active, n);
> +
> + calc_load_update += n * LOAD_FREQ;
> + }
> +}
> #else
> static void calc_load_account_idle(struct rq *this_rq)
> {
> @@ -3016,6 +3124,10 @@ static inline long calc_load_fold_idle(void)
> {
> return 0;
> }
> +
> +static void calc_global_nohz(unsigned long ticks)
> +{
> +}
> #endif
>
> /**
> @@ -3033,24 +3145,17 @@ void get_avenrun(unsigned long *loads, unsigned long offset, int shift)
> loads[2] = (avenrun[2] + offset) << shift;
> }
>
> -static unsigned long
> -calc_load(unsigned long load, unsigned long exp, unsigned long active)
> -{
> - load *= exp;
> - load += active * (FIXED_1 - exp);
> - return load >> FSHIFT;
> -}
> -
> /*
> * calc_load - update the avenrun load estimates 10 ticks after the
> * CPUs have updated calc_load_tasks.
> */
> -void calc_global_load(void)
> +void calc_global_load(unsigned long ticks)
> {
> - unsigned long upd = calc_load_update + 10;
> long active;
>
> - if (time_before(jiffies, upd))
> + calc_global_nohz(ticks);
> +
> + if (time_before(jiffies, calc_load_update + 10))
> return;
>
> active = atomic_long_read(&calc_load_tasks);
> diff --git a/kernel/timer.c b/kernel/timer.c
> index d6ccb90..9f82b2a 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1297,7 +1297,7 @@ void do_timer(unsigned long ticks)
> {
> jiffies_64 += ticks;
> update_wall_time();
> - calc_global_load();
> + calc_global_load(ticks);
> }
>
> #ifdef __ARCH_WANT_SYS_ALARM
>
>

2010-11-30 16:53:37

by Damien Wyart

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

* Peter Zijlstra <[email protected]> [101130 15:59]:
> OK, so here's a less crufty patch that gets the same result on my
> machine, load drops down to 0.00 after a while.

Seems OK too here.

> It seems a bit slower to reach 0.00, but that could be because
> I actually changed the load computation for NO_HZ=n as well, I added
> a rounding factor in calc_load(), we no longer truncate the division.

Yes, really feels slower, but in the two configurations as you write.
Then this is a matter of convention, do not know which version is more
"correct" or "natural", but this is coherent in the two modes (HZ and
NO_HZ).

Do you plan to push this one to upstream/stable, or do you plan further
modifications?


Thanks for your patches,
--
Damien Wyart

2010-11-30 17:00:10

by Damien Wyart

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

* Peter Zijlstra <[email protected]> [101129 12:38]:
> > Doesn't give wrong numbers like initial bug and tentative patches, but
> > feels a bit too slow when numbers go up and down. Correct values are
> > reached when waiting long enough, but it feels slow.

> > As I've tested many combinations, maybe this is an impression because
> > I do not remember about "normal" delays for the load to rise and fall,
> > but this still feels slow.

> You can test this by either booting with nohz=off, or builting with
> CONFIG_NO_HZ=n and then comparing the result, something like

> make O=defconfig clean; while sleep 10; do uptime >> load.log; done &
> make -j32 O=defconfig; kill %1

> And comparing the curves between the NO_HZ and !NO_HZ kernels.

In fact, timings were very similar between the two configurations, so
I guess my feelings were wrong... They were also similar with the next
version of the patch.

--
Damien Wyart

2010-11-30 17:29:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

On Tue, 2010-11-30 at 17:53 +0100, Damien Wyart wrote:
> * Peter Zijlstra <[email protected]> [101130 15:59]:
> > OK, so here's a less crufty patch that gets the same result on my
> > machine, load drops down to 0.00 after a while.
>
> Seems OK too here.
>
> > It seems a bit slower to reach 0.00, but that could be because
> > I actually changed the load computation for NO_HZ=n as well, I added
> > a rounding factor in calc_load(), we no longer truncate the division.
>
> Yes, really feels slower, but in the two configurations as you write.
> Then this is a matter of convention, do not know which version is more
> "correct" or "natural", but this is coherent in the two modes (HZ and
> NO_HZ).
>
> Do you plan to push this one to upstream/stable, or do you plan further
> modifications?

No this is about it, I've written a bit more comments, and I'll need to
come up with a changelog, I'll also wait for Kyle's testers to come
back, but yes, then I'll push it upstream/stable.

2010-11-30 20:01:20

by tmhikaru

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

On Tue, Nov 30, 2010 at 12:01:17AM +0100, Peter Zijlstra wrote:
> On Mon, 2010-11-29 at 14:40 -0500, [email protected] wrote:
> > On Mon, Nov 29, 2010 at 12:38:46PM +0100, Peter Zijlstra wrote:
> > > On Sun, 2010-11-28 at 12:40 +0100, Damien Wyart wrote:
> > > > Hi,
> > > >
> > > > * Peter Zijlstra <[email protected]> [2010-11-27 21:15]:
> > > > > How does this work for you? Its hideous but lets start simple.
> > > > > [...]
> > > >
> > > > Doesn't give wrong numbers like initial bug and tentative patches, but
> > > > feels a bit too slow when numbers go up and down. Correct values are
> > > > reached when waiting long enough, but it feels slow.
> > > >
> > > > As I've tested many combinations, maybe this is an impression because
> > > > I do not remember about "normal" delays for the load to rise and fall,
> > > > but this still feels slow.
> > >
> > > You can test this by either booting with nohz=off, or builting with
> > > CONFIG_NO_HZ=n and then comparing the result, something like
> > >
> > > make O=defconfig clean; while sleep 10; do uptime >> load.log; done &
> > > make -j32 O=defconfig; kill %1
> > >
> > > And comparing the curves between the NO_HZ and !NO_HZ kernels.
> > >
> > > I'll try and make the patch less hideous ;-)
> >
> > I've tested this patch on my own use case, and it seems to work for the most
> > part - it's still not settling as low as the previous implementation used
> > to, nor is it settling as low as CONFIG_NO_HZ=N (that is to say, 0.00 across
> > the board when not being used) however, this is definitely an improvement:
> >
> > 14:26:04 up 9:08, 5 users, load average: 0.05, 0.01, 0.00
> >
> > This is the result of running uptime on a checked out version of
> > [74f5187ac873042f502227701ed1727e7c5fbfa9] sched: Cure load average vs NO_HZ woes
> >
> > with the patch applied, starting X, and simply letting the machine sit idle
> > for nine hours. For the brief period I spent watching it after boot, it
> > quickly began settling down to a reasonable value, I only let it sit idle
> > this long to verify the loadavg was consistently low. (the loadavg was
> > consistently erratic, anywhere from 0.6 to 1.2 with the machine idle without
> > this patch)
>
> Ok, that's good testing.. so its still not quite the same as NO_HZ=n,
> how about this one?
>
> (it seems to drop down to 0.00 if I wait a few minutes with top -d5)


I haven't had time to test your further patches but THIS works!

14:57:03 up 14:01, 4 users, load average: 0.00, 0.00, 0.00

Load seems to finally be accurate on my machine compared to processes
running/whatever else usage. This is again testing vs the original commit
that caused the problems for me:

[74f5187ac873042f502227701ed1727e7c5fbfa9] sched: Cure load average vs NO_HZ woes

so I know I'm testing apples to apples here.


As time permits I'll test the later replies you made to yourself.

Thank you,
Tim McGrath


Attachments:
(No filename) (2.86 kB)
(No filename) (482.00 B)
Download all attachments

2010-11-30 20:04:22

by Kyle McMartin

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

On Tue, Nov 30, 2010 at 10:39:31AM -0500, Kyle McMartin wrote:
> Thanks Peter, I'll build a couple kernels for the bugzilla reporters and
> see what they turn up.
>

Looks good, one tester says things are promising at least.

Thanks so much for all your effort, Peter.

cheers, Kyle

2010-12-01 21:27:46

by tmhikaru

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

On Tue, Nov 30, 2010 at 03:59:05PM +0100, Peter Zijlstra wrote:
> On Tue, 2010-11-30 at 00:01 +0100, Peter Zijlstra wrote:
> >
> > Ok, that's good testing.. so its still not quite the same as NO_HZ=n,
> > how about this one?
> >
> > (it seems to drop down to 0.00 if I wait a few minutes with top -d5)
>
> OK, so here's a less crufty patch that gets the same result on my
> machine, load drops down to 0.00 after a while.
>
> It seems a bit slower to reach 0.00, but that could be because I
> actually changed the load computation for NO_HZ=n as well, I added a
> rounding factor in calc_load(), we no longer truncate the division.
>
> If people want to compare, simply remove the third line from
> calc_load(): load += 1UL << (FSHIFT - 1), to restore the old behaviour.

For some bizzare reason, this version has a small but noticable amount of
jitter and never really seems to hit 0.00 on my machine, tends to jump
around at low values between 0.03 to 0.08 on a routine basis:

16:20:42 up 16:31, 4 users, load average: 0.00, 0.01, 0.05

the jitter seems to have no visible reason for it happening; with no
networking, disk access or a process waking up and demanding attention from
the cpu, it goes back up.

Mind this is obviously NOT as horrible as it was originally, but I'd like to
find out why it's acting so differently.

I'm going to try this variant again with that line you were talking about
disabled and see if it reacts differently. I get the feeling if it's the
rounding factor - since you say that was changed for BOTH nohz=y and n, that
it's not really a problem in the first place, and likely is very low load
that wasn't being accurately reported before.


Tim McGrath


Attachments:
(No filename) (1.66 kB)
(No filename) (482.00 B)
Download all attachments

2010-12-02 10:17:08

by tmhikaru

[permalink] [raw]
Subject: Re: High CPU load when machine is idle (related to PROBLEM: Unusually high load average when idle in 2.6.35, 2.6.35.1 and later)

On Wed, Dec 01, 2010 at 04:27:38PM -0500, [email protected] wrote:
> On Tue, Nov 30, 2010 at 03:59:05PM +0100, Peter Zijlstra wrote:
> > On Tue, 2010-11-30 at 00:01 +0100, Peter Zijlstra wrote:
> > >
> > > Ok, that's good testing.. so its still not quite the same as NO_HZ=n,
> > > how about this one?
> > >
> > > (it seems to drop down to 0.00 if I wait a few minutes with top -d5)
> >
> > OK, so here's a less crufty patch that gets the same result on my
> > machine, load drops down to 0.00 after a while.
> >
> > It seems a bit slower to reach 0.00, but that could be because I
> > actually changed the load computation for NO_HZ=n as well, I added a
> > rounding factor in calc_load(), we no longer truncate the division.
> >
> > If people want to compare, simply remove the third line from
> > calc_load(): load += 1UL << (FSHIFT - 1), to restore the old behaviour.
>
> For some bizzare reason, this version has a small but noticable amount of
> jitter and never really seems to hit 0.00 on my machine, tends to jump
> around at low values between 0.03 to 0.08 on a routine basis:
>
> 16:20:42 up 16:31, 4 users, load average: 0.00, 0.01, 0.05
>
> the jitter seems to have no visible reason for it happening; with no
> networking, disk access or a process waking up and demanding attention from
> the cpu, it goes back up.
>
> Mind this is obviously NOT as horrible as it was originally, but I'd like to
> find out why it's acting so differently.
>
> I'm going to try this variant again with that line you were talking about
> disabled and see if it reacts differently. I get the feeling if it's the
> rounding factor - since you say that was changed for BOTH nohz=y and n, that
> it's not really a problem in the first place, and likely is very low load
> that wasn't being accurately reported before.

Indeed, this seems to be the case:

04:50:14 up 5:45, 5 users, load average: 0.00, 0.00, 0.00

the average seems to not be jittery, or at least noticably, and reacts as I
have expected it to in the past with that single line disabled; Since you
have said that this change would affect all load calculations I have not
tested how this patch with the line enabled/disabled reacts with nohz=n,
please let me know if you would like me to test that condition anyway.

Personally since it changes the previous behavior of the load calculation
I'd prefer that the rounding not be done.

Tim McGrath


Attachments:
(No filename) (2.36 kB)
(No filename) (482.00 B)
Download all attachments

2010-12-08 20:40:55

by Peter Zijlstra

[permalink] [raw]
Subject: [tip:sched/urgent] sched: Cure more NO_HZ load average woes

Commit-ID: 0f004f5a696a9434b7214d0d3cbd0525ee77d428
Gitweb: http://git.kernel.org/tip/0f004f5a696a9434b7214d0d3cbd0525ee77d428
Author: Peter Zijlstra <[email protected]>
AuthorDate: Tue, 30 Nov 2010 19:48:45 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 8 Dec 2010 20:15:04 +0100

sched: Cure more NO_HZ load average woes

There's a long-running regression that proved difficult to fix and
which is hitting certain people and is rather annoying in its effects.

Damien reported that after 74f5187ac8 (sched: Cure load average vs
NO_HZ woes) his load average is unnaturally high, he also noted that
even with that patch reverted the load avgerage numbers are not
correct.

The problem is that the previous patch only solved half the NO_HZ
problem, it addressed the part of going into NO_HZ mode, not of
comming out of NO_HZ mode. This patch implements that missing half.

When comming out of NO_HZ mode there are two important things to take
care of:

- Folding the pending idle delta into the global active count.
- Correctly aging the averages for the idle-duration.

So with this patch the NO_HZ interaction should be complete and
behaviour between CONFIG_NO_HZ=[yn] should be equivalent.

Furthermore, this patch slightly changes the load average computation
by adding a rounding term to the fixed point multiplication.

Reported-by: Damien Wyart <[email protected]>
Reported-by: Tim McGrath <[email protected]>
Tested-by: Damien Wyart <[email protected]>
Tested-by: Orion Poplawski <[email protected]>
Tested-by: Kyle McMartin <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: Chase Douglas <[email protected]>
LKML-Reference: <1291129145.32004.874.camel@laptop>
Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/sched.h | 2 +-
kernel/sched.c | 150 +++++++++++++++++++++++++++++++++++++++++++++----
kernel/timer.c | 2 +-
3 files changed, 141 insertions(+), 13 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2c79e92..2238745 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -143,7 +143,7 @@ extern unsigned long nr_iowait_cpu(int cpu);
extern unsigned long this_cpu_load(void);


-extern void calc_global_load(void);
+extern void calc_global_load(unsigned long ticks);

extern unsigned long get_parent_ip(unsigned long addr);

diff --git a/kernel/sched.c b/kernel/sched.c
index dc91a4d..6b7c26a 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3119,6 +3119,15 @@ static long calc_load_fold_active(struct rq *this_rq)
return delta;
}

+static unsigned long
+calc_load(unsigned long load, unsigned long exp, unsigned long active)
+{
+ load *= exp;
+ load += active * (FIXED_1 - exp);
+ load += 1UL << (FSHIFT - 1);
+ return load >> FSHIFT;
+}
+
#ifdef CONFIG_NO_HZ
/*
* For NO_HZ we delay the active fold to the next LOAD_FREQ update.
@@ -3148,6 +3157,128 @@ static long calc_load_fold_idle(void)

return delta;
}
+
+/**
+ * fixed_power_int - compute: x^n, in O(log n) time
+ *
+ * @x: base of the power
+ * @frac_bits: fractional bits of @x
+ * @n: power to raise @x to.
+ *
+ * By exploiting the relation between the definition of the natural power
+ * function: x^n := x*x*...*x (x multiplied by itself for n times), and
+ * the binary encoding of numbers used by computers: n := \Sum n_i * 2^i,
+ * (where: n_i \elem {0, 1}, the binary vector representing n),
+ * we find: x^n := x^(\Sum n_i * 2^i) := \Prod x^(n_i * 2^i), which is
+ * of course trivially computable in O(log_2 n), the length of our binary
+ * vector.
+ */
+static unsigned long
+fixed_power_int(unsigned long x, unsigned int frac_bits, unsigned int n)
+{
+ unsigned long result = 1UL << frac_bits;
+
+ if (n) for (;;) {
+ if (n & 1) {
+ result *= x;
+ result += 1UL << (frac_bits - 1);
+ result >>= frac_bits;
+ }
+ n >>= 1;
+ if (!n)
+ break;
+ x *= x;
+ x += 1UL << (frac_bits - 1);
+ x >>= frac_bits;
+ }
+
+ return result;
+}
+
+/*
+ * a1 = a0 * e + a * (1 - e)
+ *
+ * a2 = a1 * e + a * (1 - e)
+ * = (a0 * e + a * (1 - e)) * e + a * (1 - e)
+ * = a0 * e^2 + a * (1 - e) * (1 + e)
+ *
+ * a3 = a2 * e + a * (1 - e)
+ * = (a0 * e^2 + a * (1 - e) * (1 + e)) * e + a * (1 - e)
+ * = a0 * e^3 + a * (1 - e) * (1 + e + e^2)
+ *
+ * ...
+ *
+ * an = a0 * e^n + a * (1 - e) * (1 + e + ... + e^n-1) [1]
+ * = a0 * e^n + a * (1 - e) * (1 - e^n)/(1 - e)
+ * = a0 * e^n + a * (1 - e^n)
+ *
+ * [1] application of the geometric series:
+ *
+ * n 1 - x^(n+1)
+ * S_n := \Sum x^i = -------------
+ * i=0 1 - x
+ */
+static unsigned long
+calc_load_n(unsigned long load, unsigned long exp,
+ unsigned long active, unsigned int n)
+{
+
+ return calc_load(load, fixed_power_int(exp, FSHIFT, n), active);
+}
+
+/*
+ * NO_HZ can leave us missing all per-cpu ticks calling
+ * calc_load_account_active(), but since an idle CPU folds its delta into
+ * calc_load_tasks_idle per calc_load_account_idle(), all we need to do is fold
+ * in the pending idle delta if our idle period crossed a load cycle boundary.
+ *
+ * Once we've updated the global active value, we need to apply the exponential
+ * weights adjusted to the number of cycles missed.
+ */
+static void calc_global_nohz(unsigned long ticks)
+{
+ long delta, active, n;
+
+ if (time_before(jiffies, calc_load_update))
+ return;
+
+ /*
+ * If we crossed a calc_load_update boundary, make sure to fold
+ * any pending idle changes, the respective CPUs might have
+ * missed the tick driven calc_load_account_active() update
+ * due to NO_HZ.
+ */
+ delta = calc_load_fold_idle();
+ if (delta)
+ atomic_long_add(delta, &calc_load_tasks);
+
+ /*
+ * If we were idle for multiple load cycles, apply them.
+ */
+ if (ticks >= LOAD_FREQ) {
+ n = ticks / LOAD_FREQ;
+
+ active = atomic_long_read(&calc_load_tasks);
+ active = active > 0 ? active * FIXED_1 : 0;
+
+ avenrun[0] = calc_load_n(avenrun[0], EXP_1, active, n);
+ avenrun[1] = calc_load_n(avenrun[1], EXP_5, active, n);
+ avenrun[2] = calc_load_n(avenrun[2], EXP_15, active, n);
+
+ calc_load_update += n * LOAD_FREQ;
+ }
+
+ /*
+ * Its possible the remainder of the above division also crosses
+ * a LOAD_FREQ period, the regular check in calc_global_load()
+ * which comes after this will take care of that.
+ *
+ * Consider us being 11 ticks before a cycle completion, and us
+ * sleeping for 4*LOAD_FREQ + 22 ticks, then the above code will
+ * age us 4 cycles, and the test in calc_global_load() will
+ * pick up the final one.
+ */
+}
#else
static void calc_load_account_idle(struct rq *this_rq)
{
@@ -3157,6 +3288,10 @@ static inline long calc_load_fold_idle(void)
{
return 0;
}
+
+static void calc_global_nohz(unsigned long ticks)
+{
+}
#endif

/**
@@ -3174,24 +3309,17 @@ void get_avenrun(unsigned long *loads, unsigned long offset, int shift)
loads[2] = (avenrun[2] + offset) << shift;
}

-static unsigned long
-calc_load(unsigned long load, unsigned long exp, unsigned long active)
-{
- load *= exp;
- load += active * (FIXED_1 - exp);
- return load >> FSHIFT;
-}
-
/*
* calc_load - update the avenrun load estimates 10 ticks after the
* CPUs have updated calc_load_tasks.
*/
-void calc_global_load(void)
+void calc_global_load(unsigned long ticks)
{
- unsigned long upd = calc_load_update + 10;
long active;

- if (time_before(jiffies, upd))
+ calc_global_nohz(ticks);
+
+ if (time_before(jiffies, calc_load_update + 10))
return;

active = atomic_long_read(&calc_load_tasks);
diff --git a/kernel/timer.c b/kernel/timer.c
index 68a9ae7..7bd715f 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1319,7 +1319,7 @@ void do_timer(unsigned long ticks)
{
jiffies_64 += ticks;
update_wall_time();
- calc_global_load();
+ calc_global_load(ticks);
}

#ifdef __ARCH_WANT_SYS_ALARM