LinuxLists.cc - [PATCH 0/3] cpuidle: small improvements & fixes for menu governor (resend)

2015-11-03 22:34:29

Subject: [PATCH 0/3] cpuidle: small improvements & fixes for menu governor (resend)

While working on a paravirt cpuidle driver for KVM guests, I
noticed a number of small logic errors in the menu governor
code.

These patches should get rid of some artifacts that can break
the logic in the menu governor under certain corner cases, and
make idle state selection work better on CPUs with long C1 exit
latencies.

I have not seen any adverse effects with them in my (quick)
tests. As expected, they do not seem to do much on systems with
many power states and very low C1 exit latencies and target residencies.

2015-11-03 22:34:25

by Rik van Riel

[permalink] [raw]

Subject: [PATCH 1/3] cpuidle,x86: increase forced cut-off for polling to 20us

From: Rik van Riel <[email protected]>

The cpuidle menu governor has a forced cut-off for polling at 5us,
in order to deal with firmware that gives the OS bad information
on cpuidle states, leading to the system spending way too much time
in polling.

However, at least one x86 CPU family (Atom) has chips that have
a 20us break-even point for C1. Forcing the polling cut-off to
less than that wastes performance and power.

Increase the polling cut-off to 20us.

Systems with a lower C1 latency will be found in the states table by
the menu governor, which will pick those states as appropriate.

Signed-off-by: Rik van Riel <[email protected]>
---
drivers/cpuidle/governors/menu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 22e4463d1787..ecc242a586c9 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -330,7 +330,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
* We want to default to C1 (hlt), not to busy polling
* unless the timer is happening really really soon.
*/
- if (data->next_timer_us > 5 &&
+ if (data->next_timer_us > 20 &&
!drv->states[CPUIDLE_DRIVER_STATE_START].disabled &&
dev->states_usage[CPUIDLE_DRIVER_STATE_START].disable == 0)
data->last_state_idx = CPUIDLE_DRIVER_STATE_START;
--
2.1.0

2015-11-03 22:34:27

by Rik van Riel

[permalink] [raw]

Subject: [PATCH 2/3] cpuidle,menu: use interactivity_req to disable polling

From: Rik van Riel <[email protected]>

The menu governor carefully figures out how much time we typically
sleep for an estimated sleep interval, or whether there is a repeating
pattern going on, and corrects that estimate for the CPU load.

Then it proceeds to ignore that information when determining whether
or not to consider polling. This is not a big deal on most x86 CPUs,
which have very low C1 latencies, and the patch should not have any
effect on those CPUs.

However, certain CPUs (eg. Atom) have much higher C1 latencies, and
it would be good to not waste performance and power on those CPUs if
we are expecting a very low wakeup latency.

Disable polling based on the estimated interactivity requirement, not
on the time to the next timer interrupt.

Signed-off-by: Rik van Riel <[email protected]>
---
drivers/cpuidle/governors/menu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index ecc242a586c9..b1a55731f921 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -330,7 +330,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
* We want to default to C1 (hlt), not to busy polling
* unless the timer is happening really really soon.
*/
- if (data->next_timer_us > 20 &&
+ if (interactivity_req > 20 &&
!drv->states[CPUIDLE_DRIVER_STATE_START].disabled &&
dev->states_usage[CPUIDLE_DRIVER_STATE_START].disable == 0)
data->last_state_idx = CPUIDLE_DRIVER_STATE_START;
--
2.1.0

2015-11-03 22:34:26

by Rik van Riel

[permalink] [raw]

Subject: [PATCH 3/3] cpuidle,menu: smooth out measured_us calculation

From: Rik van Riel <[email protected]>

The cpuidle state tables contain the maximum exit latency for each
cpuidle state. On x86, that is the exit latency for when the entire
package goes into that same idle state.

However, a lot of the time we only go into the core idle state,
not the package idle state. This means we see a much smaller exit
latency.

We have no way to detect whether we went into the core or package
idle state while idle, and that is ok.

However, the current menu_update logic does have the potential to
trip up the repeating pattern detection in get_typical_interval.
If the system is experiencing an exit latency near the idle state's
exit latency, some of the samples will have exit_us subtracted,
while others will not. This turns a repeating pattern into mush,
potentially breaking get_typical_interval.

Furthermore, for smaller sleep intervals, we know the chance that
all the cores in the package went to the same idle state are fairly
small. Dividing the measured_us by two, instead of subtracting the
full exit latency when hitting a small measured_us, will reduce the
error.

Signed-off-by: Rik van Riel <[email protected]>
---
drivers/cpuidle/governors/menu.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index b1a55731f921..7b0971d97cc3 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -404,8 +404,10 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
measured_us = cpuidle_get_last_residency(dev);

/* Deduct exit latency */
- if (measured_us > target->exit_latency)
+ if (measured_us > 2 * target->exit_latency)
measured_us -= target->exit_latency;
+ else
+ measured_us /= 2;

/* Make sure our coefficients do not exceed unity */
if (measured_us > data->next_timer_us)
--
2.1.0

2015-11-04 16:01:25

by Arjan van de Ven

[permalink] [raw]

Subject: Re: [PATCH 1/3] cpuidle,x86: increase forced cut-off for polling to 20us

Acked-by: Arjan van de Ven <[email protected]>

2015-11-04 16:01:52

by Arjan van de Ven

[permalink] [raw]

Subject: Re: [PATCH 2/3] cpuidle,menu: use interactivity_req to disable polling

On 11/3/2015 2:34 PM, [email protected] wrote:
> From: Rik van Riel <[email protected]>
>
> The menu governor carefully figures out how much time we typically
> sleep for an estimated sleep interval, or whether there is a repeating
> pattern going on, and corrects that estimate for the CPU load.
>
> Then it proceeds to ignore that information when determining whether
> or not to consider polling. This is not a big deal on most x86 CPUs,
> which have very low C1 latencies, and the patch should not have any
> effect on those CPUs.
>
> However, certain CPUs (eg. Atom) have much higher C1 latencies, and
> it would be good to not waste performance and power on those CPUs if
> we are expecting a very low wakeup latency.
>
> Disable polling based on the estimated interactivity requirement, not
> on the time to the next timer interrupt.
>
good catch!

Acked-by: Arjan van de Ven <[email protected]>

2015-11-04 16:02:56

by Arjan van de Ven

[permalink] [raw]

Subject: Re: [PATCH 3/3] cpuidle,menu: smooth out measured_us calculation

On 11/3/2015 2:34 PM, [email protected] wrote:

> Furthermore, for smaller sleep intervals, we know the chance that
> all the cores in the package went to the same idle state are fairly
> small. Dividing the measured_us by two, instead of subtracting the
> full exit latency when hitting a small measured_us, will reduce the
> error.

there is no perfect answer for this issue; but at least this makes the situation
a lot better, so

Acked-by: Arjan van de Ven <[email protected]>

2015-11-05 22:05:20

by Rafael J. Wysocki

[permalink] [raw]

Subject: Re: [PATCH 0/3] cpuidle: small improvements & fixes for menu governor (resend)

On Tuesday, November 03, 2015 05:34:16 PM [email protected] wrote:
> While working on a paravirt cpuidle driver for KVM guests, I
> noticed a number of small logic errors in the menu governor
> code.
>
> These patches should get rid of some artifacts that can break
> the logic in the menu governor under certain corner cases, and
> make idle state selection work better on CPUs with long C1 exit
> latencies.
>
> I have not seen any adverse effects with them in my (quick)
> tests. As expected, they do not seem to do much on systems with
> many power states and very low C1 exit latencies and target residencies.

Thanks!

The patches look good to me.

I might apply [1-2/3] right away, but I'm a bit hesitant about the [3/3] (I'd
like it to spend some time in linux-next before it goes to Linus). Also, we've
lived without these changes for quite some time and I don't want to stretch the
process too much, so I'll queue them up for v4.5 if that's not a problem.

Thanks,
Rafael

2015-11-06 02:26:52

by Rik van Riel

[permalink] [raw]

Subject: Re: [PATCH 0/3] cpuidle: small improvements & fixes for menu governor (resend)

On 11/05/2015 05:34 PM, Rafael J. Wysocki wrote:
> On Tuesday, November 03, 2015 05:34:16 PM [email protected] wrote:
>> While working on a paravirt cpuidle driver for KVM guests, I
>> noticed a number of small logic errors in the menu governor
>> code.
>>
>> These patches should get rid of some artifacts that can break
>> the logic in the menu governor under certain corner cases, and
>> make idle state selection work better on CPUs with long C1 exit
>> latencies.
>>
>> I have not seen any adverse effects with them in my (quick)
>> tests. As expected, they do not seem to do much on systems with
>> many power states and very low C1 exit latencies and target residencies.
>
> Thanks!
>
> The patches look good to me.
>
> I might apply [1-2/3] right away, but I'm a bit hesitant about the [3/3] (I'd
> like it to spend some time in linux-next before it goes to Linus). Also, we've
> lived without these changes for quite some time and I don't want to stretch the
> process too much, so I'll queue them up for v4.5 if that's not a problem.

Not a problem at all. I am all for taking these changes carefully,
and seeing what happens.

I did some basic testing with it, but the permutations of what
can happen with cpuidle management are just too many to predict
in advance everything that could happen.

--
All rights reversed