2014-02-24 08:13:08

by Tuukka Tikkanen

[permalink] [raw]
Subject: [PATCH 0/7] Cpuidle: Minor fixes

This set of patches makes some minor changes to menu governor and the poll
idle state.

Patch 1 is simply a rename of a variable to make the name better represent
the contained information.

Patch 2 fixes calculating actual residency in cases where the entered state
is different from the state decided by the menu governor.

Patch 3 makes sure the menu governor timer coefficients are not updated
with values that might cause a coefficient to reach a value greater than
unity.

Patch 4 fixes calculation actual residency in cases where the entered state
does not support measuring residency. In such cases the residency time
is taken from time remaining until next timer expiry. The timer is expected
to go off at the start of exit latency, not after it. Therefore the exit
latency must not be substracted from the assumed value.

Patch 5 moves the performance multiplier based comparison out of the state
selection loop by changing it into a latency requirement. This allows
using a generic state selection function accepting only (duration, latency)
tuple as input. The change is possible by noting that performance multiplier
is used only to check for a minimum predicted idle duration to exit latency
ratio. As predicted idle duration is a constant for the loop, the maximum
allowed latency can be calculated outside of the loop.

Patch 6 prevents using negative values from tick_nohz_get_sleep_length()
in the menu governor. If unchecked, the negative values are used as huge
unsigned values. Negative values occur fairly often (e.g. on x86_64 I've
seen this happen several times per minute) on a busy system, allowing
the deepest state to win the selection while the shallowest should be picked.

Patch 7 adds CPUIDLE_FLAG_TIME_VALID to poll_idle. I do not know of any
platfrom where cpu_relax() would break ktime_get() and in fact poll_idle
uses ktime_get itself.
(Note: poll_idle updates dev->last_residency for some reason. Does it ever
get called without going through cpuidle_enter_state, which will overwrite
the value? Even if some state redirects to this state, the call will
eventually return to the framework. The redundant time measurement could
be removed, unless there is some obscure way of getting called on some
platform that I am unable to figure out.)

Tuukka Tikkanen (7):
Cpuidle: rename expected_us to next_timer_us in menu governor
Cpuidle: Use actual state latency in menu governor
Cpuidle: Ensure menu coefficients stay within domain
Cpuidle: Do not substract exit latency from assumed sleep length
Cpuidle: Move perf multiplier calculation out of the selection loop
Cpuidle: Deal with timer expiring in the past
Cpuidle: poll state can measure residency

drivers/cpuidle/driver.c | 2 +-
drivers/cpuidle/governors/menu.c | 85 ++++++++++++++++++++++++--------------
2 files changed, 54 insertions(+), 33 deletions(-)

--
1.7.9.5


2014-02-24 07:43:55

by Tuukka Tikkanen

[permalink] [raw]
Subject: [PATCH 4/7] Cpuidle: Do not substract exit latency from assumed sleep length

The menu governor statistics update function tries to determine the
amount of time between entry to low power state and the occurrence
of the wakeup event. However, the time measured by the framework
includes exit latency on top of the desired value. This exit latency
is substracted from the measured value to obtain the desired value.

When measured value is not available, the menu governor assumes
the wakeup was caused by the timer and the time is equal to remaining
timer length. No exit latency should be substracted from this value.

This patch prevents the erroneous substraction and clarifies the
associated comment. It also removes one intermediate variable that
serves no purpose.

Signed-off-by: Tuukka Tikkanen <[email protected]>
---
drivers/cpuidle/governors/menu.c | 44 ++++++++++++++++++++++----------------
1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index f0995dd..b347c10 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -387,32 +387,40 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
{
struct menu_device *data = &__get_cpu_var(menu_devices);
int last_idx = data->last_state_idx;
- unsigned int last_idle_us = cpuidle_get_last_residency(dev);
struct cpuidle_state *target = &drv->states[last_idx];
unsigned int measured_us;
unsigned int new_factor;

/*
- * Ugh, this idle state doesn't support residency measurements, so we
- * are basically lost in the dark. As a compromise, assume we slept
- * for the whole expected time.
+ * Try to figure out how much time passed between entry to low
+ * power state and occurrence of the wakeup event.
+ *
+ * If the entered idle state didn't support residency measurements,
+ * we are basically lost in the dark how much time passed.
+ * As a compromise, assume we slept for the whole expected time.
+ *
+ * Any measured amount of time will include the exit latency.
+ * Since we are interested in when the wakeup begun, not when it
+ * was completed, we must substract the exit latency. However, if
+ * the measured amount of time is less than the exit latency,
+ * assume the state was never reached and the exit latency is 0.
*/
- if (unlikely(!(target->flags & CPUIDLE_FLAG_TIME_VALID)))
- last_idle_us = data->next_timer_us;
-
+ if (unlikely(!(target->flags & CPUIDLE_FLAG_TIME_VALID))) {
+ /* Use timer value as is */
+ measured_us = data->next_timer_us;

- measured_us = last_idle_us;
+ } else {
+ /* Use measured value */
+ measured_us = cpuidle_get_last_residency(dev);

- /*
- * We correct for the exit latency; we are assuming here that the
- * exit latency happens after the event that we're interested in.
- */
- if (measured_us > target->exit_latency)
- measured_us -= target->exit_latency;
+ /* Deduct exit latency */
+ if (measured_us > target->exit_latency)
+ measured_us -= target->exit_latency;

- /* Make sure our coefficients do not exceed unity */
- if (measured_us > data->next_timer_us)
- measured_us = data->next_timer_us;
+ /* Make sure our coefficients do not exceed unity */
+ if (measured_us > data->next_timer_us)
+ measured_us = data->next_timer_us;
+ }

/* Update our correction ratio */
new_factor = data->correction_factor[data->bucket];
@@ -439,7 +447,7 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
data->correction_factor[data->bucket] = new_factor;

/* update the repeating-pattern data */
- data->intervals[data->interval_ptr++] = last_idle_us;
+ data->intervals[data->interval_ptr++] = measured_us;
if (data->interval_ptr >= INTERVALS)
data->interval_ptr = 0;
}
--
1.7.9.5

2014-02-24 07:44:07

by Tuukka Tikkanen

[permalink] [raw]
Subject: [PATCH 6/7] Cpuidle: Deal with timer expiring in the past

Sometimes (fairly often) when the cpuidle menu governor is making a decision
about idle state to enter the next timer for the cpu appears to expire in
the past. The menu governor expects the expiry to always be in the future
and in fact stores the time delta in an unsigned variable. However, when
the expiry is in the past, the value returned by tick_nohz_get_sleep_length
can be negative. This patch prevents using negative values, instead making
the governor return immediately similar to having latency requirement set
to 0.

Note: As with latency == 0, the return value is 0 with no check to see if
the state 0 has been disabled or not.

Signed-off-by: Tuukka Tikkanen <[email protected]>
---
drivers/cpuidle/governors/menu.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 71b5232..c414468 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -302,8 +302,16 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
if (unlikely(latency_req == 0))
return 0;

- /* determine the expected residency time, round up */
+ /*
+ * Determine the expected residency time. If the time is negative,
+ * a timer interrupt has probably just expired after disabling
+ * interrupts. Return as quickly as possible in the most shallow
+ * state possible. tv_nsec is always positive, so only check the
+ * seconds.
+ */
t = ktime_to_timespec(tick_nohz_get_sleep_length());
+ if (t.tv_sec < 0)
+ return 0;
data->next_timer_us =
t.tv_sec * USEC_PER_SEC + t.tv_nsec / NSEC_PER_USEC;

--
1.7.9.5

2014-02-24 07:45:52

by Tuukka Tikkanen

[permalink] [raw]
Subject: [PATCH 1/7] Cpuidle: rename expected_us to next_timer_us in menu governor

The field expected_us is used to store the time remaining until next
timer expiry. The name is inaccurate, as we really do not expect all
wakeups to be caused by timers. In addition, another field with a very
similar name (predicted_us) is used to store the predicted time
remaining until any wakeup source being active.

This patch renames expected_us to next_timer_us in order to better
reflect the contained information.

Signed-off-by: Tuukka Tikkanen <[email protected]>
---
drivers/cpuidle/governors/menu.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index cf7f2f0..e9a2a27 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -122,7 +122,7 @@ struct menu_device {
int last_state_idx;
int needs_update;

- unsigned int expected_us;
+ unsigned int next_timer_us;
unsigned int predicted_us;
unsigned int exit_us;
unsigned int bucket;
@@ -257,7 +257,7 @@ again:
stddev = int_sqrt(stddev);
if (((avg > stddev * 6) && (divisor * 4 >= INTERVALS * 3))
|| stddev <= 20) {
- if (data->expected_us > avg)
+ if (data->next_timer_us > avg)
data->predicted_us = avg;
return;
}
@@ -306,11 +306,11 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)

/* determine the expected residency time, round up */
t = ktime_to_timespec(tick_nohz_get_sleep_length());
- data->expected_us =
+ data->next_timer_us =
t.tv_sec * USEC_PER_SEC + t.tv_nsec / NSEC_PER_USEC;


- data->bucket = which_bucket(data->expected_us);
+ data->bucket = which_bucket(data->next_timer_us);

multiplier = performance_multiplier();

@@ -326,7 +326,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
* operands are 32 bits.
* Make sure to round up for half microseconds.
*/
- data->predicted_us = div_round64((uint64_t)data->expected_us *
+ data->predicted_us = div_round64((uint64_t)data->next_timer_us *
data->correction_factor[data->bucket],
RESOLUTION * DECAY);

@@ -336,7 +336,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
* We want to default to C1 (hlt), not to busy polling
* unless the timer is happening really really soon.
*/
- if (data->expected_us > 5 &&
+ if (data->next_timer_us > 5 &&
!drv->states[CPUIDLE_DRIVER_STATE_START].disabled &&
dev->states_usage[CPUIDLE_DRIVER_STATE_START].disable == 0)
data->last_state_idx = CPUIDLE_DRIVER_STATE_START;
@@ -401,7 +401,7 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
* for the whole expected time.
*/
if (unlikely(!(target->flags & CPUIDLE_FLAG_TIME_VALID)))
- last_idle_us = data->expected_us;
+ last_idle_us = data->next_timer_us;


measured_us = last_idle_us;
@@ -418,8 +418,8 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
new_factor = data->correction_factor[data->bucket];
new_factor -= new_factor / DECAY;

- if (data->expected_us > 0 && measured_us < MAX_INTERESTING)
- new_factor += RESOLUTION * measured_us / data->expected_us;
+ if (data->next_timer_us > 0 && measured_us < MAX_INTERESTING)
+ new_factor += RESOLUTION * measured_us / data->next_timer_us;
else
/*
* we were idle so long that we count it as a perfect
--
1.7.9.5

2014-02-24 07:48:01

by Tuukka Tikkanen

[permalink] [raw]
Subject: [PATCH 2/7] Cpuidle: Use actual state latency in menu governor

Currently menu governor records the exit latency of the state it has
chosen for the idle period. The stored latency value is then later
used to calculate the actual length of the idle period. This value
may however be incorrect, as the entered state may not be the one
chosen by the governor. The entered state information is available,
so we can use that to obtain the real exit latency.

Signed-off-by: Tuukka Tikkanen <[email protected]>
---
drivers/cpuidle/governors/menu.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index e9a2a27..115386a 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -124,7 +124,6 @@ struct menu_device {

unsigned int next_timer_us;
unsigned int predicted_us;
- unsigned int exit_us;
unsigned int bucket;
unsigned int correction_factor[BUCKETS];
unsigned int intervals[INTERVALS];
@@ -298,7 +297,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
}

data->last_state_idx = 0;
- data->exit_us = 0;

/* Special case when user has set very strict latency requirement */
if (unlikely(latency_req == 0))
@@ -359,7 +357,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
continue;

data->last_state_idx = i;
- data->exit_us = s->exit_latency;
}

return data->last_state_idx;
@@ -410,8 +407,8 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
* We correct for the exit latency; we are assuming here that the
* exit latency happens after the event that we're interested in.
*/
- if (measured_us > data->exit_us)
- measured_us -= data->exit_us;
+ if (measured_us > target->exit_latency)
+ measured_us -= target->exit_latency;


/* Update our correction ratio */
--
1.7.9.5

2014-02-24 08:13:11

by Tuukka Tikkanen

[permalink] [raw]
Subject: [PATCH 5/7] Cpuidle: Move perf multiplier calculation out of the selection loop

The menu governor performance multiplier defines a minimum predicted
idle duration to latency ratio. Instead of checking this separately
in every iteration of the state selection loop, adjust the overall
latency restriction for the whole loop if this restriction is tighter
than what is set by the QoS subsystem.

The original test
s->exit_latency * multiplier > data->predicted_us
becomes
s->exit_latency > data->predicted_us / multiplier
by dividing both sides of the comparison by "multiplier".

While division is likely to be several times slower than multiplication,
the minor performance hit allows making a generic sleep state selection
function based on (sleep duration, maximum latency) tuple.

Signed-off-by: Tuukka Tikkanen <[email protected]>
---
drivers/cpuidle/governors/menu.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index b347c10..71b5232 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -288,7 +288,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
struct menu_device *data = &__get_cpu_var(menu_devices);
int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
int i;
- int multiplier;
+ unsigned int interactivity_req;
struct timespec t;

if (data->needs_update) {
@@ -310,8 +310,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)

data->bucket = which_bucket(data->next_timer_us);

- multiplier = performance_multiplier();
-
/*
* if the correction factor is 0 (eg first time init or cpu hotplug
* etc), we actually want to start out with a unity factor.
@@ -331,6 +329,15 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
get_typical_interval(data);

/*
+ * Performance multiplier defines a minimum predicted idle
+ * duration / latency ratio. Adjust the latency limit if
+ * necessary.
+ */
+ interactivity_req = data->predicted_us / performance_multiplier();
+ if (latency_req > interactivity_req)
+ latency_req = interactivity_req;
+
+ /*
* We want to default to C1 (hlt), not to busy polling
* unless the timer is happening really really soon.
*/
@@ -353,8 +360,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
continue;
if (s->exit_latency > latency_req)
continue;
- if (s->exit_latency * multiplier > data->predicted_us)
- continue;

data->last_state_idx = i;
}
--
1.7.9.5

2014-02-24 08:13:10

by Tuukka Tikkanen

[permalink] [raw]
Subject: [PATCH 3/7] Cpuidle: Ensure menu coefficients stay within domain

The menu governor uses coefficients as one method of actual idle
period length estimation. The coefficients are, as detailed below,
multipliers giving expected idle period length from time until next
timer expiry. The multipliers are supposed to have domain of (0..1].

The coefficients are fractions where only the numerators are stored
and denominators are a shared constant RESOLUTION*DECAY. Since the
value of the coefficient should always be greater than 0 and less
than or equal to 1, the numerator must have a value greater than
0 and less than or equal to RESOLUTION*DECAY.

If the coefficients are updated with measured idle durations exceeding
timer length, the multiplier may reach values exceeding unity (i.e.
the stored numerator exceeds RESOLUTION*DECAY). This patch ensures that
the multipliers are updated with durations capped to timer length.

Signed-off-by: Tuukka Tikkanen <[email protected]>
---
drivers/cpuidle/governors/menu.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 115386a..f0995dd 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -410,6 +410,9 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
if (measured_us > target->exit_latency)
measured_us -= target->exit_latency;

+ /* Make sure our coefficients do not exceed unity */
+ if (measured_us > data->next_timer_us)
+ measured_us = data->next_timer_us;

/* Update our correction ratio */
new_factor = data->correction_factor[data->bucket];
--
1.7.9.5

2014-02-24 08:13:07

by Tuukka Tikkanen

[permalink] [raw]
Subject: [PATCH 7/7] Cpuidle: poll state can measure residency

For some platforms, a poll state is inserted in the cpuidle driver states.
The flags for the state do not indicate that timekeeping is not affected.
As the state does not do anything apart from calling cpu_relax(), the
times returned by ktime_get should remain valid. Add the missing flag.

Signed-off-by: Tuukka Tikkanen <[email protected]>
---
drivers/cpuidle/driver.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
index 06dbe7c..136d6a2 100644
--- a/drivers/cpuidle/driver.c
+++ b/drivers/cpuidle/driver.c
@@ -209,7 +209,7 @@ static void poll_idle_init(struct cpuidle_driver *drv)
state->exit_latency = 0;
state->target_residency = 0;
state->power_usage = -1;
- state->flags = 0;
+ state->flags = CPUIDLE_FLAG_TIME_VALID;
state->enter = poll_idle;
state->disabled = false;
}
--
1.7.9.5

2014-02-24 16:52:29

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH 1/7] Cpuidle: rename expected_us to next_timer_us in menu governor

On Mon, 24 Feb 2014, Tuukka Tikkanen wrote:

> The field expected_us is used to store the time remaining until next
> timer expiry. The name is inaccurate, as we really do not expect all
> wakeups to be caused by timers. In addition, another field with a very
> similar name (predicted_us) is used to store the predicted time
> remaining until any wakeup source being active.
>
> This patch renames expected_us to next_timer_us in order to better
> reflect the contained information.
>
> Signed-off-by: Tuukka Tikkanen <[email protected]>

Acked-by: Nicolas Pitre <[email protected]>

> ---
> drivers/cpuidle/governors/menu.c | 18 +++++++++---------
> 1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index cf7f2f0..e9a2a27 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -122,7 +122,7 @@ struct menu_device {
> int last_state_idx;
> int needs_update;
>
> - unsigned int expected_us;
> + unsigned int next_timer_us;
> unsigned int predicted_us;
> unsigned int exit_us;
> unsigned int bucket;
> @@ -257,7 +257,7 @@ again:
> stddev = int_sqrt(stddev);
> if (((avg > stddev * 6) && (divisor * 4 >= INTERVALS * 3))
> || stddev <= 20) {
> - if (data->expected_us > avg)
> + if (data->next_timer_us > avg)
> data->predicted_us = avg;
> return;
> }
> @@ -306,11 +306,11 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>
> /* determine the expected residency time, round up */
> t = ktime_to_timespec(tick_nohz_get_sleep_length());
> - data->expected_us =
> + data->next_timer_us =
> t.tv_sec * USEC_PER_SEC + t.tv_nsec / NSEC_PER_USEC;
>
>
> - data->bucket = which_bucket(data->expected_us);
> + data->bucket = which_bucket(data->next_timer_us);
>
> multiplier = performance_multiplier();
>
> @@ -326,7 +326,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> * operands are 32 bits.
> * Make sure to round up for half microseconds.
> */
> - data->predicted_us = div_round64((uint64_t)data->expected_us *
> + data->predicted_us = div_round64((uint64_t)data->next_timer_us *
> data->correction_factor[data->bucket],
> RESOLUTION * DECAY);
>
> @@ -336,7 +336,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> * We want to default to C1 (hlt), not to busy polling
> * unless the timer is happening really really soon.
> */
> - if (data->expected_us > 5 &&
> + if (data->next_timer_us > 5 &&
> !drv->states[CPUIDLE_DRIVER_STATE_START].disabled &&
> dev->states_usage[CPUIDLE_DRIVER_STATE_START].disable == 0)
> data->last_state_idx = CPUIDLE_DRIVER_STATE_START;
> @@ -401,7 +401,7 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> * for the whole expected time.
> */
> if (unlikely(!(target->flags & CPUIDLE_FLAG_TIME_VALID)))
> - last_idle_us = data->expected_us;
> + last_idle_us = data->next_timer_us;
>
>
> measured_us = last_idle_us;
> @@ -418,8 +418,8 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> new_factor = data->correction_factor[data->bucket];
> new_factor -= new_factor / DECAY;
>
> - if (data->expected_us > 0 && measured_us < MAX_INTERESTING)
> - new_factor += RESOLUTION * measured_us / data->expected_us;
> + if (data->next_timer_us > 0 && measured_us < MAX_INTERESTING)
> + new_factor += RESOLUTION * measured_us / data->next_timer_us;
> else
> /*
> * we were idle so long that we count it as a perfect
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2014-02-24 16:57:18

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH 3/7] Cpuidle: Ensure menu coefficients stay within domain

On Mon, 24 Feb 2014, Tuukka Tikkanen wrote:

> The menu governor uses coefficients as one method of actual idle
> period length estimation. The coefficients are, as detailed below,
> multipliers giving expected idle period length from time until next
> timer expiry. The multipliers are supposed to have domain of (0..1].
>
> The coefficients are fractions where only the numerators are stored
> and denominators are a shared constant RESOLUTION*DECAY. Since the
> value of the coefficient should always be greater than 0 and less
> than or equal to 1, the numerator must have a value greater than
> 0 and less than or equal to RESOLUTION*DECAY.
>
> If the coefficients are updated with measured idle durations exceeding
> timer length, the multiplier may reach values exceeding unity (i.e.
> the stored numerator exceeds RESOLUTION*DECAY). This patch ensures that
> the multipliers are updated with durations capped to timer length.
>
> Signed-off-by: Tuukka Tikkanen <[email protected]>
Acked-by: Nicolas Pitre <[email protected]>


> ---
> drivers/cpuidle/governors/menu.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 115386a..f0995dd 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -410,6 +410,9 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> if (measured_us > target->exit_latency)
> measured_us -= target->exit_latency;
>
> + /* Make sure our coefficients do not exceed unity */
> + if (measured_us > data->next_timer_us)
> + measured_us = data->next_timer_us;
>
> /* Update our correction ratio */
> new_factor = data->correction_factor[data->bucket];
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2014-02-24 17:05:43

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH 6/7] Cpuidle: Deal with timer expiring in the past

On Mon, 24 Feb 2014, Tuukka Tikkanen wrote:

> Sometimes (fairly often) when the cpuidle menu governor is making a decision
> about idle state to enter the next timer for the cpu appears to expire in
> the past. The menu governor expects the expiry to always be in the future
> and in fact stores the time delta in an unsigned variable. However, when
> the expiry is in the past, the value returned by tick_nohz_get_sleep_length
> can be negative. This patch prevents using negative values, instead making
> the governor return immediately similar to having latency requirement set
> to 0.
>
> Note: As with latency == 0, the return value is 0 with no check to see if
> the state 0 has been disabled or not.

In your cover letter you mention some occurrences of the negative result
being observed on x86. That information is worth capturing in the
commit log as well to distinguish between theoretical problems from
actual observations.

>
> Signed-off-by: Tuukka Tikkanen <[email protected]>
> ---
> drivers/cpuidle/governors/menu.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 71b5232..c414468 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -302,8 +302,16 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> if (unlikely(latency_req == 0))
> return 0;
>
> - /* determine the expected residency time, round up */
> + /*
> + * Determine the expected residency time. If the time is negative,
> + * a timer interrupt has probably just expired after disabling
> + * interrupts. Return as quickly as possible in the most shallow
> + * state possible. tv_nsec is always positive, so only check the
> + * seconds.
> + */
> t = ktime_to_timespec(tick_nohz_get_sleep_length());
> + if (t.tv_sec < 0)
> + return 0;
> data->next_timer_us =
> t.tv_sec * USEC_PER_SEC + t.tv_nsec / NSEC_PER_USEC;
>
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2014-02-24 21:24:44

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH 0/7] Cpuidle: Minor fixes

On 02/24/2014 07:29 AM, Tuukka Tikkanen wrote:
> This set of patches makes some minor changes to menu governor and the poll
> idle state.
>
> Patch 1 is simply a rename of a variable to make the name better represent
> the contained information.
>
> Patch 2 fixes calculating actual residency in cases where the entered state
> is different from the state decided by the menu governor.
>
> Patch 3 makes sure the menu governor timer coefficients are not updated
> with values that might cause a coefficient to reach a value greater than
> unity.
>
> Patch 4 fixes calculation actual residency in cases where the entered state
> does not support measuring residency. In such cases the residency time
> is taken from time remaining until next timer expiry. The timer is expected
> to go off at the start of exit latency, not after it. Therefore the exit
> latency must not be substracted from the assumed value.
>
> Patch 5 moves the performance multiplier based comparison out of the state
> selection loop by changing it into a latency requirement. This allows
> using a generic state selection function accepting only (duration, latency)
> tuple as input. The change is possible by noting that performance multiplier
> is used only to check for a minimum predicted idle duration to exit latency
> ratio. As predicted idle duration is a constant for the loop, the maximum
> allowed latency can be calculated outside of the loop.
>
> Patch 6 prevents using negative values from tick_nohz_get_sleep_length()
> in the menu governor. If unchecked, the negative values are used as huge
> unsigned values. Negative values occur fairly often (e.g. on x86_64 I've
> seen this happen several times per minute) on a busy system, allowing
> the deepest state to win the selection while the shallowest should be picked.
>
> Patch 7 adds CPUIDLE_FLAG_TIME_VALID to poll_idle. I do not know of any
> platfrom where cpu_relax() would break ktime_get() and in fact poll_idle
> uses ktime_get itself.
> (Note: poll_idle updates dev->last_residency for some reason. Does it ever
> get called without going through cpuidle_enter_state, which will overwrite
> the value? Even if some state redirects to this state, the call will
> eventually return to the framework. The redundant time measurement could
> be removed, unless there is some obscure way of getting called on some
> platform that I am unable to figure out.)
>
> Tuukka Tikkanen (7):
> Cpuidle: rename expected_us to next_timer_us in menu governor
> Cpuidle: Use actual state latency in menu governor
> Cpuidle: Ensure menu coefficients stay within domain
> Cpuidle: Do not substract exit latency from assumed sleep length
> Cpuidle: Move perf multiplier calculation out of the selection loop
> Cpuidle: Deal with timer expiring in the past
> Cpuidle: poll state can measure residency
>
> drivers/cpuidle/driver.c | 2 +-
> drivers/cpuidle/governors/menu.c | 85 ++++++++++++++++++++++++--------------
> 2 files changed, 54 insertions(+), 33 deletions(-)

You should put Arjan van de Ven in Cc for the menu governor changes.

Thanks
-- Daniel


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-02-24 22:35:42

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH 2/7] Cpuidle: Use actual state latency in menu governor

On 02/24/2014 07:29 AM, Tuukka Tikkanen wrote:
> Currently menu governor records the exit latency of the state it has
> chosen for the idle period. The stored latency value is then later
> used to calculate the actual length of the idle period. This value
> may however be incorrect, as the entered state may not be the one
> chosen by the governor. The entered state information is available,
> so we can use that to obtain the real exit latency.
>
> Signed-off-by: Tuukka Tikkanen <[email protected]>

Acked-by: Daniel Lezcano <[email protected]>

> ---
> drivers/cpuidle/governors/menu.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index e9a2a27..115386a 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -124,7 +124,6 @@ struct menu_device {
>
> unsigned int next_timer_us;
> unsigned int predicted_us;
> - unsigned int exit_us;
> unsigned int bucket;
> unsigned int correction_factor[BUCKETS];
> unsigned int intervals[INTERVALS];
> @@ -298,7 +297,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> }
>
> data->last_state_idx = 0;
> - data->exit_us = 0;
>
> /* Special case when user has set very strict latency requirement */
> if (unlikely(latency_req == 0))
> @@ -359,7 +357,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> continue;
>
> data->last_state_idx = i;
> - data->exit_us = s->exit_latency;
> }
>
> return data->last_state_idx;
> @@ -410,8 +407,8 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> * We correct for the exit latency; we are assuming here that the
> * exit latency happens after the event that we're interested in.
> */
> - if (measured_us > data->exit_us)
> - measured_us -= data->exit_us;
> + if (measured_us > target->exit_latency)
> + measured_us -= target->exit_latency;
>
>
> /* Update our correction ratio */
>


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-02-25 00:05:33

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 6/7] Cpuidle: Deal with timer expiring in the past

On Monday, February 24, 2014 12:05:39 PM Nicolas Pitre wrote:
> On Mon, 24 Feb 2014, Tuukka Tikkanen wrote:
>
> > Sometimes (fairly often) when the cpuidle menu governor is making a decision
> > about idle state to enter the next timer for the cpu appears to expire in
> > the past. The menu governor expects the expiry to always be in the future
> > and in fact stores the time delta in an unsigned variable. However, when
> > the expiry is in the past, the value returned by tick_nohz_get_sleep_length
> > can be negative. This patch prevents using negative values, instead making
> > the governor return immediately similar to having latency requirement set
> > to 0.
> >
> > Note: As with latency == 0, the return value is 0 with no check to see if
> > the state 0 has been disabled or not.
>
> In your cover letter you mention some occurrences of the negative result
> being observed on x86. That information is worth capturing in the
> commit log as well to distinguish between theoretical problems from
> actual observations.

In my opinion that applies to the other patches in the series too. In all
cases it would be good to know whether or not the problems addressed are
theoretical or they can be reproduced in practice (and if so, then how), or
they already have been reported by somebody.

In the two latter cases the patches would be good candidates for -stable,
in particular.

Thanks,
Rafael

2014-02-25 10:56:42

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH 7/7] Cpuidle: poll state can measure residency

On 02/24/2014 07:29 AM, Tuukka Tikkanen wrote:
> For some platforms, a poll state is inserted in the cpuidle driver states.
> The flags for the state do not indicate that timekeeping is not affected.
> As the state does not do anything apart from calling cpu_relax(), the
> times returned by ktime_get should remain valid. Add the missing flag.

Yes, but it is done with the interrupt enabled, so the interrupt handler
+ softirq handler times will be accounted in the residency time.

I am not sure adding this flag is good.

> Signed-off-by: Tuukka Tikkanen <[email protected]>
> ---
> drivers/cpuidle/driver.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
> index 06dbe7c..136d6a2 100644
> --- a/drivers/cpuidle/driver.c
> +++ b/drivers/cpuidle/driver.c
> @@ -209,7 +209,7 @@ static void poll_idle_init(struct cpuidle_driver *drv)
> state->exit_latency = 0;
> state->target_residency = 0;
> state->power_usage = -1;
> - state->flags = 0;
> + state->flags = CPUIDLE_FLAG_TIME_VALID;
> state->enter = poll_idle;
> state->disabled = false;
> }
>


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-02-25 15:40:11

by Tuukka Tikkanen

[permalink] [raw]
Subject: Re: [PATCH 7/7] Cpuidle: poll state can measure residency

On 25 February 2014 12:56, Daniel Lezcano <[email protected]> wrote:
> On 02/24/2014 07:29 AM, Tuukka Tikkanen wrote:
>>
>> For some platforms, a poll state is inserted in the cpuidle driver states.
>> The flags for the state do not indicate that timekeeping is not affected.
>> As the state does not do anything apart from calling cpu_relax(), the
>> times returned by ktime_get should remain valid. Add the missing flag.
>
>
> Yes, but it is done with the interrupt enabled, so the interrupt handler +
> softirq handler times will be accounted in the residency time.
>
> I am not sure adding this flag is good.

Granted, with a slow CPU and very complex interrupt handing there
might be some extra time added. So let's consider what might happen:

Menu: Not having this flag assumes the residency matches time until
next timer expiry. Any measured amount of time less than that
indicates that the residency was shorter. We might not know exactly
how much, but we are closer to the truth and everything is fine. So
that leaves reporting time that exceeds what was established as the
time until next timer expiry. That too is OK (assuming another patch
in the series) as the value is limited to the time until next timer
expiry. In short, we get the same or better outcome and have no
issues.

Ladder: Last residency is assumed to be
last_state->threshold.promotion_time + 1 if the flag is not set.
Obviously we won't be considering demotion in that case and will go
into the promotion path. Any measured value below that is again closer
to the truth and any value at or above that is simply going to result
in the same outcome.

The only remaining issue is any hypothetical governor not currently in
the kernel tree. It would have to depend on accurate measurements
while being able to work with states that do not have the valid time
flag. I'm not sure if I can picture a scenario like that.

Tuukka

>
>> Signed-off-by: Tuukka Tikkanen <[email protected]>
>> ---
>> drivers/cpuidle/driver.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
>> index 06dbe7c..136d6a2 100644
>> --- a/drivers/cpuidle/driver.c
>> +++ b/drivers/cpuidle/driver.c
>> @@ -209,7 +209,7 @@ static void poll_idle_init(struct cpuidle_driver *drv)
>> state->exit_latency = 0;
>> state->target_residency = 0;
>> state->power_usage = -1;
>> - state->flags = 0;
>> + state->flags = CPUIDLE_FLAG_TIME_VALID;
>> state->enter = poll_idle;
>> state->disabled = false;
>> }
>>
>
>
> --
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-02-26 00:31:28

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 0/7] Cpuidle: Minor fixes

On Monday, February 24, 2014 08:29:30 AM Tuukka Tikkanen wrote:
> This set of patches makes some minor changes to menu governor and the poll
> idle state.
>
> Patch 1 is simply a rename of a variable to make the name better represent
> the contained information.
>
> Patch 2 fixes calculating actual residency in cases where the entered state
> is different from the state decided by the menu governor.
>
> Patch 3 makes sure the menu governor timer coefficients are not updated
> with values that might cause a coefficient to reach a value greater than
> unity.
>
> Patch 4 fixes calculation actual residency in cases where the entered state
> does not support measuring residency. In such cases the residency time
> is taken from time remaining until next timer expiry. The timer is expected
> to go off at the start of exit latency, not after it. Therefore the exit
> latency must not be substracted from the assumed value.
>
> Patch 5 moves the performance multiplier based comparison out of the state
> selection loop by changing it into a latency requirement. This allows
> using a generic state selection function accepting only (duration, latency)
> tuple as input. The change is possible by noting that performance multiplier
> is used only to check for a minimum predicted idle duration to exit latency
> ratio. As predicted idle duration is a constant for the loop, the maximum
> allowed latency can be calculated outside of the loop.
>
> Patch 6 prevents using negative values from tick_nohz_get_sleep_length()
> in the menu governor. If unchecked, the negative values are used as huge
> unsigned values. Negative values occur fairly often (e.g. on x86_64 I've
> seen this happen several times per minute) on a busy system, allowing
> the deepest state to win the selection while the shallowest should be picked.
>
> Patch 7 adds CPUIDLE_FLAG_TIME_VALID to poll_idle. I do not know of any
> platfrom where cpu_relax() would break ktime_get() and in fact poll_idle
> uses ktime_get itself.
> (Note: poll_idle updates dev->last_residency for some reason. Does it ever
> get called without going through cpuidle_enter_state, which will overwrite
> the value? Even if some state redirects to this state, the call will
> eventually return to the framework. The redundant time measurement could
> be removed, unless there is some obscure way of getting called on some
> platform that I am unable to figure out.)
>
> Tuukka Tikkanen (7):
> Cpuidle: rename expected_us to next_timer_us in menu governor
> Cpuidle: Use actual state latency in menu governor
> Cpuidle: Ensure menu coefficients stay within domain
> Cpuidle: Do not substract exit latency from assumed sleep length
> Cpuidle: Move perf multiplier calculation out of the selection loop
> Cpuidle: Deal with timer expiring in the past
> Cpuidle: poll state can measure residency
>
> drivers/cpuidle/driver.c | 2 +-
> drivers/cpuidle/governors/menu.c | 85 ++++++++++++++++++++++++--------------
> 2 files changed, 54 insertions(+), 33 deletions(-)

I'm queuing this series up for 3.15.

It looks like at least some of these patches should go into 3.14 and -stable,
but for that I'd need some references like links to bug reports or documented
reproducibility (on what hardware, what it takes to reproduce etc.).

Thanks!

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.