2014-10-23 09:01:29

by Daniel Lezcano

[permalink] [raw]
Subject: [PATCH V2 1/5] sched: idle: cpuidle: Check the latency req before idle

When the pmqos latency requirement is set to zero that means "poll in all the
cases".

That is correctly implemented on x86 but not on the other archs.

As how is written the code, if the latency request is zero, the governor will
return zero, so corresponding, for x86, to the poll function, but for the
others arch the default idle function. For example, on ARM this is wait-for-
interrupt with a latency of '1', so violating the constraint.

In order to fix that, do the latency requirement check *before* calling the
cpuidle framework in order to jump to the poll function without entering
cpuidle. That has several benefits:

1. It clarifies and unifies the code
2. It fixes x86 vs other archs behavior
3. Factors out the call to the same function
4. Prevent to enter the cpuidle framework with its expensive cost in
calculation

As the latency_req is needed in all the cases, change the select API to take
the latency_req as parameter in case it is not equal to zero.

As a positive side effect, it introduces the latency constraint specified
externally, so one more step to the cpuidle/scheduler integration.

Signed-off-by: Daniel Lezcano <[email protected]>
Acked-by: Nicolas Pitre <[email protected]>
---
drivers/cpuidle/cpuidle.c | 5 +++--
drivers/cpuidle/governors/ladder.c | 9 +--------
drivers/cpuidle/governors/menu.c | 8 ++------
include/linux/cpuidle.h | 7 ++++---
kernel/sched/idle.c | 18 ++++++++++++++----
5 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index ee9df5e..372c36f 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -158,7 +158,8 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
*
* Returns the index of the idle state.
*/
-int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
+int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
+ int latency_req)
{
if (off || !initialized)
return -ENODEV;
@@ -169,7 +170,7 @@ int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
if (unlikely(use_deepest_state))
return cpuidle_find_deepest_state(drv, dev);

- return cpuidle_curr_governor->select(drv, dev);
+ return cpuidle_curr_governor->select(drv, dev, latency_req);
}

/**
diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
index 044ee0d..18f0da9 100644
--- a/drivers/cpuidle/governors/ladder.c
+++ b/drivers/cpuidle/governors/ladder.c
@@ -64,18 +64,11 @@ static inline void ladder_do_selection(struct ladder_device *ldev,
* @dev: the CPU
*/
static int ladder_select_state(struct cpuidle_driver *drv,
- struct cpuidle_device *dev)
+ struct cpuidle_device *dev, int latency_req)
{
struct ladder_device *ldev = &__get_cpu_var(ladder_devices);
struct ladder_device_state *last_state;
int last_residency, last_idx = ldev->last_state_idx;
- int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
-
- /* Special case when user has set very strict latency requirement */
- if (unlikely(latency_req == 0)) {
- ladder_do_selection(ldev, last_idx, 0);
- return 0;
- }

last_state = &ldev->states[last_idx];

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 34db2fb..96f8fb0 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -287,10 +287,10 @@ again:
* @drv: cpuidle driver containing state data
* @dev: the CPU
*/
-static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
+static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
+ int latency_req)
{
struct menu_device *data = &__get_cpu_var(menu_devices);
- int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
int i;
unsigned int interactivity_req;
unsigned long nr_iowaiters, cpu_load;
@@ -302,10 +302,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)

data->last_state_idx = CPUIDLE_DRIVER_STATE_START - 1;

- /* Special case when user has set very strict latency requirement */
- if (unlikely(latency_req == 0))
- return 0;
-
/* determine the expected residency time, round up */
data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length());

diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 25e0df6..fb465c1 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -122,7 +122,7 @@ struct cpuidle_driver {
extern void disable_cpuidle(void);

extern int cpuidle_select(struct cpuidle_driver *drv,
- struct cpuidle_device *dev);
+ struct cpuidle_device *dev, int latency_req);
extern int cpuidle_enter(struct cpuidle_driver *drv,
struct cpuidle_device *dev, int index);
extern void cpuidle_reflect(struct cpuidle_device *dev, int index);
@@ -150,7 +150,7 @@ extern struct cpuidle_driver *cpuidle_get_cpu_driver(struct cpuidle_device *dev)
#else
static inline void disable_cpuidle(void) { }
static inline int cpuidle_select(struct cpuidle_driver *drv,
- struct cpuidle_device *dev)
+ struct cpuidle_device *dev, int latency_req)
{return -ENODEV; }
static inline int cpuidle_enter(struct cpuidle_driver *drv,
struct cpuidle_device *dev, int index)
@@ -205,7 +205,8 @@ struct cpuidle_governor {
struct cpuidle_device *dev);

int (*select) (struct cpuidle_driver *drv,
- struct cpuidle_device *dev);
+ struct cpuidle_device *dev,
+ int latency_req);
void (*reflect) (struct cpuidle_device *dev, int index);

struct module *owner;
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 11e7bc4..25ba94d 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -5,6 +5,7 @@
#include <linux/cpu.h>
#include <linux/cpuidle.h>
#include <linux/tick.h>
+#include <linux/pm_qos.h>
#include <linux/mm.h>
#include <linux/stackprotector.h>

@@ -74,7 +75,7 @@ void __weak arch_cpu_idle(void)
* set, and it returns with polling set. If it ever stops polling, it
* must clear the polling bit.
*/
-static void cpuidle_idle_call(void)
+static void cpuidle_idle_call(unsigned int latency_req)
{
struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
@@ -107,7 +108,7 @@ static void cpuidle_idle_call(void)
* Ask the cpuidle framework to choose a convenient idle state.
* Fall back to the default arch idle method on errors.
*/
- next_state = cpuidle_select(drv, dev);
+ next_state = cpuidle_select(drv, dev, latency_req);
if (next_state < 0) {
use_default:
/*
@@ -182,6 +183,8 @@ exit_idle:
*/
static void cpu_idle_loop(void)
{
+ unsigned int latency_req;
+
while (1) {
/*
* If the arch has a polling bit, we maintain an invariant:
@@ -205,19 +208,26 @@ static void cpu_idle_loop(void)
local_irq_disable();
arch_cpu_idle_enter();

+ latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
+
/*
* In poll mode we reenable interrupts and spin.
*
+ * If the latency req is zero, we don't want to
+ * enter any idle state and we jump to the poll
+ * function directly
+ *
* Also if we detected in the wakeup from idle
* path that the tick broadcast device expired
* for us, we don't want to go deep idle as we
* know that the IPI is going to arrive right
* away
*/
- if (cpu_idle_force_poll || tick_check_broadcast_expired())
+ if (!latency_req || cpu_idle_force_poll ||
+ tick_check_broadcast_expired())
cpu_idle_poll();
else
- cpuidle_idle_call();
+ cpuidle_idle_call(latency_req);

arch_cpu_idle_exit();
}
--
1.9.1


2014-10-23 09:01:40

by Daniel Lezcano

[permalink] [raw]
Subject: [PATCH V2 3/5] cpuidle: idle: menu: Don't reflect when a state selection failed

In the current code, the check to reflect or not the outcoming state is done
against the idle state which has been chosen and its value.

Instead of doing a check in each of the reflect functions, just don't call reflect
if something went wrong in the idle path.

Signed-off-by: Daniel Lezcano <[email protected]>
Acked-by: Nicolas Pitre <[email protected]>
---
drivers/cpuidle/governors/ladder.c | 3 +--
drivers/cpuidle/governors/menu.c | 4 +---
kernel/sched/idle.c | 3 ++-
3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
index fb396d6..c0b36a8 100644
--- a/drivers/cpuidle/governors/ladder.c
+++ b/drivers/cpuidle/governors/ladder.c
@@ -165,8 +165,7 @@ static int ladder_enable_device(struct cpuidle_driver *drv,
static void ladder_reflect(struct cpuidle_device *dev, int index)
{
struct ladder_device *ldev = &__get_cpu_var(ladder_devices);
- if (index > 0)
- ldev->last_state_idx = index;
+ ldev->last_state_idx = index;
}

static struct cpuidle_governor ladder_governor = {
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index a17515f..3907301 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -365,9 +365,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
static void menu_reflect(struct cpuidle_device *dev, int index)
{
struct menu_device *data = &__get_cpu_var(menu_devices);
- data->last_state_idx = index;
- if (index >= 0)
- data->needs_update = 1;
+ data->needs_update = 1;
}

/**
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 58c7522..49dcc7d 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -162,7 +162,8 @@ use_default:
/*
* Give the governor an opportunity to reflect on the outcome
*/
- cpuidle_reflect(dev, entered_state);
+ if (entered_state >= 0)
+ cpuidle_reflect(dev, entered_state);

exit_idle:
__current_set_polling();
--
1.9.1

2014-10-23 09:01:39

by Daniel Lezcano

[permalink] [raw]
Subject: [PATCH V2 4/5] cpuidle: menu: Fix the get_typical_interval

The first time the 'get_typical_function' is called, it computes an average
of zero as no data is filled yet. That leads the 'data->predicted_us' variable
to be set to zero too.

The caller, 'menu_select' will then do:

interactivity_req = data->predicted_us /
performance_multiplier(nr_iowaiters, cpu_load);

That sets the interactivity_req to zero (0/performance...).

and then

if (latency_req > interactivity_req)
latency_req = interactivity_req;

... setting 'latency_req' to zero too.

No idle state will fulfill this constraint and we will go the C1 state as
default and leading to an update. So the next calls will compute an average
different from zero.

Even if that works with the current code but with a broken semantic, it will
just break with the next patches where we are stricter with the latencies
check: the first check will fail (latency_req is zero), then no update will
occur leading to always falling to choose an idle state.

As there are no previous values and it is pointless to compute a standard
deviation for these unexisting values. Just return without setting the
'data->predicted_us' to zero.

Signed-off-by: Daniel Lezcano <[email protected]>
---
drivers/cpuidle/governors/menu.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 3907301..6ae8390 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -226,6 +226,15 @@ again:
else
do_div(avg, divisor);

+ /*
+ * We are at the very beginning and no data have been filled
+ * yet. Let's skip the standard deviation computation
+ * otherwise the data->predicted_us will be zero and that will
+ * lead to a zero latency req in the select function
+ */
+ if (!avg)
+ return;
+
/* Then try to determine standard deviation */
stddev = 0;
for (i = 0; i < INTERVALS; i++) {
--
1.9.1

2014-10-23 09:02:26

by Daniel Lezcano

[permalink] [raw]
Subject: [PATCH V2 5/5] cpuidle: menu: Move the update function before its declaration

In order to prevent a pointless forward declaration, just move the function
at the beginning of the file.

This patch does not change the behavior of the governor, it is just code
reordering.

Signed-off-by: Daniel Lezcano <[email protected]>
---
drivers/cpuidle/governors/menu.c | 149 +++++++++++++++++++--------------------
1 file changed, 74 insertions(+), 75 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 6ae8390..0ac76b1 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -184,7 +184,6 @@ static inline int performance_multiplier(unsigned long nr_iowaiters, unsigned lo

static DEFINE_PER_CPU(struct menu_device, menu_devices);

-static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev);

/* This implements DIV_ROUND_CLOSEST but avoids 64 bit division */
static u64 div_round64(u64 dividend, u32 divisor)
@@ -192,6 +191,80 @@ static u64 div_round64(u64 dividend, u32 divisor)
return div_u64(dividend + (divisor / 2), divisor);
}

+/**
+ * menu_update - attempts to guess what happened after entry
+ * @drv: cpuidle driver containing state data
+ * @dev: the CPU
+ */
+static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
+{
+ struct menu_device *data = &__get_cpu_var(menu_devices);
+ int last_idx = data->last_state_idx;
+ struct cpuidle_state *target = &drv->states[last_idx];
+ unsigned int measured_us;
+ unsigned int new_factor;
+
+ /*
+ * Try to figure out how much time passed between entry to low
+ * power state and occurrence of the wakeup event.
+ *
+ * If the entered idle state didn't support residency measurements,
+ * we are basically lost in the dark how much time passed.
+ * As a compromise, assume we slept for the whole expected time.
+ *
+ * Any measured amount of time will include the exit latency.
+ * Since we are interested in when the wakeup begun, not when it
+ * was completed, we must subtract the exit latency. However, if
+ * the measured amount of time is less than the exit latency,
+ * assume the state was never reached and the exit latency is 0.
+ */
+ if (unlikely(!(target->flags & CPUIDLE_FLAG_TIME_VALID))) {
+ /* Use timer value as is */
+ measured_us = data->next_timer_us;
+
+ } else {
+ /* Use measured value */
+ measured_us = cpuidle_get_last_residency(dev);
+
+ /* Deduct exit latency */
+ if (measured_us > target->exit_latency)
+ measured_us -= target->exit_latency;
+
+ /* Make sure our coefficients do not exceed unity */
+ if (measured_us > data->next_timer_us)
+ measured_us = data->next_timer_us;
+ }
+
+ /* Update our correction ratio */
+ new_factor = data->correction_factor[data->bucket];
+ new_factor -= new_factor / DECAY;
+
+ if (data->next_timer_us > 0 && measured_us < MAX_INTERESTING)
+ new_factor += RESOLUTION * measured_us / data->next_timer_us;
+ else
+ /*
+ * we were idle so long that we count it as a perfect
+ * prediction
+ */
+ new_factor += RESOLUTION;
+
+ /*
+ * We don't want 0 as factor; we always want at least
+ * a tiny bit of estimated time. Fortunately, due to rounding,
+ * new_factor will stay nonzero regardless of measured_us values
+ * and the compiler can eliminate this test as long as DECAY > 1.
+ */
+ if (DECAY == 1 && unlikely(new_factor == 0))
+ new_factor = 1;
+
+ data->correction_factor[data->bucket] = new_factor;
+
+ /* update the repeating-pattern data */
+ data->intervals[data->interval_ptr++] = measured_us;
+ if (data->interval_ptr >= INTERVALS)
+ data->interval_ptr = 0;
+}
+
/*
* Try detecting repeating patterns by keeping track of the last 8
* intervals, and checking if the standard deviation of that set
@@ -378,80 +451,6 @@ static void menu_reflect(struct cpuidle_device *dev, int index)
}

/**
- * menu_update - attempts to guess what happened after entry
- * @drv: cpuidle driver containing state data
- * @dev: the CPU
- */
-static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
-{
- struct menu_device *data = &__get_cpu_var(menu_devices);
- int last_idx = data->last_state_idx;
- struct cpuidle_state *target = &drv->states[last_idx];
- unsigned int measured_us;
- unsigned int new_factor;
-
- /*
- * Try to figure out how much time passed between entry to low
- * power state and occurrence of the wakeup event.
- *
- * If the entered idle state didn't support residency measurements,
- * we are basically lost in the dark how much time passed.
- * As a compromise, assume we slept for the whole expected time.
- *
- * Any measured amount of time will include the exit latency.
- * Since we are interested in when the wakeup begun, not when it
- * was completed, we must subtract the exit latency. However, if
- * the measured amount of time is less than the exit latency,
- * assume the state was never reached and the exit latency is 0.
- */
- if (unlikely(!(target->flags & CPUIDLE_FLAG_TIME_VALID))) {
- /* Use timer value as is */
- measured_us = data->next_timer_us;
-
- } else {
- /* Use measured value */
- measured_us = cpuidle_get_last_residency(dev);
-
- /* Deduct exit latency */
- if (measured_us > target->exit_latency)
- measured_us -= target->exit_latency;
-
- /* Make sure our coefficients do not exceed unity */
- if (measured_us > data->next_timer_us)
- measured_us = data->next_timer_us;
- }
-
- /* Update our correction ratio */
- new_factor = data->correction_factor[data->bucket];
- new_factor -= new_factor / DECAY;
-
- if (data->next_timer_us > 0 && measured_us < MAX_INTERESTING)
- new_factor += RESOLUTION * measured_us / data->next_timer_us;
- else
- /*
- * we were idle so long that we count it as a perfect
- * prediction
- */
- new_factor += RESOLUTION;
-
- /*
- * We don't want 0 as factor; we always want at least
- * a tiny bit of estimated time. Fortunately, due to rounding,
- * new_factor will stay nonzero regardless of measured_us values
- * and the compiler can eliminate this test as long as DECAY > 1.
- */
- if (DECAY == 1 && unlikely(new_factor == 0))
- new_factor = 1;
-
- data->correction_factor[data->bucket] = new_factor;
-
- /* update the repeating-pattern data */
- data->intervals[data->interval_ptr++] = measured_us;
- if (data->interval_ptr >= INTERVALS)
- data->interval_ptr = 0;
-}
-
-/**
* menu_enable_device - scans a CPU's states and does setup
* @drv: cpuidle driver
* @dev: the CPU
--
1.9.1

2014-10-23 09:02:55

by Daniel Lezcano

[permalink] [raw]
Subject: [PATCH V2 2/5] sched: idle: Get the next timer event and pass it the cpuidle framework

Following the logic of the previous patch, retrieve from the idle task the
expected timer sleep duration and pass it to the cpuidle framework.

Take the opportunity to remove the unused headers in the menu.c file.

This patch does not change the current behavior.

Signed-off-by: Daniel Lezcano <[email protected]>
Acked-by: Nicolas Pitre <[email protected]>
---
drivers/cpuidle/cpuidle.c | 11 +++++------
drivers/cpuidle/governors/ladder.c | 3 ++-
drivers/cpuidle/governors/menu.c | 8 ++------
include/linux/cpuidle.h | 8 +++++---
kernel/sched/idle.c | 13 +++++++++----
5 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 372c36f..64f5800 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -8,16 +8,12 @@
* This code is licenced under the GPL.
*/

-#include <linux/clockchips.h>
#include <linux/kernel.h>
#include <linux/mutex.h>
-#include <linux/sched.h>
#include <linux/notifier.h>
#include <linux/pm_qos.h>
#include <linux/cpu.h>
#include <linux/cpuidle.h>
-#include <linux/ktime.h>
-#include <linux/hrtimer.h>
#include <linux/module.h>
#include <trace/events/power.h>

@@ -155,11 +151,13 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
*
* @drv: the cpuidle driver
* @dev: the cpuidle device
+ * @latency_req: the latency constraint when choosing an idle state
+ * @next_timer_event: the duration until the timer expires
*
* Returns the index of the idle state.
*/
int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
- int latency_req)
+ int latency_req, int next_timer_event)
{
if (off || !initialized)
return -ENODEV;
@@ -170,7 +168,8 @@ int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
if (unlikely(use_deepest_state))
return cpuidle_find_deepest_state(drv, dev);

- return cpuidle_curr_governor->select(drv, dev, latency_req);
+ return cpuidle_curr_governor->select(drv, dev, latency_req,
+ next_timer_event);
}

/**
diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
index 18f0da9..fb396d6 100644
--- a/drivers/cpuidle/governors/ladder.c
+++ b/drivers/cpuidle/governors/ladder.c
@@ -64,7 +64,8 @@ static inline void ladder_do_selection(struct ladder_device *ldev,
* @dev: the CPU
*/
static int ladder_select_state(struct cpuidle_driver *drv,
- struct cpuidle_device *dev, int latency_req)
+ struct cpuidle_device *dev,
+ int latency_req, int next_timer_event)
{
struct ladder_device *ldev = &__get_cpu_var(ladder_devices);
struct ladder_device_state *last_state;
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 96f8fb0..a17515f 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -13,10 +13,6 @@
#include <linux/kernel.h>
#include <linux/cpuidle.h>
#include <linux/pm_qos.h>
-#include <linux/time.h>
-#include <linux/ktime.h>
-#include <linux/hrtimer.h>
-#include <linux/tick.h>
#include <linux/sched.h>
#include <linux/math64.h>
#include <linux/module.h>
@@ -288,7 +284,7 @@ again:
* @dev: the CPU
*/
static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
- int latency_req)
+ int latency_req, int next_timer_event)
{
struct menu_device *data = &__get_cpu_var(menu_devices);
int i;
@@ -303,7 +299,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
data->last_state_idx = CPUIDLE_DRIVER_STATE_START - 1;

/* determine the expected residency time, round up */
- data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length());
+ data->next_timer_us = next_timer_event;

get_iowait_load(&nr_iowaiters, &cpu_load);
data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index fb465c1..d477746 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -122,7 +122,8 @@ struct cpuidle_driver {
extern void disable_cpuidle(void);

extern int cpuidle_select(struct cpuidle_driver *drv,
- struct cpuidle_device *dev, int latency_req);
+ struct cpuidle_device *dev,
+ int latency_req, int next_timer_event);
extern int cpuidle_enter(struct cpuidle_driver *drv,
struct cpuidle_device *dev, int index);
extern void cpuidle_reflect(struct cpuidle_device *dev, int index);
@@ -150,7 +151,8 @@ extern struct cpuidle_driver *cpuidle_get_cpu_driver(struct cpuidle_device *dev)
#else
static inline void disable_cpuidle(void) { }
static inline int cpuidle_select(struct cpuidle_driver *drv,
- struct cpuidle_device *dev, int latency_req)
+ struct cpuidle_device *dev,
+ int latency_req, int next_timer_event)
{return -ENODEV; }
static inline int cpuidle_enter(struct cpuidle_driver *drv,
struct cpuidle_device *dev, int index)
@@ -206,7 +208,7 @@ struct cpuidle_governor {

int (*select) (struct cpuidle_driver *drv,
struct cpuidle_device *dev,
- int latency_req);
+ int latency_req, int next_timer_event);
void (*reflect) (struct cpuidle_device *dev, int index);

struct module *owner;
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 25ba94d..58c7522 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -75,7 +75,8 @@ void __weak arch_cpu_idle(void)
* set, and it returns with polling set. If it ever stops polling, it
* must clear the polling bit.
*/
-static void cpuidle_idle_call(unsigned int latency_req)
+static void cpuidle_idle_call(unsigned int latency_req,
+ unsigned int next_timer_event)
{
struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
@@ -108,7 +109,7 @@ static void cpuidle_idle_call(unsigned int latency_req)
* Ask the cpuidle framework to choose a convenient idle state.
* Fall back to the default arch idle method on errors.
*/
- next_state = cpuidle_select(drv, dev, latency_req);
+ next_state = cpuidle_select(drv, dev, latency_req, next_timer_event);
if (next_state < 0) {
use_default:
/*
@@ -183,7 +184,7 @@ exit_idle:
*/
static void cpu_idle_loop(void)
{
- unsigned int latency_req;
+ unsigned int latency_req, next_timer_event;

while (1) {
/*
@@ -210,6 +211,9 @@ static void cpu_idle_loop(void)

latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);

+ next_timer_event =
+ ktime_to_us(tick_nohz_get_sleep_length());
+
/*
* In poll mode we reenable interrupts and spin.
*
@@ -227,7 +231,8 @@ static void cpu_idle_loop(void)
tick_check_broadcast_expired())
cpu_idle_poll();
else
- cpuidle_idle_call(latency_req);
+ cpuidle_idle_call(latency_req,
+ next_timer_event);

arch_cpu_idle_exit();
}
--
1.9.1

2014-10-23 16:44:04

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH V2 4/5] cpuidle: menu: Fix the get_typical_interval

On Thu, 23 Oct 2014, Daniel Lezcano wrote:

> The first time the 'get_typical_function' is called, it computes an average
> of zero as no data is filled yet. That leads the 'data->predicted_us' variable
> to be set to zero too.
>
> The caller, 'menu_select' will then do:
>
> interactivity_req = data->predicted_us /
> performance_multiplier(nr_iowaiters, cpu_load);
>
> That sets the interactivity_req to zero (0/performance...).
>
> and then
>
> if (latency_req > interactivity_req)
> latency_req = interactivity_req;
>
> ... setting 'latency_req' to zero too.
>
> No idle state will fulfill this constraint and we will go the C1 state as
> default and leading to an update. So the next calls will compute an average
> different from zero.
>
> Even if that works with the current code but with a broken semantic, it will
> just break with the next patches where we are stricter with the latencies
> check: the first check will fail (latency_req is zero), then no update will
> occur leading to always falling to choose an idle state.

s/falling/failing/

>
> As there are no previous values and it is pointless to compute a standard
> deviation for these unexisting values. Just return without setting the
> 'data->predicted_us' to zero.
>
> Signed-off-by: Daniel Lezcano <[email protected]>

Acked-by: Nicolas Pitre <[email protected]>

> ---
> drivers/cpuidle/governors/menu.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 3907301..6ae8390 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -226,6 +226,15 @@ again:
> else
> do_div(avg, divisor);
>
> + /*
> + * We are at the very beginning and no data have been filled
> + * yet. Let's skip the standard deviation computation
> + * otherwise the data->predicted_us will be zero and that will
> + * lead to a zero latency req in the select function
> + */
> + if (!avg)
> + return;
> +
> /* Then try to determine standard deviation */
> stddev = 0;
> for (i = 0; i < INTERVALS; i++) {
> --
> 1.9.1
>
>

2014-10-23 16:47:28

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH V2 5/5] cpuidle: menu: Move the update function before its declaration

On Thu, 23 Oct 2014, Daniel Lezcano wrote:

> In order to prevent a pointless forward declaration, just move the function
> at the beginning of the file.
>
> This patch does not change the behavior of the governor, it is just code
> reordering.
>
> Signed-off-by: Daniel Lezcano <[email protected]>

Acked-by: Nicolas Pitre <[email protected]>

> ---
> drivers/cpuidle/governors/menu.c | 149 +++++++++++++++++++--------------------
> 1 file changed, 74 insertions(+), 75 deletions(-)
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 6ae8390..0ac76b1 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -184,7 +184,6 @@ static inline int performance_multiplier(unsigned long nr_iowaiters, unsigned lo
>
> static DEFINE_PER_CPU(struct menu_device, menu_devices);
>
> -static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev);
>
> /* This implements DIV_ROUND_CLOSEST but avoids 64 bit division */
> static u64 div_round64(u64 dividend, u32 divisor)
> @@ -192,6 +191,80 @@ static u64 div_round64(u64 dividend, u32 divisor)
> return div_u64(dividend + (divisor / 2), divisor);
> }
>
> +/**
> + * menu_update - attempts to guess what happened after entry
> + * @drv: cpuidle driver containing state data
> + * @dev: the CPU
> + */
> +static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> +{
> + struct menu_device *data = &__get_cpu_var(menu_devices);
> + int last_idx = data->last_state_idx;
> + struct cpuidle_state *target = &drv->states[last_idx];
> + unsigned int measured_us;
> + unsigned int new_factor;
> +
> + /*
> + * Try to figure out how much time passed between entry to low
> + * power state and occurrence of the wakeup event.
> + *
> + * If the entered idle state didn't support residency measurements,
> + * we are basically lost in the dark how much time passed.
> + * As a compromise, assume we slept for the whole expected time.
> + *
> + * Any measured amount of time will include the exit latency.
> + * Since we are interested in when the wakeup begun, not when it
> + * was completed, we must subtract the exit latency. However, if
> + * the measured amount of time is less than the exit latency,
> + * assume the state was never reached and the exit latency is 0.
> + */
> + if (unlikely(!(target->flags & CPUIDLE_FLAG_TIME_VALID))) {
> + /* Use timer value as is */
> + measured_us = data->next_timer_us;
> +
> + } else {
> + /* Use measured value */
> + measured_us = cpuidle_get_last_residency(dev);
> +
> + /* Deduct exit latency */
> + if (measured_us > target->exit_latency)
> + measured_us -= target->exit_latency;
> +
> + /* Make sure our coefficients do not exceed unity */
> + if (measured_us > data->next_timer_us)
> + measured_us = data->next_timer_us;
> + }
> +
> + /* Update our correction ratio */
> + new_factor = data->correction_factor[data->bucket];
> + new_factor -= new_factor / DECAY;
> +
> + if (data->next_timer_us > 0 && measured_us < MAX_INTERESTING)
> + new_factor += RESOLUTION * measured_us / data->next_timer_us;
> + else
> + /*
> + * we were idle so long that we count it as a perfect
> + * prediction
> + */
> + new_factor += RESOLUTION;
> +
> + /*
> + * We don't want 0 as factor; we always want at least
> + * a tiny bit of estimated time. Fortunately, due to rounding,
> + * new_factor will stay nonzero regardless of measured_us values
> + * and the compiler can eliminate this test as long as DECAY > 1.
> + */
> + if (DECAY == 1 && unlikely(new_factor == 0))
> + new_factor = 1;
> +
> + data->correction_factor[data->bucket] = new_factor;
> +
> + /* update the repeating-pattern data */
> + data->intervals[data->interval_ptr++] = measured_us;
> + if (data->interval_ptr >= INTERVALS)
> + data->interval_ptr = 0;
> +}
> +
> /*
> * Try detecting repeating patterns by keeping track of the last 8
> * intervals, and checking if the standard deviation of that set
> @@ -378,80 +451,6 @@ static void menu_reflect(struct cpuidle_device *dev, int index)
> }
>
> /**
> - * menu_update - attempts to guess what happened after entry
> - * @drv: cpuidle driver containing state data
> - * @dev: the CPU
> - */
> -static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> -{
> - struct menu_device *data = &__get_cpu_var(menu_devices);
> - int last_idx = data->last_state_idx;
> - struct cpuidle_state *target = &drv->states[last_idx];
> - unsigned int measured_us;
> - unsigned int new_factor;
> -
> - /*
> - * Try to figure out how much time passed between entry to low
> - * power state and occurrence of the wakeup event.
> - *
> - * If the entered idle state didn't support residency measurements,
> - * we are basically lost in the dark how much time passed.
> - * As a compromise, assume we slept for the whole expected time.
> - *
> - * Any measured amount of time will include the exit latency.
> - * Since we are interested in when the wakeup begun, not when it
> - * was completed, we must subtract the exit latency. However, if
> - * the measured amount of time is less than the exit latency,
> - * assume the state was never reached and the exit latency is 0.
> - */
> - if (unlikely(!(target->flags & CPUIDLE_FLAG_TIME_VALID))) {
> - /* Use timer value as is */
> - measured_us = data->next_timer_us;
> -
> - } else {
> - /* Use measured value */
> - measured_us = cpuidle_get_last_residency(dev);
> -
> - /* Deduct exit latency */
> - if (measured_us > target->exit_latency)
> - measured_us -= target->exit_latency;
> -
> - /* Make sure our coefficients do not exceed unity */
> - if (measured_us > data->next_timer_us)
> - measured_us = data->next_timer_us;
> - }
> -
> - /* Update our correction ratio */
> - new_factor = data->correction_factor[data->bucket];
> - new_factor -= new_factor / DECAY;
> -
> - if (data->next_timer_us > 0 && measured_us < MAX_INTERESTING)
> - new_factor += RESOLUTION * measured_us / data->next_timer_us;
> - else
> - /*
> - * we were idle so long that we count it as a perfect
> - * prediction
> - */
> - new_factor += RESOLUTION;
> -
> - /*
> - * We don't want 0 as factor; we always want at least
> - * a tiny bit of estimated time. Fortunately, due to rounding,
> - * new_factor will stay nonzero regardless of measured_us values
> - * and the compiler can eliminate this test as long as DECAY > 1.
> - */
> - if (DECAY == 1 && unlikely(new_factor == 0))
> - new_factor = 1;
> -
> - data->correction_factor[data->bucket] = new_factor;
> -
> - /* update the repeating-pattern data */
> - data->intervals[data->interval_ptr++] = measured_us;
> - if (data->interval_ptr >= INTERVALS)
> - data->interval_ptr = 0;
> -}
> -
> -/**
> * menu_enable_device - scans a CPU's states and does setup
> * @drv: cpuidle driver
> * @dev: the CPU
> --
> 1.9.1
>
>

2014-10-28 02:01:41

by Len Brown

[permalink] [raw]
Subject: Re: [PATCH V2 3/5] cpuidle: idle: menu: Don't reflect when a state selection failed

On Thu, Oct 23, 2014 at 5:01 AM, Daniel Lezcano
<[email protected]> wrote:
> In the current code, the check to reflect or not the outcoming state is done
> against the idle state which has been chosen and its value.
>
> Instead of doing a check in each of the reflect functions, just don't call reflect
> if something went wrong in the idle path.
>
> Signed-off-by: Daniel Lezcano <[email protected]>
> Acked-by: Nicolas Pitre <[email protected]>
> ---
> drivers/cpuidle/governors/ladder.c | 3 +--
> drivers/cpuidle/governors/menu.c | 4 +---
> kernel/sched/idle.c | 3 ++-
> 3 files changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
> index fb396d6..c0b36a8 100644
> --- a/drivers/cpuidle/governors/ladder.c
> +++ b/drivers/cpuidle/governors/ladder.c
> @@ -165,8 +165,7 @@ static int ladder_enable_device(struct cpuidle_driver *drv,
> static void ladder_reflect(struct cpuidle_device *dev, int index)
> {
> struct ladder_device *ldev = &__get_cpu_var(ladder_devices);
> - if (index > 0)
> - ldev->last_state_idx = index;

Before this patch, last_state_idx was never set to 0 here.
After this patch, last_state_idx will be set to 0 when entered_state is 0.
Is that okay?

thanks,
-Len

> + ldev->last_state_idx = index;
> }
>
> static struct cpuidle_governor ladder_governor = {
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index a17515f..3907301 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -365,9 +365,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> static void menu_reflect(struct cpuidle_device *dev, int index)
> {
> struct menu_device *data = &__get_cpu_var(menu_devices);
> - data->last_state_idx = index;
> - if (index >= 0)
> - data->needs_update = 1;
> + data->needs_update = 1;
> }
>
> /**
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 58c7522..49dcc7d 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -162,7 +162,8 @@ use_default:
> /*
> * Give the governor an opportunity to reflect on the outcome
> */
> - cpuidle_reflect(dev, entered_state);
> + if (entered_state >= 0)
> + cpuidle_reflect(dev, entered_state);
>
> exit_idle:
> __current_set_polling();
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html



--
Len Brown, Intel Open Source Technology Center

2014-10-28 02:48:59

by Len Brown

[permalink] [raw]
Subject: Re: [PATCH V2 4/5] cpuidle: menu: Fix the get_typical_interval

On Thu, Oct 23, 2014 at 5:01 AM, Daniel Lezcano
<[email protected]> wrote:
> The first time the 'get_typical_function' is called, it computes an average
> of zero as no data is filled yet. That leads the 'data->predicted_us' variable
> to be set to zero too.
>
> The caller, 'menu_select' will then do:
>
> interactivity_req = data->predicted_us /
> performance_multiplier(nr_iowaiters, cpu_load);
>
> That sets the interactivity_req to zero (0/performance...).
>
> and then
>
> if (latency_req > interactivity_req)
> latency_req = interactivity_req;
>
> ... setting 'latency_req' to zero too.
>
> No idle state will fulfill this constraint and we will go the C1 state as
> default and leading to an update. So the next calls will compute an average
> different from zero.
>
> Even if that works with the current code but with a broken semantic, it will
> just break with the next patches where we are stricter with the latencies
> check: the first check will fail (latency_req is zero), then no update will
> occur leading to always falling to choose an idle state.
>
> As there are no previous values and it is pointless to compute a standard
> deviation for these unexisting values. Just return without setting the
> 'data->predicted_us' to zero.
>
> Signed-off-by: Daniel Lezcano <[email protected]>
> ---
> drivers/cpuidle/governors/menu.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 3907301..6ae8390 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -226,6 +226,15 @@ again:
> else
> do_div(avg, divisor);
>
> + /*
> + * We are at the very beginning and no data have been filled
> + * yet. Let's skip the standard deviation computation
> + * otherwise the data->predicted_us will be zero and that will
> + * lead to a zero latency req in the select function
> + */
> + if (!avg)
> + return;
> +

Unfortunately, you've touched ugly code,
and your (correct) patch makes it ever-so slightly more ugly,
instead of more clear.

I think the code would read more clearly, and your patch would
less obscure, if the code read something like this sow that it is
clear at the menu_select level when and where we monkey
with predicted_us:

menu_select()...
...
data->predicted_us = div_round64(bla bla bla

interactivity_overrride_us = get_typical_interval(data);

if (interactivity_override_us)
if (interactivity_predicted_us < data->predicted_us)
data->predicted_us = interactivity_override_us;

And, of course, down inside get_typical_interval()
...
if (!avg)
return 0;
...
if (likely(stddev <= ULONG_MAX)) {
...
return avg;

thanks,
Len Brown, Intel Open Source Technology Center

2014-10-28 02:53:43

by Len Brown

[permalink] [raw]
Subject: Re: [PATCH V2 5/5] cpuidle: menu: Move the update function before its declaration

>> This patch does not change the behavior of the governor, it is just code
>> reordering.
>>
>> Signed-off-by: Daniel Lezcano <[email protected]>


Acked-by: Len Brown <[email protected]>

2014-10-28 06:32:42

by Preeti Murthy

[permalink] [raw]
Subject: Re: [PATCH V2 1/5] sched: idle: cpuidle: Check the latency req before idle

Hi Daniel,

On Thu, Oct 23, 2014 at 2:31 PM, Daniel Lezcano
<[email protected]> wrote:
> When the pmqos latency requirement is set to zero that means "poll in all the
> cases".
>
> That is correctly implemented on x86 but not on the other archs.
>
> As how is written the code, if the latency request is zero, the governor will
> return zero, so corresponding, for x86, to the poll function, but for the
> others arch the default idle function. For example, on ARM this is wait-for-
> interrupt with a latency of '1', so violating the constraint.

This is not true actually. On PowerPC the idle state 0 has an exit_latency of 0.

>
> In order to fix that, do the latency requirement check *before* calling the
> cpuidle framework in order to jump to the poll function without entering
> cpuidle. That has several benefits:

Doing so actually hurts on PowerPC. Because the idle loop defined for
idle state 0 is different from what cpu_relax() does in cpu_idle_loop().
The spinning is more power efficient in the former case. Moreover we also set
certain register values which indicate an idle cpu. The ppc_runlatch bits
do precisely this. These register values are being read by some user space
tools. So we will end up breaking them with this patch

My suggestion is very well keep the latency requirement check in
kernel/sched/idle.c
like your doing in this patch. But before jumping to cpu_idle_loop verify if the
idle state 0 has an exit_latency > 0 in addition to your check on the
latency_req == 0.
If not, you can fall through to the regular path of calling into the
cpuidle driver.
The scheduler can query the cpuidle_driver structure anyway.

What do you think?

Regards
Preeti U Murthy

2014-10-28 07:01:36

by Preeti Murthy

[permalink] [raw]
Subject: Re: [PATCH V2 3/5] cpuidle: idle: menu: Don't reflect when a state selection failed

On Thu, Oct 23, 2014 at 2:31 PM, Daniel Lezcano
<[email protected]> wrote:
> In the current code, the check to reflect or not the outcoming state is done
> against the idle state which has been chosen and its value.
>
> Instead of doing a check in each of the reflect functions, just don't call reflect
> if something went wrong in the idle path.
>
> Signed-off-by: Daniel Lezcano <[email protected]>
> Acked-by: Nicolas Pitre <[email protected]>
> ---
> drivers/cpuidle/governors/ladder.c | 3 +--
> drivers/cpuidle/governors/menu.c | 4 +---
> kernel/sched/idle.c | 3 ++-
> 3 files changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
> index fb396d6..c0b36a8 100644
> --- a/drivers/cpuidle/governors/ladder.c
> +++ b/drivers/cpuidle/governors/ladder.c
> @@ -165,8 +165,7 @@ static int ladder_enable_device(struct cpuidle_driver *drv,
> static void ladder_reflect(struct cpuidle_device *dev, int index)
> {
> struct ladder_device *ldev = &__get_cpu_var(ladder_devices);
> - if (index > 0)
> - ldev->last_state_idx = index;
> + ldev->last_state_idx = index;
> }
>
> static struct cpuidle_governor ladder_governor = {
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index a17515f..3907301 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -365,9 +365,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> static void menu_reflect(struct cpuidle_device *dev, int index)
> {
> struct menu_device *data = &__get_cpu_var(menu_devices);
> - data->last_state_idx = index;
> - if (index >= 0)
> - data->needs_update = 1;
> + data->needs_update = 1;

Why is the last_state_idx not getting updated ?

Regards
Preeti U Murthy

2014-10-28 18:28:10

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH V2 3/5] cpuidle: idle: menu: Don't reflect when a state selection failed

On 10/28/2014 08:01 AM, Preeti Murthy wrote:
> On Thu, Oct 23, 2014 at 2:31 PM, Daniel Lezcano
> <[email protected]> wrote:
>> In the current code, the check to reflect or not the outcoming state is done
>> against the idle state which has been chosen and its value.
>>
>> Instead of doing a check in each of the reflect functions, just don't call reflect
>> if something went wrong in the idle path.
>>
>> Signed-off-by: Daniel Lezcano <[email protected]>
>> Acked-by: Nicolas Pitre <[email protected]>
>> ---
>> drivers/cpuidle/governors/ladder.c | 3 +--
>> drivers/cpuidle/governors/menu.c | 4 +---
>> kernel/sched/idle.c | 3 ++-
>> 3 files changed, 4 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
>> index fb396d6..c0b36a8 100644
>> --- a/drivers/cpuidle/governors/ladder.c
>> +++ b/drivers/cpuidle/governors/ladder.c
>> @@ -165,8 +165,7 @@ static int ladder_enable_device(struct cpuidle_driver *drv,
>> static void ladder_reflect(struct cpuidle_device *dev, int index)
>> {
>> struct ladder_device *ldev = &__get_cpu_var(ladder_devices);
>> - if (index > 0)
>> - ldev->last_state_idx = index;
>> + ldev->last_state_idx = index;
>> }
>>
>> static struct cpuidle_governor ladder_governor = {
>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
>> index a17515f..3907301 100644
>> --- a/drivers/cpuidle/governors/menu.c
>> +++ b/drivers/cpuidle/governors/menu.c
>> @@ -365,9 +365,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>> static void menu_reflect(struct cpuidle_device *dev, int index)
>> {
>> struct menu_device *data = &__get_cpu_var(menu_devices);
>> - data->last_state_idx = index;
>> - if (index >= 0)
>> - data->needs_update = 1;
>> + data->needs_update = 1;
>
> Why is the last_state_idx not getting updated ?

Oups, right. This is missing.

Thanks for pointing this out.

By the way, I don't think a back end driver is changing the selected
state currently and I am not sure this is desirable since we want to
trust the state we are going (as a best effort). So if the 'enter'
function does not change the index, that means the last_state_idx has
not to be changed since it has been assigned in the 'select' function.



--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-10-28 18:59:22

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH V2 1/5] sched: idle: cpuidle: Check the latency req before idle

On 10/28/2014 04:51 AM, Preeti Murthy wrote:
> Hi Daniel,
>
> On Thu, Oct 23, 2014 at 2:31 PM, Daniel Lezcano
> <[email protected]> wrote:
>> When the pmqos latency requirement is set to zero that means "poll in all the
>> cases".
>>
>> That is correctly implemented on x86 but not on the other archs.
>>
>> As how is written the code, if the latency request is zero, the governor will
>> return zero, so corresponding, for x86, to the poll function, but for the
>> others arch the default idle function. For example, on ARM this is wait-for-
>> interrupt with a latency of '1', so violating the constraint.
>
> This is not true actually. On PowerPC the idle state 0 has an exit_latency of 0.
>
>>
>> In order to fix that, do the latency requirement check *before* calling the
>> cpuidle framework in order to jump to the poll function without entering
>> cpuidle. That has several benefits:
>
> Doing so actually hurts on PowerPC. Because the idle loop defined for
> idle state 0 is different from what cpu_relax() does in cpu_idle_loop().
> The spinning is more power efficient in the former case. Moreover we also set
> certain register values which indicate an idle cpu. The ppc_runlatch bits
> do precisely this. These register values are being read by some user space
> tools. So we will end up breaking them with this patch
>
> My suggestion is very well keep the latency requirement check in
> kernel/sched/idle.c
> like your doing in this patch. But before jumping to cpu_idle_loop verify if the
> idle state 0 has an exit_latency > 0 in addition to your check on the
> latency_req == 0.
> If not, you can fall through to the regular path of calling into the
> cpuidle driver.
> The scheduler can query the cpuidle_driver structure anyway.
>
> What do you think?

Thanks for reviewing the patch and spotting this.

Wouldn't make sense to create:

void __weak_cpu_idle_poll(void) ?

and override it with your specific poll function ?

--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-10-28 19:15:58

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH V2 3/5] cpuidle: idle: menu: Don't reflect when a state selection failed

On 10/28/2014 03:01 AM, Len Brown wrote:
> On Thu, Oct 23, 2014 at 5:01 AM, Daniel Lezcano
> <[email protected]> wrote:
>> In the current code, the check to reflect or not the outcoming state is done
>> against the idle state which has been chosen and its value.
>>
>> Instead of doing a check in each of the reflect functions, just don't call reflect
>> if something went wrong in the idle path.
>>
>> Signed-off-by: Daniel Lezcano <[email protected]>
>> Acked-by: Nicolas Pitre <[email protected]>
>> ---
>> drivers/cpuidle/governors/ladder.c | 3 +--
>> drivers/cpuidle/governors/menu.c | 4 +---
>> kernel/sched/idle.c | 3 ++-
>> 3 files changed, 4 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
>> index fb396d6..c0b36a8 100644
>> --- a/drivers/cpuidle/governors/ladder.c
>> +++ b/drivers/cpuidle/governors/ladder.c
>> @@ -165,8 +165,7 @@ static int ladder_enable_device(struct cpuidle_driver *drv,
>> static void ladder_reflect(struct cpuidle_device *dev, int index)
>> {
>> struct ladder_device *ldev = &__get_cpu_var(ladder_devices);
>> - if (index > 0)
>> - ldev->last_state_idx = index;
>
> Before this patch, last_state_idx was never set to 0 here.
> After this patch, last_state_idx will be set to 0 when entered_state is 0.
> Is that okay?

Yes, I think so because the state zero won't be never selected on x86
but on the other arch it will. So before this patch, on the other archs,
the state 0 was never reflected as it should do.

This is resulting from the CPUIDLE_DRIVER_STATE_START macro (I hope I
can kill this macro in a couple of patchset after this one).

>> + ldev->last_state_idx = index;
>> }
>>
>> static struct cpuidle_governor ladder_governor = {
>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
>> index a17515f..3907301 100644
>> --- a/drivers/cpuidle/governors/menu.c
>> +++ b/drivers/cpuidle/governors/menu.c
>> @@ -365,9 +365,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>> static void menu_reflect(struct cpuidle_device *dev, int index)
>> {
>> struct menu_device *data = &__get_cpu_var(menu_devices);
>> - data->last_state_idx = index;
>> - if (index >= 0)
>> - data->needs_update = 1;
>> + data->needs_update = 1;
>> }
>>
>> /**
>> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
>> index 58c7522..49dcc7d 100644
>> --- a/kernel/sched/idle.c
>> +++ b/kernel/sched/idle.c
>> @@ -162,7 +162,8 @@ use_default:
>> /*
>> * Give the governor an opportunity to reflect on the outcome
>> */
>> - cpuidle_reflect(dev, entered_state);
>> + if (entered_state >= 0)
>> + cpuidle_reflect(dev, entered_state);
>>
>> exit_idle:
>> __current_set_polling();
>> --
>> 1.9.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-10-29 01:48:22

by Preeti U Murthy

[permalink] [raw]
Subject: Re: [PATCH V2 3/5] cpuidle: idle: menu: Don't reflect when a state selection failed

On 10/28/2014 11:58 PM, Daniel Lezcano wrote:
> On 10/28/2014 08:01 AM, Preeti Murthy wrote:
>> On Thu, Oct 23, 2014 at 2:31 PM, Daniel Lezcano
>> <[email protected]> wrote:
>>> In the current code, the check to reflect or not the outcoming state
>>> is done
>>> against the idle state which has been chosen and its value.
>>>
>>> Instead of doing a check in each of the reflect functions, just don't
>>> call reflect
>>> if something went wrong in the idle path.
>>>
>>> Signed-off-by: Daniel Lezcano <[email protected]>
>>> Acked-by: Nicolas Pitre <[email protected]>
>>> ---
>>> drivers/cpuidle/governors/ladder.c | 3 +--
>>> drivers/cpuidle/governors/menu.c | 4 +---
>>> kernel/sched/idle.c | 3 ++-
>>> 3 files changed, 4 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/cpuidle/governors/ladder.c
>>> b/drivers/cpuidle/governors/ladder.c
>>> index fb396d6..c0b36a8 100644
>>> --- a/drivers/cpuidle/governors/ladder.c
>>> +++ b/drivers/cpuidle/governors/ladder.c
>>> @@ -165,8 +165,7 @@ static int ladder_enable_device(struct
>>> cpuidle_driver *drv,
>>> static void ladder_reflect(struct cpuidle_device *dev, int index)
>>> {
>>> struct ladder_device *ldev = &__get_cpu_var(ladder_devices);
>>> - if (index > 0)
>>> - ldev->last_state_idx = index;
>>> + ldev->last_state_idx = index;
>>> }
>>>
>>> static struct cpuidle_governor ladder_governor = {
>>> diff --git a/drivers/cpuidle/governors/menu.c
>>> b/drivers/cpuidle/governors/menu.c
>>> index a17515f..3907301 100644
>>> --- a/drivers/cpuidle/governors/menu.c
>>> +++ b/drivers/cpuidle/governors/menu.c
>>> @@ -365,9 +365,7 @@ static int menu_select(struct cpuidle_driver
>>> *drv, struct cpuidle_device *dev,
>>> static void menu_reflect(struct cpuidle_device *dev, int index)
>>> {
>>> struct menu_device *data = &__get_cpu_var(menu_devices);
>>> - data->last_state_idx = index;
>>> - if (index >= 0)
>>> - data->needs_update = 1;
>>> + data->needs_update = 1;
>>
>> Why is the last_state_idx not getting updated ?
>
> Oups, right. This is missing.
>
> Thanks for pointing this out.
>
> By the way, I don't think a back end driver is changing the selected
> state currently and I am not sure this is desirable since we want to
> trust the state we are going (as a best effort). So if the 'enter'
> function does not change the index, that means the last_state_idx has
> not to be changed since it has been assigned in the 'select' function.

Hmm Right. So you might want to remove the last_state_idx update in
ladder_reflect() also?

Regards
Preeti U Murthy
>
>
>

2014-10-29 02:05:08

by Preeti U Murthy

[permalink] [raw]
Subject: Re: [PATCH V2 1/5] sched: idle: cpuidle: Check the latency req before idle

On 10/29/2014 12:29 AM, Daniel Lezcano wrote:
> On 10/28/2014 04:51 AM, Preeti Murthy wrote:
>> Hi Daniel,
>>
>> On Thu, Oct 23, 2014 at 2:31 PM, Daniel Lezcano
>> <[email protected]> wrote:
>>> When the pmqos latency requirement is set to zero that means "poll in
>>> all the
>>> cases".
>>>
>>> That is correctly implemented on x86 but not on the other archs.
>>>
>>> As how is written the code, if the latency request is zero, the
>>> governor will
>>> return zero, so corresponding, for x86, to the poll function, but for
>>> the
>>> others arch the default idle function. For example, on ARM this is
>>> wait-for-
>>> interrupt with a latency of '1', so violating the constraint.
>>
>> This is not true actually. On PowerPC the idle state 0 has an
>> exit_latency of 0.
>>
>>>
>>> In order to fix that, do the latency requirement check *before*
>>> calling the
>>> cpuidle framework in order to jump to the poll function without entering
>>> cpuidle. That has several benefits:
>>
>> Doing so actually hurts on PowerPC. Because the idle loop defined for
>> idle state 0 is different from what cpu_relax() does in cpu_idle_loop().
>> The spinning is more power efficient in the former case. Moreover we
>> also set
>> certain register values which indicate an idle cpu. The ppc_runlatch bits
>> do precisely this. These register values are being read by some user
>> space
>> tools. So we will end up breaking them with this patch
>>
>> My suggestion is very well keep the latency requirement check in
>> kernel/sched/idle.c
>> like your doing in this patch. But before jumping to cpu_idle_loop
>> verify if the
>> idle state 0 has an exit_latency > 0 in addition to your check on the
>> latency_req == 0.
>> If not, you can fall through to the regular path of calling into the
>> cpuidle driver.
>> The scheduler can query the cpuidle_driver structure anyway.
>>
>> What do you think?
>
> Thanks for reviewing the patch and spotting this.
>
> Wouldn't make sense to create:
>
> void __weak_cpu_idle_poll(void) ?
>
> and override it with your specific poll function ?
>

No this would become ugly as far as I can see. A weak function has to be
defined under arch/* code. We will either need to duplicate the idle
loop that we already have in the drivers or point the weak function to
the first idle state defined by our driver. Both of which is not
desirable (calling into the driver from arch code is ugly). Another
reason why I don't like the idea of a weak function is that if you have
missed looking at a specific driver and they have an idle loop with
features similar to on powerpc, you will have to spot it yourself and
include the arch specific cpu_idle_poll() for them.

But by having a check on the exit_latency, you are claiming that since
the driver's 0th idle state is no better than the generic idle loop in
cases of 0 latency req, we are better off calling the latter, which
looks reasonable. That way you don't have to bother about worsening the
idle loop behavior on any other driver.

Regards
Preeti U Murthy

2014-10-29 16:54:50

by Kevin Hilman

[permalink] [raw]
Subject: Re: [PATCH V2 3/5] cpuidle: idle: menu: Don't reflect when a state selection failed

Daniel Lezcano <[email protected]> writes:

> On 10/28/2014 08:01 AM, Preeti Murthy wrote:
>> On Thu, Oct 23, 2014 at 2:31 PM, Daniel Lezcano
>> <[email protected]> wrote:
>>> In the current code, the check to reflect or not the outcoming state is done
>>> against the idle state which has been chosen and its value.
>>>
>>> Instead of doing a check in each of the reflect functions, just don't call reflect
>>> if something went wrong in the idle path.
>>>
>>> Signed-off-by: Daniel Lezcano <[email protected]>
>>> Acked-by: Nicolas Pitre <[email protected]>
>>> ---
>>> drivers/cpuidle/governors/ladder.c | 3 +--
>>> drivers/cpuidle/governors/menu.c | 4 +---
>>> kernel/sched/idle.c | 3 ++-
>>> 3 files changed, 4 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
>>> index fb396d6..c0b36a8 100644
>>> --- a/drivers/cpuidle/governors/ladder.c
>>> +++ b/drivers/cpuidle/governors/ladder.c
>>> @@ -165,8 +165,7 @@ static int ladder_enable_device(struct cpuidle_driver *drv,
>>> static void ladder_reflect(struct cpuidle_device *dev, int index)
>>> {
>>> struct ladder_device *ldev = &__get_cpu_var(ladder_devices);
>>> - if (index > 0)
>>> - ldev->last_state_idx = index;
>>> + ldev->last_state_idx = index;
>>> }
>>>
>>> static struct cpuidle_governor ladder_governor = {
>>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
>>> index a17515f..3907301 100644
>>> --- a/drivers/cpuidle/governors/menu.c
>>> +++ b/drivers/cpuidle/governors/menu.c
>>> @@ -365,9 +365,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>>> static void menu_reflect(struct cpuidle_device *dev, int index)
>>> {
>>> struct menu_device *data = &__get_cpu_var(menu_devices);
>>> - data->last_state_idx = index;
>>> - if (index >= 0)
>>> - data->needs_update = 1;
>>> + data->needs_update = 1;
>>
>> Why is the last_state_idx not getting updated ?
>
> Oups, right. This is missing.
>
> Thanks for pointing this out.
>
> By the way, I don't think a back end driver is changing the selected
> state currently and I am not sure this is desirable since we want to
> trust the state we are going (as a best effort).

FYI, the OMAP3 backend driver can does not always obey the selected
state, and will return a different state than the one requested.

Kevin

2014-10-29 18:15:39

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH V2 4/5] cpuidle: menu: Fix the get_typical_interval

On 10/28/2014 03:48 AM, Len Brown wrote:
> On Thu, Oct 23, 2014 at 5:01 AM, Daniel Lezcano
> <[email protected]> wrote:
>> The first time the 'get_typical_function' is called, it computes an average
>> of zero as no data is filled yet. That leads the 'data->predicted_us' variable
>> to be set to zero too.
>>
>> The caller, 'menu_select' will then do:
>>
>> interactivity_req = data->predicted_us /
>> performance_multiplier(nr_iowaiters, cpu_load);
>>
>> That sets the interactivity_req to zero (0/performance...).
>>
>> and then
>>
>> if (latency_req > interactivity_req)
>> latency_req = interactivity_req;
>>
>> ... setting 'latency_req' to zero too.
>>
>> No idle state will fulfill this constraint and we will go the C1 state as
>> default and leading to an update. So the next calls will compute an average
>> different from zero.
>>
>> Even if that works with the current code but with a broken semantic, it will
>> just break with the next patches where we are stricter with the latencies
>> check: the first check will fail (latency_req is zero), then no update will
>> occur leading to always falling to choose an idle state.
>>
>> As there are no previous values and it is pointless to compute a standard
>> deviation for these unexisting values. Just return without setting the
>> 'data->predicted_us' to zero.
>>
>> Signed-off-by: Daniel Lezcano <[email protected]>
>> ---
>> drivers/cpuidle/governors/menu.c | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
>> index 3907301..6ae8390 100644
>> --- a/drivers/cpuidle/governors/menu.c
>> +++ b/drivers/cpuidle/governors/menu.c
>> @@ -226,6 +226,15 @@ again:
>> else
>> do_div(avg, divisor);
>>
>> + /*
>> + * We are at the very beginning and no data have been filled
>> + * yet. Let's skip the standard deviation computation
>> + * otherwise the data->predicted_us will be zero and that will
>> + * lead to a zero latency req in the select function
>> + */
>> + if (!avg)
>> + return;
>> +
>
> Unfortunately, you've touched ugly code,
> and your (correct) patch makes it ever-so slightly more ugly,
> instead of more clear.
>
> I think the code would read more clearly, and your patch would
> less obscure, if the code read something like this sow that it is
> clear at the menu_select level when and where we monkey
> with predicted_us:
>
> menu_select()...
> ...
> data->predicted_us = div_round64(bla bla bla
>
> interactivity_overrride_us = get_typical_interval(data);
>
> if (interactivity_override_us)
> if (interactivity_predicted_us < data->predicted_us)
> data->predicted_us = interactivity_override_us;
>
> And, of course, down inside get_typical_interval()
> ...
> if (!avg)
> return 0;
> ...
> if (likely(stddev <= ULONG_MAX)) {
> ...
> return avg;


Ok, thanks for the suggestion. I will look at reworking this patch.

-- Daniel


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-10-29 20:50:27

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH V2 3/5] cpuidle: idle: menu: Don't reflect when a state selection failed

On Wednesday, October 29, 2014 09:54:43 AM Kevin Hilman wrote:
> Daniel Lezcano <[email protected]> writes:
>
> > On 10/28/2014 08:01 AM, Preeti Murthy wrote:
> >> On Thu, Oct 23, 2014 at 2:31 PM, Daniel Lezcano
> >> <[email protected]> wrote:
> >>> In the current code, the check to reflect or not the outcoming state is done
> >>> against the idle state which has been chosen and its value.
> >>>
> >>> Instead of doing a check in each of the reflect functions, just don't call reflect
> >>> if something went wrong in the idle path.
> >>>
> >>> Signed-off-by: Daniel Lezcano <[email protected]>
> >>> Acked-by: Nicolas Pitre <[email protected]>
> >>> ---
> >>> drivers/cpuidle/governors/ladder.c | 3 +--
> >>> drivers/cpuidle/governors/menu.c | 4 +---
> >>> kernel/sched/idle.c | 3 ++-
> >>> 3 files changed, 4 insertions(+), 6 deletions(-)
> >>>
> >>> diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
> >>> index fb396d6..c0b36a8 100644
> >>> --- a/drivers/cpuidle/governors/ladder.c
> >>> +++ b/drivers/cpuidle/governors/ladder.c
> >>> @@ -165,8 +165,7 @@ static int ladder_enable_device(struct cpuidle_driver *drv,
> >>> static void ladder_reflect(struct cpuidle_device *dev, int index)
> >>> {
> >>> struct ladder_device *ldev = &__get_cpu_var(ladder_devices);
> >>> - if (index > 0)
> >>> - ldev->last_state_idx = index;
> >>> + ldev->last_state_idx = index;
> >>> }
> >>>
> >>> static struct cpuidle_governor ladder_governor = {
> >>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> >>> index a17515f..3907301 100644
> >>> --- a/drivers/cpuidle/governors/menu.c
> >>> +++ b/drivers/cpuidle/governors/menu.c
> >>> @@ -365,9 +365,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> >>> static void menu_reflect(struct cpuidle_device *dev, int index)
> >>> {
> >>> struct menu_device *data = &__get_cpu_var(menu_devices);
> >>> - data->last_state_idx = index;
> >>> - if (index >= 0)
> >>> - data->needs_update = 1;
> >>> + data->needs_update = 1;
> >>
> >> Why is the last_state_idx not getting updated ?
> >
> > Oups, right. This is missing.
> >
> > Thanks for pointing this out.
> >
> > By the way, I don't think a back end driver is changing the selected
> > state currently and I am not sure this is desirable since we want to
> > trust the state we are going (as a best effort).
>
> FYI, the OMAP3 backend driver can does not always obey the selected
> state, and will return a different state than the one requested.

The ACPI driver can do that as well.

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2014-11-05 14:28:42

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH V2 1/5] sched: idle: cpuidle: Check the latency req before idle

On 10/29/2014 03:01 AM, Preeti U Murthy wrote:
> On 10/29/2014 12:29 AM, Daniel Lezcano wrote:
>> On 10/28/2014 04:51 AM, Preeti Murthy wrote:
>>> Hi Daniel,
>>>
>>> On Thu, Oct 23, 2014 at 2:31 PM, Daniel Lezcano
>>> <[email protected]> wrote:
>>>> When the pmqos latency requirement is set to zero that means "poll in
>>>> all the
>>>> cases".
>>>>
>>>> That is correctly implemented on x86 but not on the other archs.
>>>>
>>>> As how is written the code, if the latency request is zero, the
>>>> governor will
>>>> return zero, so corresponding, for x86, to the poll function, but for
>>>> the
>>>> others arch the default idle function. For example, on ARM this is
>>>> wait-for-
>>>> interrupt with a latency of '1', so violating the constraint.
>>>
>>> This is not true actually. On PowerPC the idle state 0 has an
>>> exit_latency of 0.
>>>
>>>>
>>>> In order to fix that, do the latency requirement check *before*
>>>> calling the
>>>> cpuidle framework in order to jump to the poll function without entering
>>>> cpuidle. That has several benefits:
>>>
>>> Doing so actually hurts on PowerPC. Because the idle loop defined for
>>> idle state 0 is different from what cpu_relax() does in cpu_idle_loop().
>>> The spinning is more power efficient in the former case. Moreover we
>>> also set
>>> certain register values which indicate an idle cpu. The ppc_runlatch bits
>>> do precisely this. These register values are being read by some user
>>> space
>>> tools. So we will end up breaking them with this patch
>>>
>>> My suggestion is very well keep the latency requirement check in
>>> kernel/sched/idle.c
>>> like your doing in this patch. But before jumping to cpu_idle_loop
>>> verify if the
>>> idle state 0 has an exit_latency > 0 in addition to your check on the
>>> latency_req == 0.
>>> If not, you can fall through to the regular path of calling into the
>>> cpuidle driver.
>>> The scheduler can query the cpuidle_driver structure anyway.
>>>
>>> What do you think?
>>
>> Thanks for reviewing the patch and spotting this.
>>
>> Wouldn't make sense to create:
>>
>> void __weak_cpu_idle_poll(void) ?
>>
>> and override it with your specific poll function ?
>>
>
> No this would become ugly as far as I can see. A weak function has to be
> defined under arch/* code. We will either need to duplicate the idle
> loop that we already have in the drivers or point the weak function to
> the first idle state defined by our driver. Both of which is not
> desirable (calling into the driver from arch code is ugly). Another
> reason why I don't like the idea of a weak function is that if you have
> missed looking at a specific driver and they have an idle loop with
> features similar to on powerpc, you will have to spot it yourself and
> include the arch specific cpu_idle_poll() for them.

Yes, I agree this is a fair point. But actually I don't see the interest
of having the poll loop in the cpuidle driver. These cleanups are
preparing the removal of the CPUIDLE_DRIVER_STATE_START macro which
leads to a lot of mess in the cpuidle code.

With the removal of this macro, we should be able to move the select
loop from the menu governor and use it everywhere else. Furthermore,
this state which is flagged with TIME_VALID, isn't because the local
interrupt are enabled so we are measuring the interrupt time processing.
Beside that the idle loop for x86 is mostly not used.

So the idea would be to extract those idle loop from the drivers and use
them directly when:
1. the idle selection fails (use the poll loop under certain
circumstances we have to redefine)
2. when the latency req is zero

That will result in a cleaner code in cpuidle and in the governor.

Do you agree with that ?

> But by having a check on the exit_latency, you are claiming that since
> the driver's 0th idle state is no better than the generic idle loop in
> cases of 0 latency req, we are better off calling the latter, which
> looks reasonable. That way you don't have to bother about worsening the
> idle loop behavior on any other driver.





--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-11-05 21:37:07

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH V2 1/5] sched: idle: cpuidle: Check the latency req before idle

On Thursday, October 23, 2014 11:01:17 AM Daniel Lezcano wrote:
> When the pmqos latency requirement is set to zero that means "poll in all the
> cases".
>
> That is correctly implemented on x86 but not on the other archs.
>
> As how is written the code, if the latency request is zero, the governor will
> return zero, so corresponding, for x86, to the poll function, but for the
> others arch the default idle function. For example, on ARM this is wait-for-
> interrupt with a latency of '1', so violating the constraint.
>
> In order to fix that, do the latency requirement check *before* calling the
> cpuidle framework in order to jump to the poll function without entering
> cpuidle. That has several benefits:
>
> 1. It clarifies and unifies the code
> 2. It fixes x86 vs other archs behavior
> 3. Factors out the call to the same function
> 4. Prevent to enter the cpuidle framework with its expensive cost in
> calculation
>
> As the latency_req is needed in all the cases, change the select API to take
> the latency_req as parameter in case it is not equal to zero.
>
> As a positive side effect, it introduces the latency constraint specified
> externally, so one more step to the cpuidle/scheduler integration.

I'm expecting to see a new version of this patchset relatively soon.

Are you planning to send one?

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2014-11-05 21:41:13

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH V2 1/5] sched: idle: cpuidle: Check the latency req before idle

On 11/05/2014 10:57 PM, Rafael J. Wysocki wrote:
> On Thursday, October 23, 2014 11:01:17 AM Daniel Lezcano wrote:
>> When the pmqos latency requirement is set to zero that means "poll in all the
>> cases".
>>
>> That is correctly implemented on x86 but not on the other archs.
>>
>> As how is written the code, if the latency request is zero, the governor will
>> return zero, so corresponding, for x86, to the poll function, but for the
>> others arch the default idle function. For example, on ARM this is wait-for-
>> interrupt with a latency of '1', so violating the constraint.
>>
>> In order to fix that, do the latency requirement check *before* calling the
>> cpuidle framework in order to jump to the poll function without entering
>> cpuidle. That has several benefits:
>>
>> 1. It clarifies and unifies the code
>> 2. It fixes x86 vs other archs behavior
>> 3. Factors out the call to the same function
>> 4. Prevent to enter the cpuidle framework with its expensive cost in
>> calculation
>>
>> As the latency_req is needed in all the cases, change the select API to take
>> the latency_req as parameter in case it is not equal to zero.
>>
>> As a positive side effect, it introduces the latency constraint specified
>> externally, so one more step to the cpuidle/scheduler integration.
>
> I'm expecting to see a new version of this patchset relatively soon.
>
> Are you planning to send one?

I would like to find an agreement with Preeti. But, yes, I am on it.

-- Daniel

--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-11-06 04:08:35

by Preeti U Murthy

[permalink] [raw]
Subject: Re: [PATCH V2 1/5] sched: idle: cpuidle: Check the latency req before idle

On 11/05/2014 07:58 PM, Daniel Lezcano wrote:
> On 10/29/2014 03:01 AM, Preeti U Murthy wrote:
>> On 10/29/2014 12:29 AM, Daniel Lezcano wrote:
>>> On 10/28/2014 04:51 AM, Preeti Murthy wrote:
>>>> Hi Daniel,
>>>>
>>>> On Thu, Oct 23, 2014 at 2:31 PM, Daniel Lezcano
>>>> <[email protected]> wrote:
>>>>> When the pmqos latency requirement is set to zero that means "poll in
>>>>> all the
>>>>> cases".
>>>>>
>>>>> That is correctly implemented on x86 but not on the other archs.
>>>>>
>>>>> As how is written the code, if the latency request is zero, the
>>>>> governor will
>>>>> return zero, so corresponding, for x86, to the poll function, but for
>>>>> the
>>>>> others arch the default idle function. For example, on ARM this is
>>>>> wait-for-
>>>>> interrupt with a latency of '1', so violating the constraint.
>>>>
>>>> This is not true actually. On PowerPC the idle state 0 has an
>>>> exit_latency of 0.
>>>>
>>>>>
>>>>> In order to fix that, do the latency requirement check *before*
>>>>> calling the
>>>>> cpuidle framework in order to jump to the poll function without
>>>>> entering
>>>>> cpuidle. That has several benefits:
>>>>
>>>> Doing so actually hurts on PowerPC. Because the idle loop defined for
>>>> idle state 0 is different from what cpu_relax() does in
>>>> cpu_idle_loop().
>>>> The spinning is more power efficient in the former case. Moreover we
>>>> also set
>>>> certain register values which indicate an idle cpu. The ppc_runlatch
>>>> bits
>>>> do precisely this. These register values are being read by some user
>>>> space
>>>> tools. So we will end up breaking them with this patch
>>>>
>>>> My suggestion is very well keep the latency requirement check in
>>>> kernel/sched/idle.c
>>>> like your doing in this patch. But before jumping to cpu_idle_loop
>>>> verify if the
>>>> idle state 0 has an exit_latency > 0 in addition to your check on the
>>>> latency_req == 0.
>>>> If not, you can fall through to the regular path of calling into the
>>>> cpuidle driver.
>>>> The scheduler can query the cpuidle_driver structure anyway.
>>>>
>>>> What do you think?
>>>
>>> Thanks for reviewing the patch and spotting this.
>>>
>>> Wouldn't make sense to create:
>>>
>>> void __weak_cpu_idle_poll(void) ?
>>>
>>> and override it with your specific poll function ?
>>>
>>
>> No this would become ugly as far as I can see. A weak function has to be
>> defined under arch/* code. We will either need to duplicate the idle
>> loop that we already have in the drivers or point the weak function to
>> the first idle state defined by our driver. Both of which is not
>> desirable (calling into the driver from arch code is ugly). Another
>> reason why I don't like the idea of a weak function is that if you have
>> missed looking at a specific driver and they have an idle loop with
>> features similar to on powerpc, you will have to spot it yourself and
>> include the arch specific cpu_idle_poll() for them.
>
> Yes, I agree this is a fair point. But actually I don't see the interest
> of having the poll loop in the cpuidle driver. These cleanups are

We can't do that simply because the idle poll loop has arch specific
bits on powerpc.

> preparing the removal of the CPUIDLE_DRIVER_STATE_START macro which
> leads to a lot of mess in the cpuidle code.

How is the suggestion to check the exit_latency of idle state 0 when
latency_req == 0 going to hinder this removal?

>
> With the removal of this macro, we should be able to move the select
> loop from the menu governor and use it everywhere else. Furthermore,
> this state which is flagged with TIME_VALID, isn't because the local
> interrupt are enabled so we are measuring the interrupt time processing.
> Beside that the idle loop for x86 is mostly not used.
>
> So the idea would be to extract those idle loop from the drivers and use
> them directly when:
> 1. the idle selection fails (use the poll loop under certain
> circumstances we have to redefine)

This behavior will not change as per my suggestion.

> 2. when the latency req is zero

Its only here that I suggested you also verify state 0's exit_latency.
For the reason that the arch may have a more optimized idle poll loop,
which we cannot override with the generic cpuidle poll loop.

Regards
Preeti U Murthy
>
> That will result in a cleaner code in cpuidle and in the governor.
>
> Do you agree with that ?
>
>> But by having a check on the exit_latency, you are claiming that since
>> the driver's 0th idle state is no better than the generic idle loop in
>> cases of 0 latency req, we are better off calling the latter, which
>> looks reasonable. That way you don't have to bother about worsening the
>> idle loop behavior on any other driver.
>
>
>
>
>

2014-11-06 12:27:38

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH V2 1/5] sched: idle: cpuidle: Check the latency req before idle

On 11/06/2014 05:08 AM, Preeti U Murthy wrote:
> On 11/05/2014 07:58 PM, Daniel Lezcano wrote:
>> On 10/29/2014 03:01 AM, Preeti U Murthy wrote:
>>> On 10/29/2014 12:29 AM, Daniel Lezcano wrote:
>>>> On 10/28/2014 04:51 AM, Preeti Murthy wrote:
>>>>> Hi Daniel,
>>>>>
>>>>> On Thu, Oct 23, 2014 at 2:31 PM, Daniel Lezcano
>>>>> <[email protected]> wrote:
>>>>>> When the pmqos latency requirement is set to zero that means "poll in
>>>>>> all the
>>>>>> cases".
>>>>>>
>>>>>> That is correctly implemented on x86 but not on the other archs.
>>>>>>
>>>>>> As how is written the code, if the latency request is zero, the
>>>>>> governor will
>>>>>> return zero, so corresponding, for x86, to the poll function, but for
>>>>>> the
>>>>>> others arch the default idle function. For example, on ARM this is
>>>>>> wait-for-
>>>>>> interrupt with a latency of '1', so violating the constraint.
>>>>>
>>>>> This is not true actually. On PowerPC the idle state 0 has an
>>>>> exit_latency of 0.
>>>>>
>>>>>>
>>>>>> In order to fix that, do the latency requirement check *before*
>>>>>> calling the
>>>>>> cpuidle framework in order to jump to the poll function without
>>>>>> entering
>>>>>> cpuidle. That has several benefits:
>>>>>
>>>>> Doing so actually hurts on PowerPC. Because the idle loop defined for
>>>>> idle state 0 is different from what cpu_relax() does in
>>>>> cpu_idle_loop().
>>>>> The spinning is more power efficient in the former case. Moreover we
>>>>> also set
>>>>> certain register values which indicate an idle cpu. The ppc_runlatch
>>>>> bits
>>>>> do precisely this. These register values are being read by some user
>>>>> space
>>>>> tools. So we will end up breaking them with this patch
>>>>>
>>>>> My suggestion is very well keep the latency requirement check in
>>>>> kernel/sched/idle.c
>>>>> like your doing in this patch. But before jumping to cpu_idle_loop
>>>>> verify if the
>>>>> idle state 0 has an exit_latency > 0 in addition to your check on the
>>>>> latency_req == 0.
>>>>> If not, you can fall through to the regular path of calling into the
>>>>> cpuidle driver.
>>>>> The scheduler can query the cpuidle_driver structure anyway.
>>>>>
>>>>> What do you think?
>>>>
>>>> Thanks for reviewing the patch and spotting this.
>>>>
>>>> Wouldn't make sense to create:
>>>>
>>>> void __weak_cpu_idle_poll(void) ?
>>>>
>>>> and override it with your specific poll function ?
>>>>
>>>
>>> No this would become ugly as far as I can see. A weak function has to be
>>> defined under arch/* code. We will either need to duplicate the idle
>>> loop that we already have in the drivers or point the weak function to
>>> the first idle state defined by our driver. Both of which is not
>>> desirable (calling into the driver from arch code is ugly). Another
>>> reason why I don't like the idea of a weak function is that if you have
>>> missed looking at a specific driver and they have an idle loop with
>>> features similar to on powerpc, you will have to spot it yourself and
>>> include the arch specific cpu_idle_poll() for them.
>>
>> Yes, I agree this is a fair point. But actually I don't see the interest
>> of having the poll loop in the cpuidle driver. These cleanups are
>
> We can't do that simply because the idle poll loop has arch specific
> bits on powerpc.

I am not sure.

Could you describe what is the difference between the arch_cpu_idle
function in arch/arm/powerpc/kernel/idle.c and the 0th power PC idle state ?

Is it kind of duplicate ?

And for polling, do you really want to use while (...); cpu_relax(); as
it is x86 specific ? instead of the powerpc's arch_idle ?

Today, if latency_req == 0, it returns the 0th idle state, so polling.

If we jump to the arch_cpu_idle_poll, the result will be the same for
all architecture.

>> preparing the removal of the CPUIDLE_DRIVER_STATE_START macro which
>> leads to a lot of mess in the cpuidle code.
>
> How is the suggestion to check the exit_latency of idle state 0 when
> latency_req == 0 going to hinder this removal?

It sounds a bit hackish. I prefer to sort out the current situation.

And by the way, what is the reasoning behind having a target_residency /
exit_latency equal to zero for an idle state ?

All this sounds really fuzzy for me.

>> With the removal of this macro, we should be able to move the select
>> loop from the menu governor and use it everywhere else. Furthermore,
>> this state which is flagged with TIME_VALID, isn't because the local
>> interrupt are enabled so we are measuring the interrupt time processing.
>> Beside that the idle loop for x86 is mostly not used.
>>
>> So the idea would be to extract those idle loop from the drivers and use
>> them directly when:
>> 1. the idle selection fails (use the poll loop under certain
>> circumstances we have to redefine)
>
> This behavior will not change as per my suggestion.
>
>> 2. when the latency req is zero
>
> Its only here that I suggested you also verify state 0's exit_latency.
> For the reason that the arch may have a more optimized idle poll loop,
> which we cannot override with the generic cpuidle poll loop.
>
> Regards
> Preeti U Murthy
>>
>> That will result in a cleaner code in cpuidle and in the governor.
>>
>> Do you agree with that ?
>>
>>> But by having a check on the exit_latency, you are claiming that since
>>> the driver's 0th idle state is no better than the generic idle loop in
>>> cases of 0 latency req, we are better off calling the latter, which
>>> looks reasonable. That way you don't have to bother about worsening the
>>> idle loop behavior on any other driver.
>>
>>
>>
>>
>>
>


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-11-06 13:42:15

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH V2 1/5] sched: idle: cpuidle: Check the latency req before idle


Preeti,

I am wondering if we aren't going to a false debate.

If the latency_req is 0, we should just poll and not enter in any idle
state even if one has zero exit latency. With a zero latency req, we
want full reactivity on the system, not enter an idle state with all the
computation in the menu governor, no ?

I agree this patch changes the behavior on PowerPC, but only if the
latency_req is set to zero. I don't think we are worried about power
saving when setting this value.

Couldn't the patch accepted as it is for the sake of consistency on all
the platform and then we optimize cleanly for the special latency zero
case ?

On 11/06/2014 05:08 AM, Preeti U Murthy wrote:
> On 11/05/2014 07:58 PM, Daniel Lezcano wrote:
>> On 10/29/2014 03:01 AM, Preeti U Murthy wrote:
>>> On 10/29/2014 12:29 AM, Daniel Lezcano wrote:
>>>> On 10/28/2014 04:51 AM, Preeti Murthy wrote:
>>>>> Hi Daniel,
>>>>>
>>>>> On Thu, Oct 23, 2014 at 2:31 PM, Daniel Lezcano
>>>>> <[email protected]> wrote:
>>>>>> When the pmqos latency requirement is set to zero that means "poll in
>>>>>> all the
>>>>>> cases".
>>>>>>
>>>>>> That is correctly implemented on x86 but not on the other archs.
>>>>>>
>>>>>> As how is written the code, if the latency request is zero, the
>>>>>> governor will
>>>>>> return zero, so corresponding, for x86, to the poll function, but for
>>>>>> the
>>>>>> others arch the default idle function. For example, on ARM this is
>>>>>> wait-for-
>>>>>> interrupt with a latency of '1', so violating the constraint.
>>>>>
>>>>> This is not true actually. On PowerPC the idle state 0 has an
>>>>> exit_latency of 0.
>>>>>
>>>>>>
>>>>>> In order to fix that, do the latency requirement check *before*
>>>>>> calling the
>>>>>> cpuidle framework in order to jump to the poll function without
>>>>>> entering
>>>>>> cpuidle. That has several benefits:
>>>>>
>>>>> Doing so actually hurts on PowerPC. Because the idle loop defined for
>>>>> idle state 0 is different from what cpu_relax() does in
>>>>> cpu_idle_loop().
>>>>> The spinning is more power efficient in the former case. Moreover we
>>>>> also set
>>>>> certain register values which indicate an idle cpu. The ppc_runlatch
>>>>> bits
>>>>> do precisely this. These register values are being read by some user
>>>>> space
>>>>> tools. So we will end up breaking them with this patch
>>>>>
>>>>> My suggestion is very well keep the latency requirement check in
>>>>> kernel/sched/idle.c
>>>>> like your doing in this patch. But before jumping to cpu_idle_loop
>>>>> verify if the
>>>>> idle state 0 has an exit_latency > 0 in addition to your check on the
>>>>> latency_req == 0.
>>>>> If not, you can fall through to the regular path of calling into the
>>>>> cpuidle driver.
>>>>> The scheduler can query the cpuidle_driver structure anyway.
>>>>>
>>>>> What do you think?
>>>>
>>>> Thanks for reviewing the patch and spotting this.
>>>>
>>>> Wouldn't make sense to create:
>>>>
>>>> void __weak_cpu_idle_poll(void) ?
>>>>
>>>> and override it with your specific poll function ?
>>>>
>>>
>>> No this would become ugly as far as I can see. A weak function has to be
>>> defined under arch/* code. We will either need to duplicate the idle
>>> loop that we already have in the drivers or point the weak function to
>>> the first idle state defined by our driver. Both of which is not
>>> desirable (calling into the driver from arch code is ugly). Another
>>> reason why I don't like the idea of a weak function is that if you have
>>> missed looking at a specific driver and they have an idle loop with
>>> features similar to on powerpc, you will have to spot it yourself and
>>> include the arch specific cpu_idle_poll() for them.
>>
>> Yes, I agree this is a fair point. But actually I don't see the interest
>> of having the poll loop in the cpuidle driver. These cleanups are
>
> We can't do that simply because the idle poll loop has arch specific
> bits on powerpc.
>
>> preparing the removal of the CPUIDLE_DRIVER_STATE_START macro which
>> leads to a lot of mess in the cpuidle code.
>
> How is the suggestion to check the exit_latency of idle state 0 when
> latency_req == 0 going to hinder this removal?
>
>>
>> With the removal of this macro, we should be able to move the select
>> loop from the menu governor and use it everywhere else. Furthermore,
>> this state which is flagged with TIME_VALID, isn't because the local
>> interrupt are enabled so we are measuring the interrupt time processing.
>> Beside that the idle loop for x86 is mostly not used.
>>
>> So the idea would be to extract those idle loop from the drivers and use
>> them directly when:
>> 1. the idle selection fails (use the poll loop under certain
>> circumstances we have to redefine)
>
> This behavior will not change as per my suggestion.
>
>> 2. when the latency req is zero
>
> Its only here that I suggested you also verify state 0's exit_latency.
> For the reason that the arch may have a more optimized idle poll loop,
> which we cannot override with the generic cpuidle poll loop.
>
> Regards
> Preeti U Murthy
>>
>> That will result in a cleaner code in cpuidle and in the governor.
>>
>> Do you agree with that ?
>>
>>> But by having a check on the exit_latency, you are claiming that since
>>> the driver's 0th idle state is no better than the generic idle loop in
>>> cases of 0 latency req, we are better off calling the latter, which
>>> looks reasonable. That way you don't have to bother about worsening the
>>> idle loop behavior on any other driver.
>>
>>
>>
>>
>>
>


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2014-11-07 04:23:51

by Preeti U Murthy

[permalink] [raw]
Subject: Re: [PATCH V2 1/5] sched: idle: cpuidle: Check the latency req before idle

On 11/06/2014 05:57 PM, Daniel Lezcano wrote:
> On 11/06/2014 05:08 AM, Preeti U Murthy wrote:
>> On 11/05/2014 07:58 PM, Daniel Lezcano wrote:
>>> On 10/29/2014 03:01 AM, Preeti U Murthy wrote:
>>>> On 10/29/2014 12:29 AM, Daniel Lezcano wrote:
>>>>> On 10/28/2014 04:51 AM, Preeti Murthy wrote:
>>>>>> Hi Daniel,
>>>>>>
>>>>>> On Thu, Oct 23, 2014 at 2:31 PM, Daniel Lezcano
>>>>>> <[email protected]> wrote:
>>>>>>> When the pmqos latency requirement is set to zero that means
>>>>>>> "poll in
>>>>>>> all the
>>>>>>> cases".
>>>>>>>
>>>>>>> That is correctly implemented on x86 but not on the other archs.
>>>>>>>
>>>>>>> As how is written the code, if the latency request is zero, the
>>>>>>> governor will
>>>>>>> return zero, so corresponding, for x86, to the poll function, but
>>>>>>> for
>>>>>>> the
>>>>>>> others arch the default idle function. For example, on ARM this is
>>>>>>> wait-for-
>>>>>>> interrupt with a latency of '1', so violating the constraint.
>>>>>>
>>>>>> This is not true actually. On PowerPC the idle state 0 has an
>>>>>> exit_latency of 0.
>>>>>>
>>>>>>>
>>>>>>> In order to fix that, do the latency requirement check *before*
>>>>>>> calling the
>>>>>>> cpuidle framework in order to jump to the poll function without
>>>>>>> entering
>>>>>>> cpuidle. That has several benefits:
>>>>>>
>>>>>> Doing so actually hurts on PowerPC. Because the idle loop defined for
>>>>>> idle state 0 is different from what cpu_relax() does in
>>>>>> cpu_idle_loop().
>>>>>> The spinning is more power efficient in the former case. Moreover we
>>>>>> also set
>>>>>> certain register values which indicate an idle cpu. The ppc_runlatch
>>>>>> bits
>>>>>> do precisely this. These register values are being read by some user
>>>>>> space
>>>>>> tools. So we will end up breaking them with this patch
>>>>>>
>>>>>> My suggestion is very well keep the latency requirement check in
>>>>>> kernel/sched/idle.c
>>>>>> like your doing in this patch. But before jumping to cpu_idle_loop
>>>>>> verify if the
>>>>>> idle state 0 has an exit_latency > 0 in addition to your check on the
>>>>>> latency_req == 0.
>>>>>> If not, you can fall through to the regular path of calling into the
>>>>>> cpuidle driver.
>>>>>> The scheduler can query the cpuidle_driver structure anyway.
>>>>>>
>>>>>> What do you think?
>>>>>
>>>>> Thanks for reviewing the patch and spotting this.
>>>>>
>>>>> Wouldn't make sense to create:
>>>>>
>>>>> void __weak_cpu_idle_poll(void) ?
>>>>>
>>>>> and override it with your specific poll function ?
>>>>>
>>>>
>>>> No this would become ugly as far as I can see. A weak function has
>>>> to be
>>>> defined under arch/* code. We will either need to duplicate the idle
>>>> loop that we already have in the drivers or point the weak function to
>>>> the first idle state defined by our driver. Both of which is not
>>>> desirable (calling into the driver from arch code is ugly). Another
>>>> reason why I don't like the idea of a weak function is that if you have
>>>> missed looking at a specific driver and they have an idle loop with
>>>> features similar to on powerpc, you will have to spot it yourself and
>>>> include the arch specific cpu_idle_poll() for them.
>>>
>>> Yes, I agree this is a fair point. But actually I don't see the interest
>>> of having the poll loop in the cpuidle driver. These cleanups are
>>
>> We can't do that simply because the idle poll loop has arch specific
>> bits on powerpc.
>
> I am not sure.
>
> Could you describe what is the difference between the arch_cpu_idle
> function in arch/arm/powerpc/kernel/idle.c and the 0th power PC idle
> state ?

arch_cpu_idle() is the arch specific idle routine. It goes into deeper
idle state. I am guessing you meant to ask the difference between
power pc 0th idle state and the polling logic in cpu_idle_poll().

The 0th idle state is also a polling loop. Additionally it sets a couple
of registers to indicate idleness.

>
> Is it kind of duplicate ?
>
> And for polling, do you really want to use while (...); cpu_relax(); as
> it is x86 specific ? instead of the powerpc's arch_idle ?
>
> Today, if latency_req == 0, it returns the 0th idle state, so polling.
>
> If we jump to the arch_cpu_idle_poll, the result will be the same for
> all architecture.


So you propose creating a weak arch_cpu_idle_poll()? Ok if it is going
to make the cleanup easier, go ahead. I can add arch_cpu_idle_poll() in
the core code on powerpc.

>
>>> preparing the removal of the CPUIDLE_DRIVER_STATE_START macro which
>>> leads to a lot of mess in the cpuidle code.
>>
>> How is the suggestion to check the exit_latency of idle state 0 when
>> latency_req == 0 going to hinder this removal?
>
> It sounds a bit hackish. I prefer to sort out the current situation.
>
> And by the way, what is the reasoning behind having a target_residency /
> exit_latency equal to zero for an idle state ?

Its a polling idle state, hence the exit_latency is 0.

Regards
Preeti U Murthy

2014-11-07 04:29:40

by Preeti U Murthy

[permalink] [raw]
Subject: Re: [PATCH V2 1/5] sched: idle: cpuidle: Check the latency req before idle

On 11/06/2014 07:12 PM, Daniel Lezcano wrote:
>
> Preeti,
>
> I am wondering if we aren't going to a false debate.
>
> If the latency_req is 0, we should just poll and not enter in any idle
> state even if one has zero exit latency. With a zero latency req, we
> want full reactivity on the system, not enter an idle state with all the
> computation in the menu governor, no ?
>
> I agree this patch changes the behavior on PowerPC, but only if the
> latency_req is set to zero. I don't think we are worried about power
> saving when setting this value.
>
> Couldn't the patch accepted as it is for the sake of consistency on all
> the platform and then we optimize cleanly for the special latency zero
> case ?

Alright Daniel, you can go ahead. I was thinking this patch through and
now realize that, like you point out the logic will only get complicated
with all the additional hack.

But would it be possible to add the weak arch_cpu_idle_loop() call for
the cases where latency requirement is 0 like you had suggested earlier
? This would ensure the polling logic does not break on PowerPC and we
don't bother the governor even. I will add the function in the core
PowerPC code. If arch does not define this function it will fall back to
cpu_idle_loop(). Fair enough?

Regards
Preeti U Murthy

2014-11-07 09:41:43

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH V2 1/5] sched: idle: cpuidle: Check the latency req before idle

On 11/07/2014 05:29 AM, Preeti U Murthy wrote:
> On 11/06/2014 07:12 PM, Daniel Lezcano wrote:
>>
>> Preeti,
>>
>> I am wondering if we aren't going to a false debate.
>>
>> If the latency_req is 0, we should just poll and not enter in any idle
>> state even if one has zero exit latency. With a zero latency req, we
>> want full reactivity on the system, not enter an idle state with all the
>> computation in the menu governor, no ?
>>
>> I agree this patch changes the behavior on PowerPC, but only if the
>> latency_req is set to zero. I don't think we are worried about power
>> saving when setting this value.
>>
>> Couldn't the patch accepted as it is for the sake of consistency on all
>> the platform and then we optimize cleanly for the special latency zero
>> case ?
>
> Alright Daniel, you can go ahead. I was thinking this patch through and
> now realize that, like you point out the logic will only get complicated
> with all the additional hack.
>
> But would it be possible to add the weak arch_cpu_idle_loop() call for
> the cases where latency requirement is 0 like you had suggested earlier
> ? This would ensure the polling logic does not break on PowerPC and we
> don't bother the governor even. I will add the function in the core
> PowerPC code. If arch does not define this function it will fall back to
> cpu_idle_loop(). Fair enough?

Yes, sounds good.

I will add the weak function as the first patch in the series.

Thanks for your reviews.

-- Daniel

--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog