2012-10-16 22:40:28

by Julius Werner

[permalink] [raw]
Subject: [PATCH] cpuidle: reinitialize power_usage values when adding/removing C-states

When cpuidle drivers do not supply explicit power_usage values,
cpuidle/driver.c inserts dummy values instead. When a running processor
dynamically gains new C-states (e.g. after ACPI events), the power_usage
values of those states will stay uninitialized, and cpuidle governors
will never choose to enter them.

This patch moves the dummy value initialization from
cpuidle_register_driver to cpuidle_enable_device, which drivers such as
acpi/processor_idle.c will call again when they add or remove C-states.
Tested and verified on an Acer AC700 Chromebook, which supports the C3
state only when it switches from AC to battery power.

Signed-off-by: Julius Werner <[email protected]>
---
drivers/cpuidle/cpuidle.c | 24 ++++++++++++++++++++++++
drivers/cpuidle/driver.c | 25 -------------------------
2 files changed, 24 insertions(+), 25 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 7f15b85..bef3a31 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -298,6 +298,28 @@ static void poll_idle_init(struct cpuidle_driver *drv)
static void poll_idle_init(struct cpuidle_driver *drv) {}
#endif /* CONFIG_ARCH_HAS_CPU_RELAX */

+static void set_power_states(struct cpuidle_driver *drv)
+{
+ int i;
+
+ /*
+ * cpuidle driver should set the drv->power_specified bit
+ * before registering if the driver provides
+ * power_usage numbers.
+ *
+ * If power_specified is not set,
+ * we fill in power_usage with decreasing values as the
+ * cpuidle code has an implicit assumption that state Cn
+ * uses less power than C(n-1).
+ *
+ * With CONFIG_ARCH_HAS_CPU_RELAX, C0 is already assigned
+ * an power value of -1. So we use -2, -3, etc, for other
+ * c-states.
+ */
+ for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++)
+ drv->states[i].power_usage = -1 - i;
+}
+
/**
* cpuidle_enable_device - enables idle PM for a CPU
* @dev: the CPU
@@ -330,6 +352,8 @@ int cpuidle_enable_device(struct cpuidle_device *dev)
cpuidle_enter_tk : cpuidle_enter;

poll_idle_init(drv);
+ if (!drv->power_specified)
+ set_power_states(drv);

if ((ret = cpuidle_add_state_sysfs(dev)))
return ret;
diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
index 87db387..caaed27 100644
--- a/drivers/cpuidle/driver.c
+++ b/drivers/cpuidle/driver.c
@@ -18,28 +18,6 @@ static struct cpuidle_driver *cpuidle_curr_driver;
DEFINE_SPINLOCK(cpuidle_driver_lock);
int cpuidle_driver_refcount;

-static void set_power_states(struct cpuidle_driver *drv)
-{
- int i;
-
- /*
- * cpuidle driver should set the drv->power_specified bit
- * before registering if the driver provides
- * power_usage numbers.
- *
- * If power_specified is not set,
- * we fill in power_usage with decreasing values as the
- * cpuidle code has an implicit assumption that state Cn
- * uses less power than C(n-1).
- *
- * With CONFIG_ARCH_HAS_CPU_RELAX, C0 is already assigned
- * an power value of -1. So we use -2, -3, etc, for other
- * c-states.
- */
- for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++)
- drv->states[i].power_usage = -1 - i;
-}
-
/**
* cpuidle_register_driver - registers a driver
* @drv: the driver
@@ -58,9 +36,6 @@ int cpuidle_register_driver(struct cpuidle_driver *drv)
return -EBUSY;
}

- if (!drv->power_specified)
- set_power_states(drv);
-
cpuidle_curr_driver = drv;

spin_unlock(&cpuidle_driver_lock);
--
1.7.8.6


2012-10-17 10:31:22

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH] cpuidle: reinitialize power_usage values when adding/removing C-states

On 10/17/2012 12:39 AM, Julius Werner wrote:
> When cpuidle drivers do not supply explicit power_usage values,
> cpuidle/driver.c inserts dummy values instead. When a running processor
> dynamically gains new C-states (e.g. after ACPI events), the power_usage
> values of those states will stay uninitialized, and cpuidle governors
> will never choose to enter them.
>
> This patch moves the dummy value initialization from
> cpuidle_register_driver to cpuidle_enable_device, which drivers such as
> acpi/processor_idle.c will call again when they add or remove C-states.
> Tested and verified on an Acer AC700 Chromebook, which supports the C3
> state only when it switches from AC to battery power.
>
> Signed-off-by: Julius Werner <[email protected]>
> ---

This is specific to the acpi and should be handled in the
processor_idle.c file instead of the cpuidle core code.

Could be the function 'acpi_processor_cst_has_changed' the right place
to set a dummy power value for the power in the new C-state ?




> drivers/cpuidle/cpuidle.c | 24 ++++++++++++++++++++++++
> drivers/cpuidle/driver.c | 25 -------------------------
> 2 files changed, 24 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 7f15b85..bef3a31 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -298,6 +298,28 @@ static void poll_idle_init(struct cpuidle_driver *drv)
> static void poll_idle_init(struct cpuidle_driver *drv) {}
> #endif /* CONFIG_ARCH_HAS_CPU_RELAX */
>
> +static void set_power_states(struct cpuidle_driver *drv)
> +{
> + int i;
> +
> + /*
> + * cpuidle driver should set the drv->power_specified bit
> + * before registering if the driver provides
> + * power_usage numbers.
> + *
> + * If power_specified is not set,
> + * we fill in power_usage with decreasing values as the
> + * cpuidle code has an implicit assumption that state Cn
> + * uses less power than C(n-1).
> + *
> + * With CONFIG_ARCH_HAS_CPU_RELAX, C0 is already assigned
> + * an power value of -1. So we use -2, -3, etc, for other
> + * c-states.
> + */
> + for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++)
> + drv->states[i].power_usage = -1 - i;
> +}
> +
> /**
> * cpuidle_enable_device - enables idle PM for a CPU
> * @dev: the CPU
> @@ -330,6 +352,8 @@ int cpuidle_enable_device(struct cpuidle_device *dev)
> cpuidle_enter_tk : cpuidle_enter;
>
> poll_idle_init(drv);
> + if (!drv->power_specified)
> + set_power_states(drv);
>
> if ((ret = cpuidle_add_state_sysfs(dev)))
> return ret;
> diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
> index 87db387..caaed27 100644
> --- a/drivers/cpuidle/driver.c
> +++ b/drivers/cpuidle/driver.c
> @@ -18,28 +18,6 @@ static struct cpuidle_driver *cpuidle_curr_driver;
> DEFINE_SPINLOCK(cpuidle_driver_lock);
> int cpuidle_driver_refcount;
>
> -static void set_power_states(struct cpuidle_driver *drv)
> -{
> - int i;
> -
> - /*
> - * cpuidle driver should set the drv->power_specified bit
> - * before registering if the driver provides
> - * power_usage numbers.
> - *
> - * If power_specified is not set,
> - * we fill in power_usage with decreasing values as the
> - * cpuidle code has an implicit assumption that state Cn
> - * uses less power than C(n-1).
> - *
> - * With CONFIG_ARCH_HAS_CPU_RELAX, C0 is already assigned
> - * an power value of -1. So we use -2, -3, etc, for other
> - * c-states.
> - */
> - for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++)
> - drv->states[i].power_usage = -1 - i;
> -}
> -
> /**
> * cpuidle_register_driver - registers a driver
> * @drv: the driver
> @@ -58,9 +36,6 @@ int cpuidle_register_driver(struct cpuidle_driver *drv)
> return -EBUSY;
> }
>
> - if (!drv->power_specified)
> - set_power_states(drv);
> -
> cpuidle_curr_driver = drv;
>
> spin_unlock(&cpuidle_driver_lock);


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2012-10-17 10:44:31

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH] cpuidle: reinitialize power_usage values when adding/removing C-states

On 10/17/2012 12:31 PM, Daniel Lezcano wrote:
> On 10/17/2012 12:39 AM, Julius Werner wrote:
>> When cpuidle drivers do not supply explicit power_usage values,
>> cpuidle/driver.c inserts dummy values instead. When a running processor
>> dynamically gains new C-states (e.g. after ACPI events), the power_usage
>> values of those states will stay uninitialized, and cpuidle governors
>> will never choose to enter them.
>>
>> This patch moves the dummy value initialization from
>> cpuidle_register_driver to cpuidle_enable_device, which drivers such as
>> acpi/processor_idle.c will call again when they add or remove C-states.
>> Tested and verified on an Acer AC700 Chromebook, which supports the C3
>> state only when it switches from AC to battery power.
>>
>> Signed-off-by: Julius Werner <[email protected]>
>> ---
> This is specific to the acpi and should be handled in the
> processor_idle.c file instead of the cpuidle core code.
>
> Could be the function 'acpi_processor_cst_has_changed' the right place
> to set a dummy power value for the power in the new C-state ?

btw, good catch ! :)



--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2012-10-17 18:43:37

by Julius Werner

[permalink] [raw]
Subject: Re: [PATCH] cpuidle: reinitialize power_usage values when adding/removing C-states

> This is specific to the acpi and should be handled in the
> processor_idle.c file instead of the cpuidle core code.
>
> Could be the function 'acpi_processor_cst_has_changed' the right place
> to set a dummy power value for the power in the new C-state ?

Thanks for your feedback. I think it wouldn't be wise to split the
dummy power value logic over two places, but I could submit a patch
that makes set_power_states globally accessible and calls it from
acpi_processor_cst_has_changed instead.

However, I do not think this should really be ACPI specific. It
applies to any cpuidle driver that wants to change its idle states at
runtime. Currently only the ACPI one does, but the future might bring
others that would run into the same problem. I also think that
set_power_states fits much better into cpuidle_enable_device
conceptually anyway (right next to poll_idle_init which also does
state initialization).

Let me know what you think.

2012-10-18 08:21:59

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH] cpuidle: reinitialize power_usage values when adding/removing C-states

On 10/17/2012 08:43 PM, Julius Werner wrote:
>> This is specific to the acpi and should be handled in the
>> processor_idle.c file instead of the cpuidle core code.
>>
>> Could be the function 'acpi_processor_cst_has_changed' the right place
>> to set a dummy power value for the power in the new C-state ?
>
> Thanks for your feedback. I think it wouldn't be wise to split the
> dummy power value logic over two places, but I could submit a patch
> that makes set_power_states globally accessible and calls it from
> acpi_processor_cst_has_changed instead.

No please, do not export this function. That will add more confusion on
the acpi code and more generally in the cpuidle core code.

IIUC, a new state is inserted/deleted and we will set the entire array
of states to setup the power.

You have the acpi_processor_cst_has_changed function calling the
acpi_processor_setup_cpuidle_states function. It seems to be a good
candidate to setup the power of the new state. All the states are filled
again AFAICS under the lock.

> However, I do not think this should really be ACPI specific. It
> applies to any cpuidle driver that wants to change its idle states at
> runtime. Currently only the ACPI one does, but the future might bring
> others that would run into the same problem. I also think that
> set_power_states fits much better into cpuidle_enable_device
> conceptually anyway (right next to poll_idle_init which also does
> state initialization).

The states are now part of the cpuidle driver and the set_power_states
should remain to this file. The dynamic C-states brought some complexity
in the acpi code and honestly this code is very confusing.

Maybe one day, that would make sense but until then I am in favor of
keeping the arch specific bits in the drivers, especially when they are
*hum* so "complex".

poll_idle_init looks hackish for me and probably move it to the arch
would also make sense.

Thanks
-- Daniel

--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2012-10-19 21:52:14

by Julius Werner

[permalink] [raw]
Subject: [PATCH] acpi/cpuidle: reinitialize power_usage values when adding/removing C-states

When cpuidle drivers do not supply explicit power_usage values,
cpuidle/driver.c inserts dummy values instead. When a running processor
dynamically gains new C-states (e.g. after ACPI events), the power_usage
values of those states will stay uninitialized, and cpuidle governors
will never choose to enter them.

This patch ensures that the ACPI cpuidle driver sets those dummy power
values itself whenever it (re-)initializes its idle states.
Tested and verified on an Acer AC700 Chromebook, which supports the C3
state only when it switches from AC to battery power.

Signed-off-by: Julius Werner <[email protected]>
---
drivers/acpi/processor_idle.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index e8086c7..078e22f 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -1090,6 +1090,9 @@ static int acpi_processor_setup_cpuidle_states(struct acpi_processor *pr)
state->exit_latency = cx->latency;
state->target_residency = cx->latency * latency_factor;

+ /* reinitialize this in case new states are added after boot */
+ state->power_usage = -1 - count;
+
state->flags = 0;
switch (cx->type) {
case ACPI_STATE_C1:
--
1.7.8.6

2012-10-20 21:50:34

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH] acpi/cpuidle: reinitialize power_usage values when adding/removing C-states

On 10/19/2012 11:50 PM, Julius Werner wrote:
> When cpuidle drivers do not supply explicit power_usage values,
> cpuidle/driver.c inserts dummy values instead. When a running processor
> dynamically gains new C-states (e.g. after ACPI events), the power_usage
> values of those states will stay uninitialized, and cpuidle governors
> will never choose to enter them.
>
> This patch ensures that the ACPI cpuidle driver sets those dummy power
> values itself whenever it (re-)initializes its idle states.
> Tested and verified on an Acer AC700 Chromebook, which supports the C3
> state only when it switches from AC to battery power.
>
> Signed-off-by: Julius Werner <[email protected]>
> ---

I am not against this patch but I am wondering if it is not time to do
some cleanup around this.

The flag 'power_specified' is never used in any driver.

And the field 'power_usage' is used only in the tegra3 driver where
logically as power_specified is not set, it will be overwritten at the
init (could someone could check the
/sys/devices/system/cpu/cpu0/cpuidle/state1/power is different from 600
on tegra3 ?)

The drivers define their states in a power consumption descendant order
making de facto the same ordering than the 'set_power_state' function in
driver.c

The governor looks at the power_usage (which is always filled by
'set_power_state').

static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device
*dev)

[ ... ]

if (s->power_usage < power_usage) {
power_usage = s->power_usage;
data->last_state_idx = i;
data->exit_us = s->exit_latency;
}

[ ... ]

Could we just say this is always true because state[i+1] consumes less
than state[i] ?

And then just remove the 'set_power_state' function, and the field
'driver->power_specified' ?

That will cleanup the code and fix this problem, no ?

Thanks
-- Daniel

> drivers/acpi/processor_idle.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index e8086c7..078e22f 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -1090,6 +1090,9 @@ static int acpi_processor_setup_cpuidle_states(struct acpi_processor *pr)
> state->exit_latency = cx->latency;
> state->target_residency = cx->latency * latency_factor;
>
> + /* reinitialize this in case new states are added after boot */
> + state->power_usage = -1 - count;
> +
> state->flags = 0;
> switch (cx->type) {
> case ACPI_STATE_C1:


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2012-10-22 17:13:04

by Julius Werner

[permalink] [raw]
Subject: Re: [PATCH] acpi/cpuidle: reinitialize power_usage values when adding/removing C-states

> Could we just say this is always true because state[i+1] consumes less
> than state[i] ?
>
> And then just remove the 'set_power_state' function, and the field
> 'driver->power_specified' ?
>
> That will cleanup the code and fix this problem, no ?

I totally agree with your analysis. Even if a driver were to set
proper usage values (and the power_specified bit), none of the
existing governors would care about those actual numbers (and since
the vast majority of drivers uses fake values anyway, this is not
likely to change in the future). This seems to be a classic example of
unnecessary over-engineering.

I am mostly interested in getting that bug fixed right now, but
removing unnecessary code is always a good thing. If you think it
would have a good chance of getting merged, I would be happy to draft
up a larger patch that refactors power_usage away completely.

2012-10-22 17:21:23

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH] acpi/cpuidle: reinitialize power_usage values when adding/removing C-states

On 10/22/2012 07:13 PM, Julius Werner wrote:
>> Could we just say this is always true because state[i+1] consumes less
>> than state[i] ?
>>
>> And then just remove the 'set_power_state' function, and the field
>> 'driver->power_specified' ?
>>
>> That will cleanup the code and fix this problem, no ?
>
> I totally agree with your analysis. Even if a driver were to set
> proper usage values (and the power_specified bit), none of the
> existing governors would care about those actual numbers (and since
> the vast majority of drivers uses fake values anyway, this is not
> likely to change in the future). This seems to be a classic example of
> unnecessary over-engineering.
>
> I am mostly interested in getting that bug fixed right now, but
> removing unnecessary code is always a good thing. If you think it
> would have a good chance of getting merged, I would be happy to draft
> up a larger patch that refactors power_usage away completely.

I am in favor of removing the unnecessary code as it fixes a bug also
but I am not a maintainer, so I can't tell if it has a good chance to be
merged as a bug fix.

I think Rafael can tell us what approach he would prefer.

Thanks
-- Daniel


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2012-11-12 20:27:04

by Daniel Lezcano

[permalink] [raw]
Subject: [RFC] cpuidle - remove the power_specified field in the driver

This patch follows the discussion about reinitializing the power usage
when a C-state is added/removed.

https://lkml.org/lkml/2012/10/16/518

We realized the power usage field is never filled and when it is
filled for tegra, the power_specified flag is not set making all these
values to be resetted when the driver is initialized with the set_power_state
function.

Julius and I feel this is over-engineered and the power_specified
flag could be simply removed and continue assuming the states are
backward sorted.

The menu governor select function is simplified as the power is ordered.
Actually the condition is always true with the current code.

The cpuidle_play_dead function is also simplified by doing a reverse lookup
on the array.

The set_power_states function is removed as it does no make sense anymore.

Signed-off-by: Daniel Lezcano <[email protected]>
---
drivers/cpuidle/cpuidle.c | 17 ++++-------------
drivers/cpuidle/driver.c | 25 -------------------------
drivers/cpuidle/governors/menu.c | 8 ++------
include/linux/cpuidle.h | 2 +-
4 files changed, 7 insertions(+), 45 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 711dd83..f983262 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -69,24 +69,15 @@ int cpuidle_play_dead(void)
{
struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
- int i, dead_state = -1;
- int power_usage = -1;
+ int i;

if (!drv)
return -ENODEV;

/* Find lowest-power state that supports long-term idle */
- for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++) {
- struct cpuidle_state *s = &drv->states[i];
-
- if (s->power_usage < power_usage && s->enter_dead) {
- power_usage = s->power_usage;
- dead_state = i;
- }
- }
-
- if (dead_state != -1)
- return drv->states[dead_state].enter_dead(dev, dead_state);
+ for (i = drv->state_count; i >= CPUIDLE_DRIVER_STATE_START; i--)
+ if (drv->states[i].play_dead)
+ return drv->states[i].enter_dead(dev, i);

return -ENODEV;
}
diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
index 3af841f..bb045f1 100644
--- a/drivers/cpuidle/driver.c
+++ b/drivers/cpuidle/driver.c
@@ -19,34 +19,9 @@ DEFINE_SPINLOCK(cpuidle_driver_lock);
static void __cpuidle_set_cpu_driver(struct cpuidle_driver *drv, int cpu);
static struct cpuidle_driver * __cpuidle_get_cpu_driver(int cpu);

-static void set_power_states(struct cpuidle_driver *drv)
-{
- int i;
-
- /*
- * cpuidle driver should set the drv->power_specified bit
- * before registering if the driver provides
- * power_usage numbers.
- *
- * If power_specified is not set,
- * we fill in power_usage with decreasing values as the
- * cpuidle code has an implicit assumption that state Cn
- * uses less power than C(n-1).
- *
- * With CONFIG_ARCH_HAS_CPU_RELAX, C0 is already assigned
- * an power value of -1. So we use -2, -3, etc, for other
- * c-states.
- */
- for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++)
- drv->states[i].power_usage = -1 - i;
-}
-
static void __cpuidle_driver_init(struct cpuidle_driver *drv)
{
drv->refcnt = 0;
-
- if (!drv->power_specified)
- set_power_states(drv);
}

static int __cpuidle_register_driver(struct cpuidle_driver *drv, int cpu)
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 2efee27..14eb11f 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -312,7 +312,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
{
struct menu_device *data = &__get_cpu_var(menu_devices);
int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
- int power_usage = -1;
int i;
int multiplier;
struct timespec t;
@@ -383,11 +382,8 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
if (s->exit_latency * multiplier > data->predicted_us)
continue;

- if (s->power_usage < power_usage) {
- power_usage = s->power_usage;
- data->last_state_idx = i;
- data->exit_us = s->exit_latency;
- }
+ data->last_state_idx = i;
+ data->exit_us = s->exit_latency;
}

/* not deepest C-state chosen for low predicted residency */
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 3711b34..24cd103 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -126,9 +126,9 @@ struct cpuidle_driver {
struct module *owner;
int refcnt;

- unsigned int power_specified:1;
/* set to 1 to use the core cpuidle time keeping (for all states). */
unsigned int en_core_tk_irqen:1;
+ /* states array must be ordered in decreasing power consumption */
struct cpuidle_state states[CPUIDLE_STATE_MAX];
int state_count;
int safe_state_index;
--
1.7.5.4

2012-11-12 21:09:43

by Julius Werner

[permalink] [raw]
Subject: Re: [RFC] cpuidle - remove the power_specified field in the driver

Thanks for moving this along, Daniel. I think this is the right
approach... the cpuidle driver shouldn't be more complex than
necessary.

Note that you are starting your loop too high in cpuidle_play_dead...
states[state_count] is not an actual state anymore, it should start at
state_count - 1. Also, I think you can go ahead and do the same
last-to-first loop transformation with immediate return in the menu
governor, for an extra tiny bit of performance.

On Mon, Nov 12, 2012 at 12:26 PM, Daniel Lezcano
<[email protected]> wrote:
> This patch follows the discussion about reinitializing the power usage
> when a C-state is added/removed.
>
> https://lkml.org/lkml/2012/10/16/518
>
> We realized the power usage field is never filled and when it is
> filled for tegra, the power_specified flag is not set making all these
> values to be resetted when the driver is initialized with the set_power_state
> function.
>
> Julius and I feel this is over-engineered and the power_specified
> flag could be simply removed and continue assuming the states are
> backward sorted.
>
> The menu governor select function is simplified as the power is ordered.
> Actually the condition is always true with the current code.
>
> The cpuidle_play_dead function is also simplified by doing a reverse lookup
> on the array.
>
> The set_power_states function is removed as it does no make sense anymore.
>
> Signed-off-by: Daniel Lezcano <[email protected]>
> ---
> drivers/cpuidle/cpuidle.c | 17 ++++-------------
> drivers/cpuidle/driver.c | 25 -------------------------
> drivers/cpuidle/governors/menu.c | 8 ++------
> include/linux/cpuidle.h | 2 +-
> 4 files changed, 7 insertions(+), 45 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 711dd83..f983262 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -69,24 +69,15 @@ int cpuidle_play_dead(void)
> {
> struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
> struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
> - int i, dead_state = -1;
> - int power_usage = -1;
> + int i;
>
> if (!drv)
> return -ENODEV;
>
> /* Find lowest-power state that supports long-term idle */
> - for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++) {
> - struct cpuidle_state *s = &drv->states[i];
> -
> - if (s->power_usage < power_usage && s->enter_dead) {
> - power_usage = s->power_usage;
> - dead_state = i;
> - }
> - }
> -
> - if (dead_state != -1)
> - return drv->states[dead_state].enter_dead(dev, dead_state);
> + for (i = drv->state_count; i >= CPUIDLE_DRIVER_STATE_START; i--)
> + if (drv->states[i].play_dead)
> + return drv->states[i].enter_dead(dev, i);
>
> return -ENODEV;
> }
> diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
> index 3af841f..bb045f1 100644
> --- a/drivers/cpuidle/driver.c
> +++ b/drivers/cpuidle/driver.c
> @@ -19,34 +19,9 @@ DEFINE_SPINLOCK(cpuidle_driver_lock);
> static void __cpuidle_set_cpu_driver(struct cpuidle_driver *drv, int cpu);
> static struct cpuidle_driver * __cpuidle_get_cpu_driver(int cpu);
>
> -static void set_power_states(struct cpuidle_driver *drv)
> -{
> - int i;
> -
> - /*
> - * cpuidle driver should set the drv->power_specified bit
> - * before registering if the driver provides
> - * power_usage numbers.
> - *
> - * If power_specified is not set,
> - * we fill in power_usage with decreasing values as the
> - * cpuidle code has an implicit assumption that state Cn
> - * uses less power than C(n-1).
> - *
> - * With CONFIG_ARCH_HAS_CPU_RELAX, C0 is already assigned
> - * an power value of -1. So we use -2, -3, etc, for other
> - * c-states.
> - */
> - for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++)
> - drv->states[i].power_usage = -1 - i;
> -}
> -
> static void __cpuidle_driver_init(struct cpuidle_driver *drv)
> {
> drv->refcnt = 0;
> -
> - if (!drv->power_specified)
> - set_power_states(drv);
> }
>
> static int __cpuidle_register_driver(struct cpuidle_driver *drv, int cpu)
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 2efee27..14eb11f 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -312,7 +312,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> {
> struct menu_device *data = &__get_cpu_var(menu_devices);
> int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
> - int power_usage = -1;
> int i;
> int multiplier;
> struct timespec t;
> @@ -383,11 +382,8 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> if (s->exit_latency * multiplier > data->predicted_us)
> continue;
>
> - if (s->power_usage < power_usage) {
> - power_usage = s->power_usage;
> - data->last_state_idx = i;
> - data->exit_us = s->exit_latency;
> - }
> + data->last_state_idx = i;
> + data->exit_us = s->exit_latency;
> }
>
> /* not deepest C-state chosen for low predicted residency */
> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> index 3711b34..24cd103 100644
> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -126,9 +126,9 @@ struct cpuidle_driver {
> struct module *owner;
> int refcnt;
>
> - unsigned int power_specified:1;
> /* set to 1 to use the core cpuidle time keeping (for all states). */
> unsigned int en_core_tk_irqen:1;
> + /* states array must be ordered in decreasing power consumption */
> struct cpuidle_state states[CPUIDLE_STATE_MAX];
> int state_count;
> int safe_state_index;
> --
> 1.7.5.4
>

2012-11-12 22:08:27

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [RFC] cpuidle - remove the power_specified field in the driver

On 11/12/2012 10:09 PM, Julius Werner wrote:
> Thanks for moving this along, Daniel. I think this is the right
> approach... the cpuidle driver shouldn't be more complex than
> necessary.
>
> Note that you are starting your loop too high in cpuidle_play_dead...
> states[state_count] is not an actual state anymore, it should start at
> state_count - 1.

Yep. Thanks for catching this.

> Also, I think you can go ahead and do the same
> last-to-first loop transformation with immediate return in the menu
> governor, for an extra tiny bit of performance.

Yes, that makes sense.

Thanks for the review.

-- Daniel

> On Mon, Nov 12, 2012 at 12:26 PM, Daniel Lezcano
> <[email protected]> wrote:
>> This patch follows the discussion about reinitializing the power usage
>> when a C-state is added/removed.
>>
>> https://lkml.org/lkml/2012/10/16/518
>>
>> We realized the power usage field is never filled and when it is
>> filled for tegra, the power_specified flag is not set making all these
>> values to be resetted when the driver is initialized with the set_power_state
>> function.
>>
>> Julius and I feel this is over-engineered and the power_specified
>> flag could be simply removed and continue assuming the states are
>> backward sorted.
>>
>> The menu governor select function is simplified as the power is ordered.
>> Actually the condition is always true with the current code.
>>
>> The cpuidle_play_dead function is also simplified by doing a reverse lookup
>> on the array.
>>
>> The set_power_states function is removed as it does no make sense anymore.
>>
>> Signed-off-by: Daniel Lezcano <[email protected]>
>> ---
>> drivers/cpuidle/cpuidle.c | 17 ++++-------------
>> drivers/cpuidle/driver.c | 25 -------------------------
>> drivers/cpuidle/governors/menu.c | 8 ++------
>> include/linux/cpuidle.h | 2 +-
>> 4 files changed, 7 insertions(+), 45 deletions(-)
>>
>> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
>> index 711dd83..f983262 100644
>> --- a/drivers/cpuidle/cpuidle.c
>> +++ b/drivers/cpuidle/cpuidle.c
>> @@ -69,24 +69,15 @@ int cpuidle_play_dead(void)
>> {
>> struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
>> struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
>> - int i, dead_state = -1;
>> - int power_usage = -1;
>> + int i;
>>
>> if (!drv)
>> return -ENODEV;
>>
>> /* Find lowest-power state that supports long-term idle */
>> - for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++) {
>> - struct cpuidle_state *s = &drv->states[i];
>> -
>> - if (s->power_usage < power_usage && s->enter_dead) {
>> - power_usage = s->power_usage;
>> - dead_state = i;
>> - }
>> - }
>> -
>> - if (dead_state != -1)
>> - return drv->states[dead_state].enter_dead(dev, dead_state);
>> + for (i = drv->state_count; i >= CPUIDLE_DRIVER_STATE_START; i--)
>> + if (drv->states[i].play_dead)
>> + return drv->states[i].enter_dead(dev, i);
>>
>> return -ENODEV;
>> }
>> diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
>> index 3af841f..bb045f1 100644
>> --- a/drivers/cpuidle/driver.c
>> +++ b/drivers/cpuidle/driver.c
>> @@ -19,34 +19,9 @@ DEFINE_SPINLOCK(cpuidle_driver_lock);
>> static void __cpuidle_set_cpu_driver(struct cpuidle_driver *drv, int cpu);
>> static struct cpuidle_driver * __cpuidle_get_cpu_driver(int cpu);
>>
>> -static void set_power_states(struct cpuidle_driver *drv)
>> -{
>> - int i;
>> -
>> - /*
>> - * cpuidle driver should set the drv->power_specified bit
>> - * before registering if the driver provides
>> - * power_usage numbers.
>> - *
>> - * If power_specified is not set,
>> - * we fill in power_usage with decreasing values as the
>> - * cpuidle code has an implicit assumption that state Cn
>> - * uses less power than C(n-1).
>> - *
>> - * With CONFIG_ARCH_HAS_CPU_RELAX, C0 is already assigned
>> - * an power value of -1. So we use -2, -3, etc, for other
>> - * c-states.
>> - */
>> - for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++)
>> - drv->states[i].power_usage = -1 - i;
>> -}
>> -
>> static void __cpuidle_driver_init(struct cpuidle_driver *drv)
>> {
>> drv->refcnt = 0;
>> -
>> - if (!drv->power_specified)
>> - set_power_states(drv);
>> }
>>
>> static int __cpuidle_register_driver(struct cpuidle_driver *drv, int cpu)
>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
>> index 2efee27..14eb11f 100644
>> --- a/drivers/cpuidle/governors/menu.c
>> +++ b/drivers/cpuidle/governors/menu.c
>> @@ -312,7 +312,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>> {
>> struct menu_device *data = &__get_cpu_var(menu_devices);
>> int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
>> - int power_usage = -1;
>> int i;
>> int multiplier;
>> struct timespec t;
>> @@ -383,11 +382,8 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>> if (s->exit_latency * multiplier > data->predicted_us)
>> continue;
>>
>> - if (s->power_usage < power_usage) {
>> - power_usage = s->power_usage;
>> - data->last_state_idx = i;
>> - data->exit_us = s->exit_latency;
>> - }
>> + data->last_state_idx = i;
>> + data->exit_us = s->exit_latency;
>> }
>>
>> /* not deepest C-state chosen for low predicted residency */
>> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
>> index 3711b34..24cd103 100644
>> --- a/include/linux/cpuidle.h
>> +++ b/include/linux/cpuidle.h
>> @@ -126,9 +126,9 @@ struct cpuidle_driver {
>> struct module *owner;
>> int refcnt;
>>
>> - unsigned int power_specified:1;
>> /* set to 1 to use the core cpuidle time keeping (for all states). */
>> unsigned int en_core_tk_irqen:1;
>> + /* states array must be ordered in decreasing power consumption */
>> struct cpuidle_state states[CPUIDLE_STATE_MAX];
>> int state_count;
>> int safe_state_index;
>> --
>> 1.7.5.4
>>


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2012-11-18 08:41:55

by Francesco Lavra

[permalink] [raw]
Subject: Re: [RFC] cpuidle - remove the power_specified field in the driver

Hi,

On 11/12/2012 09:26 PM, Daniel Lezcano wrote:
> This patch follows the discussion about reinitializing the power usage
> when a C-state is added/removed.
>
> https://lkml.org/lkml/2012/10/16/518
>
> We realized the power usage field is never filled and when it is
> filled for tegra, the power_specified flag is not set making all these
> values to be resetted when the driver is initialized with the set_power_state
> function.
>
> Julius and I feel this is over-engineered and the power_specified
> flag could be simply removed and continue assuming the states are
> backward sorted.
>
> The menu governor select function is simplified as the power is ordered.
> Actually the condition is always true with the current code.
>
> The cpuidle_play_dead function is also simplified by doing a reverse lookup
> on the array.
>
> The set_power_states function is removed as it does no make sense anymore.
>
> Signed-off-by: Daniel Lezcano <[email protected]>
> ---
> drivers/cpuidle/cpuidle.c | 17 ++++-------------
> drivers/cpuidle/driver.c | 25 -------------------------
> drivers/cpuidle/governors/menu.c | 8 ++------
> include/linux/cpuidle.h | 2 +-
> 4 files changed, 7 insertions(+), 45 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 711dd83..f983262 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -69,24 +69,15 @@ int cpuidle_play_dead(void)
> {
> struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
> struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
> - int i, dead_state = -1;
> - int power_usage = -1;
> + int i;
>
> if (!drv)
> return -ENODEV;
>
> /* Find lowest-power state that supports long-term idle */
> - for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++) {
> - struct cpuidle_state *s = &drv->states[i];
> -
> - if (s->power_usage < power_usage && s->enter_dead) {
> - power_usage = s->power_usage;
> - dead_state = i;
> - }
> - }
> -
> - if (dead_state != -1)
> - return drv->states[dead_state].enter_dead(dev, dead_state);
> + for (i = drv->state_count; i >= CPUIDLE_DRIVER_STATE_START; i--)
> + if (drv->states[i].play_dead)

I guess you meant drv->states[i].enter_dead

> + return drv->states[i].enter_dead(dev, i);

--
Francesco

2012-11-18 09:17:53

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [RFC] cpuidle - remove the power_specified field in the driver

On 11/18/2012 09:40 AM, Francesco Lavra wrote:
> Hi,
>
> On 11/12/2012 09:26 PM, Daniel Lezcano wrote:
>> This patch follows the discussion about reinitializing the power usage
>> when a C-state is added/removed.
>>
>> https://lkml.org/lkml/2012/10/16/518
>>
>> We realized the power usage field is never filled and when it is
>> filled for tegra, the power_specified flag is not set making all these
>> values to be resetted when the driver is initialized with the set_power_state
>> function.
>>
>> Julius and I feel this is over-engineered and the power_specified
>> flag could be simply removed and continue assuming the states are
>> backward sorted.
>>
>> The menu governor select function is simplified as the power is ordered.
>> Actually the condition is always true with the current code.
>>
>> The cpuidle_play_dead function is also simplified by doing a reverse lookup
>> on the array.
>>
>> The set_power_states function is removed as it does no make sense anymore.
>>
>> Signed-off-by: Daniel Lezcano <[email protected]>
>> ---
>> drivers/cpuidle/cpuidle.c | 17 ++++-------------
>> drivers/cpuidle/driver.c | 25 -------------------------
>> drivers/cpuidle/governors/menu.c | 8 ++------
>> include/linux/cpuidle.h | 2 +-
>> 4 files changed, 7 insertions(+), 45 deletions(-)
>>
>> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
>> index 711dd83..f983262 100644
>> --- a/drivers/cpuidle/cpuidle.c
>> +++ b/drivers/cpuidle/cpuidle.c
>> @@ -69,24 +69,15 @@ int cpuidle_play_dead(void)
>> {
>> struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
>> struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
>> - int i, dead_state = -1;
>> - int power_usage = -1;
>> + int i;
>>
>> if (!drv)
>> return -ENODEV;
>>
>> /* Find lowest-power state that supports long-term idle */
>> - for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++) {
>> - struct cpuidle_state *s = &drv->states[i];
>> -
>> - if (s->power_usage < power_usage && s->enter_dead) {
>> - power_usage = s->power_usage;
>> - dead_state = i;
>> - }
>> - }
>> -
>> - if (dead_state != -1)
>> - return drv->states[dead_state].enter_dead(dev, dead_state);
>> + for (i = drv->state_count; i >= CPUIDLE_DRIVER_STATE_START; i--)
>> + if (drv->states[i].play_dead)
>
> I guess you meant drv->states[i].enter_dead

Yep :)



--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog