2011-04-20 06:55:26

by Trinabh Gupta

[permalink] [raw]
Subject: [RFC PATCH V3 0/4] cpuidle: global registration of idle states with per-cpu statistics

The core change in this series is to split the cpuidle_device structure
into parts that can be global and parts that has to remain per-cpu.
The per-cpu pieces are mostly generic statistics that can be independent
of current running driver. As a result of these changes, there is single
copy of cpuidle_states structure and single registration done by one
cpu. The low level driver is free to set per-cpu driver data on
each cpu if needed using the cpuidle_set_statedata() as the case
today. Only in very rare cases asymmetric C-states exist which can be
handled within the cpuidle driver. Most architectures do not have
asymmetric C-states.

This patch series along with Len Brown's pm_idle() cleanup
(ref:https://lkml.org/lkml/2011/4/2/8) will simplify the cpuidle framework
and make it easy to port to other architectures like POWER.

References:
https://lkml.org/lkml/2011/2/10/37
https://lkml.org/lkml/2011/3/25/52

First two patches in the series facilitate splitting of cpuidle_states
and cpuidle_device structure and next two patches do the actual split,
change the API's and make existing code follow the changed API.

[1/4] - Move the idle residency accounting part from cpuidle.c to
the respective low level drivers, so that the accounting can
be accurately maintained if the driver decides to demote the
chosen (suggested) by the governor.

[2/4] - removes the cpuidle_device()->prepare API since is is not
widely used and the only use case was to allow software
demotion using CPUIDLE_FLAG_IGNORE flag. Both these
functions can be absorbed within the cpuidle back-end
driver ad hence deprecating the prepare routine and the
CPUIDLE_FLAG_IGNORE flag.

- Ref: https://lkml.org/lkml/2011/3/25/52

[3/4] - Splits the usage statistics (read/write) part out of
cpuidle_state structure, so that the states can become read
only and hence made global.

[4/4] - most APIs will now need to pass pointer to both global
cpuidle_driver and per-cpu cpuidle_device structure.

Version 1 is at https://lkml.org/lkml/2011/3/22/161
Version 2 is at https://lkml.org/lkml/2011/4/11/32

Changes from V2:

1. Enabled global registration for arm at91_idle_driver,
davinci_idle_driver, kirkwood_idle_driver, omap3_idle_driver cpuidle
drivers. Enabled global registration for x86 intel_idle_driver also.

2. Made ladder governor follow new changed API. Thus both menu and ladder
governors work with these changes.

This patch series applies on top of 2.6.39-rc2 and is tested on x86 Nehalem
system with multiple ACPI C-States with both acpi_idle and intel_idle
cpuidle drivers. Note that this code is not tested for arm yet.

Thanks,
-Trinabh


2011-04-20 06:55:48

by Trinabh Gupta

[permalink] [raw]
Subject: [RFC PATCH V3 1/4] cpuidle: Move dev->last_residency update to driver enter routine; remove dev->last_state

Cpuidle subsystem only suggests the state to enter and does not
guarantee if the suggested state is entered. The actual entered state
may be different because of software or hardware demotion. Software
demotion is done by the back-end cpuidle driver and can be accounted
correctly. Current cpuidle code uses last_state field to capture the
actual state entered and based on that updates the statistics for the
state entered.

Ideally the driver enter routine should update the counters,
and it should return the state actually entered rather than the time
spent there. The generic cpuidle code should simply handle where
the counters live in the sysfs namespace, not updating the counters.

Reference:
https://lkml.org/lkml/2011/3/25/52

Signed-off-by: Trinabh Gupta <[email protected]>
---

arch/arm/mach-at91/cpuidle.c | 14 ++++--
arch/arm/mach-davinci/cpuidle.c | 12 ++++-
arch/arm/mach-kirkwood/cpuidle.c | 14 ++++--
arch/arm/mach-omap2/cpuidle34xx.c | 55 ++++++++++++++--------
arch/sh/kernel/cpu/shmobile/cpuidle.c | 15 ++++--
drivers/acpi/processor_idle.c | 81 ++++++++++++++++++++++-----------
drivers/cpuidle/cpuidle.c | 27 ++++-------
drivers/cpuidle/governors/ladder.c | 13 +++++
drivers/cpuidle/governors/menu.c | 7 ++-
drivers/idle/intel_idle.c | 14 ++++--
include/linux/cpuidle.h | 7 +--
11 files changed, 169 insertions(+), 90 deletions(-)

diff --git a/arch/arm/mach-at91/cpuidle.c b/arch/arm/mach-at91/cpuidle.c
index 1cfeac1..c85da01 100644
--- a/arch/arm/mach-at91/cpuidle.c
+++ b/arch/arm/mach-at91/cpuidle.c
@@ -33,7 +33,7 @@ static struct cpuidle_driver at91_idle_driver = {

/* Actual code that puts the SoC in different idle states */
static int at91_enter_idle(struct cpuidle_device *dev,
- struct cpuidle_state *state)
+ int index)
{
struct timeval before, after;
int idle_time;
@@ -41,10 +41,10 @@ static int at91_enter_idle(struct cpuidle_device *dev,

local_irq_disable();
do_gettimeofday(&before);
- if (state == &dev->states[0])
+ if (index == 0)
/* Wait for interrupt state */
cpu_do_idle();
- else if (state == &dev->states[1]) {
+ else if (index == 1) {
asm("b 1f; .align 5; 1:");
asm("mcr p15, 0, r0, c7, c10, 4"); /* drain write buffer */
saved_lpr = sdram_selfrefresh_enable();
@@ -55,7 +55,13 @@ static int at91_enter_idle(struct cpuidle_device *dev,
local_irq_enable();
idle_time = (after.tv_sec - before.tv_sec) * USEC_PER_SEC +
(after.tv_usec - before.tv_usec);
- return idle_time;
+
+ /* Update cpuidle counters */
+ dev->last_residency = idle_time;
+ dev->states[index].time += (unsigned long long)dev->last_residency;
+ dev->states[index].usage++;
+
+ return index;
}

/* Initialize CPU idle by registering the idle states */
diff --git a/arch/arm/mach-davinci/cpuidle.c b/arch/arm/mach-davinci/cpuidle.c
index bd59f31..0053aaa 100644
--- a/arch/arm/mach-davinci/cpuidle.c
+++ b/arch/arm/mach-davinci/cpuidle.c
@@ -78,9 +78,9 @@ static struct davinci_ops davinci_states[DAVINCI_CPUIDLE_MAX_STATES] = {

/* Actual code that puts the SoC in different idle states */
static int davinci_enter_idle(struct cpuidle_device *dev,
- struct cpuidle_state *state)
+ int index)
{
- struct davinci_ops *ops = cpuidle_get_statedata(state);
+ struct davinci_ops *ops = cpuidle_get_statedata(&dev->states[index]);
struct timeval before, after;
int idle_time;

@@ -98,7 +98,13 @@ static int davinci_enter_idle(struct cpuidle_device *dev,
local_irq_enable();
idle_time = (after.tv_sec - before.tv_sec) * USEC_PER_SEC +
(after.tv_usec - before.tv_usec);
- return idle_time;
+
+ /* Update cpuidle counters */
+ dev->last_residency = idle_time;
+ dev->states[index].time += (unsigned long long)dev->last_residency;
+ dev->states[index].usage++;
+
+ return index;
}

static int __init davinci_cpuidle_probe(struct platform_device *pdev)
diff --git a/arch/arm/mach-kirkwood/cpuidle.c b/arch/arm/mach-kirkwood/cpuidle.c
index f68d33f..a5f6fef 100644
--- a/arch/arm/mach-kirkwood/cpuidle.c
+++ b/arch/arm/mach-kirkwood/cpuidle.c
@@ -32,17 +32,17 @@ static DEFINE_PER_CPU(struct cpuidle_device, kirkwood_cpuidle_device);

/* Actual code that puts the SoC in different idle states */
static int kirkwood_enter_idle(struct cpuidle_device *dev,
- struct cpuidle_state *state)
+ int index)
{
struct timeval before, after;
int idle_time;

local_irq_disable();
do_gettimeofday(&before);
- if (state == &dev->states[0])
+ if (index == 0)
/* Wait for interrupt state */
cpu_do_idle();
- else if (state == &dev->states[1]) {
+ else if (index == 1) {
/*
* Following write will put DDR in self refresh.
* Note that we have 256 cycles before DDR puts it
@@ -57,7 +57,13 @@ static int kirkwood_enter_idle(struct cpuidle_device *dev,
local_irq_enable();
idle_time = (after.tv_sec - before.tv_sec) * USEC_PER_SEC +
(after.tv_usec - before.tv_usec);
- return idle_time;
+
+ /* Update cpuidle counters */
+ dev->last_residency = idle_time;
+ dev->states[index].time += (unsigned long long)dev->last_residency;
+ dev->states[index].usage++;
+
+ return index;
}

/* Initialize CPU idle by registering the idle states */
diff --git a/arch/arm/mach-omap2/cpuidle34xx.c b/arch/arm/mach-omap2/cpuidle34xx.c
index 1c240ef..0b00872 100644
--- a/arch/arm/mach-omap2/cpuidle34xx.c
+++ b/arch/arm/mach-omap2/cpuidle34xx.c
@@ -114,17 +114,19 @@ static int _cpuidle_deny_idle(struct powerdomain *pwrdm,
/**
* omap3_enter_idle - Programs OMAP3 to enter the specified state
* @dev: cpuidle device
- * @state: The target state to be programmed
+ * @index: the index of state to be entered
*
* Called from the CPUidle framework to program the device to the
* specified target state selected by the governor.
*/
static int omap3_enter_idle(struct cpuidle_device *dev,
- struct cpuidle_state *state)
+ int index)
{
- struct omap3_processor_cx *cx = cpuidle_get_statedata(state);
+ struct omap3_processor_cx *cx =
+ cpuidle_get_statedata(&dev->states[index]);
struct timespec ts_preidle, ts_postidle, ts_idle;
u32 mpu_state = cx->mpu_state, core_state = cx->core_state;
+ int idle_time;

current_cx_state = *cx;

@@ -160,29 +162,38 @@ return_sleep_time:
local_irq_enable();
local_fiq_enable();

- return ts_idle.tv_nsec / NSEC_PER_USEC + ts_idle.tv_sec * USEC_PER_SEC;
+ idle_time = ts_idle.tv_nsec / NSEC_PER_USEC + ts_idle.tv_sec * USEC_PER_SEC;
+
+ /* Update cpuidle counters */
+ dev->last_residency = idle_time;
+ dev->states[index].time += (unsigned long long)dev->last_residency;
+ dev->states[index].usage++;
+
+ return index;
}

/**
* next_valid_state - Find next valid c-state
* @dev: cpuidle device
- * @state: Currently selected c-state
+ * @index: Index of currently selected c-state
*
- * If the current state is valid, it is returned back to the caller.
- * Else, this function searches for a lower c-state which is still
- * valid (as defined in omap3_power_states[]).
+ * If the state corresponding to index is valid, index is returned back
+ * to the caller. Else, this function searches for a lower c-state which is
+ * still valid (as defined in omap3_power_states[]) and returns its index.
*/
-static struct cpuidle_state *next_valid_state(struct cpuidle_device *dev,
- struct cpuidle_state *curr)
+static int next_valid_state(struct cpuidle_device *dev,
+ int index)
{
+ struct cpuidle_state *curr = &dev->states[index];
struct cpuidle_state *next = NULL;
struct omap3_processor_cx *cx;
+ int next_index;

cx = (struct omap3_processor_cx *)cpuidle_get_statedata(curr);

/* Check if current state is valid */
if (cx->valid) {
- return curr;
+ return index;
} else {
u8 idx = OMAP3_STATE_MAX;

@@ -192,6 +203,7 @@ static struct cpuidle_state *next_valid_state(struct cpuidle_device *dev,
for (; idx >= OMAP3_STATE_C1; idx--) {
if (&dev->states[idx] == curr) {
next = &dev->states[idx];
+ next_index = idx;
break;
}
}
@@ -212,6 +224,7 @@ static struct cpuidle_state *next_valid_state(struct cpuidle_device *dev,
cx = cpuidle_get_statedata(&dev->states[idx]);
if (cx->valid) {
next = &dev->states[idx];
+ next_index = idx;
break;
}
}
@@ -221,30 +234,31 @@ static struct cpuidle_state *next_valid_state(struct cpuidle_device *dev,
*/
}

- return next;
+ return next_index;
}

/**
* omap3_enter_idle_bm - Checks for any bus activity
* @dev: cpuidle device
- * @state: The target state to be programmed
+ * @index: array index of target state to be programmed
*
* Used for C states with CPUIDLE_FLAG_CHECK_BM flag set. This
* function checks for any pending activity and then programs the
* device to the specified or a safer state.
*/
static int omap3_enter_idle_bm(struct cpuidle_device *dev,
- struct cpuidle_state *state)
+ int index)
{
- struct cpuidle_state *new_state = next_valid_state(dev, state);
+ struct cpuidle_state *state = &dev->states[index];
+ int new_state_idx = next_valid_state(dev, index);
u32 core_next_state, per_next_state = 0, per_saved_state = 0;
u32 cam_state;
struct omap3_processor_cx *cx;
int ret;

if ((state->flags & CPUIDLE_FLAG_CHECK_BM) && omap3_idle_bm_check()) {
- BUG_ON(!dev->safe_state);
- new_state = dev->safe_state;
+ BUG_ON(dev->safe_state_index < 0);
+ new_state_idx = dev->safe_state_index;
goto select_state;
}

@@ -265,7 +279,7 @@ static int omap3_enter_idle_bm(struct cpuidle_device *dev,
*/
cam_state = pwrdm_read_pwrst(cam_pd);
if (cam_state == PWRDM_POWER_ON) {
- new_state = dev->safe_state;
+ new_state_idx = dev->safe_state_index;
goto select_state;
}

@@ -283,8 +297,7 @@ static int omap3_enter_idle_bm(struct cpuidle_device *dev,
pwrdm_set_next_pwrst(per_pd, per_next_state);

select_state:
- dev->last_state = new_state;
- ret = omap3_enter_idle(dev, new_state);
+ ret = omap3_enter_idle(dev, new_state_idx);

/* Restore original PER state if it was modified */
if (per_next_state != per_saved_state)
@@ -518,7 +531,7 @@ int __init omap3_idle_init(void)
state->enter = (state->flags & CPUIDLE_FLAG_CHECK_BM) ?
omap3_enter_idle_bm : omap3_enter_idle;
if (cx->type == OMAP3_STATE_C1)
- dev->safe_state = state;
+ dev->safe_state_index = count;
sprintf(state->name, "C%d", count+1);
strncpy(state->desc, cx->desc, CPUIDLE_DESC_LEN);
count++;
diff --git a/arch/sh/kernel/cpu/shmobile/cpuidle.c b/arch/sh/kernel/cpu/shmobile/cpuidle.c
index e4469e7..89ac9f4 100644
--- a/arch/sh/kernel/cpu/shmobile/cpuidle.c
+++ b/arch/sh/kernel/cpu/shmobile/cpuidle.c
@@ -25,11 +25,11 @@ static unsigned long cpuidle_mode[] = {
};

static int cpuidle_sleep_enter(struct cpuidle_device *dev,
- struct cpuidle_state *state)
+ int index)
{
unsigned long allowed_mode = arch_hwblk_sleep_mode();
ktime_t before, after;
- int requested_state = state - &dev->states[0];
+ int requested_state = index;
int allowed_state;
int k;

@@ -46,11 +46,16 @@ static int cpuidle_sleep_enter(struct cpuidle_device *dev,
*/
k = min_t(int, allowed_state, requested_state);

- dev->last_state = &dev->states[k];
before = ktime_get();
sh_mobile_call_standby(cpuidle_mode[k]);
after = ktime_get();
- return ktime_to_ns(ktime_sub(after, before)) >> 10;
+
+ /* Update cpuidle counters */
+ dev->last_residency = (int)ktime_to_ns(ktime_sub(after, before)) >> 10;
+ dev->states[k].time += (unsigned long long)dev->last_residency;
+ dev->states[k].usage++;
+
+ return k;
}

static struct cpuidle_device cpuidle_dev;
@@ -84,7 +89,7 @@ void sh_mobile_setup_cpuidle(void)
state->flags |= CPUIDLE_FLAG_TIME_VALID;
state->enter = cpuidle_sleep_enter;

- dev->safe_state = state;
+ dev->safe_state_index = i-1;

if (sh_mobile_sleep_supported & SUSP_SH_SF) {
state = &dev->states[i++];
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index d615b7d..00712a7 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -741,22 +741,24 @@ static inline void acpi_idle_do_entry(struct acpi_processor_cx *cx)
/**
* acpi_idle_enter_c1 - enters an ACPI C1 state-type
* @dev: the target CPU
- * @state: the state data
+ * @index: index of target state
*
* This is equivalent to the HALT instruction.
*/
static int acpi_idle_enter_c1(struct cpuidle_device *dev,
- struct cpuidle_state *state)
+ int index)
{
ktime_t kt1, kt2;
s64 idle_time;
struct acpi_processor *pr;
+ struct cpuidle_state *state = &dev->states[index];
struct acpi_processor_cx *cx = cpuidle_get_statedata(state);

pr = __this_cpu_read(processors);
+ dev->last_residency = 0;

if (unlikely(!pr))
- return 0;
+ return -EINVAL;

local_irq_disable();

@@ -764,7 +766,7 @@ static int acpi_idle_enter_c1(struct cpuidle_device *dev,
if (acpi_idle_suspend) {
local_irq_enable();
cpu_relax();
- return 0;
+ return -EINVAL;
}

lapic_timer_state_broadcast(pr, cx, 1);
@@ -773,37 +775,48 @@ static int acpi_idle_enter_c1(struct cpuidle_device *dev,
kt2 = ktime_get_real();
idle_time = ktime_to_us(ktime_sub(kt2, kt1));

+ /* Update device last_residency and state counters*/
+ dev->last_residency = (int)idle_time;
+ dev->states[index].time += (unsigned long long)dev->last_residency;
+ dev->states[index].usage++;
+
local_irq_enable();
cx->usage++;
lapic_timer_state_broadcast(pr, cx, 0);

- return idle_time;
+ return index;
}

/**
* acpi_idle_enter_simple - enters an ACPI state without BM handling
* @dev: the target CPU
- * @state: the state data
+ * @index: the index of suggested state
*/
static int acpi_idle_enter_simple(struct cpuidle_device *dev,
- struct cpuidle_state *state)
+ int index)
{
struct acpi_processor *pr;
+ struct cpuidle_state *state = &dev->states[index];
struct acpi_processor_cx *cx = cpuidle_get_statedata(state);
ktime_t kt1, kt2;
s64 idle_time_ns;
s64 idle_time;

pr = __this_cpu_read(processors);
+ dev->last_residency = 0;

if (unlikely(!pr))
- return 0;
-
- if (acpi_idle_suspend)
- return(acpi_idle_enter_c1(dev, state));
+ return -EINVAL;

local_irq_disable();

+ if (acpi_idle_suspend) {
+ local_irq_enable();
+ cpu_relax();
+ return -EINVAL;
+ }
+
+
if (cx->entry_method != ACPI_CSTATE_FFH) {
current_thread_info()->status &= ~TS_POLLING;
/*
@@ -815,7 +828,7 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev,
if (unlikely(need_resched())) {
current_thread_info()->status |= TS_POLLING;
local_irq_enable();
- return 0;
+ return -EINVAL;
}
}

@@ -837,6 +850,11 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev,
idle_time = idle_time_ns;
do_div(idle_time, NSEC_PER_USEC);

+ /* Update device last_residency and state counters*/
+ dev->last_residency = (int)idle_time;
+ dev->states[index].time += (unsigned long long)dev->last_residency;
+ dev->states[index].usage++;
+
/* Tell the scheduler how much we idled: */
sched_clock_idle_wakeup_event(idle_time_ns);

@@ -848,7 +866,7 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev,

lapic_timer_state_broadcast(pr, cx, 0);
cx->time += idle_time;
- return idle_time;
+ return index;
}

static int c3_cpu_count;
@@ -857,14 +875,15 @@ static DEFINE_SPINLOCK(c3_lock);
/**
* acpi_idle_enter_bm - enters C3 with proper BM handling
* @dev: the target CPU
- * @state: the state data
+ * @index: the index of suggested state
*
* If BM is detected, the deepest non-C3 idle state is entered instead.
*/
static int acpi_idle_enter_bm(struct cpuidle_device *dev,
- struct cpuidle_state *state)
+ int index)
{
struct acpi_processor *pr;
+ struct cpuidle_state *state = &dev->states[index];
struct acpi_processor_cx *cx = cpuidle_get_statedata(state);
ktime_t kt1, kt2;
s64 idle_time_ns;
@@ -872,22 +891,26 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,


pr = __this_cpu_read(processors);
+ dev->last_residency = 0;

if (unlikely(!pr))
- return 0;
+ return -EINVAL;

- if (acpi_idle_suspend)
- return(acpi_idle_enter_c1(dev, state));
+
+ if (acpi_idle_suspend) {
+ cpu_relax();
+ return -EINVAL;
+ }

if (!cx->bm_sts_skip && acpi_idle_bm_check()) {
- if (dev->safe_state) {
- dev->last_state = dev->safe_state;
- return dev->safe_state->enter(dev, dev->safe_state);
+ if (dev->safe_state_index >= 0) {
+ return dev->states[dev->safe_state_index].enter(dev,
+ dev->safe_state_index);
} else {
local_irq_disable();
acpi_safe_halt();
local_irq_enable();
- return 0;
+ return -EINVAL;
}
}

@@ -904,7 +927,7 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,
if (unlikely(need_resched())) {
current_thread_info()->status |= TS_POLLING;
local_irq_enable();
- return 0;
+ return -EINVAL;
}
}

@@ -954,6 +977,11 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,
idle_time = idle_time_ns;
do_div(idle_time, NSEC_PER_USEC);

+ /* Update device last_residency and state counters*/
+ dev->last_residency = (int)idle_time;
+ dev->states[index].time += (unsigned long long)dev->last_residency;
+ dev->states[index].usage++;
+
/* Tell the scheduler how much we idled: */
sched_clock_idle_wakeup_event(idle_time_ns);

@@ -965,7 +993,7 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,

lapic_timer_state_broadcast(pr, cx, 0);
cx->time += idle_time;
- return idle_time;
+ return index;
}

struct cpuidle_driver acpi_idle_driver = {
@@ -992,6 +1020,7 @@ static int acpi_processor_setup_cpuidle(struct acpi_processor *pr)
}

dev->cpu = pr->id;
+ dev->safe_state_index = -1;
for (i = 0; i < CPUIDLE_STATE_MAX; i++) {
dev->states[i].name[0] = '\0';
dev->states[i].desc[0] = '\0';
@@ -1027,13 +1056,13 @@ static int acpi_processor_setup_cpuidle(struct acpi_processor *pr)
state->flags |= CPUIDLE_FLAG_TIME_VALID;

state->enter = acpi_idle_enter_c1;
- dev->safe_state = state;
+ dev->safe_state_index = count;
break;

case ACPI_STATE_C2:
state->flags |= CPUIDLE_FLAG_TIME_VALID;
state->enter = acpi_idle_enter_simple;
- dev->safe_state = state;
+ dev->safe_state_index = count;
break;

case ACPI_STATE_C3:
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index bf50924..355b078 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -51,7 +51,7 @@ static void cpuidle_idle_call(void)
{
struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
struct cpuidle_state *target_state;
- int next_state;
+ int next_state, entered_state;

/* check if the device is ready */
if (!dev || !dev->enabled) {
@@ -94,26 +94,18 @@ static void cpuidle_idle_call(void)

target_state = &dev->states[next_state];

- /* enter the state and update stats */
- dev->last_state = target_state;
-
+ /* Is using next_state here correct?? */
trace_power_start(POWER_CSTATE, next_state, dev->cpu);
trace_cpu_idle(next_state, dev->cpu);

- dev->last_residency = target_state->enter(dev, target_state);
+ entered_state = target_state->enter(dev, next_state);

trace_power_end(dev->cpu);
trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);

- if (dev->last_state)
- target_state = dev->last_state;
-
- target_state->time += (unsigned long long)dev->last_residency;
- target_state->usage++;
-
/* give the governor an opportunity to reflect on the outcome */
if (cpuidle_curr_governor->reflect)
- cpuidle_curr_governor->reflect(dev);
+ cpuidle_curr_governor->reflect(dev, entered_state);
}

/**
@@ -162,11 +154,10 @@ void cpuidle_resume_and_unlock(void)
EXPORT_SYMBOL_GPL(cpuidle_resume_and_unlock);

#ifdef CONFIG_ARCH_HAS_CPU_RELAX
-static int poll_idle(struct cpuidle_device *dev, struct cpuidle_state *st)
+static int poll_idle(struct cpuidle_device *dev, int index)
{
ktime_t t1, t2;
s64 diff;
- int ret;

t1 = ktime_get();
local_irq_enable();
@@ -178,8 +169,11 @@ static int poll_idle(struct cpuidle_device *dev, struct cpuidle_state *st)
if (diff > INT_MAX)
diff = INT_MAX;

- ret = (int) diff;
- return ret;
+ dev->last_residency = (int) diff;
+ dev->states[index].time += (unsigned long long)dev->last_residency;
+ dev->states[index].usage++;
+
+ return index;
}

static void poll_idle_init(struct cpuidle_device *dev)
@@ -238,7 +232,6 @@ int cpuidle_enable_device(struct cpuidle_device *dev)
dev->states[i].time = 0;
}
dev->last_residency = 0;
- dev->last_state = NULL;

smp_wmb();

diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
index 12c9890..6a686a7 100644
--- a/drivers/cpuidle/governors/ladder.c
+++ b/drivers/cpuidle/governors/ladder.c
@@ -153,11 +153,24 @@ static int ladder_enable_device(struct cpuidle_device *dev)
return 0;
}

+/**
+ * ladder_reflect - update the correct last_state_idx
+ * @dev: the CPU
+ * @index: the index of actual state entered
+ */
+static void ladder_reflect(struct cpuidle_device *dev, int index)
+{
+ struct ladder_device *ldev = &__get_cpu_var(ladder_devices);
+ if (index > 0)
+ ldev->last_state_idx = index;
+}
+
static struct cpuidle_governor ladder_governor = {
.name = "ladder",
.rating = 10,
.enable = ladder_enable_device,
.select = ladder_select_state,
+ .reflect = ladder_reflect,
.owner = THIS_MODULE,
};

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index f508690..70d9982 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -308,14 +308,17 @@ static int menu_select(struct cpuidle_device *dev)
/**
* menu_reflect - records that data structures need update
* @dev: the CPU
+ * @index: the index of actual entered state
*
* NOTE: it's important to be fast here because this operation will add to
* the overall exit latency.
*/
-static void menu_reflect(struct cpuidle_device *dev)
+static void menu_reflect(struct cpuidle_device *dev, int index)
{
struct menu_device *data = &__get_cpu_var(menu_devices);
- data->needs_update = 1;
+ data->last_state_idx = index;
+ if (index >= 0)
+ data->needs_update = 1;
}

/**
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index a46dddf..add225c 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -81,7 +81,7 @@ static unsigned int mwait_substates;
static unsigned int lapic_timer_reliable_states = (1 << 1); /* Default to only C1 */

static struct cpuidle_device __percpu *intel_idle_cpuidle_devices;
-static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state);
+static int intel_idle(struct cpuidle_device *dev, int index);

static struct cpuidle_state *cpuidle_state_table;

@@ -209,12 +209,13 @@ static struct cpuidle_state atom_cstates[MWAIT_MAX_NUM_CSTATES] = {
/**
* intel_idle
* @dev: cpuidle_device
- * @state: cpuidle state
+ * @index: index of cpuidle state
*
*/
-static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
+static int intel_idle(struct cpuidle_device *dev, int index)
{
unsigned long ecx = 1; /* break on interrupt flag */
+ struct cpuidle_state *state = &dev->states[index];
unsigned long eax = (unsigned long)cpuidle_get_statedata(state);
unsigned int cstate;
ktime_t kt_before, kt_after;
@@ -256,7 +257,12 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
if (!(lapic_timer_reliable_states & (1 << (cstate))))
clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);

- return usec_delta;
+ /* Update cpuidle counters */
+ dev->last_residency = (int)usec_delta;
+ state->time += (unsigned long long)dev->last_residency;
+ state->usage++;
+
+ return index;
}

static void __setup_broadcast_timer(void *arg)
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 36719ea..45eef60 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -42,7 +42,7 @@ struct cpuidle_state {
unsigned long long time; /* in US */

int (*enter) (struct cpuidle_device *dev,
- struct cpuidle_state *state);
+ int index);
};

/* Idle State Flags */
@@ -87,13 +87,12 @@ struct cpuidle_device {
int state_count;
struct cpuidle_state states[CPUIDLE_STATE_MAX];
struct cpuidle_state_kobj *kobjs[CPUIDLE_STATE_MAX];
- struct cpuidle_state *last_state;

struct list_head device_list;
struct kobject kobj;
struct completion kobj_unregister;
void *governor_data;
- struct cpuidle_state *safe_state;
+ int safe_state_index;

int (*prepare) (struct cpuidle_device *dev);
};
@@ -165,7 +164,7 @@ struct cpuidle_governor {
void (*disable) (struct cpuidle_device *dev);

int (*select) (struct cpuidle_device *dev);
- void (*reflect) (struct cpuidle_device *dev);
+ void (*reflect) (struct cpuidle_device *dev, int index);

struct module *owner;
};

2011-04-20 06:55:59

by Trinabh Gupta

[permalink] [raw]
Subject: [RFC PATCH V3 2/4] cpuidle: Remove CPUIDLE_FLAG_IGNORE and dev->prepare()

The cpuidle_device->prepare() mechanism causes updates to the
cpuidle_state[].flags, setting and clearing CPUIDLE_FLAG_IGNORE
to tell the governor not to chose a state on a per-cpu basis at
run-time. State demotion is now handled by the driver and it returns
the actual state entered. Hence, this mechanism is not required.
Also this removes per-cpu flags from cpuidle_state enabling
it to be made global.

Signed-off-by: Trinabh Gupta <[email protected]>
---

drivers/cpuidle/cpuidle.c | 10 ----------
drivers/cpuidle/governors/menu.c | 2 --
include/linux/cpuidle.h | 3 ---
3 files changed, 0 insertions(+), 15 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 355b078..92a6216 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -75,16 +75,6 @@ static void cpuidle_idle_call(void)
hrtimer_peek_ahead_timers();
#endif

- /*
- * Call the device's prepare function before calling the
- * governor's select function. ->prepare gives the device's
- * cpuidle driver a chance to update any dynamic information
- * of its cpuidle states for the current idle period, e.g.
- * state availability, latencies, residencies, etc.
- */
- if (dev->prepare)
- dev->prepare(dev);
-
/* ask the governor for the next state */
next_state = cpuidle_curr_governor->select(dev);
if (need_resched()) {
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 70d9982..40b5630 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -286,8 +286,6 @@ static int menu_select(struct cpuidle_device *dev)
for (i = CPUIDLE_DRIVER_STATE_START; i < dev->state_count; i++) {
struct cpuidle_state *s = &dev->states[i];

- if (s->flags & CPUIDLE_FLAG_IGNORE)
- continue;
if (s->target_residency > data->predicted_us)
continue;
if (s->exit_latency > latency_req)
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 45eef60..a3306be 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -47,7 +47,6 @@ struct cpuidle_state {

/* Idle State Flags */
#define CPUIDLE_FLAG_TIME_VALID (0x01) /* is residency time measurable? */
-#define CPUIDLE_FLAG_IGNORE (0x100) /* ignore during this idle period */

#define CPUIDLE_DRIVER_FLAGS_MASK (0xFFFF0000)

@@ -93,8 +92,6 @@ struct cpuidle_device {
struct completion kobj_unregister;
void *governor_data;
int safe_state_index;
-
- int (*prepare) (struct cpuidle_device *dev);
};

DECLARE_PER_CPU(struct cpuidle_device *, cpuidle_devices);

2011-04-20 06:56:20

by Trinabh Gupta

[permalink] [raw]
Subject: [RFC PATCH V3 3/4] Split cpuidle_state structure and move per-cpu statistics fields

This is the first step towards global registration of cpuidle
states. The statistics used primarily by the governor are per-cpu
and have to be split from rest of the fields inside cpuidle_state,
which would be made global i.e. single copy. The driver_data field
is also per-cpu and moved.

Signed-off-by: Trinabh Gupta <[email protected]>
---

arch/arm/mach-at91/cpuidle.c | 4 +--
arch/arm/mach-davinci/cpuidle.c | 9 +++---
arch/arm/mach-kirkwood/cpuidle.c | 4 +--
arch/arm/mach-omap2/cpuidle34xx.c | 17 ++++++-----
arch/sh/kernel/cpu/shmobile/cpuidle.c | 4 +--
drivers/acpi/processor_idle.c | 37 +++++++++++------------
drivers/cpuidle/cpuidle.c | 12 ++++----
drivers/cpuidle/sysfs.c | 15 ++++++----
drivers/idle/intel_idle.c | 52 ++++++++++++++++++++++++---------
include/linux/cpuidle.h | 25 ++++++++++------
10 files changed, 108 insertions(+), 71 deletions(-)

diff --git a/arch/arm/mach-at91/cpuidle.c b/arch/arm/mach-at91/cpuidle.c
index c85da01..ed38e3c 100644
--- a/arch/arm/mach-at91/cpuidle.c
+++ b/arch/arm/mach-at91/cpuidle.c
@@ -58,8 +58,8 @@ static int at91_enter_idle(struct cpuidle_device *dev,

/* Update cpuidle counters */
dev->last_residency = idle_time;
- dev->states[index].time += (unsigned long long)dev->last_residency;
- dev->states[index].usage++;
+ dev->states_usage[index].time += (unsigned long long)dev->last_residency;
+ dev->states_usage[index].usage++;

return index;
}
diff --git a/arch/arm/mach-davinci/cpuidle.c b/arch/arm/mach-davinci/cpuidle.c
index 0053aaa..e3aebe6 100644
--- a/arch/arm/mach-davinci/cpuidle.c
+++ b/arch/arm/mach-davinci/cpuidle.c
@@ -80,7 +80,8 @@ static struct davinci_ops davinci_states[DAVINCI_CPUIDLE_MAX_STATES] = {
static int davinci_enter_idle(struct cpuidle_device *dev,
int index)
{
- struct davinci_ops *ops = cpuidle_get_statedata(&dev->states[index]);
+ struct cpuidle_state_usage *state_usage = &dev->states_usage[index];
+ struct davinci_ops *ops = cpuidle_get_statedata(state_usage);
struct timeval before, after;
int idle_time;

@@ -101,8 +102,8 @@ static int davinci_enter_idle(struct cpuidle_device *dev,

/* Update cpuidle counters */
dev->last_residency = idle_time;
- dev->states[index].time += (unsigned long long)dev->last_residency;
- dev->states[index].usage++;
+ state_usage->time += (unsigned long long)dev->last_residency;
+ state_usage->usage++;

return index;
}
@@ -145,7 +146,7 @@ static int __init davinci_cpuidle_probe(struct platform_device *pdev)
strcpy(device->states[1].desc, "WFI and DDR Self Refresh");
if (pdata->ddr2_pdown)
davinci_states[1].flags |= DAVINCI_CPUIDLE_FLAGS_DDR2_PWDN;
- cpuidle_set_statedata(&device->states[1], &davinci_states[1]);
+ cpuidle_set_statedata(&device->states_usage[1], &davinci_states[1]);

device->state_count = DAVINCI_CPUIDLE_MAX_STATES;

diff --git a/arch/arm/mach-kirkwood/cpuidle.c b/arch/arm/mach-kirkwood/cpuidle.c
index a5f6fef..d135a41 100644
--- a/arch/arm/mach-kirkwood/cpuidle.c
+++ b/arch/arm/mach-kirkwood/cpuidle.c
@@ -60,8 +60,8 @@ static int kirkwood_enter_idle(struct cpuidle_device *dev,

/* Update cpuidle counters */
dev->last_residency = idle_time;
- dev->states[index].time += (unsigned long long)dev->last_residency;
- dev->states[index].usage++;
+ dev->states_usage[index].time += (unsigned long long)dev->last_residency;
+ dev->states_usage[index].usage++;

return index;
}
diff --git a/arch/arm/mach-omap2/cpuidle34xx.c b/arch/arm/mach-omap2/cpuidle34xx.c
index 0b00872..4282420 100644
--- a/arch/arm/mach-omap2/cpuidle34xx.c
+++ b/arch/arm/mach-omap2/cpuidle34xx.c
@@ -123,7 +123,7 @@ static int omap3_enter_idle(struct cpuidle_device *dev,
int index)
{
struct omap3_processor_cx *cx =
- cpuidle_get_statedata(&dev->states[index]);
+ cpuidle_get_statedata(&dev->states_usage[index]);
struct timespec ts_preidle, ts_postidle, ts_idle;
u32 mpu_state = cx->mpu_state, core_state = cx->core_state;
int idle_time;
@@ -166,8 +166,8 @@ return_sleep_time:

/* Update cpuidle counters */
dev->last_residency = idle_time;
- dev->states[index].time += (unsigned long long)dev->last_residency;
- dev->states[index].usage++;
+ dev->states_usage[index].time += (unsigned long long)dev->last_residency;
+ dev->states_usage[index].usage++;

return index;
}
@@ -185,11 +185,12 @@ static int next_valid_state(struct cpuidle_device *dev,
int index)
{
struct cpuidle_state *curr = &dev->states[index];
+ struct cpuidle_state_usage *curr_usage = &dev->states_usage[index];
struct cpuidle_state *next = NULL;
struct omap3_processor_cx *cx;
int next_index;

- cx = (struct omap3_processor_cx *)cpuidle_get_statedata(curr);
+ cx = (struct omap3_processor_cx *)cpuidle_get_statedata(curr_usage);

/* Check if current state is valid */
if (cx->valid) {
@@ -221,7 +222,7 @@ static int next_valid_state(struct cpuidle_device *dev,
for (; idx >= OMAP3_STATE_C1; idx--) {
struct omap3_processor_cx *cx;

- cx = cpuidle_get_statedata(&dev->states[idx]);
+ cx = cpuidle_get_statedata(&dev->states_usage[idx]);
if (cx->valid) {
next = &dev->states[idx];
next_index = idx;
@@ -262,7 +263,7 @@ static int omap3_enter_idle_bm(struct cpuidle_device *dev,
goto select_state;
}

- cx = cpuidle_get_statedata(state);
+ cx = cpuidle_get_statedata(&dev->states_usage[index]);
core_next_state = cx->core_state;

/*
@@ -506,6 +507,7 @@ int __init omap3_idle_init(void)
int i, count = 0;
struct omap3_processor_cx *cx;
struct cpuidle_state *state;
+ struct cpuidle_state_usage *state_usage;
struct cpuidle_device *dev;

mpu_pd = pwrdm_lookup("mpu_pwrdm");
@@ -521,10 +523,11 @@ int __init omap3_idle_init(void)
for (i = OMAP3_STATE_C1; i < OMAP3_MAX_STATES; i++) {
cx = &omap3_power_states[i];
state = &dev->states[count];
+ state_usage = &dev->states_usage[count];

if (!cx->valid)
continue;
- cpuidle_set_statedata(state, cx);
+ cpuidle_set_statedata(state_usage, cx);
state->exit_latency = cx->sleep_latency + cx->wakeup_latency;
state->target_residency = cx->threshold;
state->flags = cx->flags;
diff --git a/arch/sh/kernel/cpu/shmobile/cpuidle.c b/arch/sh/kernel/cpu/shmobile/cpuidle.c
index 89ac9f4..2340d62 100644
--- a/arch/sh/kernel/cpu/shmobile/cpuidle.c
+++ b/arch/sh/kernel/cpu/shmobile/cpuidle.c
@@ -52,8 +52,8 @@ static int cpuidle_sleep_enter(struct cpuidle_device *dev,

/* Update cpuidle counters */
dev->last_residency = (int)ktime_to_ns(ktime_sub(after, before)) >> 10;
- dev->states[k].time += (unsigned long long)dev->last_residency;
- dev->states[k].usage++;
+ dev->states_usage[k].time += (unsigned long long)dev->last_residency;
+ dev->states_usage[k].usage++;

return k;
}
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 00712a7..bd29363 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -745,14 +745,13 @@ static inline void acpi_idle_do_entry(struct acpi_processor_cx *cx)
*
* This is equivalent to the HALT instruction.
*/
-static int acpi_idle_enter_c1(struct cpuidle_device *dev,
- int index)
+static int acpi_idle_enter_c1(struct cpuidle_device *dev, int index)
{
ktime_t kt1, kt2;
s64 idle_time;
struct acpi_processor *pr;
- struct cpuidle_state *state = &dev->states[index];
- struct acpi_processor_cx *cx = cpuidle_get_statedata(state);
+ struct cpuidle_state_usage *state_usage = &dev->states_usage[index];
+ struct acpi_processor_cx *cx = cpuidle_get_statedata(state_usage);

pr = __this_cpu_read(processors);
dev->last_residency = 0;
@@ -777,8 +776,8 @@ static int acpi_idle_enter_c1(struct cpuidle_device *dev,

/* Update device last_residency and state counters*/
dev->last_residency = (int)idle_time;
- dev->states[index].time += (unsigned long long)dev->last_residency;
- dev->states[index].usage++;
+ state_usage->time += (unsigned long long)dev->last_residency;
+ state_usage->usage++;

local_irq_enable();
cx->usage++;
@@ -792,12 +791,11 @@ static int acpi_idle_enter_c1(struct cpuidle_device *dev,
* @dev: the target CPU
* @index: the index of suggested state
*/
-static int acpi_idle_enter_simple(struct cpuidle_device *dev,
- int index)
+static int acpi_idle_enter_simple(struct cpuidle_device *dev, int index)
{
struct acpi_processor *pr;
- struct cpuidle_state *state = &dev->states[index];
- struct acpi_processor_cx *cx = cpuidle_get_statedata(state);
+ struct cpuidle_state_usage *state_usage = &dev->states_usage[index];
+ struct acpi_processor_cx *cx = cpuidle_get_statedata(state_usage);
ktime_t kt1, kt2;
s64 idle_time_ns;
s64 idle_time;
@@ -852,8 +850,8 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev,

/* Update device last_residency and state counters*/
dev->last_residency = (int)idle_time;
- dev->states[index].time += (unsigned long long)dev->last_residency;
- dev->states[index].usage++;
+ state_usage->time += (unsigned long long)dev->last_residency;
+ state_usage->usage++;

/* Tell the scheduler how much we idled: */
sched_clock_idle_wakeup_event(idle_time_ns);
@@ -879,12 +877,11 @@ static DEFINE_SPINLOCK(c3_lock);
*
* If BM is detected, the deepest non-C3 idle state is entered instead.
*/
-static int acpi_idle_enter_bm(struct cpuidle_device *dev,
- int index)
+static int acpi_idle_enter_bm(struct cpuidle_device *dev, int index)
{
struct acpi_processor *pr;
- struct cpuidle_state *state = &dev->states[index];
- struct acpi_processor_cx *cx = cpuidle_get_statedata(state);
+ struct cpuidle_state_usage *state_usage = &dev->states_usage[index];
+ struct acpi_processor_cx *cx = cpuidle_get_statedata(state_usage);
ktime_t kt1, kt2;
s64 idle_time_ns;
s64 idle_time;
@@ -979,8 +976,8 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,

/* Update device last_residency and state counters*/
dev->last_residency = (int)idle_time;
- dev->states[index].time += (unsigned long long)dev->last_residency;
- dev->states[index].usage++;
+ state_usage->time += (unsigned long long)dev->last_residency;
+ state_usage->usage++;

/* Tell the scheduler how much we idled: */
sched_clock_idle_wakeup_event(idle_time_ns);
@@ -1010,6 +1007,7 @@ static int acpi_processor_setup_cpuidle(struct acpi_processor *pr)
int i, count = CPUIDLE_DRIVER_STATE_START;
struct acpi_processor_cx *cx;
struct cpuidle_state *state;
+ struct cpuidle_state_usage *state_usage;
struct cpuidle_device *dev = &pr->power.dev;

if (!pr->flags.power_setup_done)
@@ -1032,6 +1030,7 @@ static int acpi_processor_setup_cpuidle(struct acpi_processor *pr)
for (i = 1; i < ACPI_PROCESSOR_MAX_POWER && i <= max_cstate; i++) {
cx = &pr->power.states[i];
state = &dev->states[count];
+ state_usage = &dev->states_usage[count];

if (!cx->valid)
continue;
@@ -1042,7 +1041,7 @@ static int acpi_processor_setup_cpuidle(struct acpi_processor *pr)
!(acpi_gbl_FADT.flags & ACPI_FADT_C2_MP_SUPPORTED))
continue;
#endif
- cpuidle_set_statedata(state, cx);
+ cpuidle_set_statedata(state_usage, cx);

snprintf(state->name, CPUIDLE_NAME_LEN, "C%d", i);
strncpy(state->desc, cx->desc, CPUIDLE_DESC_LEN);
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 92a6216..5d6f98d 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -160,8 +160,9 @@ static int poll_idle(struct cpuidle_device *dev, int index)
diff = INT_MAX;

dev->last_residency = (int) diff;
- dev->states[index].time += (unsigned long long)dev->last_residency;
- dev->states[index].usage++;
+ dev->states_usage[index].time +=
+ (unsigned long long)dev->last_residency;
+ dev->states_usage[index].usage++;

return index;
}
@@ -169,8 +170,9 @@ static int poll_idle(struct cpuidle_device *dev, int index)
static void poll_idle_init(struct cpuidle_device *dev)
{
struct cpuidle_state *state = &dev->states[0];
+ struct cpuidle_state_usage *state_usage = &dev->states_usage[0];

- cpuidle_set_statedata(state, NULL);
+ cpuidle_set_statedata(state_usage, NULL);

snprintf(state->name, CPUIDLE_NAME_LEN, "POLL");
snprintf(state->desc, CPUIDLE_DESC_LEN, "CPUIDLE CORE POLL IDLE");
@@ -218,8 +220,8 @@ int cpuidle_enable_device(struct cpuidle_device *dev)
goto fail_sysfs;

for (i = 0; i < dev->state_count; i++) {
- dev->states[i].usage = 0;
- dev->states[i].time = 0;
+ dev->states_usage[i].usage = 0;
+ dev->states_usage[i].time = 0;
}
dev->last_residency = 0;

diff --git a/drivers/cpuidle/sysfs.c b/drivers/cpuidle/sysfs.c
index be7917e..09c9c77 100644
--- a/drivers/cpuidle/sysfs.c
+++ b/drivers/cpuidle/sysfs.c
@@ -216,7 +216,7 @@ static struct kobj_type ktype_cpuidle = {

struct cpuidle_state_attr {
struct attribute attr;
- ssize_t (*show)(struct cpuidle_state *, char *);
+ ssize_t (*show)(struct cpuidle_state *, struct cpuidle_state_usage *, char *);
ssize_t (*store)(struct cpuidle_state *, const char *, size_t);
};

@@ -224,19 +224,19 @@ struct cpuidle_state_attr {
static struct cpuidle_state_attr attr_##_name = __ATTR(_name, 0444, show, NULL)

#define define_show_state_function(_name) \
-static ssize_t show_state_##_name(struct cpuidle_state *state, char *buf) \
+static ssize_t show_state_##_name(struct cpuidle_state *state, struct cpuidle_state_usage *state_usage, char *buf) \
{ \
return sprintf(buf, "%u\n", state->_name);\
}

#define define_show_state_ull_function(_name) \
-static ssize_t show_state_##_name(struct cpuidle_state *state, char *buf) \
+static ssize_t show_state_##_name(struct cpuidle_state *state, struct cpuidle_state_usage *state_usage, char *buf) \
{ \
- return sprintf(buf, "%llu\n", state->_name);\
+ return sprintf(buf, "%llu\n", state_usage->_name);\
}

#define define_show_state_str_function(_name) \
-static ssize_t show_state_##_name(struct cpuidle_state *state, char *buf) \
+static ssize_t show_state_##_name(struct cpuidle_state *state, struct cpuidle_state_usage *state_usage, char *buf) \
{ \
if (state->_name[0] == '\0')\
return sprintf(buf, "<null>\n");\
@@ -269,16 +269,18 @@ static struct attribute *cpuidle_state_default_attrs[] = {

#define kobj_to_state_obj(k) container_of(k, struct cpuidle_state_kobj, kobj)
#define kobj_to_state(k) (kobj_to_state_obj(k)->state)
+#define kobj_to_state_usage(k) (kobj_to_state_obj(k)->state_usage)
#define attr_to_stateattr(a) container_of(a, struct cpuidle_state_attr, attr)
static ssize_t cpuidle_state_show(struct kobject * kobj,
struct attribute * attr ,char * buf)
{
int ret = -EIO;
struct cpuidle_state *state = kobj_to_state(kobj);
+ struct cpuidle_state_usage *state_usage = kobj_to_state_usage(kobj);
struct cpuidle_state_attr * cattr = attr_to_stateattr(attr);

if (cattr->show)
- ret = cattr->show(state, buf);
+ ret = cattr->show(state, state_usage, buf);

return ret;
}
@@ -323,6 +325,7 @@ int cpuidle_add_state_sysfs(struct cpuidle_device *device)
if (!kobj)
goto error_state;
kobj->state = &device->states[i];
+ kobj->state_usage = &device->states_usage[i];
init_completion(&kobj->kobj_unregister);

ret = kobject_init_and_add(&kobj->kobj, &ktype_state_cpuidle, &device->kobj,
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index add225c..4f92d96 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -109,7 +109,6 @@ static struct cpuidle_state nehalem_cstates[MWAIT_MAX_NUM_CSTATES] = {
{ /* MWAIT C1 */
.name = "C1-NHM",
.desc = "MWAIT 0x00",
- .driver_data = (void *) 0x00,
.flags = CPUIDLE_FLAG_TIME_VALID,
.exit_latency = 3,
.target_residency = 6,
@@ -117,7 +116,6 @@ static struct cpuidle_state nehalem_cstates[MWAIT_MAX_NUM_CSTATES] = {
{ /* MWAIT C2 */
.name = "C3-NHM",
.desc = "MWAIT 0x10",
- .driver_data = (void *) 0x10,
.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
.exit_latency = 20,
.target_residency = 80,
@@ -125,7 +123,6 @@ static struct cpuidle_state nehalem_cstates[MWAIT_MAX_NUM_CSTATES] = {
{ /* MWAIT C3 */
.name = "C6-NHM",
.desc = "MWAIT 0x20",
- .driver_data = (void *) 0x20,
.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
.exit_latency = 200,
.target_residency = 800,
@@ -137,7 +134,6 @@ static struct cpuidle_state snb_cstates[MWAIT_MAX_NUM_CSTATES] = {
{ /* MWAIT C1 */
.name = "C1-SNB",
.desc = "MWAIT 0x00",
- .driver_data = (void *) 0x00,
.flags = CPUIDLE_FLAG_TIME_VALID,
.exit_latency = 1,
.target_residency = 1,
@@ -145,7 +141,6 @@ static struct cpuidle_state snb_cstates[MWAIT_MAX_NUM_CSTATES] = {
{ /* MWAIT C2 */
.name = "C3-SNB",
.desc = "MWAIT 0x10",
- .driver_data = (void *) 0x10,
.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
.exit_latency = 80,
.target_residency = 211,
@@ -153,7 +148,6 @@ static struct cpuidle_state snb_cstates[MWAIT_MAX_NUM_CSTATES] = {
{ /* MWAIT C3 */
.name = "C6-SNB",
.desc = "MWAIT 0x20",
- .driver_data = (void *) 0x20,
.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
.exit_latency = 104,
.target_residency = 345,
@@ -161,7 +155,6 @@ static struct cpuidle_state snb_cstates[MWAIT_MAX_NUM_CSTATES] = {
{ /* MWAIT C4 */
.name = "C7-SNB",
.desc = "MWAIT 0x30",
- .driver_data = (void *) 0x30,
.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
.exit_latency = 109,
.target_residency = 345,
@@ -173,7 +166,6 @@ static struct cpuidle_state atom_cstates[MWAIT_MAX_NUM_CSTATES] = {
{ /* MWAIT C1 */
.name = "C1-ATM",
.desc = "MWAIT 0x00",
- .driver_data = (void *) 0x00,
.flags = CPUIDLE_FLAG_TIME_VALID,
.exit_latency = 1,
.target_residency = 4,
@@ -181,7 +173,6 @@ static struct cpuidle_state atom_cstates[MWAIT_MAX_NUM_CSTATES] = {
{ /* MWAIT C2 */
.name = "C2-ATM",
.desc = "MWAIT 0x10",
- .driver_data = (void *) 0x10,
.flags = CPUIDLE_FLAG_TIME_VALID,
.exit_latency = 20,
.target_residency = 80,
@@ -190,7 +181,6 @@ static struct cpuidle_state atom_cstates[MWAIT_MAX_NUM_CSTATES] = {
{ /* MWAIT C4 */
.name = "C4-ATM",
.desc = "MWAIT 0x30",
- .driver_data = (void *) 0x30,
.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
.exit_latency = 100,
.target_residency = 400,
@@ -199,13 +189,43 @@ static struct cpuidle_state atom_cstates[MWAIT_MAX_NUM_CSTATES] = {
{ /* MWAIT C6 */
.name = "C6-ATM",
.desc = "MWAIT 0x52",
- .driver_data = (void *) 0x52,
.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
.exit_latency = 140,
.target_residency = 560,
.enter = &intel_idle },
};

+static int get_driver_data(int cstate)
+{
+ int driver_data;
+ switch (cstate) {
+
+ case 0: /* MWAIT C0 */
+ return -EINVAL;
+ case 1: /* MWAIT C1 */
+ driver_data = 0x00;
+ break;
+ case 2: /* MWAIT C2 */
+ driver_data = 0x10;
+ break;
+ case 3: /* MWAIT C3 */
+ driver_data = 0x20;
+ break;
+ case 4: /* MWAIT C4 */
+ driver_data = 0x30;
+ break;
+ case 5: /* MWAIT C5 */
+ driver_data = 0x40;
+ break;
+ case 6: /* MWAIT C6 */
+ driver_data = 0x52;
+ break;
+ default:
+ return -EINVAL;
+ }
+ return driver_data;
+}
+
/**
* intel_idle
* @dev: cpuidle_device
@@ -216,7 +236,8 @@ static int intel_idle(struct cpuidle_device *dev, int index)
{
unsigned long ecx = 1; /* break on interrupt flag */
struct cpuidle_state *state = &dev->states[index];
- unsigned long eax = (unsigned long)cpuidle_get_statedata(state);
+ struct cpuidle_state_usage *state_usage = &dev->states_usage[index];
+ unsigned long eax = (unsigned long)cpuidle_get_statedata(state_usage);
unsigned int cstate;
ktime_t kt_before, kt_after;
s64 usec_delta;
@@ -259,8 +280,8 @@ static int intel_idle(struct cpuidle_device *dev, int index)

/* Update cpuidle counters */
dev->last_residency = (int)usec_delta;
- state->time += (unsigned long long)dev->last_residency;
- state->usage++;
+ state_usage->time += (unsigned long long)dev->last_residency;
+ state_usage->usage++;

return index;
}
@@ -453,6 +474,9 @@ static int intel_idle_cpuidle_devices_init(void)
dev->states[dev->state_count] = /* structure copy */
cpuidle_state_table[cstate];

+ dev->states_usage[dev->state_count].driver_data =
+ (void *)get_driver_data(cstate);
+
dev->state_count += 1;
}

diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index a3306be..5a1a238 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -28,19 +28,22 @@ struct cpuidle_device;
* CPUIDLE DEVICE INTERFACE *
****************************/

+struct cpuidle_state_usage {
+ void *driver_data;
+
+ unsigned long long usage;
+ unsigned long long time; /* in US */
+};
+
struct cpuidle_state {
char name[CPUIDLE_NAME_LEN];
char desc[CPUIDLE_DESC_LEN];
- void *driver_data;

unsigned int flags;
unsigned int exit_latency; /* in US */
unsigned int power_usage; /* in mW */
unsigned int target_residency; /* in US */

- unsigned long long usage;
- unsigned long long time; /* in US */
-
int (*enter) (struct cpuidle_device *dev,
int index);
};
@@ -52,26 +55,27 @@ struct cpuidle_state {

/**
* cpuidle_get_statedata - retrieves private driver state data
- * @state: the state
+ * @st_usage: the state usage statistics
*/
-static inline void * cpuidle_get_statedata(struct cpuidle_state *state)
+static inline void *cpuidle_get_statedata(struct cpuidle_state_usage *st_usage)
{
- return state->driver_data;
+ return st_usage->driver_data;
}

/**
* cpuidle_set_statedata - stores private driver state data
- * @state: the state
+ * @st_usage: the state usage statistics
* @data: the private data
*/
static inline void
-cpuidle_set_statedata(struct cpuidle_state *state, void *data)
+cpuidle_set_statedata(struct cpuidle_state_usage *st_usage, void *data)
{
- state->driver_data = data;
+ st_usage->driver_data = data;
}

struct cpuidle_state_kobj {
struct cpuidle_state *state;
+ struct cpuidle_state_usage *state_usage;
struct completion kobj_unregister;
struct kobject kobj;
};
@@ -85,6 +89,7 @@ struct cpuidle_device {
int last_residency;
int state_count;
struct cpuidle_state states[CPUIDLE_STATE_MAX];
+ struct cpuidle_state_usage states_usage[CPUIDLE_STATE_MAX];
struct cpuidle_state_kobj *kobjs[CPUIDLE_STATE_MAX];

struct list_head device_list;

2011-04-20 06:56:46

by Trinabh Gupta

[permalink] [raw]
Subject: [RFC PATCH V3 4/4] cpuidle: Single/Global registration of idle states

With this patch there is single copy of cpuidle_states structure
instead of per-cpu. The statistics needed on per-cpu basis
by the governor are kept per-cpu. This simplifies the cpuidle
subsystem as state registration is done by single cpu only.
Having single copy of cpuidle_states saves memory. Rare case
of asymmetric C-states can be handled within the cpuidle driverand
architectures such as POWER do not have asymmetric C-states.

To Do:

1. Handle the case when idle states may change at run time
and acpi_processor_cst_has_changed() routine is called.

2. Handle acpi_processor_power_exit() correctly i.e. ensure
unregistration of cpuidle driver since registration is now
moved inside acpi_processor_power_init().

Signed-off-by: Trinabh Gupta <[email protected]>
---

arch/arm/mach-at91/cpuidle.c | 31 +++++----
arch/arm/mach-davinci/cpuidle.c | 39 ++++++-----
arch/arm/mach-kirkwood/cpuidle.c | 30 +++++---
arch/arm/mach-omap2/cpuidle34xx.c | 97 +++++++++++++++++++++------
arch/sh/kernel/cpu/shmobile/cpuidle.c | 18 +++--
drivers/acpi/processor_driver.c | 18 +----
drivers/acpi/processor_idle.c | 117 +++++++++++++++++++++++++++------
drivers/cpuidle/cpuidle.c | 46 ++++---------
drivers/cpuidle/driver.c | 25 +++++++
drivers/cpuidle/governors/ladder.c | 26 ++++---
drivers/cpuidle/governors/menu.c | 20 +++---
drivers/cpuidle/sysfs.c | 3 +
drivers/idle/intel_idle.c | 80 +++++++++++++++++------
include/linux/cpuidle.h | 21 ++++--
14 files changed, 378 insertions(+), 193 deletions(-)

diff --git a/arch/arm/mach-at91/cpuidle.c b/arch/arm/mach-at91/cpuidle.c
index ed38e3c..039b378 100644
--- a/arch/arm/mach-at91/cpuidle.c
+++ b/arch/arm/mach-at91/cpuidle.c
@@ -33,6 +33,7 @@ static struct cpuidle_driver at91_idle_driver = {

/* Actual code that puts the SoC in different idle states */
static int at91_enter_idle(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv,
int index)
{
struct timeval before, after;
@@ -68,27 +69,29 @@ static int at91_enter_idle(struct cpuidle_device *dev,
static int at91_init_cpuidle(void)
{
struct cpuidle_device *device;
-
- cpuidle_register_driver(&at91_idle_driver);
+ struct cpuidle_drive *driver = &at91_idle_driver;

device = &per_cpu(at91_cpuidle_device, smp_processor_id());
device->state_count = AT91_MAX_STATES;
+ driver->state_count = AT91_MAX_STATES;

/* Wait for interrupt state */
- device->states[0].enter = at91_enter_idle;
- device->states[0].exit_latency = 1;
- device->states[0].target_residency = 10000;
- device->states[0].flags = CPUIDLE_FLAG_TIME_VALID;
- strcpy(device->states[0].name, "WFI");
- strcpy(device->states[0].desc, "Wait for interrupt");
+ driver->states[0].enter = at91_enter_idle;
+ driver->states[0].exit_latency = 1;
+ driver->states[0].target_residency = 10000;
+ driver->states[0].flags = CPUIDLE_FLAG_TIME_VALID;
+ strcpy(driver->states[0].name, "WFI");
+ strcpy(driver->states[0].desc, "Wait for interrupt");

/* Wait for interrupt and RAM self refresh state */
- device->states[1].enter = at91_enter_idle;
- device->states[1].exit_latency = 10;
- device->states[1].target_residency = 10000;
- device->states[1].flags = CPUIDLE_FLAG_TIME_VALID;
- strcpy(device->states[1].name, "RAM_SR");
- strcpy(device->states[1].desc, "WFI and RAM Self Refresh");
+ driver->states[1].enter = at91_enter_idle;
+ driver->states[1].exit_latency = 10;
+ driver->states[1].target_residency = 10000;
+ driver->states[1].flags = CPUIDLE_FLAG_TIME_VALID;
+ strcpy(driver->states[1].name, "RAM_SR");
+ strcpy(driver->states[1].desc, "WFI and RAM Self Refresh");
+
+ cpuidle_register_driver(&at91_idle_driver);

if (cpuidle_register_device(device)) {
printk(KERN_ERR "at91_init_cpuidle: Failed registering\n");
diff --git a/arch/arm/mach-davinci/cpuidle.c b/arch/arm/mach-davinci/cpuidle.c
index e3aebe6..ba46a83 100644
--- a/arch/arm/mach-davinci/cpuidle.c
+++ b/arch/arm/mach-davinci/cpuidle.c
@@ -78,6 +78,7 @@ static struct davinci_ops davinci_states[DAVINCI_CPUIDLE_MAX_STATES] = {

/* Actual code that puts the SoC in different idle states */
static int davinci_enter_idle(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv,
int index)
{
struct cpuidle_state_usage *state_usage = &dev->states_usage[index];
@@ -112,6 +113,7 @@ static int __init davinci_cpuidle_probe(struct platform_device *pdev)
{
int ret;
struct cpuidle_device *device;
+ strcut cpuidle_driver *driver = &davinci_idle_driver;
struct davinci_cpuidle_config *pdata = pdev->dev.platform_data;

device = &per_cpu(davinci_cpuidle_device, smp_processor_id());
@@ -123,32 +125,33 @@ static int __init davinci_cpuidle_probe(struct platform_device *pdev)

ddr2_reg_base = pdata->ddr2_ctlr_base;

- ret = cpuidle_register_driver(&davinci_idle_driver);
- if (ret) {
- dev_err(&pdev->dev, "failed to register driver\n");
- return ret;
- }
-
/* Wait for interrupt state */
- device->states[0].enter = davinci_enter_idle;
- device->states[0].exit_latency = 1;
- device->states[0].target_residency = 10000;
- device->states[0].flags = CPUIDLE_FLAG_TIME_VALID;
- strcpy(device->states[0].name, "WFI");
- strcpy(device->states[0].desc, "Wait for interrupt");
+ driver->states[0].enter = davinci_enter_idle;
+ driver->states[0].exit_latency = 1;
+ driver->states[0].target_residency = 10000;
+ driver->states[0].flags = CPUIDLE_FLAG_TIME_VALID;
+ strcpy(driver->states[0].name, "WFI");
+ strcpy(driver->states[0].desc, "Wait for interrupt");

/* Wait for interrupt and DDR self refresh state */
- device->states[1].enter = davinci_enter_idle;
- device->states[1].exit_latency = 10;
- device->states[1].target_residency = 10000;
- device->states[1].flags = CPUIDLE_FLAG_TIME_VALID;
- strcpy(device->states[1].name, "DDR SR");
- strcpy(device->states[1].desc, "WFI and DDR Self Refresh");
+ driver->states[1].enter = davinci_enter_idle;
+ driver->states[1].exit_latency = 10;
+ driver->states[1].target_residency = 10000;
+ driver->states[1].flags = CPUIDLE_FLAG_TIME_VALID;
+ strcpy(driver->states[1].name, "DDR SR");
+ strcpy(driver->states[1].desc, "WFI and DDR Self Refresh");
if (pdata->ddr2_pdown)
davinci_states[1].flags |= DAVINCI_CPUIDLE_FLAGS_DDR2_PWDN;
cpuidle_set_statedata(&device->states_usage[1], &davinci_states[1]);

device->state_count = DAVINCI_CPUIDLE_MAX_STATES;
+ driver->state_count = DAVINCI_CPUIDLE_MAX_STATES;
+
+ ret = cpuidle_register_driver(&davinci_idle_driver);
+ if (ret) {
+ dev_err(&pdev->dev, "failed to register driver\n");
+ return ret;
+ }

ret = cpuidle_register_device(device);
if (ret) {
diff --git a/arch/arm/mach-kirkwood/cpuidle.c b/arch/arm/mach-kirkwood/cpuidle.c
index d135a41..f04f6b1 100644
--- a/arch/arm/mach-kirkwood/cpuidle.c
+++ b/arch/arm/mach-kirkwood/cpuidle.c
@@ -32,6 +32,7 @@ static DEFINE_PER_CPU(struct cpuidle_device, kirkwood_cpuidle_device);

/* Actual code that puts the SoC in different idle states */
static int kirkwood_enter_idle(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv,
int index)
{
struct timeval before, after;
@@ -70,28 +71,29 @@ static int kirkwood_enter_idle(struct cpuidle_device *dev,
static int kirkwood_init_cpuidle(void)
{
struct cpuidle_device *device;
-
- cpuidle_register_driver(&kirkwood_idle_driver);
+ struct cpuidle_driver *driver = &kirkwood_idle_driver;

device = &per_cpu(kirkwood_cpuidle_device, smp_processor_id());
device->state_count = KIRKWOOD_MAX_STATES;
+ driver->state_count = KIRKWOOD_MAX_STATES;

/* Wait for interrupt state */
- device->states[0].enter = kirkwood_enter_idle;
- device->states[0].exit_latency = 1;
- device->states[0].target_residency = 10000;
- device->states[0].flags = CPUIDLE_FLAG_TIME_VALID;
- strcpy(device->states[0].name, "WFI");
- strcpy(device->states[0].desc, "Wait for interrupt");
+ driver->states[0].enter = kirkwood_enter_idle;
+ driver->states[0].exit_latency = 1;
+ driver->states[0].target_residency = 10000;
+ driver->states[0].flags = CPUIDLE_FLAG_TIME_VALID;
+ strcpy(driver->states[0].name, "WFI");
+ strcpy(driver->states[0].desc, "Wait for interrupt");

/* Wait for interrupt and DDR self refresh state */
- device->states[1].enter = kirkwood_enter_idle;
- device->states[1].exit_latency = 10;
- device->states[1].target_residency = 10000;
- device->states[1].flags = CPUIDLE_FLAG_TIME_VALID;
- strcpy(device->states[1].name, "DDR SR");
- strcpy(device->states[1].desc, "WFI and DDR Self Refresh");
+ driver->states[1].enter = kirkwood_enter_idle;
+ driver->states[1].exit_latency = 10;
+ driver->states[1].target_residency = 10000;
+ driver->states[1].flags = CPUIDLE_FLAG_TIME_VALID;
+ strcpy(driver->states[1].name, "DDR SR");
+ strcpy(driver->states[1].desc, "WFI and DDR Self Refresh");

+ cpuidle_register_driver(&kirkwood_idle_driver);
if (cpuidle_register_device(device)) {
printk(KERN_ERR "kirkwood_init_cpuidle: Failed registering\n");
return -EIO;
diff --git a/arch/arm/mach-omap2/cpuidle34xx.c b/arch/arm/mach-omap2/cpuidle34xx.c
index 4282420..6641574 100644
--- a/arch/arm/mach-omap2/cpuidle34xx.c
+++ b/arch/arm/mach-omap2/cpuidle34xx.c
@@ -114,12 +114,14 @@ static int _cpuidle_deny_idle(struct powerdomain *pwrdm,
/**
* omap3_enter_idle - Programs OMAP3 to enter the specified state
* @dev: cpuidle device
+ * @drv: cpuidle driver
* @index: the index of state to be entered
*
* Called from the CPUidle framework to program the device to the
* specified target state selected by the governor.
*/
static int omap3_enter_idle(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv,
int index)
{
struct omap3_processor_cx *cx =
@@ -175,6 +177,7 @@ return_sleep_time:
/**
* next_valid_state - Find next valid c-state
* @dev: cpuidle device
+ * @drv: cpuidle driver
* @index: Index of currently selected c-state
*
* If the state corresponding to index is valid, index is returned back
@@ -182,9 +185,10 @@ return_sleep_time:
* still valid (as defined in omap3_power_states[]) and returns its index.
*/
static int next_valid_state(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv,
int index)
{
- struct cpuidle_state *curr = &dev->states[index];
+ struct cpuidle_state *curr = &drv->states[index];
struct cpuidle_state_usage *curr_usage = &dev->states_usage[index];
struct cpuidle_state *next = NULL;
struct omap3_processor_cx *cx;
@@ -202,8 +206,8 @@ static int next_valid_state(struct cpuidle_device *dev,
* Reach the current state starting at highest C-state
*/
for (; idx >= OMAP3_STATE_C1; idx--) {
- if (&dev->states[idx] == curr) {
- next = &dev->states[idx];
+ if (&drv->states[idx] == curr) {
+ next = &drv->states[idx];
next_index = idx;
break;
}
@@ -224,7 +228,7 @@ static int next_valid_state(struct cpuidle_device *dev,

cx = cpuidle_get_statedata(&dev->states_usage[idx]);
if (cx->valid) {
- next = &dev->states[idx];
+ next = &drv->states[idx];
next_index = idx;
break;
}
@@ -241,6 +245,7 @@ static int next_valid_state(struct cpuidle_device *dev,
/**
* omap3_enter_idle_bm - Checks for any bus activity
* @dev: cpuidle device
+ * @drv: cpuidle driver
* @index: array index of target state to be programmed
*
* Used for C states with CPUIDLE_FLAG_CHECK_BM flag set. This
@@ -248,18 +253,19 @@ static int next_valid_state(struct cpuidle_device *dev,
* device to the specified or a safer state.
*/
static int omap3_enter_idle_bm(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv,
int index)
{
- struct cpuidle_state *state = &dev->states[index];
- int new_state_idx = next_valid_state(dev, index);
+ struct cpuidle_state *state = &drv->states[index];
+ int new_state_idx = next_valid_state(dev, drv, index);
u32 core_next_state, per_next_state = 0, per_saved_state = 0;
u32 cam_state;
struct omap3_processor_cx *cx;
int ret;

if ((state->flags & CPUIDLE_FLAG_CHECK_BM) && omap3_idle_bm_check()) {
- BUG_ON(dev->safe_state_index < 0);
- new_state_idx = dev->safe_state_index;
+ BUG_ON(drv->safe_state_index < 0);
+ new_state_idx = drv->safe_state_index;
goto select_state;
}

@@ -280,7 +286,7 @@ static int omap3_enter_idle_bm(struct cpuidle_device *dev,
*/
cam_state = pwrdm_read_pwrst(cam_pd);
if (cam_state == PWRDM_POWER_ON) {
- new_state_idx = dev->safe_state_index;
+ new_state_idx = drv->safe_state_index;
goto select_state;
}

@@ -298,7 +304,7 @@ static int omap3_enter_idle_bm(struct cpuidle_device *dev,
pwrdm_set_next_pwrst(per_pd, per_next_state);

select_state:
- ret = omap3_enter_idle(dev, new_state_idx);
+ ret = omap3_enter_idle(dev, drv, new_state_idx);

/* Restore original PER state if it was modified */
if (per_next_state != per_saved_state)
@@ -497,18 +503,15 @@ struct cpuidle_driver omap3_idle_driver = {
};

/**
- * omap3_idle_init - Init routine for OMAP3 idle
+ * omap3_cpuidle_driver_init
*
- * Registers the OMAP3 specific cpuidle driver with the cpuidle
- * framework with the valid set of states.
+ * Sets up and registers omap3 idle driver
*/
-int __init omap3_idle_init(void)
+static int omap3_cpuidle_driver_init(void)
{
- int i, count = 0;
+ int i, retval, count = 0;
struct omap3_processor_cx *cx;
struct cpuidle_state *state;
- struct cpuidle_state_usage *state_usage;
- struct cpuidle_device *dev;

mpu_pd = pwrdm_lookup("mpu_pwrdm");
core_pd = pwrdm_lookup("core_pwrdm");
@@ -516,18 +519,13 @@ int __init omap3_idle_init(void)
cam_pd = pwrdm_lookup("cam_pwrdm");

omap_init_power_states();
- cpuidle_register_driver(&omap3_idle_driver);
-
- dev = &per_cpu(omap3_idle_dev, smp_processor_id());

for (i = OMAP3_STATE_C1; i < OMAP3_MAX_STATES; i++) {
cx = &omap3_power_states[i];
- state = &dev->states[count];
- state_usage = &dev->states_usage[count];
+ state = &drv->states[count];

if (!cx->valid)
continue;
- cpuidle_set_statedata(state_usage, cx);
state->exit_latency = cx->sleep_latency + cx->wakeup_latency;
state->target_residency = cx->threshold;
state->flags = cx->flags;
@@ -542,13 +540,64 @@ int __init omap3_idle_init(void)

if (!count)
return -EINVAL;
- dev->state_count = count;
+ drv->state_count = count;

if (enable_off_mode)
omap3_cpuidle_update_states(PWRDM_POWER_OFF, PWRDM_POWER_OFF);
else
omap3_cpuidle_update_states(PWRDM_POWER_RET, PWRDM_POWER_RET);

+ retval = cpuidle_register_driver(&omap3_idle_driver);
+ if (retval) {
+ printk(KERN_ERR "%s: CPUidle register driver failed\n",
+ __func__);
+ return retval;
+ }
+
+ return 0;
+}
+
+/**
+ * omap3_idle_init
+ *
+ * Registers the OMAP3 specific cpuidle driver with the cpuidle
+ * framework with the valid set of states, registers per cpu
+ * cpuidle device.
+ */
+int __init omap3_idle_init(void)
+{
+ int i, retval, count = 0;
+ static int cpuidle_drv_init;
+ struct omap3_processor_cx *cx;
+ struct cpuidle_state_usage *state_usage;
+ struct cpuidle_device *dev;
+
+ if (!cpuidle_drv_init) {
+ retval = omap3_cpuidle_driver_init();
+ if (retval) {
+ printk(KERN_ERR "%s: CPUidle register driver failed\n",
+ __func__);
+ return retval;
+ }
+ cpuidle_drv_init++;
+ }
+
+ dev = &per_cpu(omap3_idle_dev, smp_processor_id());
+
+ for (i = OMAP3_STATE_C1; i < OMAP3_MAX_STATES; i++) {
+ cx = &omap3_power_states[i];
+ state_usage = &dev->states_usage[count];
+
+ if (!cx->valid)
+ continue;
+ cpuidle_set_statedata(state_usage, cx);
+ count++;
+ }
+
+ if (!count)
+ return -EINVAL;
+ dev->state_count = count;
+
if (cpuidle_register_device(dev)) {
printk(KERN_ERR "%s: CPUidle register device failed\n",
__func__);
diff --git a/arch/sh/kernel/cpu/shmobile/cpuidle.c b/arch/sh/kernel/cpu/shmobile/cpuidle.c
index 2340d62..b689f17 100644
--- a/arch/sh/kernel/cpu/shmobile/cpuidle.c
+++ b/arch/sh/kernel/cpu/shmobile/cpuidle.c
@@ -25,6 +25,7 @@ static unsigned long cpuidle_mode[] = {
};

static int cpuidle_sleep_enter(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv,
int index)
{
unsigned long allowed_mode = arch_hwblk_sleep_mode();
@@ -67,19 +68,19 @@ static struct cpuidle_driver cpuidle_driver = {
void sh_mobile_setup_cpuidle(void)
{
struct cpuidle_device *dev = &cpuidle_dev;
+ struct cpuidle_driver *drv = &cpuidle_driver;
struct cpuidle_state *state;
int i;

- cpuidle_register_driver(&cpuidle_driver);

for (i = 0; i < CPUIDLE_STATE_MAX; i++) {
- dev->states[i].name[0] = '\0';
- dev->states[i].desc[0] = '\0';
+ drv->states[i].name[0] = '\0';
+ drv->states[i].desc[0] = '\0';
}

i = CPUIDLE_DRIVER_STATE_START;

- state = &dev->states[i++];
+ state = &drv->states[i++];
snprintf(state->name, CPUIDLE_NAME_LEN, "C1");
strncpy(state->desc, "SuperH Sleep Mode", CPUIDLE_DESC_LEN);
state->exit_latency = 1;
@@ -89,10 +90,10 @@ void sh_mobile_setup_cpuidle(void)
state->flags |= CPUIDLE_FLAG_TIME_VALID;
state->enter = cpuidle_sleep_enter;

- dev->safe_state_index = i-1;
+ drv->safe_state_index = i-1;

if (sh_mobile_sleep_supported & SUSP_SH_SF) {
- state = &dev->states[i++];
+ state = &drv->states[i++];
snprintf(state->name, CPUIDLE_NAME_LEN, "C2");
strncpy(state->desc, "SuperH Sleep Mode [SF]",
CPUIDLE_DESC_LEN);
@@ -105,7 +106,7 @@ void sh_mobile_setup_cpuidle(void)
}

if (sh_mobile_sleep_supported & SUSP_SH_STANDBY) {
- state = &dev->states[i++];
+ state = &drv->states[i++];
snprintf(state->name, CPUIDLE_NAME_LEN, "C3");
strncpy(state->desc, "SuperH Mobile Standby Mode [SF]",
CPUIDLE_DESC_LEN);
@@ -117,7 +118,10 @@ void sh_mobile_setup_cpuidle(void)
state->enter = cpuidle_sleep_enter;
}

+ drv->state_count = i;
dev->state_count = i;

+ cpuidle_register_driver(&cpuidle_driver);
+
cpuidle_register_device(dev);
}
diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c
index a4e0f1b..91464b4 100644
--- a/drivers/acpi/processor_driver.c
+++ b/drivers/acpi/processor_driver.c
@@ -503,8 +503,7 @@ static int __cpuinit acpi_processor_add(struct acpi_device *device)
acpi_processor_get_throttling_info(pr);
acpi_processor_get_limit_info(pr);

-
- if (cpuidle_get_driver() == &acpi_idle_driver)
+ if (!cpuidle_get_driver() || cpuidle_get_driver() == &acpi_idle_driver)
acpi_processor_power_init(pr, device);

pr->cdev = thermal_cooling_device_register("Processor", device,
@@ -800,17 +799,9 @@ static int __init acpi_processor_init(void)

memset(&errata, 0, sizeof(errata));

- if (!cpuidle_register_driver(&acpi_idle_driver)) {
- printk(KERN_DEBUG "ACPI: %s registered with cpuidle\n",
- acpi_idle_driver.name);
- } else {
- printk(KERN_DEBUG "ACPI: acpi_idle yielding to %s\n",
- cpuidle_get_driver()->name);
- }
-
result = acpi_bus_register_driver(&acpi_processor_driver);
if (result < 0)
- goto out_cpuidle;
+ return result;

acpi_processor_install_hotplug_notify();

@@ -821,11 +812,6 @@ static int __init acpi_processor_init(void)
acpi_processor_throttling_init();

return 0;
-
-out_cpuidle:
- cpuidle_unregister_driver(&acpi_idle_driver);
-
- return result;
}

static void __exit acpi_processor_exit(void)
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index bd29363..505355d 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -741,11 +741,13 @@ static inline void acpi_idle_do_entry(struct acpi_processor_cx *cx)
/**
* acpi_idle_enter_c1 - enters an ACPI C1 state-type
* @dev: the target CPU
+ * @drv: cpuidle driver containing cpuidle state info
* @index: index of target state
*
* This is equivalent to the HALT instruction.
*/
-static int acpi_idle_enter_c1(struct cpuidle_device *dev, int index)
+static int acpi_idle_enter_c1(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int index)
{
ktime_t kt1, kt2;
s64 idle_time;
@@ -789,9 +791,11 @@ static int acpi_idle_enter_c1(struct cpuidle_device *dev, int index)
/**
* acpi_idle_enter_simple - enters an ACPI state without BM handling
* @dev: the target CPU
+ * @drv: cpuidle driver with cpuidle state information
* @index: the index of suggested state
*/
-static int acpi_idle_enter_simple(struct cpuidle_device *dev, int index)
+static int acpi_idle_enter_simple(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int index)
{
struct acpi_processor *pr;
struct cpuidle_state_usage *state_usage = &dev->states_usage[index];
@@ -873,11 +877,13 @@ static DEFINE_SPINLOCK(c3_lock);
/**
* acpi_idle_enter_bm - enters C3 with proper BM handling
* @dev: the target CPU
+ * @drv: cpuidle driver containing state data
* @index: the index of suggested state
*
* If BM is detected, the deepest non-C3 idle state is entered instead.
*/
-static int acpi_idle_enter_bm(struct cpuidle_device *dev, int index)
+static int acpi_idle_enter_bm(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int index)
{
struct acpi_processor *pr;
struct cpuidle_state_usage *state_usage = &dev->states_usage[index];
@@ -900,9 +906,9 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev, int index)
}

if (!cx->bm_sts_skip && acpi_idle_bm_check()) {
- if (dev->safe_state_index >= 0) {
- return dev->states[dev->safe_state_index].enter(dev,
- dev->safe_state_index);
+ if (drv->safe_state_index >= 0) {
+ return drv->states[drv->safe_state_index].enter(dev,
+ drv, drv->safe_state_index);
} else {
local_irq_disable();
acpi_safe_halt();
@@ -999,14 +1005,15 @@ struct cpuidle_driver acpi_idle_driver = {
};

/**
- * acpi_processor_setup_cpuidle - prepares and configures CPUIDLE
+ * acpi_processor_setup_cpuidle_cx - prepares and configures CPUIDLE
+ * device i.e. per-cpu data
+ *
* @pr: the ACPI processor
*/
-static int acpi_processor_setup_cpuidle(struct acpi_processor *pr)
+static int acpi_processor_setup_cpuidle_cx(struct acpi_processor *pr)
{
int i, count = CPUIDLE_DRIVER_STATE_START;
struct acpi_processor_cx *cx;
- struct cpuidle_state *state;
struct cpuidle_state_usage *state_usage;
struct cpuidle_device *dev = &pr->power.dev;

@@ -1018,18 +1025,12 @@ static int acpi_processor_setup_cpuidle(struct acpi_processor *pr)
}

dev->cpu = pr->id;
- dev->safe_state_index = -1;
- for (i = 0; i < CPUIDLE_STATE_MAX; i++) {
- dev->states[i].name[0] = '\0';
- dev->states[i].desc[0] = '\0';
- }

if (max_cstate == 0)
max_cstate = 1;

for (i = 1; i < ACPI_PROCESSOR_MAX_POWER && i <= max_cstate; i++) {
cx = &pr->power.states[i];
- state = &dev->states[count];
state_usage = &dev->states_usage[count];

if (!cx->valid)
@@ -1041,8 +1042,61 @@ static int acpi_processor_setup_cpuidle(struct acpi_processor *pr)
!(acpi_gbl_FADT.flags & ACPI_FADT_C2_MP_SUPPORTED))
continue;
#endif
+
cpuidle_set_statedata(state_usage, cx);

+ count++;
+ if (count == CPUIDLE_STATE_MAX)
+ break;
+ }
+
+ dev->state_count = count;
+
+ if (!count)
+ return -EINVAL;
+
+ return 0;
+}
+
+/**
+ * acpi_processor_setup_cpuidle states- prepares and configures cpuidle
+ * global state data i.e. idle routines
+ *
+ * @pr: the ACPI processor
+ */
+static int acpi_processor_setup_cpuidle_states(struct acpi_processor *pr)
+{
+ int i, count = CPUIDLE_DRIVER_STATE_START;
+ struct acpi_processor_cx *cx;
+ struct cpuidle_state *state;
+ struct cpuidle_driver *drv = &acpi_idle_driver;
+
+ if (!pr->flags.power_setup_done)
+ return -EINVAL;
+
+ if (pr->flags.power == 0) {
+ return -EINVAL;
+ }
+
+ drv->safe_state_index = -1;
+
+ if (max_cstate == 0)
+ max_cstate = 1;
+
+ for (i = 1; i < ACPI_PROCESSOR_MAX_POWER && i <= max_cstate; i++) {
+ cx = &pr->power.states[i];
+
+ if (!cx->valid)
+ continue;
+
+#ifdef CONFIG_HOTPLUG_CPU
+ if ((cx->type != ACPI_STATE_C1) && (num_online_cpus() > 1) &&
+ !pr->flags.has_cst &&
+ !(acpi_gbl_FADT.flags & ACPI_FADT_C2_MP_SUPPORTED))
+ continue;
+#endif
+
+ state = &drv->states[count];
snprintf(state->name, CPUIDLE_NAME_LEN, "C%d", i);
strncpy(state->desc, cx->desc, CPUIDLE_DESC_LEN);
state->exit_latency = cx->latency;
@@ -1055,13 +1109,13 @@ static int acpi_processor_setup_cpuidle(struct acpi_processor *pr)
state->flags |= CPUIDLE_FLAG_TIME_VALID;

state->enter = acpi_idle_enter_c1;
- dev->safe_state_index = count;
+ drv->safe_state_index = count;
break;

case ACPI_STATE_C2:
state->flags |= CPUIDLE_FLAG_TIME_VALID;
state->enter = acpi_idle_enter_simple;
- dev->safe_state_index = count;
+ drv->safe_state_index = count;
break;

case ACPI_STATE_C3:
@@ -1077,7 +1131,7 @@ static int acpi_processor_setup_cpuidle(struct acpi_processor *pr)
break;
}

- dev->state_count = count;
+ drv->state_count = count;

if (!count)
return -EINVAL;
@@ -1106,7 +1160,15 @@ int acpi_processor_cst_has_changed(struct acpi_processor *pr)
cpuidle_disable_device(&pr->power.dev);
acpi_processor_get_power_info(pr);
if (pr->flags.power) {
- acpi_processor_setup_cpuidle(pr);
+ /*
+ * To Do: Currently state data within driver
+ * is not updated. So change this to also update
+ * actual state data i.e. what this routine is
+ * meant for. Maybe complete unregistration and
+ * re-registration.
+ *
+ */
+ acpi_processor_setup_cpuidle_cx(pr);
ret = cpuidle_enable_device(&pr->power.dev);
}
cpuidle_resume_and_unlock();
@@ -1118,7 +1180,7 @@ int __cpuinit acpi_processor_power_init(struct acpi_processor *pr,
struct acpi_device *device)
{
acpi_status status = 0;
- static int first_run;
+ static int first_run, acpi_processor_registered;

if (disabled_by_idle_boot_param())
return 0;
@@ -1154,7 +1216,15 @@ int __cpuinit acpi_processor_power_init(struct acpi_processor *pr,
* platforms that only support C1.
*/
if (pr->flags.power) {
- acpi_processor_setup_cpuidle(pr);
+ if (!acpi_processor_registered) {
+ acpi_processor_setup_cpuidle_states(pr);
+ if (!cpuidle_register_driver(&acpi_idle_driver)) {
+ printk(KERN_DEBUG "ACPI: %s registered with cpuidle\n",
+ acpi_idle_driver.name);
+ acpi_processor_registered = 1;
+ }
+ }
+ acpi_processor_setup_cpuidle_cx(pr);
if (cpuidle_register_device(&pr->power.dev))
return -EIO;
}
@@ -1168,6 +1238,11 @@ int acpi_processor_power_exit(struct acpi_processor *pr,
return 0;

cpuidle_unregister_device(&pr->power.dev);
+ /* We will have to have unregister driver as well here
+ * since we move register_driver to power_init. Maybe
+ * just do an unregister everytime; which will be successful
+ * only during the last call.
+ */
pr->flags.power_setup_done = 0;

return 0;
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 5d6f98d..845d3ef 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -50,6 +50,7 @@ static int __cpuidle_register_device(struct cpuidle_device *dev);
static void cpuidle_idle_call(void)
{
struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
+ struct cpuidle_driver *drv = cpuidle_get_driver();
struct cpuidle_state *target_state;
int next_state, entered_state;

@@ -76,19 +77,19 @@ static void cpuidle_idle_call(void)
#endif

/* ask the governor for the next state */
- next_state = cpuidle_curr_governor->select(dev);
+ next_state = cpuidle_curr_governor->select(drv, dev);
if (need_resched()) {
local_irq_enable();
return;
}

- target_state = &dev->states[next_state];
+ target_state = &drv->states[next_state];

/* Is using next_state here correct?? */
trace_power_start(POWER_CSTATE, next_state, dev->cpu);
trace_cpu_idle(next_state, dev->cpu);

- entered_state = target_state->enter(dev, next_state);
+ entered_state = target_state->enter(dev, drv, next_state);

trace_power_end(dev->cpu);
trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);
@@ -144,7 +145,8 @@ void cpuidle_resume_and_unlock(void)
EXPORT_SYMBOL_GPL(cpuidle_resume_and_unlock);

#ifdef CONFIG_ARCH_HAS_CPU_RELAX
-static int poll_idle(struct cpuidle_device *dev, int index)
+static int poll_idle(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int index)
{
ktime_t t1, t2;
s64 diff;
@@ -167,12 +169,9 @@ static int poll_idle(struct cpuidle_device *dev, int index)
return index;
}

-static void poll_idle_init(struct cpuidle_device *dev)
+static void poll_idle_init(struct cpuidle_driver *drv)
{
- struct cpuidle_state *state = &dev->states[0];
- struct cpuidle_state_usage *state_usage = &dev->states_usage[0];
-
- cpuidle_set_statedata(state_usage, NULL);
+ struct cpuidle_state *state = &drv->states[0];

snprintf(state->name, CPUIDLE_NAME_LEN, "POLL");
snprintf(state->desc, CPUIDLE_DESC_LEN, "CPUIDLE CORE POLL IDLE");
@@ -183,7 +182,7 @@ static void poll_idle_init(struct cpuidle_device *dev)
state->enter = poll_idle;
}
#else
-static void poll_idle_init(struct cpuidle_device *dev) {}
+static void poll_idle_init(struct cpuidle_driver *drv) {}
#endif /* CONFIG_ARCH_HAS_CPU_RELAX */

/**
@@ -210,13 +209,14 @@ int cpuidle_enable_device(struct cpuidle_device *dev)
return ret;
}

- poll_idle_init(dev);
+ poll_idle_init(cpuidle_get_driver());
+ cpuidle_set_statedata(&dev->states_usage[0], NULL);

if ((ret = cpuidle_add_state_sysfs(dev)))
return ret;

if (cpuidle_curr_governor->enable &&
- (ret = cpuidle_curr_governor->enable(dev)))
+ (ret = cpuidle_curr_governor->enable(cpuidle_get_driver(), dev)))
goto fail_sysfs;

for (i = 0; i < dev->state_count; i++) {
@@ -257,7 +257,7 @@ void cpuidle_disable_device(struct cpuidle_device *dev)
dev->enabled = 0;

if (cpuidle_curr_governor->disable)
- cpuidle_curr_governor->disable(dev);
+ cpuidle_curr_governor->disable(cpuidle_get_driver(), dev);

cpuidle_remove_state_sysfs(dev);
enabled_devices--;
@@ -285,26 +285,6 @@ static int __cpuidle_register_device(struct cpuidle_device *dev)

init_completion(&dev->kobj_unregister);

- /*
- * cpuidle driver should set the dev->power_specified bit
- * before registering the device if the driver provides
- * power_usage numbers.
- *
- * For those devices whose ->power_specified is not set,
- * we fill in power_usage with decreasing values as the
- * cpuidle code has an implicit assumption that state Cn
- * uses less power than C(n-1).
- *
- * With CONFIG_ARCH_HAS_CPU_RELAX, C0 is already assigned
- * an power value of -1. So we use -2, -3, etc, for other
- * c-states.
- */
- if (!dev->power_specified) {
- int i;
- for (i = CPUIDLE_DRIVER_STATE_START; i < dev->state_count; i++)
- dev->states[i].power_usage = -1 - i;
- }
-
per_cpu(cpuidle_devices, dev->cpu) = dev;
list_add(&dev->device_list, &cpuidle_detected_devices);
if ((ret = cpuidle_add_sysfs(sys_dev))) {
diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
index fd1601e..4b04129 100644
--- a/drivers/cpuidle/driver.c
+++ b/drivers/cpuidle/driver.c
@@ -17,6 +17,30 @@
static struct cpuidle_driver *cpuidle_curr_driver;
DEFINE_SPINLOCK(cpuidle_driver_lock);

+static void __cpuidle_register_driver(struct cpuidle_driver *drv)
+{
+ /*
+ * cpuidle driver should set the drv->power_specified bit
+ * before registering if the driver provides
+ * power_usage numbers.
+ *
+ * If power_specified is not set,
+ * we fill in power_usage with decreasing values as the
+ * cpuidle code has an implicit assumption that state Cn
+ * uses less power than C(n-1).
+ *
+ * With CONFIG_ARCH_HAS_CPU_RELAX, C0 is already assigned
+ * an power value of -1. So we use -2, -3, etc, for other
+ * c-states.
+ */
+ if (!drv->power_specified) {
+ int i;
+ for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++)
+ drv->states[i].power_usage = -1 - i;
+ }
+}
+
+
/**
* cpuidle_register_driver - registers a driver
* @drv: the driver
@@ -31,6 +55,7 @@ int cpuidle_register_driver(struct cpuidle_driver *drv)
spin_unlock(&cpuidle_driver_lock);
return -EBUSY;
}
+ __cpuidle_register_driver(drv);
cpuidle_curr_driver = drv;
spin_unlock(&cpuidle_driver_lock);

diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
index 6a686a7..988d813 100644
--- a/drivers/cpuidle/governors/ladder.c
+++ b/drivers/cpuidle/governors/ladder.c
@@ -60,9 +60,11 @@ static inline void ladder_do_selection(struct ladder_device *ldev,

/**
* ladder_select_state - selects the next state to enter
+ * @drv: cpuidle driver
* @dev: the CPU
*/
-static int ladder_select_state(struct cpuidle_device *dev)
+static int ladder_select_state(struct cpuidle_driver *drv,
+ struct cpuidle_device *dev)
{
struct ladder_device *ldev = &__get_cpu_var(ladder_devices);
struct ladder_device_state *last_state;
@@ -77,15 +79,15 @@ static int ladder_select_state(struct cpuidle_device *dev)

last_state = &ldev->states[last_idx];

- if (dev->states[last_idx].flags & CPUIDLE_FLAG_TIME_VALID)
- last_residency = cpuidle_get_last_residency(dev) - dev->states[last_idx].exit_latency;
+ if (drv->states[last_idx].flags & CPUIDLE_FLAG_TIME_VALID)
+ last_residency = cpuidle_get_last_residency(dev) - drv->states[last_idx].exit_latency;
else
last_residency = last_state->threshold.promotion_time + 1;

/* consider promotion */
- if (last_idx < dev->state_count - 1 &&
+ if (last_idx < drv->state_count - 1 &&
last_residency > last_state->threshold.promotion_time &&
- dev->states[last_idx + 1].exit_latency <= latency_req) {
+ drv->states[last_idx + 1].exit_latency <= latency_req) {
last_state->stats.promotion_count++;
last_state->stats.demotion_count = 0;
if (last_state->stats.promotion_count >= last_state->threshold.promotion_count) {
@@ -96,11 +98,11 @@ static int ladder_select_state(struct cpuidle_device *dev)

/* consider demotion */
if (last_idx > CPUIDLE_DRIVER_STATE_START &&
- dev->states[last_idx].exit_latency > latency_req) {
+ drv->states[last_idx].exit_latency > latency_req) {
int i;

for (i = last_idx - 1; i > CPUIDLE_DRIVER_STATE_START; i--) {
- if (dev->states[i].exit_latency <= latency_req)
+ if (drv->states[i].exit_latency <= latency_req)
break;
}
ladder_do_selection(ldev, last_idx, i);
@@ -123,9 +125,11 @@ static int ladder_select_state(struct cpuidle_device *dev)

/**
* ladder_enable_device - setup for the governor
+ * @drv: cpuidle driver
* @dev: the CPU
*/
-static int ladder_enable_device(struct cpuidle_device *dev)
+static int ladder_enable_device(struct cpuidle_driver *drv,
+ struct cpuidle_device *dev)
{
int i;
struct ladder_device *ldev = &per_cpu(ladder_devices, dev->cpu);
@@ -134,8 +138,8 @@ static int ladder_enable_device(struct cpuidle_device *dev)

ldev->last_state_idx = CPUIDLE_DRIVER_STATE_START;

- for (i = 0; i < dev->state_count; i++) {
- state = &dev->states[i];
+ for (i = 0; i < drv->state_count; i++) {
+ state = &drv->states[i];
lstate = &ldev->states[i];

lstate->stats.promotion_count = 0;
@@ -144,7 +148,7 @@ static int ladder_enable_device(struct cpuidle_device *dev)
lstate->threshold.promotion_count = PROMOTION_COUNT;
lstate->threshold.demotion_count = DEMOTION_COUNT;

- if (i < dev->state_count - 1)
+ if (i < drv->state_count - 1)
lstate->threshold.promotion_time = state->exit_latency;
if (i > 0)
lstate->threshold.demotion_time = state->exit_latency;
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 40b5630..968149b 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -182,7 +182,7 @@ static inline int performance_multiplier(void)

static DEFINE_PER_CPU(struct menu_device, menu_devices);

-static void menu_update(struct cpuidle_device *dev);
+static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev);

/* This implements DIV_ROUND_CLOSEST but avoids 64 bit division */
static u64 div_round64(u64 dividend, u32 divisor)
@@ -228,9 +228,10 @@ static void detect_repeating_patterns(struct menu_device *data)

/**
* menu_select - selects the next idle state to enter
+ * @drv: cpuidle driver containing state data
* @dev: the CPU
*/
-static int menu_select(struct cpuidle_device *dev)
+static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
{
struct menu_device *data = &__get_cpu_var(menu_devices);
int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
@@ -239,7 +240,7 @@ static int menu_select(struct cpuidle_device *dev)
int multiplier;

if (data->needs_update) {
- menu_update(dev);
+ menu_update(drv, dev);
data->needs_update = 0;
}

@@ -283,8 +284,8 @@ static int menu_select(struct cpuidle_device *dev)
* Find the idle state with the lowest power while satisfying
* our constraints.
*/
- for (i = CPUIDLE_DRIVER_STATE_START; i < dev->state_count; i++) {
- struct cpuidle_state *s = &dev->states[i];
+ for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++) {
+ struct cpuidle_state *s = &drv->states[i];

if (s->target_residency > data->predicted_us)
continue;
@@ -321,14 +322,15 @@ static void menu_reflect(struct cpuidle_device *dev, int index)

/**
* menu_update - attempts to guess what happened after entry
+ * @drv: cpuidle driver containing state data
* @dev: the CPU
*/
-static void menu_update(struct cpuidle_device *dev)
+static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
{
struct menu_device *data = &__get_cpu_var(menu_devices);
int last_idx = data->last_state_idx;
unsigned int last_idle_us = cpuidle_get_last_residency(dev);
- struct cpuidle_state *target = &dev->states[last_idx];
+ struct cpuidle_state *target = &drv->states[last_idx];
unsigned int measured_us;
u64 new_factor;

@@ -382,9 +384,11 @@ static void menu_update(struct cpuidle_device *dev)

/**
* menu_enable_device - scans a CPU's states and does setup
+ * @drv: cpuidle driver
* @dev: the CPU
*/
-static int menu_enable_device(struct cpuidle_device *dev)
+static int menu_enable_device(struct cpuidle_driver *drv,
+ struct cpuidle_device *dev)
{
struct menu_device *data = &per_cpu(menu_devices, dev->cpu);

diff --git a/drivers/cpuidle/sysfs.c b/drivers/cpuidle/sysfs.c
index 09c9c77..90be2ad 100644
--- a/drivers/cpuidle/sysfs.c
+++ b/drivers/cpuidle/sysfs.c
@@ -318,13 +318,14 @@ int cpuidle_add_state_sysfs(struct cpuidle_device *device)
{
int i, ret = -ENOMEM;
struct cpuidle_state_kobj *kobj;
+ struct cpuidle_driver *drv = cpuidle_get_driver();

/* state statistics */
for (i = 0; i < device->state_count; i++) {
kobj = kzalloc(sizeof(struct cpuidle_state_kobj), GFP_KERNEL);
if (!kobj)
goto error_state;
- kobj->state = &device->states[i];
+ kobj->state = &drv->states[i];
kobj->state_usage = &device->states_usage[i];
init_completion(&kobj->kobj_unregister);

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 4f92d96..3bcd8c2 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -81,7 +81,8 @@ static unsigned int mwait_substates;
static unsigned int lapic_timer_reliable_states = (1 << 1); /* Default to only C1 */

static struct cpuidle_device __percpu *intel_idle_cpuidle_devices;
-static int intel_idle(struct cpuidle_device *dev, int index);
+static int intel_idle(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int index);

static struct cpuidle_state *cpuidle_state_table;

@@ -229,13 +230,15 @@ static int get_driver_data(int cstate)
/**
* intel_idle
* @dev: cpuidle_device
+ * @drv: cpuidle driver
* @index: index of cpuidle state
*
*/
-static int intel_idle(struct cpuidle_device *dev, int index)
+static int intel_idle(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int index)
{
unsigned long ecx = 1; /* break on interrupt flag */
- struct cpuidle_state *state = &dev->states[index];
+ struct cpuidle_state *state = &drv->states[index];
struct cpuidle_state_usage *state_usage = &dev->states_usage[index];
unsigned long eax = (unsigned long)cpuidle_get_statedata(state_usage);
unsigned int cstate;
@@ -424,6 +427,60 @@ static void intel_idle_cpuidle_devices_uninit(void)
return;
}
/*
+ * intel_idle_cpuidle_driver_init()
+ * allocate, initialize cpuidle_states
+ */
+static int intel_idle_cpuidle_driver_init(void)
+{
+ int cstate;
+ struct cpuidle_driver *drv = &intel_idle_driver;
+
+ drv->state_count = 1;
+
+ for (cstate = 1; cstate < MWAIT_MAX_NUM_CSTATES; ++cstate) {
+ int num_substates;
+
+ if (cstate > max_cstate) {
+ printk(PREFIX "max_cstate %d reached\n",
+ max_cstate);
+ break;
+ }
+
+ /* does the state exist in CPUID.MWAIT? */
+ num_substates = (mwait_substates >> ((cstate) * 4))
+ & MWAIT_SUBSTATE_MASK;
+ if (num_substates == 0)
+ continue;
+ /* is the state not enabled? */
+ if (cpuidle_state_table[cstate].enter == NULL) {
+ /* does the driver not know about the state? */
+ if (*cpuidle_state_table[cstate].name == '\0')
+ pr_debug(PREFIX "unaware of model 0x%x"
+ " MWAIT %d please"
+ " contact [email protected]",
+ boot_cpu_data.x86_model, cstate);
+ continue;
+ }
+
+ if ((cstate > 2) &&
+ !boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
+ mark_tsc_unstable("TSC halts in idle"
+ " states deeper than C2");
+
+ drv->states[drv->state_count] = /* structure copy */
+ cpuidle_state_table[cstate];
+
+ drv->state_count += 1;
+ }
+
+ if (auto_demotion_disable_flags)
+ smp_call_function(auto_demotion_disable, NULL, 1);
+
+ return 0;
+}
+
+
+/*
* intel_idle_cpuidle_devices_init()
* allocate, initialize, register cpuidle_devices
*/
@@ -457,23 +514,9 @@ static int intel_idle_cpuidle_devices_init(void)
continue;
/* is the state not enabled? */
if (cpuidle_state_table[cstate].enter == NULL) {
- /* does the driver not know about the state? */
- if (*cpuidle_state_table[cstate].name == '\0')
- pr_debug(PREFIX "unaware of model 0x%x"
- " MWAIT %d please"
- " contact [email protected]",
- boot_cpu_data.x86_model, cstate);
continue;
}

- if ((cstate > 2) &&
- !boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
- mark_tsc_unstable("TSC halts in idle"
- " states deeper than C2");
-
- dev->states[dev->state_count] = /* structure copy */
- cpuidle_state_table[cstate];
-
dev->states_usage[dev->state_count].driver_data =
(void *)get_driver_data(cstate);

@@ -488,8 +531,6 @@ static int intel_idle_cpuidle_devices_init(void)
return -EIO;
}
}
- if (auto_demotion_disable_flags)
- smp_call_function(auto_demotion_disable, NULL, 1);

return 0;
}
@@ -507,6 +548,7 @@ static int __init intel_idle_init(void)
if (retval)
return retval;

+ intel_idle_cpuidle_driver_init();
retval = cpuidle_register_driver(&intel_idle_driver);
if (retval) {
printk(KERN_DEBUG PREFIX "intel_idle yielding to %s",
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 5a1a238..c70fb8c 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -22,6 +22,7 @@
#define CPUIDLE_DESC_LEN 32

struct cpuidle_device;
+struct cpuidle_driver;


/****************************
@@ -45,6 +46,7 @@ struct cpuidle_state {
unsigned int target_residency; /* in US */

int (*enter) (struct cpuidle_device *dev,
+ struct cpuidle_driver *drv,
int index);
};

@@ -83,20 +85,17 @@ struct cpuidle_state_kobj {
struct cpuidle_device {
unsigned int registered:1;
unsigned int enabled:1;
- unsigned int power_specified:1;
unsigned int cpu;

int last_residency;
int state_count;
- struct cpuidle_state states[CPUIDLE_STATE_MAX];
struct cpuidle_state_usage states_usage[CPUIDLE_STATE_MAX];
struct cpuidle_state_kobj *kobjs[CPUIDLE_STATE_MAX];

- struct list_head device_list;
+ struct list_head device_list;
struct kobject kobj;
struct completion kobj_unregister;
void *governor_data;
- int safe_state_index;
};

DECLARE_PER_CPU(struct cpuidle_device *, cpuidle_devices);
@@ -120,6 +119,11 @@ static inline int cpuidle_get_last_residency(struct cpuidle_device *dev)
struct cpuidle_driver {
char name[CPUIDLE_NAME_LEN];
struct module *owner;
+
+ unsigned int power_specified:1;
+ struct cpuidle_state states[CPUIDLE_STATE_MAX];
+ int state_count;
+ int safe_state_index;
};

#ifdef CONFIG_CPU_IDLE
@@ -162,10 +166,13 @@ struct cpuidle_governor {
struct list_head governor_list;
unsigned int rating;

- int (*enable) (struct cpuidle_device *dev);
- void (*disable) (struct cpuidle_device *dev);
+ int (*enable) (struct cpuidle_driver *drv,
+ struct cpuidle_device *dev);
+ void (*disable) (struct cpuidle_driver *drv,
+ struct cpuidle_device *dev);

- int (*select) (struct cpuidle_device *dev);
+ int (*select) (struct cpuidle_driver *drv,
+ struct cpuidle_device *dev);
void (*reflect) (struct cpuidle_device *dev, int index);

struct module *owner;

2011-04-20 17:27:22

by Kevin Hilman

[permalink] [raw]
Subject: Re: [linux-pm] [RFC PATCH V3 1/4] cpuidle: Move dev->last_residency update to driver enter routine; remove dev->last_state

Trinabh Gupta <[email protected]> writes:

> Cpuidle subsystem only suggests the state to enter and does not
> guarantee if the suggested state is entered. The actual entered state
> may be different because of software or hardware demotion. Software
> demotion is done by the back-end cpuidle driver and can be accounted
> correctly. Current cpuidle code uses last_state field to capture the
> actual state entered and based on that updates the statistics for the
> state entered.
>
> Ideally the driver enter routine should update the counters,
> and it should return the state actually entered rather than the time
> spent there.

OK, the return type was changed to return the state index instead of the
time, but since the governors are still relying on dev->last_residency,
drivers are required to update it.

Because of that, why not keep the update of the time/usage counters
in common code rather than duplicating it (9 times in this patch) into
all the drivers?

Something like the patch below should suffice.

Kevin


diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 845d3ef..875d241 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -91,6 +91,11 @@ static void cpuidle_idle_call(void)

entered_state = target_state->enter(dev, drv, next_state);

+ /* Update cpuidle counters */
+ dev->states_usage[entered_state].time +=
+ (unsigned long long)dev->last_residency;
+ dev->states_usage[entered_state].usage++;
+
trace_power_end(dev->cpu);
trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);

2011-04-20 17:39:25

by Kevin Hilman

[permalink] [raw]
Subject: Re: [linux-pm] [RFC PATCH V3 4/4] cpuidle: Single/Global registration of idle states

Trinabh Gupta <[email protected]> writes:

> With this patch there is single copy of cpuidle_states structure
> instead of per-cpu. The statistics needed on per-cpu basis
> by the governor are kept per-cpu. This simplifies the cpuidle
> subsystem as state registration is done by single cpu only.
> Having single copy of cpuidle_states saves memory. Rare case
> of asymmetric C-states can be handled within the cpuidle driverand
> architectures such as POWER do not have asymmetric C-states.

I haven't actually tested this series on OMAP yet, but it currently
doesn't compile.

The patch below (on top of your series) is required to compile on OMAP,
I think it's doing what you intended, but please confirm.

Kevin

diff --git a/arch/arm/mach-omap2/cpuidle34xx.c b/arch/arm/mach-omap2/cpuidle34xx.c
index 6641574..ab77ba3 100644
--- a/arch/arm/mach-omap2/cpuidle34xx.c
+++ b/arch/arm/mach-omap2/cpuidle34xx.c
@@ -512,6 +512,7 @@ static int omap3_cpuidle_driver_init(void)
int i, retval, count = 0;
struct omap3_processor_cx *cx;
struct cpuidle_state *state;
+ struct cpuidle_driver *drv = &omap3_idle_driver;

mpu_pd = pwrdm_lookup("mpu_pwrdm");
core_pd = pwrdm_lookup("core_pwrdm");
@@ -532,7 +533,7 @@ static int omap3_cpuidle_driver_init(void)
state->enter = (state->flags & CPUIDLE_FLAG_CHECK_BM) ?
omap3_enter_idle_bm : omap3_enter_idle;
if (cx->type == OMAP3_STATE_C1)
- dev->safe_state_index = count;
+ drv->safe_state_index = count;
sprintf(state->name, "C%d", count+1);
strncpy(state->desc, cx->desc, CPUIDLE_DESC_LEN);
count++;

2011-04-21 04:43:10

by Trinabh Gupta

[permalink] [raw]
Subject: Re: [linux-pm] [RFC PATCH V3 1/4] cpuidle: Move dev->last_residency update to driver enter routine; remove dev->last_state



On 04/20/2011 10:57 PM, Kevin Hilman wrote:
> Trinabh Gupta<[email protected]> writes:
>
>> Cpuidle subsystem only suggests the state to enter and does not
>> guarantee if the suggested state is entered. The actual entered state
>> may be different because of software or hardware demotion. Software
>> demotion is done by the back-end cpuidle driver and can be accounted
>> correctly. Current cpuidle code uses last_state field to capture the
>> actual state entered and based on that updates the statistics for the
>> state entered.
>>
>> Ideally the driver enter routine should update the counters,
>> and it should return the state actually entered rather than the time
>> spent there.
>
> OK, the return type was changed to return the state index instead of the
> time, but since the governors are still relying on dev->last_residency,
> drivers are required to update it.
>
> Because of that, why not keep the update of the time/usage counters
> in common code rather than duplicating it (9 times in this patch) into
> all the drivers?

Hi Kevin,

Thanks for your review. Yes, we can do like this and it would prevent
duplication of code. I just wanted to make cpuidle handle only selection
and entering of state, and leave everything else to the driver which
knows best about these things.

But both are functionally same and definitely statistics update inside
cpuidle_idle_call() prevents duplication of code.

Thanks,
-Trinabh

2011-04-21 04:53:41

by Trinabh Gupta

[permalink] [raw]
Subject: Re: [linux-pm] [RFC PATCH V3 4/4] cpuidle: Single/Global registration of idle states



On 04/20/2011 11:03 PM, Kevin Hilman wrote:
> Trinabh Gupta<[email protected]> writes:
>
>> With this patch there is single copy of cpuidle_states structure
>> instead of per-cpu. The statistics needed on per-cpu basis
>> by the governor are kept per-cpu. This simplifies the cpuidle
>> subsystem as state registration is done by single cpu only.
>> Having single copy of cpuidle_states saves memory. Rare case
>> of asymmetric C-states can be handled within the cpuidle driverand
>> architectures such as POWER do not have asymmetric C-states.
>
> I haven't actually tested this series on OMAP yet, but it currently
> doesn't compile.

Hi Kevin,

Yes, I tested it only for x86 (as I had mentioned in the description
of the patch series). I just wanted to get comments on the design
and understand how it affects various architectures in question. It
looks to me as if the design should be okay and infact better for
architectures like ARM since they do not have different idle
states for different cpus and thus do not require per-cpu registration.
Global registration would work and be simpler; please correct me if I am
wrong.

>
> The patch below (on top of your series) is required to compile on OMAP,
> I think it's doing what you intended, but please confirm.

Thanks for helping with this.

-Trinabh

2011-04-22 23:06:49

by Kevin Hilman

[permalink] [raw]
Subject: Re: [linux-pm] [RFC PATCH V3 4/4] cpuidle: Single/Global registration of idle states

Hi Trinabh,

Trinabh Gupta <[email protected]> writes:

[...]

> I just wanted to get comments on the design and understand how it
> affects various architectures in question. It looks to me as if the
> design should be okay and infact better for architectures like ARM
> since they do not have different idle states for different cpus and
> thus do not require per-cpu registration. Global registration would
> work and be simpler; please correct me if I am wrong.

Yes, I agree that the new design is better, I especially like that it's
more clear (and expected) that final state decision making is to be done
directly in the driver without the back-and-forth in the current setup.

Thanks,

Kevin

2011-04-25 12:00:16

by Trinabh Gupta

[permalink] [raw]
Subject: Re: [linux-pm] [RFC PATCH V3 4/4] cpuidle: Single/Global registration of idle states


[...]

> - * acpi_processor_setup_cpuidle - prepares and configures CPUIDLE
> + * acpi_processor_setup_cpuidle_cx - prepares and configures CPUIDLE
> + * device i.e. per-cpu data
> + *
> * @pr: the ACPI processor
> */
> -static int acpi_processor_setup_cpuidle(struct acpi_processor *pr)
> +static int acpi_processor_setup_cpuidle_cx(struct acpi_processor *pr)
> {
> int i, count = CPUIDLE_DRIVER_STATE_START;
> struct acpi_processor_cx *cx;
> - struct cpuidle_state *state;
> struct cpuidle_state_usage *state_usage;
> struct cpuidle_device *dev =&pr->power.dev;
>
> @@ -1018,18 +1025,12 @@ static int acpi_processor_setup_cpuidle(struct acpi_processor *pr)
> }
>
> dev->cpu = pr->id;
> - dev->safe_state_index = -1;
> - for (i = 0; i< CPUIDLE_STATE_MAX; i++) {
> - dev->states[i].name[0] = '\0';
> - dev->states[i].desc[0] = '\0';
> - }
>
> if (max_cstate == 0)
> max_cstate = 1;
>
> for (i = 1; i< ACPI_PROCESSOR_MAX_POWER&& i<= max_cstate; i++) {
> cx =&pr->power.states[i];
> - state =&dev->states[count];
> state_usage =&dev->states_usage[count];
>
> if (!cx->valid)
> @@ -1041,8 +1042,61 @@ static int acpi_processor_setup_cpuidle(struct acpi_processor *pr)
> !(acpi_gbl_FADT.flags& ACPI_FADT_C2_MP_SUPPORTED))
> continue;
> #endif
> +
> cpuidle_set_statedata(state_usage, cx);
>
> + count++;
> + if (count == CPUIDLE_STATE_MAX)
> + break;
> + }
> +
> + dev->state_count = count;
> +
> + if (!count)
> + return -EINVAL;
> +
> + return 0;
> +}
> +
> +/**
> + * acpi_processor_setup_cpuidle states- prepares and configures cpuidle
> + * global state data i.e. idle routines
> + *
> + * @pr: the ACPI processor
> + */
> +static int acpi_processor_setup_cpuidle_states(struct acpi_processor *pr)
> +{
> + int i, count = CPUIDLE_DRIVER_STATE_START;
> + struct acpi_processor_cx *cx;
> + struct cpuidle_state *state;
> + struct cpuidle_driver *drv =&acpi_idle_driver;
> +
> + if (!pr->flags.power_setup_done)
> + return -EINVAL;
> +
> + if (pr->flags.power == 0) {
> + return -EINVAL;
> + }
> +
> + drv->safe_state_index = -1;
> +
> + if (max_cstate == 0)
> + max_cstate = 1;
> +
> + for (i = 1; i< ACPI_PROCESSOR_MAX_POWER&& i<= max_cstate; i++) {
> + cx =&pr->power.states[i];
> +
> + if (!cx->valid)
> + continue;
> +
> +#ifdef CONFIG_HOTPLUG_CPU
> + if ((cx->type != ACPI_STATE_C1)&& (num_online_cpus()> 1)&&
> + !pr->flags.has_cst&&
> + !(acpi_gbl_FADT.flags& ACPI_FADT_C2_MP_SUPPORTED))
> + continue;
> +#endif
> +
> + state =&drv->states[count];
> snprintf(state->name, CPUIDLE_NAME_LEN, "C%d", i);
> strncpy(state->desc, cx->desc, CPUIDLE_DESC_LEN);
> state->exit_latency = cx->latency;
> @@ -1055,13 +1109,13 @@ static int acpi_processor_setup_cpuidle(struct acpi_processor *pr)
> state->flags |= CPUIDLE_FLAG_TIME_VALID;
>
> state->enter = acpi_idle_enter_c1;
> - dev->safe_state_index = count;
> + drv->safe_state_index = count;
> break;
>
> case ACPI_STATE_C2:
> state->flags |= CPUIDLE_FLAG_TIME_VALID;
> state->enter = acpi_idle_enter_simple;
> - dev->safe_state_index = count;
> + drv->safe_state_index = count;
> break;
>
> case ACPI_STATE_C3:
> @@ -1077,7 +1131,7 @@ static int acpi_processor_setup_cpuidle(struct acpi_processor *pr)
> break;
> }
>
> - dev->state_count = count;
> + drv->state_count = count;
>
> if (!count)
> return -EINVAL;
> @@ -1106,7 +1160,15 @@ int acpi_processor_cst_has_changed(struct acpi_processor *pr)
> cpuidle_disable_device(&pr->power.dev);
> acpi_processor_get_power_info(pr);
> if (pr->flags.power) {
> - acpi_processor_setup_cpuidle(pr);
> + /*
> + * To Do: Currently state data within driver
> + * is not updated. So change this to also update
> + * actual state data i.e. what this routine is
> + * meant for. Maybe complete unregistration and
> + * re-registration.
> + *
> + */

Hi Len, Arjan

acpi_processor_cst_has_changed is called on switch between AC
power and battery for each cpu. As a result each cpu disables
the cpuidle device, populates the idle state info again and
enables the device. This all works fine for per-cpu registration.

But if we do global registration of idle states then during
the first notification itself we need to disable all devices,
re-populate the states structure and enable all devices. But because
currently each cpu is notified in no particular order this is
not possible. Do you have any suggestions on how to design this?

Reading through the ACPI notification code it looks as if
drivers/acpi/acpica/evxface.c:acpi_install_notify_handler()
registers the device notification for root object i.e. basically
all objects of type CPU. With global registration only one
notification is required and sufficient. Maybe we need to
change the event type to system notification just like
for sleep button?

Thanks,
-Trinabh



> + acpi_processor_setup_cpuidle_cx(pr);
> ret = cpuidle_enable_device(&pr->power.dev);
> }
> cpuidle_resume_and_unlock();
> @@ -1118,7 +1180,7 @@ int __cpuinit acpi_processor_power_init(struct acpi_processor *pr,
> struct acpi_device *device)
> {
> acpi_status status = 0;
> - static int first_run;
> + static int first_run, acpi_processor_registered;
>
> if (disabled_by_idle_boot_param())
> return 0;
> @@ -1154,7 +1216,15 @@ int __cpuinit acpi_processor_power_init(struct acpi_processor *pr,
> * platforms that only support C1.
> */
> if (pr->flags.power) {
> - acpi_processor_setup_cpuidle(pr);
> + if (!acpi_processor_registered) {
> + acpi_processor_setup_cpuidle_states(pr);
> + if (!cpuidle_register_driver(&acpi_idle_driver)) {
> + printk(KERN_DEBUG "ACPI: %s registered with cpuidle\n",
> + acpi_idle_driver.name);
> + acpi_processor_registered = 1;
> + }
> + }
> + acpi_processor_setup_cpuidle_cx(pr);
> if (cpuidle_register_device(&pr->power.dev))
> return -EIO;
> }
> @@ -1168,6 +1238,11 @@ int acpi_processor_power_exit(struct acpi_processor *pr,
> return 0;
>
> cpuidle_unregister_device(&pr->power.dev);
> + /* We will have to have unregister driver as well here
> + * since we move register_driver to power_init. Maybe
> + * just do an unregister everytime; which will be successful
> + * only during the last call.
> + */
> pr->flags.power_setup_done = 0;
>
> return 0;

[...]