2009-09-22 11:25:38

by Arun R Bharadwaj

[permalink] [raw]
Subject: [v6 PATCH 0/7]: cpuidle/x86/POWER: Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER.

Hi,

This patchset introduces cpuidle infrastructure to POWER, prototyping
for pSeries, and also does a major refactoring of current x86 idle
power management and a cleanup of cpuidle infrastructure.

My earlier iterations can be found at:

v5 --> http://lkml.org/lkml/2009/9/22/26
v4 --> http://lkml.org/lkml/2009/9/1/133
v3 --> http://lkml.org/lkml/2009/8/27/124
v2 --> http://lkml.org/lkml/2009/8/26/233
v1 --> http://lkml.org/lkml/2009/8/19/150


Changes in this version:
------------------------------------------
Remove the bug from previous iteration in the routine
cpuidle_remove_from_list(), which was causing the kernel to panic on
platform supporting multiple sleep states.

Add the routine cpuidle_kick_cpus() in POWER, which is needed to kick
the cpus out of their idle when changing the idle routines.


TODO:
-------------------------------------------
Peterz suggested it would be nice to have a sysfs interface through
which an idle routine can be forced at runtime.

Also, current implementation registers every cpu as a cpuidle_device,
but this is an overkill and the registering mechanism should be a
systemwide process and not per-cpu. (probably one of the original
cpuidle authors can reply to this - Venki, Shaohua Li? ).

ppc_md.power_save has been replaced by cpuidle_idle_call only for
pseries. So this needs to be done for all POWER platforms so that
ppc_md.power_save is completely removed.


Any comments on the design is welcome.

--arun


2009-09-22 11:27:25

by Arun R Bharadwaj

[permalink] [raw]
Subject: [v6 PATCH 1/7]: cpuidle: cleanup drivers/cpuidle/cpuidle.c

* Arun R Bharadwaj <[email protected]> [2009-09-22 16:55:27]:

This patch cleans up drivers/cpuidle/cpuidle.c
Earlier cpuidle assumed pm_idle as the default idle loop. Break that
assumption and make it more generic. cpuidle_idle_call() which is the
main idle loop of cpuidle is to be called by architectures which have
registered to cpuidle.

Remove routines cpuidle_install/uninstall_idle_handler() and
cpuidle_kick_cpus() which are not needed anymore.

Signed-off-by: Arun R Bharadwaj <[email protected]>
---
drivers/cpuidle/cpuidle.c | 48 +++++++--------------------------------------
drivers/cpuidle/cpuidle.h | 1
drivers/cpuidle/governor.c | 3 --
include/linux/cpuidle.h | 3 ++
4 files changed, 11 insertions(+), 44 deletions(-)

Index: linux.trees.git/drivers/cpuidle/cpuidle.c
===================================================================
--- linux.trees.git.orig/drivers/cpuidle/cpuidle.c
+++ linux.trees.git/drivers/cpuidle/cpuidle.c
@@ -24,9 +24,6 @@ DEFINE_PER_CPU(struct cpuidle_device *,

DEFINE_MUTEX(cpuidle_lock);
LIST_HEAD(cpuidle_detected_devices);
-static void (*pm_idle_old)(void);
-
-static int enabled_devices;

#if defined(CONFIG_ARCH_HAS_CPU_IDLE_WAIT)
static void cpuidle_kick_cpus(void)
@@ -46,7 +43,7 @@ static int __cpuidle_register_device(str
*
* NOTE: no locks or semaphores should be used here
*/
-static void cpuidle_idle_call(void)
+void cpuidle_idle_call(void)
{
struct cpuidle_device *dev = __get_cpu_var(cpuidle_devices);
struct cpuidle_state *target_state;
@@ -54,13 +51,10 @@ static void cpuidle_idle_call(void)

/* check if the device is ready */
if (!dev || !dev->enabled) {
- if (pm_idle_old)
- pm_idle_old();
- else
#if defined(CONFIG_ARCH_HAS_DEFAULT_IDLE)
- default_idle();
+ default_idle();
#else
- local_irq_enable();
+ local_irq_enable();
#endif
return;
}
@@ -74,7 +68,11 @@ static void cpuidle_idle_call(void)
hrtimer_peek_ahead_timers();
#endif
/* ask the governor for the next state */
- next_state = cpuidle_curr_governor->select(dev);
+ if (dev->state_count > 1)
+ next_state = cpuidle_curr_governor->select(dev);
+ else
+ next_state = 0;
+
if (need_resched())
return;
target_state = &dev->states[next_state];
@@ -94,35 +92,11 @@ static void cpuidle_idle_call(void)
}

/**
- * cpuidle_install_idle_handler - installs the cpuidle idle loop handler
- */
-void cpuidle_install_idle_handler(void)
-{
- if (enabled_devices && (pm_idle != cpuidle_idle_call)) {
- /* Make sure all changes finished before we switch to new idle */
- smp_wmb();
- pm_idle = cpuidle_idle_call;
- }
-}
-
-/**
- * cpuidle_uninstall_idle_handler - uninstalls the cpuidle idle loop handler
- */
-void cpuidle_uninstall_idle_handler(void)
-{
- if (enabled_devices && pm_idle_old && (pm_idle != pm_idle_old)) {
- pm_idle = pm_idle_old;
- cpuidle_kick_cpus();
- }
-}
-
-/**
* cpuidle_pause_and_lock - temporarily disables CPUIDLE
*/
void cpuidle_pause_and_lock(void)
{
mutex_lock(&cpuidle_lock);
- cpuidle_uninstall_idle_handler();
}

EXPORT_SYMBOL_GPL(cpuidle_pause_and_lock);
@@ -132,7 +106,6 @@ EXPORT_SYMBOL_GPL(cpuidle_pause_and_lock
*/
void cpuidle_resume_and_unlock(void)
{
- cpuidle_install_idle_handler();
mutex_unlock(&cpuidle_lock);
}

@@ -180,7 +153,6 @@ int cpuidle_enable_device(struct cpuidle

dev->enabled = 1;

- enabled_devices++;
return 0;

fail_sysfs:
@@ -211,7 +183,6 @@ void cpuidle_disable_device(struct cpuid
cpuidle_curr_governor->disable(dev);

cpuidle_remove_state_sysfs(dev);
- enabled_devices--;
}

EXPORT_SYMBOL_GPL(cpuidle_disable_device);
@@ -303,7 +274,6 @@ int cpuidle_register_device(struct cpuid
}

cpuidle_enable_device(dev);
- cpuidle_install_idle_handler();

mutex_unlock(&cpuidle_lock);

@@ -382,8 +352,6 @@ static int __init cpuidle_init(void)
{
int ret;

- pm_idle_old = pm_idle;
-
ret = cpuidle_add_class_sysfs(&cpu_sysdev_class);
if (ret)
return ret;
Index: linux.trees.git/drivers/cpuidle/governor.c
===================================================================
--- linux.trees.git.orig/drivers/cpuidle/governor.c
+++ linux.trees.git/drivers/cpuidle/governor.c
@@ -48,8 +48,6 @@ int cpuidle_switch_governor(struct cpuid
if (gov == cpuidle_curr_governor)
return 0;

- cpuidle_uninstall_idle_handler();
-
if (cpuidle_curr_governor) {
list_for_each_entry(dev, &cpuidle_detected_devices, device_list)
cpuidle_disable_device(dev);
@@ -63,7 +61,6 @@ int cpuidle_switch_governor(struct cpuid
return -EINVAL;
list_for_each_entry(dev, &cpuidle_detected_devices, device_list)
cpuidle_enable_device(dev);
- cpuidle_install_idle_handler();
printk(KERN_INFO "cpuidle: using governor %s\n", gov->name);
}

Index: linux.trees.git/include/linux/cpuidle.h
===================================================================
--- linux.trees.git.orig/include/linux/cpuidle.h
+++ linux.trees.git/include/linux/cpuidle.h
@@ -112,6 +112,9 @@ static inline int cpuidle_get_last_resid
return dev->last_residency;
}

+extern void cpuidle_idle_call(void);
+extern struct cpuidle_driver *cpuidle_curr_driver;
+

/****************************
* CPUIDLE DRIVER INTERFACE *
Index: linux.trees.git/drivers/cpuidle/cpuidle.h
===================================================================
--- linux.trees.git.orig/drivers/cpuidle/cpuidle.h
+++ linux.trees.git/drivers/cpuidle/cpuidle.h
@@ -9,7 +9,6 @@

/* For internal use only */
extern struct cpuidle_governor *cpuidle_curr_governor;
-extern struct cpuidle_driver *cpuidle_curr_driver;
extern struct list_head cpuidle_governors;
extern struct list_head cpuidle_detected_devices;
extern struct mutex cpuidle_lock;

2009-09-22 11:28:26

by Arun R Bharadwaj

[permalink] [raw]
Subject: [v6 PATCH 2/7]: cpuidle: implement a list based approach to register a set of idle routines.

* Arun R Bharadwaj <[email protected]> [2009-09-22 16:55:27]:

Implement a list based registering mechanism for architectures which
have multiple sets of idle routines which are to be registered.

Currently, in x86 it is done by merely setting pm_idle = idle_routine
and managing this pm_idle pointer is messy.

To give an example of how this mechanism works:
In x86, initially, idle routine is selected from the set of poll/mwait/
c1e/default idle loops. So the selected idle loop is registered in cpuidle
as one idle state cpuidle devices. Once ACPI comes up, it registers
another set of idle states on top of this state. Again, suppose a module
registers another set of idle loops, it is added to this list.

This provides a clean way of registering and unregistering idle state
routines.

In the current implementation, pm_idle is set as the current idle routine
being used and the old idle routine has to be maintained and when a module
registers/unregisters an idle routine, confusion arises.


Signed-off-by: Arun R Bharadwaj <[email protected]>
---
drivers/cpuidle/cpuidle.c | 51 ++++++++++++++++++++++++++++++++++++++++++----
include/linux/cpuidle.h | 1
2 files changed, 48 insertions(+), 4 deletions(-)

Index: linux.trees.git/drivers/cpuidle/cpuidle.c
===================================================================
--- linux.trees.git.orig/drivers/cpuidle/cpuidle.c
+++ linux.trees.git/drivers/cpuidle/cpuidle.c
@@ -21,6 +21,7 @@
#include "cpuidle.h"

DEFINE_PER_CPU(struct cpuidle_device *, cpuidle_devices);
+DEFINE_PER_CPU(struct list_head, cpuidle_devices_list);

DEFINE_MUTEX(cpuidle_lock);
LIST_HEAD(cpuidle_detected_devices);
@@ -111,6 +112,45 @@ void cpuidle_resume_and_unlock(void)

EXPORT_SYMBOL_GPL(cpuidle_resume_and_unlock);

+int cpuidle_add_to_list(struct cpuidle_device *dev)
+{
+ int ret, cpu = dev->cpu;
+ struct cpuidle_device *old_dev;
+
+ if (!list_empty(&per_cpu(cpuidle_devices_list, cpu))) {
+ old_dev = list_first_entry(&per_cpu(cpuidle_devices_list, cpu),
+ struct cpuidle_device, idle_list);
+ cpuidle_remove_state_sysfs(old_dev);
+ }
+
+ list_add(&dev->idle_list, &per_cpu(cpuidle_devices_list, cpu));
+ ret = cpuidle_add_state_sysfs(dev);
+ return ret;
+}
+
+void cpuidle_remove_from_list(struct cpuidle_device *dev)
+{
+ struct cpuidle_device *temp_dev;
+ struct list_head *pos;
+ int ret, cpu = dev->cpu;
+
+ list_for_each(pos, &per_cpu(cpuidle_devices_list, cpu)) {
+ temp_dev = container_of(pos, struct cpuidle_device, idle_list);
+ if (dev == temp_dev) {
+ list_del(&temp_dev->idle_list);
+ cpuidle_remove_state_sysfs(temp_dev);
+ break;
+ }
+ }
+
+ if (!list_empty(&per_cpu(cpuidle_devices_list, cpu))) {
+ temp_dev = list_first_entry(&per_cpu(cpuidle_devices_list, cpu),
+ struct cpuidle_device, idle_list);
+ ret = cpuidle_add_state_sysfs(temp_dev);
+ }
+ cpuidle_kick_cpus();
+}
+
/**
* cpuidle_enable_device - enables idle PM for a CPU
* @dev: the CPU
@@ -135,7 +175,7 @@ int cpuidle_enable_device(struct cpuidle
return ret;
}

- if ((ret = cpuidle_add_state_sysfs(dev)))
+ if ((cpuidle_add_to_list(dev)))
return ret;

if (cpuidle_curr_governor->enable &&
@@ -156,7 +196,7 @@ int cpuidle_enable_device(struct cpuidle
return 0;

fail_sysfs:
- cpuidle_remove_state_sysfs(dev);
+ cpuidle_remove_from_list(dev);

return ret;
}
@@ -182,7 +222,7 @@ void cpuidle_disable_device(struct cpuid
if (cpuidle_curr_governor->disable)
cpuidle_curr_governor->disable(dev);

- cpuidle_remove_state_sysfs(dev);
+ cpuidle_remove_from_list(dev);
}

EXPORT_SYMBOL_GPL(cpuidle_disable_device);
@@ -350,12 +390,15 @@ static inline void latency_notifier_init
*/
static int __init cpuidle_init(void)
{
- int ret;
+ int ret, cpu;

ret = cpuidle_add_class_sysfs(&cpu_sysdev_class);
if (ret)
return ret;

+ for_each_possible_cpu(cpu)
+ INIT_LIST_HEAD(&per_cpu(cpuidle_devices_list, cpu));
+
latency_notifier_init(&cpuidle_latency_notifier);

return 0;
Index: linux.trees.git/include/linux/cpuidle.h
===================================================================
--- linux.trees.git.orig/include/linux/cpuidle.h
+++ linux.trees.git/include/linux/cpuidle.h
@@ -93,6 +93,7 @@ struct cpuidle_device {
struct cpuidle_state *last_state;

struct list_head device_list;
+ struct list_head idle_list;
struct kobject kobj;
struct completion kobj_unregister;
void *governor_data;

2009-09-22 11:29:40

by Arun R Bharadwaj

[permalink] [raw]
Subject: [v6 PATCH 3/7]: x86: refactor x86 idle power management code and remove all instances of pm_idle.

* Arun R Bharadwaj <[email protected]> [2009-09-22 16:55:27]:

This patch cleans up x86 of all instances of pm_idle.

pm_idle which was earlier called from cpu_idle() idle loop
is replaced by cpuidle_idle_call.

x86 also registers to cpuidle when the idle routine is selected,
by populating the cpuidle_device data structure for each cpu.

This is replicated for apm module and for xen, which also used pm_idle.


Signed-off-by: Arun R Bharadwaj <[email protected]>
---
arch/x86/kernel/apm_32.c | 37 +++++++++++++++++++++--
arch/x86/kernel/process.c | 69 ++++++++++++++++++++++++++++++++++---------
arch/x86/kernel/process_32.c | 3 +
arch/x86/kernel/process_64.c | 3 +
arch/x86/xen/setup.c | 22 +++++++++++++
5 files changed, 114 insertions(+), 20 deletions(-)

Index: linux.trees.git/arch/x86/kernel/process.c
===================================================================
--- linux.trees.git.orig/arch/x86/kernel/process.c
+++ linux.trees.git/arch/x86/kernel/process.c
@@ -9,6 +9,8 @@
#include <linux/pm.h>
#include <linux/clockchips.h>
#include <linux/random.h>
+#include <linux/cpuidle.h>
+
#include <trace/power.h>
#include <asm/system.h>
#include <asm/apic.h>
@@ -247,12 +249,6 @@ int sys_vfork(struct pt_regs *regs)
unsigned long boot_option_idle_override = 0;
EXPORT_SYMBOL(boot_option_idle_override);

-/*
- * Powermanagement idle function, if any..
- */
-void (*pm_idle)(void);
-EXPORT_SYMBOL(pm_idle);
-
#ifdef CONFIG_X86_32
/*
* This halt magic was a workaround for ancient floppy DMA
@@ -531,15 +527,58 @@ static void c1e_idle(void)
default_idle();
}

+static void (*local_idle)(void);
+DEFINE_PER_CPU(struct cpuidle_device, idle_devices);
+
+struct cpuidle_driver cpuidle_default_driver = {
+ .name = "cpuidle_default",
+};
+
+static int local_idle_loop(struct cpuidle_device *dev, struct cpuidle_state *st)
+{
+ ktime_t t1, t2;
+ s64 diff;
+ int ret;
+
+ t1 = ktime_get();
+ local_idle();
+ t2 = ktime_get();
+
+ diff = ktime_to_us(ktime_sub(t2, t1));
+ if (diff > INT_MAX)
+ diff = INT_MAX;
+ ret = (int) diff;
+
+ return ret;
+}
+static int __cpuinit setup_cpuidle_simple(void)
+{
+ struct cpuidle_device *dev;
+ int cpu;
+
+ if (!cpuidle_curr_driver)
+ cpuidle_register_driver(&cpuidle_default_driver);
+
+ for_each_online_cpu(cpu) {
+ dev = &per_cpu(idle_devices, cpu);
+ dev->cpu = cpu;
+ dev->states[0].enter = local_idle_loop;
+ dev->state_count = 1;
+ cpuidle_register_device(dev);
+ }
+ return 0;
+}
+late_initcall(setup_cpuidle_simple);
+
void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
{
#ifdef CONFIG_SMP
- if (pm_idle == poll_idle && smp_num_siblings > 1) {
+ if (local_idle == poll_idle && smp_num_siblings > 1) {
printk(KERN_WARNING "WARNING: polling idle and HT enabled,"
" performance may degrade.\n");
}
#endif
- if (pm_idle)
+ if (local_idle)
return;

if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
@@ -547,18 +586,20 @@ void __cpuinit select_idle_routine(const
* One CPU supports mwait => All CPUs supports mwait
*/
printk(KERN_INFO "using mwait in idle threads.\n");
- pm_idle = mwait_idle;
+ local_idle = mwait_idle;
} else if (check_c1e_idle(c)) {
printk(KERN_INFO "using C1E aware idle routine\n");
- pm_idle = c1e_idle;
+ local_idle = c1e_idle;
} else
- pm_idle = default_idle;
+ local_idle = default_idle;
+
+ return;
}

void __init init_c1e_mask(void)
{
/* If we're using c1e_idle, we need to allocate c1e_mask. */
- if (pm_idle == c1e_idle) {
+ if (local_idle == c1e_idle) {
alloc_cpumask_var(&c1e_mask, GFP_KERNEL);
cpumask_clear(c1e_mask);
}
@@ -571,7 +612,7 @@ static int __init idle_setup(char *str)

if (!strcmp(str, "poll")) {
printk("using polling idle threads.\n");
- pm_idle = poll_idle;
+ local_idle = poll_idle;
} else if (!strcmp(str, "mwait"))
force_mwait = 1;
else if (!strcmp(str, "halt")) {
@@ -582,7 +623,7 @@ static int __init idle_setup(char *str)
* To continue to load the CPU idle driver, don't touch
* the boot_option_idle_override.
*/
- pm_idle = default_idle;
+ local_idle = default_idle;
idle_halt = 1;
return 0;
} else if (!strcmp(str, "nomwait")) {
Index: linux.trees.git/arch/x86/kernel/process_32.c
===================================================================
--- linux.trees.git.orig/arch/x86/kernel/process_32.c
+++ linux.trees.git/arch/x86/kernel/process_32.c
@@ -40,6 +40,7 @@
#include <linux/uaccess.h>
#include <linux/io.h>
#include <linux/kdebug.h>
+#include <linux/cpuidle.h>

#include <asm/pgtable.h>
#include <asm/system.h>
@@ -113,7 +114,7 @@ void cpu_idle(void)
local_irq_disable();
/* Don't trace irqs off for idle */
stop_critical_timings();
- pm_idle();
+ cpuidle_idle_call();
start_critical_timings();
}
tick_nohz_restart_sched_tick();
Index: linux.trees.git/arch/x86/kernel/process_64.c
===================================================================
--- linux.trees.git.orig/arch/x86/kernel/process_64.c
+++ linux.trees.git/arch/x86/kernel/process_64.c
@@ -39,6 +39,7 @@
#include <linux/io.h>
#include <linux/ftrace.h>
#include <linux/dmi.h>
+#include <linux/cpuidle.h>

#include <asm/pgtable.h>
#include <asm/system.h>
@@ -142,7 +143,7 @@ void cpu_idle(void)
enter_idle();
/* Don't trace irqs off for idle */
stop_critical_timings();
- pm_idle();
+ cpuidle_idle_call();
start_critical_timings();
/* In many cases the interrupt that ended idle
has already called exit_idle. But some idle
Index: linux.trees.git/arch/x86/kernel/apm_32.c
===================================================================
--- linux.trees.git.orig/arch/x86/kernel/apm_32.c
+++ linux.trees.git/arch/x86/kernel/apm_32.c
@@ -2257,6 +2257,38 @@ static struct dmi_system_id __initdata a
{ }
};

+DEFINE_PER_CPU(struct cpuidle_device, apm_idle_devices);
+
+struct cpuidle_driver cpuidle_apm_driver = {
+ .name = "cpuidle_apm",
+};
+
+void __cpuinit setup_cpuidle_apm(void)
+{
+ struct cpuidle_device *dev;
+
+ if (!cpuidle_curr_driver)
+ cpuidle_register_driver(&cpuidle_apm_driver);
+
+ dev = &per_cpu(apm_idle_devices, smp_processor_id());
+ dev->cpu = smp_processor_id();
+ dev->states[0].enter = apm_cpu_idle;
+ dev->state_count = 1;
+ cpuidle_register_device(dev);
+}
+
+void exit_cpuidle_apm(void)
+{
+ struct cpuidle_device *dev;
+ int cpu;
+
+ for_each_online_cpu(cpu) {
+ dev = &per_cpu(apm_idle_devices, cpu);
+ cpuidle_unregister_device(dev);
+ }
+}
+
+
/*
* Just start the APM thread. We do NOT want to do APM BIOS
* calls from anything but the APM thread, if for no other reason
@@ -2394,8 +2426,7 @@ static int __init apm_init(void)
if (HZ != 100)
idle_period = (idle_period * HZ) / 100;
if (idle_threshold < 100) {
- original_pm_idle = pm_idle;
- pm_idle = apm_cpu_idle;
+ setup_cpuidle_apm();
set_pm_idle = 1;
}

@@ -2407,7 +2438,7 @@ static void __exit apm_exit(void)
int error;

if (set_pm_idle) {
- pm_idle = original_pm_idle;
+ exit_cpuidle_apm();
/*
* We are about to unload the current idle thread pm callback
* (pm_idle), Wait for all processors to update cached/local
Index: linux.trees.git/arch/x86/xen/setup.c
===================================================================
--- linux.trees.git.orig/arch/x86/xen/setup.c
+++ linux.trees.git/arch/x86/xen/setup.c
@@ -8,6 +8,7 @@
#include <linux/sched.h>
#include <linux/mm.h>
#include <linux/pm.h>
+#include <linux/cpuidle.h>

#include <asm/elf.h>
#include <asm/vdso.h>
@@ -151,6 +152,25 @@ void __cpuinit xen_enable_syscall(void)
#endif /* CONFIG_X86_64 */
}

+DEFINE_PER_CPU(struct cpuidle_device, idle_devices);
+struct cpuidle_driver cpuidle_xen_driver = {
+ .name = "cpuidle_xen",
+};
+
+void __cpuinit setup_cpuidle_xen(void)
+{
+ struct cpuidle_device *dev;
+
+ if (!cpuidle_curr_driver)
+ cpuidle_register_driver(&cpuidle_xen_driver);
+
+ dev = &per_cpu(idle_devices, smp_processor_id());
+ dev->cpu = smp_processor_id();
+ dev->states[0].enter = xen_idle;
+ dev->state_count = 1;
+ cpuidle_register_device(dev);
+}
+
void __init xen_arch_setup(void)
{
struct physdev_set_iopl set_iopl;
@@ -186,7 +206,7 @@ void __init xen_arch_setup(void)
MAX_GUEST_CMDLINE > COMMAND_LINE_SIZE ?
COMMAND_LINE_SIZE : MAX_GUEST_CMDLINE);

- pm_idle = xen_idle;
+ setup_cpuidle_xen();

paravirt_disable_iospace();

2009-09-22 11:30:43

by Arun R Bharadwaj

[permalink] [raw]
Subject: [v6 PATCH 4/7]: POWER: enable cpuidle for POWER.

* Arun R Bharadwaj <[email protected]> [2009-09-22 16:55:27]:

This patch enables the cpuidle option in Kconfig for pSeries.

Currently cpuidle infrastructure is enabled only for x86 and ARM.
This code is almost completely borrowed from x86 to enable
cpuidle for pSeries.

Signed-off-by: Arun R Bharadwaj <[email protected]>
---
arch/powerpc/Kconfig | 17 +++++++++++++++++
arch/powerpc/include/asm/system.h | 2 ++
arch/powerpc/kernel/idle.c | 19 +++++++++++++++++++
3 files changed, 38 insertions(+)

Index: linux.trees.git/arch/powerpc/Kconfig
===================================================================
--- linux.trees.git.orig/arch/powerpc/Kconfig
+++ linux.trees.git/arch/powerpc/Kconfig
@@ -88,6 +88,9 @@ config ARCH_HAS_ILOG2_U64
bool
default y if 64BIT

+config ARCH_HAS_CPU_IDLE_WAIT
+ def_bool y
+
config GENERIC_HWEIGHT
bool
default y
@@ -243,6 +246,20 @@ source "kernel/Kconfig.freezer"
source "arch/powerpc/sysdev/Kconfig"
source "arch/powerpc/platforms/Kconfig"

+menu "Power management options"
+
+source "drivers/cpuidle/Kconfig"
+
+config PSERIES_PROCESSOR_IDLE
+ bool "Idle Power Management Support for pSeries"
+ depends on PPC_PSERIES && CPU_IDLE
+ default y
+ help
+ Idle Power Management Support for pSeries. This hooks onto cpuidle
+ infrastructure to help in idle cpu power management.
+
+endmenu
+
menu "Kernel options"

config HIGHMEM
Index: linux.trees.git/arch/powerpc/include/asm/system.h
===================================================================
--- linux.trees.git.orig/arch/powerpc/include/asm/system.h
+++ linux.trees.git/arch/powerpc/include/asm/system.h
@@ -546,5 +546,7 @@ extern void account_system_vtime(struct

extern struct dentry *powerpc_debugfs_root;

+void cpu_idle_wait(void);
+
#endif /* __KERNEL__ */
#endif /* _ASM_POWERPC_SYSTEM_H */
Index: linux.trees.git/arch/powerpc/kernel/idle.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/kernel/idle.c
+++ linux.trees.git/arch/powerpc/kernel/idle.c
@@ -102,6 +102,25 @@ void cpu_idle(void)
}
}

+static void do_nothing(void *unused)
+{
+}
+
+/*
+ * cpu_idle_wait - Used to ensure that all the CPUs come out of the old
+ * idle loop and start using the new idle loop.
+ * Required while changing idle handler on SMP systems.
+ * Caller must have changed idle handler to the new value before the call.
+ */
+void cpu_idle_wait(void)
+{
+ /* Ensure that new value of ppc_md.power_save is set */
+ smp_mb();
+ /* kick all the CPUs so that they exit out of ppc_md.power_save */
+ smp_call_function(do_nothing, NULL, 1);
+}
+EXPORT_SYMBOL_GPL(cpu_idle_wait);
+
int powersave_nap;

#ifdef CONFIG_SYSCTL

2009-09-22 11:31:53

by Arun R Bharadwaj

[permalink] [raw]
Subject: [v6 PATCH 5/7]: pSeries/cpuidle: remove dedicate/shared idle loops, which will be moved to arch/powerpc/platforms/pseries/processor_idle.c

* Arun R Bharadwaj <[email protected]> [2009-09-22 16:55:27]:

This patch removes the routines, pseries_shared_idle_sleep and
pseries_dedicated_idle_sleep, since this is implemented as a part
of arch/powerpc/platform/pseries/processor_idle.c

Also, similar to x86, call cpuidle_idle_call from cpu_idle() idle
loop instead of ppc_md.power_save.


Signed-off-by: Arun R Bharadwaj <[email protected]>
---
arch/powerpc/kernel/idle.c | 50 +++++++-----------
arch/powerpc/platforms/pseries/setup.c | 89 ---------------------------------
2 files changed, 22 insertions(+), 117 deletions(-)

Index: linux.trees.git/arch/powerpc/platforms/pseries/setup.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/pseries/setup.c
+++ linux.trees.git/arch/powerpc/platforms/pseries/setup.c
@@ -75,9 +75,6 @@ EXPORT_SYMBOL(CMO_PageSize);

int fwnmi_active; /* TRUE if an FWNMI handler is present */

-static void pseries_shared_idle_sleep(void);
-static void pseries_dedicated_idle_sleep(void);
-
static struct device_node *pSeries_mpic_node;

static void pSeries_show_cpuinfo(struct seq_file *m)
@@ -301,18 +298,8 @@ static void __init pSeries_setup_arch(vo
pSeries_nvram_init();

/* Choose an idle loop */
- if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
+ if (firmware_has_feature(FW_FEATURE_SPLPAR))
vpa_init(boot_cpuid);
- if (get_lppaca()->shared_proc) {
- printk(KERN_DEBUG "Using shared processor idle loop\n");
- ppc_md.power_save = pseries_shared_idle_sleep;
- } else {
- printk(KERN_DEBUG "Using dedicated idle loop\n");
- ppc_md.power_save = pseries_dedicated_idle_sleep;
- }
- } else {
- printk(KERN_DEBUG "Using default idle loop\n");
- }

if (firmware_has_feature(FW_FEATURE_LPAR))
ppc_md.enable_pmcs = pseries_lpar_enable_pmcs;
@@ -500,80 +487,6 @@ static int __init pSeries_probe(void)
return 1;
}

-
-DECLARE_PER_CPU(unsigned long, smt_snooze_delay);
-
-static void pseries_dedicated_idle_sleep(void)
-{
- unsigned int cpu = smp_processor_id();
- unsigned long start_snooze;
- unsigned long in_purr, out_purr;
-
- /*
- * Indicate to the HV that we are idle. Now would be
- * a good time to find other work to dispatch.
- */
- get_lppaca()->idle = 1;
- get_lppaca()->donate_dedicated_cpu = 1;
- in_purr = mfspr(SPRN_PURR);
-
- /*
- * We come in with interrupts disabled, and need_resched()
- * has been checked recently. If we should poll for a little
- * while, do so.
- */
- if (__get_cpu_var(smt_snooze_delay)) {
- start_snooze = get_tb() +
- __get_cpu_var(smt_snooze_delay) * tb_ticks_per_usec;
- local_irq_enable();
- set_thread_flag(TIF_POLLING_NRFLAG);
-
- while (get_tb() < start_snooze) {
- if (need_resched() || cpu_is_offline(cpu))
- goto out;
- ppc64_runlatch_off();
- HMT_low();
- HMT_very_low();
- }
-
- HMT_medium();
- clear_thread_flag(TIF_POLLING_NRFLAG);
- smp_mb();
- local_irq_disable();
- if (need_resched() || cpu_is_offline(cpu))
- goto out;
- }
-
- cede_processor();
-
-out:
- HMT_medium();
- out_purr = mfspr(SPRN_PURR);
- get_lppaca()->wait_state_cycles += out_purr - in_purr;
- get_lppaca()->donate_dedicated_cpu = 0;
- get_lppaca()->idle = 0;
-}
-
-static void pseries_shared_idle_sleep(void)
-{
- /*
- * Indicate to the HV that we are idle. Now would be
- * a good time to find other work to dispatch.
- */
- get_lppaca()->idle = 1;
-
- /*
- * Yield the processor to the hypervisor. We return if
- * an external interrupt occurs (which are driven prior
- * to returning here) or if a prod occurs from another
- * processor. When returning here, external interrupts
- * are enabled.
- */
- cede_processor();
-
- get_lppaca()->idle = 0;
-}
-
static int pSeries_pci_probe_mode(struct pci_bus *bus)
{
if (firmware_has_feature(FW_FEATURE_LPAR))
Index: linux.trees.git/arch/powerpc/kernel/idle.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/kernel/idle.c
+++ linux.trees.git/arch/powerpc/kernel/idle.c
@@ -25,6 +25,7 @@
#include <linux/cpu.h>
#include <linux/sysctl.h>
#include <linux/tick.h>
+#include <linux/cpuidle.h>

#include <asm/system.h>
#include <asm/processor.h>
@@ -60,35 +61,26 @@ void cpu_idle(void)
while (!need_resched() && !cpu_should_die()) {
ppc64_runlatch_off();

- if (ppc_md.power_save) {
- clear_thread_flag(TIF_POLLING_NRFLAG);
- /*
- * smp_mb is so clearing of TIF_POLLING_NRFLAG
- * is ordered w.r.t. need_resched() test.
- */
- smp_mb();
- local_irq_disable();
-
- /* Don't trace irqs off for idle */
- stop_critical_timings();
-
- /* check again after disabling irqs */
- if (!need_resched() && !cpu_should_die())
- ppc_md.power_save();
-
- start_critical_timings();
-
- local_irq_enable();
- set_thread_flag(TIF_POLLING_NRFLAG);
-
- } else {
- /*
- * Go into low thread priority and possibly
- * low power mode.
- */
- HMT_low();
- HMT_very_low();
- }
+ clear_thread_flag(TIF_POLLING_NRFLAG);
+ /*
+ * smp_mb is so clearing of TIF_POLLING_NRFLAG
+ * is ordered w.r.t. need_resched() test.
+ */
+ smp_mb();
+ local_irq_disable();
+
+ /* Don't trace irqs off for idle */
+ stop_critical_timings();
+
+ /* check again after disabling irqs */
+ if (!need_resched() && !cpu_should_die())
+ cpuidle_idle_call();
+
+ start_critical_timings();
+
+ local_irq_enable();
+ set_thread_flag(TIF_POLLING_NRFLAG);
+
}

HMT_medium();

2009-09-22 11:32:56

by Arun R Bharadwaj

[permalink] [raw]
Subject: [v6 PATCH 6/7]: POWER: add a default_idle idle loop for POWER.

* Arun R Bharadwaj <[email protected]> [2009-09-22 16:55:27]:

In arch/powerpc/kernel/idle.c create a default_idle() routine by moving
the failover condition of the cpu_idle() idle loop. This is needed by
cpuidle infrastructure to call default_idle when other idle routines
are not yet registered. Functionality remains the same, but the code is
slightly moved around.


Signed-off-by: Arun R Bharadwaj <[email protected]>
---
arch/powerpc/Kconfig | 3 +++
arch/powerpc/include/asm/system.h | 1 +
arch/powerpc/kernel/idle.c | 6 ++++++
3 files changed, 10 insertions(+)

Index: linux.trees.git/arch/powerpc/Kconfig
===================================================================
--- linux.trees.git.orig/arch/powerpc/Kconfig
+++ linux.trees.git/arch/powerpc/Kconfig
@@ -91,6 +91,9 @@ config ARCH_HAS_ILOG2_U64
config ARCH_HAS_CPU_IDLE_WAIT
def_bool y

+config ARCH_HAS_DEFAULT_IDLE
+ def_bool y
+
config GENERIC_HWEIGHT
bool
default y
Index: linux.trees.git/arch/powerpc/include/asm/system.h
===================================================================
--- linux.trees.git.orig/arch/powerpc/include/asm/system.h
+++ linux.trees.git/arch/powerpc/include/asm/system.h
@@ -218,6 +218,7 @@ extern unsigned long klimit;
extern void *alloc_maybe_bootmem(size_t size, gfp_t mask);
extern void *zalloc_maybe_bootmem(size_t size, gfp_t mask);

+extern void default_idle(void);
extern int powersave_nap; /* set if nap mode can be used in idle loop */

/*
Index: linux.trees.git/arch/powerpc/kernel/idle.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/kernel/idle.c
+++ linux.trees.git/arch/powerpc/kernel/idle.c
@@ -113,6 +113,12 @@ void cpu_idle_wait(void)
}
EXPORT_SYMBOL_GPL(cpu_idle_wait);

+void default_idle(void)
+{
+ HMT_low();
+ HMT_very_low();
+}
+
int powersave_nap;

#ifdef CONFIG_SYSCTL

2009-09-22 11:33:47

by Arun R Bharadwaj

[permalink] [raw]
Subject: [v6 PATCH 7/7]: pSeries: implement pSeries processor idle module.

* Arun R Bharadwaj <[email protected]> [2009-09-22 16:55:27]:

This patch creates arch/powerpc/platforms/pseries/processor_idle.c,
which implements the cpuidle infrastructure for pseries.
It implements a pseries_cpuidle_loop() which would be the main idle loop
called from cpu_idle(). It makes decision of entering either cede1 or cede2
for dedicated lpar and shared_cede for shared lpar processor based on the
decision taken by the cpuidle governor.

Signed-off-by: Arun R Bharadwaj <[email protected]>
---
arch/powerpc/platforms/pseries/Makefile | 1
arch/powerpc/platforms/pseries/processor_idle.c | 191 ++++++++++++++++++++++++
arch/powerpc/platforms/pseries/pseries.h | 9 +
arch/powerpc/platforms/pseries/setup.c | 8 -
4 files changed, 207 insertions(+), 2 deletions(-)

Index: linux.trees.git/arch/powerpc/platforms/pseries/Makefile
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/pseries/Makefile
+++ linux.trees.git/arch/powerpc/platforms/pseries/Makefile
@@ -26,3 +26,4 @@ obj-$(CONFIG_HCALL_STATS) += hvCall_inst
obj-$(CONFIG_PHYP_DUMP) += phyp_dump.o
obj-$(CONFIG_CMM) += cmm.o
obj-$(CONFIG_DTL) += dtl.o
+obj-$(CONFIG_PSERIES_PROCESSOR_IDLE) += processor_idle.o
Index: linux.trees.git/arch/powerpc/platforms/pseries/pseries.h
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/pseries/pseries.h
+++ linux.trees.git/arch/powerpc/platforms/pseries/pseries.h
@@ -10,6 +10,8 @@
#ifndef _PSERIES_PSERIES_H
#define _PSERIES_PSERIES_H

+#include <linux/cpuidle.h>
+
extern void __init fw_feature_init(const char *hypertas, unsigned long len);

struct pt_regs;
@@ -40,4 +42,11 @@ extern unsigned long rtas_poweron_auto;

extern void find_udbg_vterm(void);

+DECLARE_PER_CPU(unsigned long, smt_snooze_delay);
+
+#ifdef CONFIG_PSERIES_PROCESSOR_IDLE
+int pseries_processor_idle_init(void);
+extern struct cpuidle_driver pseries_idle_driver;
+#endif
+
#endif /* _PSERIES_PSERIES_H */
Index: linux.trees.git/arch/powerpc/platforms/pseries/processor_idle.c
===================================================================
--- /dev/null
+++ linux.trees.git/arch/powerpc/platforms/pseries/processor_idle.c
@@ -0,0 +1,191 @@
+/*
+ * processor_idle - idle state cpuidle driver.
+ * Adapted from drivers/acpi/processor_idle.c
+ *
+ * Arun R Bharadwaj <[email protected]>
+ *
+ * Copyright (C) 2009 IBM Corporation.
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or (at
+ * your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/moduleparam.h>
+#include <linux/cpuidle.h>
+
+#include <asm/paca.h>
+#include <asm/reg.h>
+#include <asm/system.h>
+#include <asm/machdep.h>
+#include <asm/firmware.h>
+
+#include "plpar_wrappers.h"
+#include "pseries.h"
+
+MODULE_AUTHOR("Arun R Bharadwaj");
+MODULE_DESCRIPTION("pSeries Idle State Driver");
+MODULE_LICENSE("GPL");
+
+struct cpuidle_driver pseries_idle_driver = {
+ .name = "pseries_idle",
+ .owner = THIS_MODULE,
+};
+
+DEFINE_PER_CPU(struct cpuidle_device, pseries_dev);
+
+#define IDLE_STATE_COUNT 2
+
+static int pseries_idle_init(struct cpuidle_device *dev)
+{
+ return cpuidle_register_device(dev);
+}
+
+static void shared_cede(void)
+{
+ get_lppaca()->idle = 1;
+ cede_processor();
+ get_lppaca()->idle = 0;
+}
+
+static void cede1(void)
+{
+ local_irq_enable();
+ set_thread_flag(TIF_POLLING_NRFLAG);
+ while (!need_resched()) {
+ ppc64_runlatch_off();
+ HMT_low();
+ HMT_very_low();
+ }
+ HMT_medium();
+ clear_thread_flag(TIF_POLLING_NRFLAG);
+ smp_mb();
+ local_irq_disable();
+}
+
+static void cede2(void)
+{
+ ppc64_runlatch_off();
+ HMT_medium();
+ cede_processor();
+}
+
+static int pseries_cpuidle_loop(struct cpuidle_device *dev,
+ struct cpuidle_state *st)
+{
+ ktime_t t1, t2;
+ s64 diff;
+ int ret;
+ unsigned long in_purr, out_purr;
+
+ get_lppaca()->idle = 1;
+ get_lppaca()->donate_dedicated_cpu = 1;
+ in_purr = mfspr(SPRN_PURR);
+
+ t1 = ktime_get();
+
+ if (strcmp(st->desc, "shared_cede") == 0)
+ shared_cede();
+ else if (strcmp(st->desc, "cede1") == 0)
+ cede1();
+ else
+ cede2();
+
+ t2 = ktime_get();
+ diff = ktime_to_us(ktime_sub(t2, t1));
+ if (diff > INT_MAX)
+ diff = INT_MAX;
+
+ ret = (int) diff;
+
+ out_purr = mfspr(SPRN_PURR);
+ get_lppaca()->wait_state_cycles += out_purr - in_purr;
+ get_lppaca()->donate_dedicated_cpu = 0;
+ get_lppaca()->idle = 0;
+
+ return ret;
+}
+
+static int pseries_setup_cpuidle(struct cpuidle_device *dev, int cpu)
+{
+ int i;
+ struct cpuidle_state *state;
+
+ dev->cpu = cpu;
+
+ if (get_lppaca()->shared_proc) {
+ state = &dev->states[0];
+ snprintf(state->name, CPUIDLE_NAME_LEN, "IDLE");
+ state->enter = pseries_cpuidle_loop;
+ strncpy(state->desc, "shared_cede", CPUIDLE_DESC_LEN);
+ state->exit_latency = 0;
+ state->target_residency = 0;
+ return 0;
+ }
+
+ for (i = 0; i < IDLE_STATE_COUNT; i++) {
+ state = &dev->states[i];
+
+ snprintf(state->name, CPUIDLE_NAME_LEN, "IDLE%d", i);
+ state->enter = pseries_cpuidle_loop;
+
+ switch (i) {
+ case 0:
+ strncpy(state->desc, "cede1", CPUIDLE_DESC_LEN);
+ state->exit_latency = 0;
+ state->target_residency = 0;
+ break;
+
+ case 1:
+ strncpy(state->desc, "cede2", CPUIDLE_DESC_LEN);
+ state->exit_latency = 1;
+ state->target_residency =
+ __get_cpu_var(smt_snooze_delay);
+ break;
+ }
+ }
+ dev->state_count = IDLE_STATE_COUNT;
+
+ return 0;
+}
+
+int __init pseries_processor_idle_init(void)
+{
+ int cpu;
+ int result = cpuidle_register_driver(&pseries_idle_driver);
+
+ if (result < 0)
+ return result;
+
+ printk(KERN_DEBUG "pSeries idle driver registered\n");
+
+ if (!firmware_has_feature(FW_FEATURE_SPLPAR)) {
+ printk(KERN_DEBUG "Using default idle\n");
+ return 0;
+ }
+
+ for_each_online_cpu(cpu) {
+ pseries_setup_cpuidle(&per_cpu(pseries_dev, cpu), cpu);
+ pseries_idle_init(&per_cpu(pseries_dev, cpu));
+ }
+
+ printk(KERN_DEBUG "Using cpuidle idle loop\n");
+
+ return 0;
+}
Index: linux.trees.git/arch/powerpc/platforms/pseries/setup.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/pseries/setup.c
+++ linux.trees.git/arch/powerpc/platforms/pseries/setup.c
@@ -297,9 +297,13 @@ static void __init pSeries_setup_arch(vo

pSeries_nvram_init();

- /* Choose an idle loop */
- if (firmware_has_feature(FW_FEATURE_SPLPAR))
+ /* Register an idle loop with cpuidle */
+ if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
vpa_init(boot_cpuid);
+#ifdef CONFIG_PSERIES_PROCESSOR_IDLE
+ pseries_processor_idle_init();
+#endif
+ }

if (firmware_has_feature(FW_FEATURE_LPAR))
ppc_md.enable_pmcs = pseries_lpar_enable_pmcs;

2009-09-24 05:13:06

by Arun R Bharadwaj

[permalink] [raw]
Subject: Re: [v6 PATCH 0/7]: cpuidle/x86/POWER: Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER.

* Arun R Bharadwaj <[email protected]> [2009-09-22 16:55:27]:

Hi Len, (or other acpi folks),

I had a question regarding ACPI-cpuidle interaction in the current
implementation.

Currently, every cpu (i.e. acpi_processor) registers to cpuidle as
a cpuidle_device. So every cpu has to go through the process of
setting up the idle states and then registering as a cpuidle device.

What exactly is the reason behind this?

Is this really necessary or can we have a system-wide one-time registering
to cpuidle by ACPI?

I'm currently in the process of enabling cpuidle for POWER systems and
find that having a system-wide registering mechanism to be a cleaner
design.

--arun

2009-09-24 12:25:50

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [v6 PATCH 0/7]: cpuidle/x86/POWER: Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER.

On Thu, 24 Sep 2009 10:42:41 +0530
Arun R Bharadwaj <[email protected]> wrote:

> * Arun R Bharadwaj <[email protected]> [2009-09-22 16:55:27]:
>
> Hi Len, (or other acpi folks),
>
> I had a question regarding ACPI-cpuidle interaction in the current
> implementation.
>
> Currently, every cpu (i.e. acpi_processor) registers to cpuidle as
> a cpuidle_device. So every cpu has to go through the process of
> setting up the idle states and then registering as a cpuidle device.
>
> What exactly is the reason behind this?
>

technically a BIOS can opt to give you C states via ACPI on some cpus,
but not on others.

in practice when this happens it tends to be a bug.. but it's
technically a valid configuration



--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-09-25 07:06:48

by Vaidyanathan Srinivasan

[permalink] [raw]
Subject: Re: [v6 PATCH 0/7]: cpuidle/x86/POWER: Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER.

* Arjan van de Ven <[email protected]> [2009-09-24 14:22:28]:

> On Thu, 24 Sep 2009 10:42:41 +0530
> Arun R Bharadwaj <[email protected]> wrote:
>
> > * Arun R Bharadwaj <[email protected]> [2009-09-22 16:55:27]:
> >
> > Hi Len, (or other acpi folks),
> >
> > I had a question regarding ACPI-cpuidle interaction in the current
> > implementation.
> >
> > Currently, every cpu (i.e. acpi_processor) registers to cpuidle as
> > a cpuidle_device. So every cpu has to go through the process of
> > setting up the idle states and then registering as a cpuidle device.
> >
> > What exactly is the reason behind this?
> >
>
> technically a BIOS can opt to give you C states via ACPI on some cpus,
> but not on others.
>
> in practice when this happens it tends to be a bug.. but it's
> technically a valid configuration

So we will need to keep the per-cpu registration as of now because we
may have such buggy BIOS in the field and we don't want the cpuidle
framework to malfunction there.

--Vaidy

2009-09-25 07:20:43

by Balbir Singh

[permalink] [raw]
Subject: Re: [v6 PATCH 0/7]: cpuidle/x86/POWER: Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER.

On Thu, Sep 24, 2009 at 5:52 PM, Arjan van de Ven <[email protected]> wrote:
> On Thu, 24 Sep 2009 10:42:41 +0530
> Arun R Bharadwaj <[email protected]> wrote:
>
>> * Arun R Bharadwaj <[email protected]> [2009-09-22 16:55:27]:
>>
>> Hi Len, (or other acpi folks),
>>
>> I had a question regarding ACPI-cpuidle interaction in the current
>> implementation.
>>
>> Currently, every cpu (i.e. acpi_processor) registers to cpuidle as
>> a cpuidle_device. So every cpu has to go through the process of
>> setting up the idle states and then registering as a cpuidle device.
>>
>> What exactly is the reason behind this?
>>
>
> technically a BIOS can opt to give you C states via ACPI on some cpus,
> but not on others.
>
> in practice when this happens it tends to be a bug.. but it's
> technically a valid configuration

In this day and age of flashable BIOS with recovery BIOS built in,
can't we just print out a big far warning, asking users of such
systems to go back to their vendors and ask for updates or find the
updates and apply them? Does the OS have to do the heavy lifting and
allow users to live with buggy BIOS's.

When you say it is a technically valid configuration, you mean that
the ACPI spec allows for such inconsistency?

Balbir Singh

2009-09-25 08:53:59

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [v6 PATCH 0/7]: cpuidle/x86/POWER: Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER.

On Fri, 2009-09-25 at 12:36 +0530, Vaidyanathan Srinivasan wrote:
> * Arjan van de Ven <[email protected]> [2009-09-24 14:22:28]:
>
> > On Thu, 24 Sep 2009 10:42:41 +0530
> > Arun R Bharadwaj <[email protected]> wrote:
> >
> > > * Arun R Bharadwaj <[email protected]> [2009-09-22 16:55:27]:
> > >
> > > Hi Len, (or other acpi folks),
> > >
> > > I had a question regarding ACPI-cpuidle interaction in the current
> > > implementation.
> > >
> > > Currently, every cpu (i.e. acpi_processor) registers to cpuidle as
> > > a cpuidle_device. So every cpu has to go through the process of
> > > setting up the idle states and then registering as a cpuidle device.
> > >
> > > What exactly is the reason behind this?
> > >
> >
> > technically a BIOS can opt to give you C states via ACPI on some cpus,
> > but not on others.
> >
> > in practice when this happens it tends to be a bug.. but it's
> > technically a valid configuration
>
> So we will need to keep the per-cpu registration as of now because we
> may have such buggy BIOS in the field and we don't want the cpuidle
> framework to malfunction there.

If the BIOS doesn't mention a certain C state on a cpu, and you try to
set it anyway, does that go boom?

This whole per-cpu registration thing is horridly ugly, can't you have a
per-cpu C state exception mask and leave it at that -- if its really
needed?


2009-09-25 09:36:21

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [v6 PATCH 0/7]: cpuidle/x86/POWER: Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER.

On Fri, 25 Sep 2009 10:54:24 +0200
Peter Zijlstra <[email protected]> wrote:

> On Fri, 2009-09-25 at 12:36 +0530, Vaidyanathan Srinivasan wrote:
> > * Arjan van de Ven <[email protected]> [2009-09-24 14:22:28]:
> >
> > > On Thu, 24 Sep 2009 10:42:41 +0530
> > > Arun R Bharadwaj <[email protected]> wrote:
> > >
> > > > * Arun R Bharadwaj <[email protected]> [2009-09-22
> > > > 16:55:27]:
> > > >
> > > > Hi Len, (or other acpi folks),
> > > >
> > > > I had a question regarding ACPI-cpuidle interaction in the
> > > > current implementation.
> > > >
> > > > Currently, every cpu (i.e. acpi_processor) registers to cpuidle
> > > > as a cpuidle_device. So every cpu has to go through the process
> > > > of setting up the idle states and then registering as a cpuidle
> > > > device.
> > > >
> > > > What exactly is the reason behind this?
> > > >
> > >
> > > technically a BIOS can opt to give you C states via ACPI on some
> > > cpus, but not on others.
> > >
> > > in practice when this happens it tends to be a bug.. but it's
> > > technically a valid configuration
> >
> > So we will need to keep the per-cpu registration as of now because
> > we may have such buggy BIOS in the field and we don't want the
> > cpuidle framework to malfunction there.
>
> If the BIOS doesn't mention a certain C state on a cpu, and you try to
> set it anyway, does that go boom?
>
> This whole per-cpu registration thing is horridly ugly, can't you
> have a per-cpu C state exception mask and leave it at that -- if its
> really needed?

the real solution is to make the acpi code always know about C1, even
if the bios doesn't.... That's one for Len :)

(C1 is just "hlt", what we do in the other idle loop ;-)


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-09-25 17:09:05

by Arun R Bharadwaj

[permalink] [raw]
Subject: Re: [v6 PATCH 0/7]: cpuidle/x86/POWER: Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER.

* Arun R Bharadwaj <[email protected]> [2009-09-22 16:55:27]:
Hi,

I have done the following experiments and have posted the results
below.


Average of 5 iterations:
------------------------------------------------------------------------------
------------------------------------------------------------------------------

Kernbench make -j16 results on a
16 core x86 machine _with_deep_sleep_ support (C1,C2,C3)


Without the patches applied With the patches applied


31.8s 30.4s

------------------------------------------------------------------------------
------------------------------------------------------------------------------


Kernbench make -j8 results on a
8 core x86 machine _without_deep_sleep_ support (only mwait)


Without the patches applied With the patches applied


20.2s 20.4s

------------------------------------------------------------------------------
------------------------------------------------------------------------------

Kernbench make -j8 results on a 8 core _dedicated_lpar_pSeries_ machine


Without the patches applied With the patches applied


4m, 37s 4m, 36s

------------------------------------------------------------------------------
------------------------------------------------------------------------------

Please let me know if any other kind of testing is necessary.


Based on the feedback, I will post out the next iteration with a few
minor bug fixes.


thanks,
arun