Hi,
******** This is an RFC, not for inclusion **********
This patchset introduces cpuidle infrastructure to POWER, prototyping
for pseries and currently in the process of porting to x86 and hence
will *not* build on x86/other POWER platforms.
This is to get initial comments on the redesign of my earlier implementation
which can be found at http://lkml.org/lkml/2009/8/27/124
Major changes from last iteration:
----------------------------------
* Cleanup drivers/cpuidle/cpuidle.c
Currently, the cpuidle implementation has weakness in the
framework where an exported pm_idle function pointer is
manipulated by various subsystem. The proposed framework has
a registration architecture to cleanly add and remove new idle
routines from different subsystems.
* Introduce [un]register_idle_function() routines
Implement a LIFO based approach for registering architecture
dependent idle routines.
* Sample implementation of register_idle_function for pSeries
TODO:
-----
* Extend this prototype to cover x86 and other archs that use cpuidle.
Currently, in x86, the cpu_idle() idle loop doesn't have a
default idle loop to fall back to if pm_idle is NULL, unlike
the corresponding implementation in pseries, where
ppc_md.power_save can be NULL and there is a fallback.
So we need to create a similar fork in cpu_idle() idle loop of
x86.
Patches included in this series:
--------------------------------
1/5 - Cleanup drivers/cpuidle/cpuidle.c
2/5 - Implement routines to register and unregister idle function.
3/5 - Incorporate registering of idle loop for pSeries.
4/5 - Add Kconfig entry to enable cpuidle for POWER.
5/5 - Implement pSeries processor idle module.
Any comments on the design is welcome.
--arun
* Arun R Bharadwaj <[email protected]> [2009-09-01 17:07:04]:
Cleanup drivers/cpuidle/cpuidle.c
Cpuidle maintains a pm_idle_old void pointer because, currently in x86
there is no clean way of registering and unregistering a idle function.
So remove pm_idle_old and leave the responsibility of maintaining the
list of registered idle loops to the architecture specific code. If the
architecture registers cpuidle_idle_call as its idle loop, only then
this loop is called.
Also remove unwanted functions cpuidle_[un]install_idle_handler,
cpuidle_kick_cpus()
Signed-off-by: Arun R Bharadwaj <[email protected]>
---
drivers/cpuidle/cpuidle.c | 51 +++++++++++++++------------------------------
drivers/cpuidle/governor.c | 3 --
2 files changed, 17 insertions(+), 37 deletions(-)
Index: linux.trees.git/drivers/cpuidle/cpuidle.c
===================================================================
--- linux.trees.git.orig/drivers/cpuidle/cpuidle.c
+++ linux.trees.git/drivers/cpuidle/cpuidle.c
@@ -24,9 +24,14 @@ DEFINE_PER_CPU(struct cpuidle_device *,
DEFINE_MUTEX(cpuidle_lock);
LIST_HEAD(cpuidle_detected_devices);
-static void (*pm_idle_old)(void);
static int enabled_devices;
+static int idle_function_registered;
+
+struct idle_function_desc cpuidle_idle_desc = {
+ .name = "cpuidle_loop",
+ .idle_func = cpuidle_idle_call,
+};
#if defined(CONFIG_ARCH_HAS_CPU_IDLE_WAIT)
static void cpuidle_kick_cpus(void)
@@ -54,13 +59,10 @@ static void cpuidle_idle_call(void)
/* check if the device is ready */
if (!dev || !dev->enabled) {
- if (pm_idle_old)
- pm_idle_old();
- else
#if defined(CONFIG_ARCH_HAS_DEFAULT_IDLE)
- default_idle();
+ default_idle();
#else
- local_irq_enable();
+ local_irq_enable();
#endif
return;
}
@@ -94,35 +96,11 @@ static void cpuidle_idle_call(void)
}
/**
- * cpuidle_install_idle_handler - installs the cpuidle idle loop handler
- */
-void cpuidle_install_idle_handler(void)
-{
- if (enabled_devices && (pm_idle != cpuidle_idle_call)) {
- /* Make sure all changes finished before we switch to new idle */
- smp_wmb();
- pm_idle = cpuidle_idle_call;
- }
-}
-
-/**
- * cpuidle_uninstall_idle_handler - uninstalls the cpuidle idle loop handler
- */
-void cpuidle_uninstall_idle_handler(void)
-{
- if (enabled_devices && pm_idle_old && (pm_idle != pm_idle_old)) {
- pm_idle = pm_idle_old;
- cpuidle_kick_cpus();
- }
-}
-
-/**
* cpuidle_pause_and_lock - temporarily disables CPUIDLE
*/
void cpuidle_pause_and_lock(void)
{
mutex_lock(&cpuidle_lock);
- cpuidle_uninstall_idle_handler();
}
EXPORT_SYMBOL_GPL(cpuidle_pause_and_lock);
@@ -132,7 +110,6 @@ EXPORT_SYMBOL_GPL(cpuidle_pause_and_lock
*/
void cpuidle_resume_and_unlock(void)
{
- cpuidle_install_idle_handler();
mutex_unlock(&cpuidle_lock);
}
@@ -287,6 +264,12 @@ static int __cpuidle_register_device(str
return 0;
}
+static void register_cpuidle_idle_function(void)
+{
+ register_idle_function(&cpuidle_idle_desc);
+
+ idle_function_registered = 1;
+}
/**
* cpuidle_register_device - registers a CPU's idle PM feature
* @dev: the cpu
@@ -303,7 +286,9 @@ int cpuidle_register_device(struct cpuid
}
cpuidle_enable_device(dev);
- cpuidle_install_idle_handler();
+
+ if (!idle_function_registered)
+ register_cpuidle_idle_function();
mutex_unlock(&cpuidle_lock);
@@ -382,8 +367,6 @@ static int __init cpuidle_init(void)
{
int ret;
- pm_idle_old = pm_idle;
-
ret = cpuidle_add_class_sysfs(&cpu_sysdev_class);
if (ret)
return ret;
Index: linux.trees.git/drivers/cpuidle/governor.c
===================================================================
--- linux.trees.git.orig/drivers/cpuidle/governor.c
+++ linux.trees.git/drivers/cpuidle/governor.c
@@ -48,8 +48,6 @@ int cpuidle_switch_governor(struct cpuid
if (gov == cpuidle_curr_governor)
return 0;
- cpuidle_uninstall_idle_handler();
-
if (cpuidle_curr_governor) {
list_for_each_entry(dev, &cpuidle_detected_devices, device_list)
cpuidle_disable_device(dev);
@@ -63,7 +61,6 @@ int cpuidle_switch_governor(struct cpuid
return -EINVAL;
list_for_each_entry(dev, &cpuidle_detected_devices, device_list)
cpuidle_enable_device(dev);
- cpuidle_install_idle_handler();
printk(KERN_INFO "cpuidle: using governor %s\n", gov->name);
}
* Arun R Bharadwaj <[email protected]> [2009-09-01 17:07:04]:
Implement a LIFO based approach for registering arch dependent
idle routines.
This is a prototype for pseries, needs to be extended
for other platforms.
Signed-off-by: Arun R Bharadwaj <[email protected]>
---
arch/powerpc/kernel/idle.c | 5 +++++
drivers/cpuidle/cpuidle.c | 37 +++++++++++++++++++++++++++++++++++++
include/linux/pm.h | 10 ++++++++++
3 files changed, 52 insertions(+)
Index: linux.trees.git/arch/powerpc/kernel/idle.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/kernel/idle.c
+++ linux.trees.git/arch/powerpc/kernel/idle.c
@@ -46,6 +46,11 @@ static int __init powersave_off(char *ar
}
__setup("powersave=off", powersave_off);
+void set_arch_idle(void (*idle)(void))
+{
+ ppc_md.power_save = idle;
+}
+
/*
* The body of the idle task.
*/
Index: linux.trees.git/include/linux/pm.h
===================================================================
--- linux.trees.git.orig/include/linux/pm.h
+++ linux.trees.git/include/linux/pm.h
@@ -30,6 +30,16 @@ extern void (*pm_idle)(void);
extern void (*pm_power_off)(void);
extern void (*pm_power_off_prepare)(void);
+struct idle_function_desc {
+ char *name;
+ void (*idle_func)(void);
+ struct list_head idle_list;
+};
+
+extern void set_arch_idle(void (*idle)(void));
+extern void register_idle_function(struct idle_function_desc *desc);
+extern void unregister_idle_function(struct idle_function_desc *desc);
+
/*
* Device power management
*/
Index: linux.trees.git/drivers/cpuidle/cpuidle.c
===================================================================
--- linux.trees.git.orig/drivers/cpuidle/cpuidle.c
+++ linux.trees.git/drivers/cpuidle/cpuidle.c
@@ -44,6 +44,43 @@ static void cpuidle_kick_cpus(void)
static void cpuidle_kick_cpus(void) {}
#endif
+LIST_HEAD(idle_function_list);
+static DEFINE_MUTEX(idle_list_mutex);
+
+void register_idle_function(struct idle_function_desc *desc)
+{
+ mutex_lock(&idle_list_mutex);
+
+ list_add(&desc->idle_list, &idle_function_list);
+ set_arch_idle(desc->idle_func);
+ cpuidle_kick_cpus();
+
+ mutex_unlock(&idle_list_mutex);
+}
+
+void unregister_idle_function(struct idle_function_desc *desc)
+{
+ struct list_head *pos;
+ struct idle_function_desc *temp_desc;
+
+ mutex_lock(&idle_list_mutex);
+ WARN_ON_ONCE(list_empty(&desc->idle_list) || desc != NULL);
+
+ list_for_each(pos, &idle_function_list) {
+ temp_desc = container_of(pos, struct idle_function_desc,
+ idle_list);
+ if (temp_desc == desc) {
+ list_del(&temp_desc->idle_list);
+ /* Re-using temp_desc here */
+ temp_desc = list_first_entry(&idle_function_list,
+ struct idle_function_desc, idle_list);
+ set_arch_idle(temp_desc->idle_func);
+ cpuidle_kick_cpus();
+ }
+ }
+ mutex_unlock(&idle_list_mutex);
+}
+
static int __cpuidle_register_device(struct cpuidle_device *dev);
/**
* Arun R Bharadwaj <[email protected]> [2009-09-01 17:07:04]:
Platform needs to register its idle function via register_idle_function()
in order to provide a clean way of handling the ppc_md.power_save
Signed-off-by: Arun R Bharadwaj <[email protected]>
---
arch/powerpc/platforms/pseries/setup.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
Index: linux.trees.git/arch/powerpc/platforms/pseries/setup.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/pseries/setup.c
+++ linux.trees.git/arch/powerpc/platforms/pseries/setup.c
@@ -280,6 +280,8 @@ static struct notifier_block pci_dn_reco
static void __init pSeries_setup_arch(void)
{
+ struct idle_function_desc pseries_idle_desc;
+
/* Discover PIC type and setup ppc_md accordingly */
pseries_discover_pic();
@@ -305,10 +307,17 @@ static void __init pSeries_setup_arch(vo
vpa_init(boot_cpuid);
if (get_lppaca()->shared_proc) {
printk(KERN_DEBUG "Using shared processor idle loop\n");
- ppc_md.power_save = pseries_shared_idle_sleep;
+ //snprintf(pseries_idle_desc.name, 16, "shared_loop");
+ pseries_idle_desc.name = "shared_loop";
+ pseries_idle_desc.idle_func = pseries_shared_idle_sleep;
+ register_idle_function(&pseries_idle_desc);
} else {
printk(KERN_DEBUG "Using dedicated idle loop\n");
- ppc_md.power_save = pseries_dedicated_idle_sleep;
+ //snprintf(pseries_idle_desc.name, 16, "dedicated_loop");
+ pseries_idle_desc.name = "dedicated_loop";
+ pseries_idle_desc.idle_func =
+ pseries_dedicated_idle_sleep;
+ register_idle_function(&pseries_idle_desc);
}
} else {
printk(KERN_DEBUG "Using default idle loop\n");
* Arun R Bharadwaj <[email protected]> [2009-09-01 17:07:04]:
This patch enables the cpuidle option in Kconfig for pSeries.
Currently cpuidle infrastructure is enabled only for x86 and ARM.
This code is almost completely borrowed from x86 to enable
cpuidle for pSeries.
Signed-off-by: Arun R Bharadwaj <[email protected]>
---
arch/powerpc/Kconfig | 17 +++++++++++++++++
arch/powerpc/include/asm/system.h | 2 ++
arch/powerpc/kernel/idle.c | 19 +++++++++++++++++++
3 files changed, 38 insertions(+)
Index: linux.trees.git/arch/powerpc/Kconfig
===================================================================
--- linux.trees.git.orig/arch/powerpc/Kconfig
+++ linux.trees.git/arch/powerpc/Kconfig
@@ -88,6 +88,9 @@ config ARCH_HAS_ILOG2_U64
bool
default y if 64BIT
+config ARCH_HAS_CPU_IDLE_WAIT
+ def_bool y
+
config GENERIC_HWEIGHT
bool
default y
@@ -243,6 +246,20 @@ source "kernel/Kconfig.freezer"
source "arch/powerpc/sysdev/Kconfig"
source "arch/powerpc/platforms/Kconfig"
+menu "Power management options"
+
+source "drivers/cpuidle/Kconfig"
+
+config PSERIES_PROCESSOR_IDLE
+ bool "Idle Power Management Support for pSeries"
+ depends on PPC_PSERIES && CPU_IDLE
+ default y
+ help
+ Idle Power Management Support for pSeries. This hooks onto cpuidle
+ infrastructure to help in idle cpu power management.
+
+endmenu
+
menu "Kernel options"
config HIGHMEM
Index: linux.trees.git/arch/powerpc/include/asm/system.h
===================================================================
--- linux.trees.git.orig/arch/powerpc/include/asm/system.h
+++ linux.trees.git/arch/powerpc/include/asm/system.h
@@ -546,5 +546,7 @@ extern void account_system_vtime(struct
extern struct dentry *powerpc_debugfs_root;
+void cpu_idle_wait(void);
+
#endif /* __KERNEL__ */
#endif /* _ASM_POWERPC_SYSTEM_H */
Index: linux.trees.git/arch/powerpc/kernel/idle.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/kernel/idle.c
+++ linux.trees.git/arch/powerpc/kernel/idle.c
@@ -107,6 +107,25 @@ void cpu_idle(void)
}
}
+static void do_nothing(void *unused)
+{
+}
+
+/*
+ * cpu_idle_wait - Used to ensure that all the CPUs discard old value of
+ * ppc_md.power_save and update to new value.
+ * Required while changing ppc_md.power_save handler on SMP systems.
+ * Caller must have changed ppc_md.power_save to the new value before the call.
+ */
+void cpu_idle_wait(void)
+{
+ /* Ensure that new value of ppc_md.power_save is set */
+ smp_mb();
+ /* kick all the CPUs so that they exit out of ppc_md.power_save */
+ smp_call_function(do_nothing, NULL, 1);
+}
+EXPORT_SYMBOL_GPL(cpu_idle_wait);
+
int powersave_nap;
#ifdef CONFIG_SYSCTL
* Arun R Bharadwaj <[email protected]> [2009-09-01 17:07:04]:
This patch creates arch/powerpc/platforms/pseries/processor_idle.c,
which implements the cpuidle infrastructure for pseries.
It implements a pseries_cpuidle_loop() which would be the main idle loop
called from cpu_idle(). It makes decision of entering either snooze or nap
state based on the decision taken by the cpuidle governor.
Signed-off-by: Arun R Bharadwaj <[email protected]>
---
arch/powerpc/platforms/pseries/Makefile | 1
arch/powerpc/platforms/pseries/processor_idle.c | 179 ++++++++++++++++++++++++
arch/powerpc/platforms/pseries/pseries.h | 14 +
arch/powerpc/platforms/pseries/setup.c | 3
4 files changed, 194 insertions(+), 3 deletions(-)
Index: linux.trees.git/arch/powerpc/platforms/pseries/Makefile
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/pseries/Makefile
+++ linux.trees.git/arch/powerpc/platforms/pseries/Makefile
@@ -26,3 +26,4 @@ obj-$(CONFIG_HCALL_STATS) += hvCall_inst
obj-$(CONFIG_PHYP_DUMP) += phyp_dump.o
obj-$(CONFIG_CMM) += cmm.o
obj-$(CONFIG_DTL) += dtl.o
+obj-$(CONFIG_PSERIES_PROCESSOR_IDLE) += processor_idle.o
Index: linux.trees.git/arch/powerpc/platforms/pseries/pseries.h
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/pseries/pseries.h
+++ linux.trees.git/arch/powerpc/platforms/pseries/pseries.h
@@ -10,6 +10,8 @@
#ifndef _PSERIES_PSERIES_H
#define _PSERIES_PSERIES_H
+#include <linux/cpuidle.h>
+
extern void __init fw_feature_init(const char *hypertas, unsigned long len);
struct pt_regs;
@@ -40,4 +42,16 @@ extern unsigned long rtas_poweron_auto;
extern void find_udbg_vterm(void);
+DECLARE_PER_CPU(unsigned long, smt_snooze_delay);
+
+#ifdef CONFIG_PSERIES_PROCESSOR_IDLE
+struct pseries_processor_power {
+ struct cpuidle_device dev;
+ int count;
+ int id;
+};
+
+extern struct cpuidle_driver pseries_idle_driver;
+#endif
+
#endif /* _PSERIES_PSERIES_H */
Index: linux.trees.git/arch/powerpc/platforms/pseries/processor_idle.c
===================================================================
--- /dev/null
+++ linux.trees.git/arch/powerpc/platforms/pseries/processor_idle.c
@@ -0,0 +1,179 @@
+/*
+ * processor_idle - idle state cpuidle driver.
+ * Adapted from drivers/acpi/processor_idle.c
+ *
+ * Arun R Bharadwaj <[email protected]>
+ *
+ * Copyright (C) 2009 IBM Corporation.
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or (at
+ * your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/moduleparam.h>
+#include <linux/cpuidle.h>
+
+#include <asm/paca.h>
+#include <asm/reg.h>
+#include <asm/system.h>
+#include <asm/machdep.h>
+
+#include "plpar_wrappers.h"
+#include "pseries.h"
+
+MODULE_AUTHOR("Arun R Bharadwaj");
+MODULE_DESCRIPTION("pSeries Idle State Driver");
+MODULE_LICENSE("GPL");
+
+struct cpuidle_driver pseries_idle_driver = {
+ .name = "pseries_idle",
+ .owner = THIS_MODULE,
+};
+
+DEFINE_PER_CPU(struct pseries_processor_power, power);
+
+#define IDLE_STATE_COUNT 2
+
+static int pseries_idle_init(struct pseries_processor_power *power)
+{
+ return cpuidle_register_device(&power->dev);
+}
+
+static void cede1(void)
+{
+ local_irq_enable();
+ set_thread_flag(TIF_POLLING_NRFLAG);
+ while (!need_resched()) {
+ ppc64_runlatch_off();
+ HMT_low();
+ HMT_very_low();
+ }
+ HMT_medium();
+ clear_thread_flag(TIF_POLLING_NRFLAG);
+ smp_mb();
+ local_irq_disable();
+}
+
+static void cede2(void)
+{
+ ppc64_runlatch_off();
+ HMT_medium();
+ cede_processor();
+}
+
+static int pseries_cpuidle_loop(struct cpuidle_device *dev,
+ struct cpuidle_state *st)
+{
+ ktime_t t1, t2;
+ s64 diff;
+ int ret;
+ unsigned long in_purr, out_purr;
+
+ get_lppaca()->idle = 1;
+ get_lppaca()->donate_dedicated_cpu = 1;
+ in_purr = mfspr(SPRN_PURR);
+
+ t1 = ktime_get();
+
+ if (strcmp(st->desc, "cede1") == 0)
+ cede1();
+ else
+ cede2();
+
+ t2 = ktime_get();
+ diff = ktime_to_us(ktime_sub(t2, t1));
+ if (diff > INT_MAX)
+ diff = INT_MAX;
+
+ ret = (int) diff;
+
+ out_purr = mfspr(SPRN_PURR);
+ get_lppaca()->wait_state_cycles += out_purr - in_purr;
+ get_lppaca()->donate_dedicated_cpu = 0;
+ get_lppaca()->idle = 0;
+
+ return ret;
+}
+
+static int pseries_setup_cpuidle(struct pseries_processor_power *power)
+{
+ int i;
+ struct cpuidle_state *state;
+ struct cpuidle_device *dev = &power->dev;
+
+ dev->cpu = power->id;
+
+ dev->enabled = 0;
+ for (i = 0; i < IDLE_STATE_COUNT; i++) {
+ state = &dev->states[i];
+
+ snprintf(state->name, CPUIDLE_NAME_LEN, "IDLE%d", i);
+ state->enter = pseries_cpuidle_loop;
+
+ switch (i) {
+ case 0:
+ strncpy(state->desc, "cede1", CPUIDLE_DESC_LEN);
+ state->exit_latency = 0;
+ state->target_residency = 0;
+ break;
+
+ case 1:
+ strncpy(state->desc, "cede2", CPUIDLE_DESC_LEN);
+ state->exit_latency = 1;
+ state->target_residency =
+ __get_cpu_var(smt_snooze_delay);
+ break;
+ }
+ }
+
+ power->dev.state_count = i;
+ return 0;
+}
+
+static int pseries_get_power_info(struct pseries_processor_power *power,
+ int cpu)
+{
+ power->id = cpu;
+ power->count = IDLE_STATE_COUNT;
+ return 0;
+}
+
+static int __init pseries_processor_idle_init(void)
+{
+ int cpu;
+ int result = cpuidle_register_driver(&pseries_idle_driver);
+
+ if (result < 0)
+ return result;
+
+ printk(KERN_DEBUG "pSeries idle driver registered\n");
+
+ for_each_online_cpu(cpu) {
+ pseries_get_power_info(&per_cpu(power, cpu), cpu);
+ pseries_setup_cpuidle(&per_cpu(power, cpu));
+ pseries_idle_init(&per_cpu(power, cpu));
+ }
+
+ printk(KERN_DEBUG "Using cpuidle idle loop\n");
+
+ return 0;
+}
+
+late_initcall(pseries_processor_idle_init);
Index: linux.trees.git/arch/powerpc/platforms/pseries/setup.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/pseries/setup.c
+++ linux.trees.git/arch/powerpc/platforms/pseries/setup.c
@@ -509,9 +509,6 @@ static int __init pSeries_probe(void)
return 1;
}
-
-DECLARE_PER_CPU(unsigned long, smt_snooze_delay);
-
static void pseries_dedicated_idle_sleep(void)
{
unsigned int cpu = smp_processor_id();
* Arun R B <[email protected]> [2009-09-01 17:08:40]:
> * Arun R Bharadwaj <[email protected]> [2009-09-01 17:07:04]:
>
> Cleanup drivers/cpuidle/cpuidle.c
>
> Cpuidle maintains a pm_idle_old void pointer because, currently in x86
> there is no clean way of registering and unregistering a idle function.
>
> So remove pm_idle_old and leave the responsibility of maintaining the
> list of registered idle loops to the architecture specific code. If the
> architecture registers cpuidle_idle_call as its idle loop, only then
> this loop is called.
>
It sounds as if there is a side-effect of this
patch on x86 (am I reading it incorrectly), which can be fixed, but
it will need a patch or so to get back the old behaviour on x86.
> Also remove unwanted functions cpuidle_[un]install_idle_handler,
> cpuidle_kick_cpus()
>
> Signed-off-by: Arun R Bharadwaj <[email protected]>
> ---
> drivers/cpuidle/cpuidle.c | 51 +++++++++++++++------------------------------
> drivers/cpuidle/governor.c | 3 --
> 2 files changed, 17 insertions(+), 37 deletions(-)
>
> Index: linux.trees.git/drivers/cpuidle/cpuidle.c
> ===================================================================
> --- linux.trees.git.orig/drivers/cpuidle/cpuidle.c
> +++ linux.trees.git/drivers/cpuidle/cpuidle.c
> @@ -24,9 +24,14 @@ DEFINE_PER_CPU(struct cpuidle_device *,
>
> DEFINE_MUTEX(cpuidle_lock);
> LIST_HEAD(cpuidle_detected_devices);
> -static void (*pm_idle_old)(void);
>
> static int enabled_devices;
> +static int idle_function_registered;
> +
> +struct idle_function_desc cpuidle_idle_desc = {
> + .name = "cpuidle_loop",
> + .idle_func = cpuidle_idle_call,
> +};
>
> #if defined(CONFIG_ARCH_HAS_CPU_IDLE_WAIT)
> static void cpuidle_kick_cpus(void)
> @@ -54,13 +59,10 @@ static void cpuidle_idle_call(void)
>
> /* check if the device is ready */
> if (!dev || !dev->enabled) {
> - if (pm_idle_old)
> - pm_idle_old();
> - else
> #if defined(CONFIG_ARCH_HAS_DEFAULT_IDLE)
> - default_idle();
> + default_idle();
> #else
> - local_irq_enable();
> + local_irq_enable();
> #endif
> return;
> }
> @@ -94,35 +96,11 @@ static void cpuidle_idle_call(void)
> }
>
> /**
> - * cpuidle_install_idle_handler - installs the cpuidle idle loop handler
> - */
> -void cpuidle_install_idle_handler(void)
> -{
> - if (enabled_devices && (pm_idle != cpuidle_idle_call)) {
> - /* Make sure all changes finished before we switch to new idle */
> - smp_wmb();
> - pm_idle = cpuidle_idle_call;
> - }
> -}
> -
> -/**
> - * cpuidle_uninstall_idle_handler - uninstalls the cpuidle idle loop handler
> - */
> -void cpuidle_uninstall_idle_handler(void)
> -{
> - if (enabled_devices && pm_idle_old && (pm_idle != pm_idle_old)) {
> - pm_idle = pm_idle_old;
> - cpuidle_kick_cpus();
> - }
> -}
> -
> -/**
> * cpuidle_pause_and_lock - temporarily disables CPUIDLE
> */
> void cpuidle_pause_and_lock(void)
> {
> mutex_lock(&cpuidle_lock);
> - cpuidle_uninstall_idle_handler();
> }
>
> EXPORT_SYMBOL_GPL(cpuidle_pause_and_lock);
> @@ -132,7 +110,6 @@ EXPORT_SYMBOL_GPL(cpuidle_pause_and_lock
> */
> void cpuidle_resume_and_unlock(void)
> {
> - cpuidle_install_idle_handler();
> mutex_unlock(&cpuidle_lock);
> }
>
What does this mean for users of cpuidle_pause_and_lock/unlock?
Should we be calling register/unregister_idle_function here?
> @@ -287,6 +264,12 @@ static int __cpuidle_register_device(str
> return 0;
> }
>
> +static void register_cpuidle_idle_function(void)
> +{
> + register_idle_function(&cpuidle_idle_desc);
> +
> + idle_function_registered = 1;
Use booleans if possible, unless you intend to extend the meaning of
registered someday.
> +}
> /**
> * cpuidle_register_device - registers a CPU's idle PM feature
> * @dev: the cpu
> @@ -303,7 +286,9 @@ int cpuidle_register_device(struct cpuid
> }
>
> cpuidle_enable_device(dev);
> - cpuidle_install_idle_handler();
> +
> + if (!idle_function_registered)
> + register_cpuidle_idle_function();
>
> mutex_unlock(&cpuidle_lock);
>
> @@ -382,8 +367,6 @@ static int __init cpuidle_init(void)
> {
> int ret;
>
> - pm_idle_old = pm_idle;
> -
> ret = cpuidle_add_class_sysfs(&cpu_sysdev_class);
> if (ret)
> return ret;
> Index: linux.trees.git/drivers/cpuidle/governor.c
> ===================================================================
> --- linux.trees.git.orig/drivers/cpuidle/governor.c
> +++ linux.trees.git/drivers/cpuidle/governor.c
> @@ -48,8 +48,6 @@ int cpuidle_switch_governor(struct cpuid
> if (gov == cpuidle_curr_governor)
> return 0;
>
> - cpuidle_uninstall_idle_handler();
> -
> if (cpuidle_curr_governor) {
> list_for_each_entry(dev, &cpuidle_detected_devices, device_list)
> cpuidle_disable_device(dev);
> @@ -63,7 +61,6 @@ int cpuidle_switch_governor(struct cpuid
> return -EINVAL;
> list_for_each_entry(dev, &cpuidle_detected_devices, device_list)
> cpuidle_enable_device(dev);
> - cpuidle_install_idle_handler();
> printk(KERN_INFO "cpuidle: using governor %s\n", gov->name);
> }
>
--
Balbir
* Balbir Singh <[email protected]> [2009-09-01 22:58:25]:
> * Arun R B <[email protected]> [2009-09-01 17:08:40]:
>
> > * Arun R Bharadwaj <[email protected]> [2009-09-01 17:07:04]:
> >
> > Cleanup drivers/cpuidle/cpuidle.c
> >
> > Cpuidle maintains a pm_idle_old void pointer because, currently in x86
> > there is no clean way of registering and unregistering a idle function.
> >
> > So remove pm_idle_old and leave the responsibility of maintaining the
> > list of registered idle loops to the architecture specific code. If the
> > architecture registers cpuidle_idle_call as its idle loop, only then
> > this loop is called.
> >
>
> It sounds as if there is a side-effect of this
> patch on x86 (am I reading it incorrectly), which can be fixed, but
> it will need a patch or so to get back the old behaviour on x86.
>
Hi Balbir,
Yes, your understanding is correct. Currently, x86 exports pm_idle and
this pm_idle is set to cpuidle_idle_call inside cpuidle.c
So instead of that x86 should just export a function called
set_arch_idle() which will be called from within
register_idle_function() and set pm_idle to the idle handler which is
currently being registered.
I have implemented this for pseries, and in the process of doing it
for x86 too.
> > Also remove unwanted functions cpuidle_[un]install_idle_handler,
> > cpuidle_kick_cpus()
> >
> > Signed-off-by: Arun R Bharadwaj <[email protected]>
> > ---
> > drivers/cpuidle/cpuidle.c | 51 +++++++++++++++------------------------------
> > drivers/cpuidle/governor.c | 3 --
> > 2 files changed, 17 insertions(+), 37 deletions(-)
> >
> > Index: linux.trees.git/drivers/cpuidle/cpuidle.c
> > ===================================================================
> > --- linux.trees.git.orig/drivers/cpuidle/cpuidle.c
> > +++ linux.trees.git/drivers/cpuidle/cpuidle.c
> > @@ -24,9 +24,14 @@ DEFINE_PER_CPU(struct cpuidle_device *,
> >
> > DEFINE_MUTEX(cpuidle_lock);
> > LIST_HEAD(cpuidle_detected_devices);
> > -static void (*pm_idle_old)(void);
> >
> > static int enabled_devices;
> > +static int idle_function_registered;
> > +
> > +struct idle_function_desc cpuidle_idle_desc = {
> > + .name = "cpuidle_loop",
> > + .idle_func = cpuidle_idle_call,
> > +};
> >
> > #if defined(CONFIG_ARCH_HAS_CPU_IDLE_WAIT)
> > static void cpuidle_kick_cpus(void)
> > @@ -54,13 +59,10 @@ static void cpuidle_idle_call(void)
> >
> > /* check if the device is ready */
> > if (!dev || !dev->enabled) {
> > - if (pm_idle_old)
> > - pm_idle_old();
> > - else
> > #if defined(CONFIG_ARCH_HAS_DEFAULT_IDLE)
> > - default_idle();
> > + default_idle();
> > #else
> > - local_irq_enable();
> > + local_irq_enable();
> > #endif
> > return;
> > }
> > @@ -94,35 +96,11 @@ static void cpuidle_idle_call(void)
> > }
> >
> > /**
> > - * cpuidle_install_idle_handler - installs the cpuidle idle loop handler
> > - */
> > -void cpuidle_install_idle_handler(void)
> > -{
> > - if (enabled_devices && (pm_idle != cpuidle_idle_call)) {
> > - /* Make sure all changes finished before we switch to new idle */
> > - smp_wmb();
> > - pm_idle = cpuidle_idle_call;
> > - }
> > -}
> > -
> > -/**
> > - * cpuidle_uninstall_idle_handler - uninstalls the cpuidle idle loop handler
> > - */
> > -void cpuidle_uninstall_idle_handler(void)
> > -{
> > - if (enabled_devices && pm_idle_old && (pm_idle != pm_idle_old)) {
> > - pm_idle = pm_idle_old;
> > - cpuidle_kick_cpus();
> > - }
> > -}
> > -
> > -/**
> > * cpuidle_pause_and_lock - temporarily disables CPUIDLE
> > */
> > void cpuidle_pause_and_lock(void)
> > {
> > mutex_lock(&cpuidle_lock);
> > - cpuidle_uninstall_idle_handler();
> > }
> >
> > EXPORT_SYMBOL_GPL(cpuidle_pause_and_lock);
> > @@ -132,7 +110,6 @@ EXPORT_SYMBOL_GPL(cpuidle_pause_and_lock
> > */
> > void cpuidle_resume_and_unlock(void)
> > {
> > - cpuidle_install_idle_handler();
> > mutex_unlock(&cpuidle_lock);
> > }
> >
>
> What does this mean for users of cpuidle_pause_and_lock/unlock?
> Should we be calling register/unregister_idle_function here?
>
Yes, you are right. I have missed out on this part.
register/unregister_idle_function should replace
install/uninstall_idle_handler at those places. Thanks.
>
> > @@ -287,6 +264,12 @@ static int __cpuidle_register_device(str
> > return 0;
> > }
> >
> > +static void register_cpuidle_idle_function(void)
> > +{
> > + register_idle_function(&cpuidle_idle_desc);
> > +
> > + idle_function_registered = 1;
>
> Use booleans if possible, unless you intend to extend the meaning of
> registered someday.
>
I don't intend to extend the meaning of idle_function_registered.
Will use boolean here.
> > +}
> > /**
> > * cpuidle_register_device - registers a CPU's idle PM feature
> > * @dev: the cpu
> > @@ -303,7 +286,9 @@ int cpuidle_register_device(struct cpuid
> > }
> >
> > cpuidle_enable_device(dev);
> > - cpuidle_install_idle_handler();
> > +
> > + if (!idle_function_registered)
> > + register_cpuidle_idle_function();
> >
> > mutex_unlock(&cpuidle_lock);
> >
> > @@ -382,8 +367,6 @@ static int __init cpuidle_init(void)
> > {
> > int ret;
> >
> > - pm_idle_old = pm_idle;
> > -
> > ret = cpuidle_add_class_sysfs(&cpu_sysdev_class);
> > if (ret)
> > return ret;
> > Index: linux.trees.git/drivers/cpuidle/governor.c
> > ===================================================================
> > --- linux.trees.git.orig/drivers/cpuidle/governor.c
> > +++ linux.trees.git/drivers/cpuidle/governor.c
> > @@ -48,8 +48,6 @@ int cpuidle_switch_governor(struct cpuid
> > if (gov == cpuidle_curr_governor)
> > return 0;
> >
> > - cpuidle_uninstall_idle_handler();
> > -
> > if (cpuidle_curr_governor) {
> > list_for_each_entry(dev, &cpuidle_detected_devices, device_list)
> > cpuidle_disable_device(dev);
> > @@ -63,7 +61,6 @@ int cpuidle_switch_governor(struct cpuid
> > return -EINVAL;
> > list_for_each_entry(dev, &cpuidle_detected_devices, device_list)
> > cpuidle_enable_device(dev);
> > - cpuidle_install_idle_handler();
> > printk(KERN_INFO "cpuidle: using governor %s\n", gov->name);
> > }
> >
>
> --
> Balbir
Thanks for the review!
--arun
On Tue, 2009-09-01 at 17:08 +0530, Arun R Bharadwaj wrote:
> * Arun R Bharadwaj <[email protected]> [2009-09-01 17:07:04]:
>
> Cleanup drivers/cpuidle/cpuidle.c
>
> Cpuidle maintains a pm_idle_old void pointer because, currently in x86
> there is no clean way of registering and unregistering a idle function.
Right, and instead of fixing that, they build this cpuidle crap on top,
instead of replacing the current crap with it.
> So remove pm_idle_old and leave the responsibility of maintaining the
> list of registered idle loops to the architecture specific code. If the
> architecture registers cpuidle_idle_call as its idle loop, only then
> this loop is called.
OK, that's a start I guess. Best would be to replace all of pm_idle with
cpuidle, which is what should have been done from the very start.
If cpuidle cannot fully replace the pm_idle functionality, then it needs
to fix that. But having two layers of idle functions is just silly.
Looking at patch 2 and 3, you're making the same mistake on power, after
those patches there are multiple ways of registering idle functions, one
through some native interface and one through cpuidle, this strikes me
as undesirable.
If cpuidle is a good idle function manager, then it should be good
enough to be the sole one, if its not, then why bother with it at all.
* Balbir Singh <[email protected]> [2009-09-01 22:58:25]:
> * Arun R B <[email protected]> [2009-09-01 17:08:40]:
>
> > * Arun R Bharadwaj <[email protected]> [2009-09-01 17:07:04]:
> >
> > Cleanup drivers/cpuidle/cpuidle.c
> >
> > Cpuidle maintains a pm_idle_old void pointer because, currently in x86
> > there is no clean way of registering and unregistering a idle function.
> >
> > So remove pm_idle_old and leave the responsibility of maintaining the
> > list of registered idle loops to the architecture specific code. If the
> > architecture registers cpuidle_idle_call as its idle loop, only then
> > this loop is called.
> >
>
> It sounds as if there is a side-effect of this
> patch on x86 (am I reading it incorrectly), which can be fixed, but
> it will need a patch or so to get back the old behaviour on x86.
>
> > Also remove unwanted functions cpuidle_[un]install_idle_handler,
> > cpuidle_kick_cpus()
> >
> > Signed-off-by: Arun R Bharadwaj <[email protected]>
> > ---
> > drivers/cpuidle/cpuidle.c | 51 +++++++++++++++------------------------------
> > drivers/cpuidle/governor.c | 3 --
> > 2 files changed, 17 insertions(+), 37 deletions(-)
> >
> > Index: linux.trees.git/drivers/cpuidle/cpuidle.c
> > ===================================================================
> > --- linux.trees.git.orig/drivers/cpuidle/cpuidle.c
> > +++ linux.trees.git/drivers/cpuidle/cpuidle.c
> > @@ -24,9 +24,14 @@ DEFINE_PER_CPU(struct cpuidle_device *,
> >
> > DEFINE_MUTEX(cpuidle_lock);
> > LIST_HEAD(cpuidle_detected_devices);
> > -static void (*pm_idle_old)(void);
> >
> > static int enabled_devices;
> > +static int idle_function_registered;
> > +
> > +struct idle_function_desc cpuidle_idle_desc = {
> > + .name = "cpuidle_loop",
> > + .idle_func = cpuidle_idle_call,
> > +};
> >
> > #if defined(CONFIG_ARCH_HAS_CPU_IDLE_WAIT)
> > static void cpuidle_kick_cpus(void)
> > @@ -54,13 +59,10 @@ static void cpuidle_idle_call(void)
> >
> > /* check if the device is ready */
> > if (!dev || !dev->enabled) {
> > - if (pm_idle_old)
> > - pm_idle_old();
> > - else
> > #if defined(CONFIG_ARCH_HAS_DEFAULT_IDLE)
> > - default_idle();
> > + default_idle();
> > #else
> > - local_irq_enable();
> > + local_irq_enable();
> > #endif
> > return;
> > }
> > @@ -94,35 +96,11 @@ static void cpuidle_idle_call(void)
> > }
> >
> > /**
> > - * cpuidle_install_idle_handler - installs the cpuidle idle loop handler
> > - */
> > -void cpuidle_install_idle_handler(void)
> > -{
> > - if (enabled_devices && (pm_idle != cpuidle_idle_call)) {
> > - /* Make sure all changes finished before we switch to new idle */
> > - smp_wmb();
> > - pm_idle = cpuidle_idle_call;
> > - }
> > -}
> > -
> > -/**
> > - * cpuidle_uninstall_idle_handler - uninstalls the cpuidle idle loop handler
> > - */
> > -void cpuidle_uninstall_idle_handler(void)
> > -{
> > - if (enabled_devices && pm_idle_old && (pm_idle != pm_idle_old)) {
> > - pm_idle = pm_idle_old;
> > - cpuidle_kick_cpus();
> > - }
> > -}
> > -
> > -/**
> > * cpuidle_pause_and_lock - temporarily disables CPUIDLE
> > */
> > void cpuidle_pause_and_lock(void)
> > {
> > mutex_lock(&cpuidle_lock);
> > - cpuidle_uninstall_idle_handler();
> > }
> >
> > EXPORT_SYMBOL_GPL(cpuidle_pause_and_lock);
> > @@ -132,7 +110,6 @@ EXPORT_SYMBOL_GPL(cpuidle_pause_and_lock
> > */
> > void cpuidle_resume_and_unlock(void)
> > {
> > - cpuidle_install_idle_handler();
> > mutex_unlock(&cpuidle_lock);
> > }
> >
>
> What does this mean for users of cpuidle_pause_and_lock/unlock?
> Should we be calling register/unregister_idle_function here?
>
Just observed the use case for cpuidle_pause_and_lock/unlock.
It is not clear as to why we need to switch back to the old idle
handler and then again back to cpuidle's idle handler. Wouldn't it
make more sense to just register the idle handler when the first
cpuidle device is being registered and unregister the idle handler
when the last cpuidle device is unregistered?
--arun
>
> > @@ -287,6 +264,12 @@ static int __cpuidle_register_device(str
> > return 0;
> > }
> >
> > +static void register_cpuidle_idle_function(void)
> > +{
> > + register_idle_function(&cpuidle_idle_desc);
> > +
> > + idle_function_registered = 1;
>
> Use booleans if possible, unless you intend to extend the meaning of
> registered someday.
>
> > +}
> > /**
> > * cpuidle_register_device - registers a CPU's idle PM feature
> > * @dev: the cpu
> > @@ -303,7 +286,9 @@ int cpuidle_register_device(struct cpuid
> > }
> >
> > cpuidle_enable_device(dev);
> > - cpuidle_install_idle_handler();
> > +
> > + if (!idle_function_registered)
> > + register_cpuidle_idle_function();
> >
> > mutex_unlock(&cpuidle_lock);
> >
> > @@ -382,8 +367,6 @@ static int __init cpuidle_init(void)
> > {
> > int ret;
> >
> > - pm_idle_old = pm_idle;
> > -
> > ret = cpuidle_add_class_sysfs(&cpu_sysdev_class);
> > if (ret)
> > return ret;
> > Index: linux.trees.git/drivers/cpuidle/governor.c
> > ===================================================================
> > --- linux.trees.git.orig/drivers/cpuidle/governor.c
> > +++ linux.trees.git/drivers/cpuidle/governor.c
> > @@ -48,8 +48,6 @@ int cpuidle_switch_governor(struct cpuid
> > if (gov == cpuidle_curr_governor)
> > return 0;
> >
> > - cpuidle_uninstall_idle_handler();
> > -
> > if (cpuidle_curr_governor) {
> > list_for_each_entry(dev, &cpuidle_detected_devices, device_list)
> > cpuidle_disable_device(dev);
> > @@ -63,7 +61,6 @@ int cpuidle_switch_governor(struct cpuid
> > return -EINVAL;
> > list_for_each_entry(dev, &cpuidle_detected_devices, device_list)
> > cpuidle_enable_device(dev);
> > - cpuidle_install_idle_handler();
> > printk(KERN_INFO "cpuidle: using governor %s\n", gov->name);
> > }
> >
>
> --
> Balbir
* Peter Zijlstra <[email protected]> [2009-09-02 07:42:24]:
> On Tue, 2009-09-01 at 17:08 +0530, Arun R Bharadwaj wrote:
> > * Arun R Bharadwaj <[email protected]> [2009-09-01 17:07:04]:
> >
> > Cleanup drivers/cpuidle/cpuidle.c
> >
> > Cpuidle maintains a pm_idle_old void pointer because, currently in x86
> > there is no clean way of registering and unregistering a idle function.
>
> Right, and instead of fixing that, they build this cpuidle crap on top,
> instead of replacing the current crap with it.
>
> > So remove pm_idle_old and leave the responsibility of maintaining the
> > list of registered idle loops to the architecture specific code. If the
> > architecture registers cpuidle_idle_call as its idle loop, only then
> > this loop is called.
>
> OK, that's a start I guess. Best would be to replace all of pm_idle with
> cpuidle, which is what should have been done from the very start.
>
> If cpuidle cannot fully replace the pm_idle functionality, then it needs
> to fix that. But having two layers of idle functions is just silly.
>
> Looking at patch 2 and 3, you're making the same mistake on power, after
> those patches there are multiple ways of registering idle functions, one
> through some native interface and one through cpuidle, this strikes me
> as undesirable.
>
> If cpuidle is a good idle function manager, then it should be good
> enough to be the sole one, if its not, then why bother with it at all.
>
Okay, I'm giving this approach a shot now. i.e. trying to make cpuidle
as _the_ sole idle function manager. This would mean doing away with
pm_idle and ppc_md.power_save. And, cpuidle_idle_call() which is the
main idle loop of cpuidle, present in drivers/cpuidle/cpuidle.c will
have to be called from arch specific code of cpu_idle()
Also this would mean enabling cpuidle for all platforms, even if the
platform doesn't have multiple idle states. So suppose a platform doesnt
have multiple states, it wouldn't want the bloated code of cpuidle
governors, and would want just a simple cpuidle loop.
--arun
>
On Thu, 2009-09-03 at 10:12 +0530, Arun R Bharadwaj wrote:
> > OK, that's a start I guess. Best would be to replace all of pm_idle with
> > cpuidle, which is what should have been done from the very start.
> >
> > If cpuidle cannot fully replace the pm_idle functionality, then it needs
> > to fix that. But having two layers of idle functions is just silly.
> >
> > Looking at patch 2 and 3, you're making the same mistake on power, after
> > those patches there are multiple ways of registering idle functions, one
> > through some native interface and one through cpuidle, this strikes me
> > as undesirable.
> >
> > If cpuidle is a good idle function manager, then it should be good
> > enough to be the sole one, if its not, then why bother with it at all.
> >
>
> Okay, I'm giving this approach a shot now. i.e. trying to make cpuidle
> as _the_ sole idle function manager. This would mean doing away with
> pm_idle and ppc_md.power_save. And, cpuidle_idle_call() which is the
> main idle loop of cpuidle, present in drivers/cpuidle/cpuidle.c will
> have to be called from arch specific code of cpu_idle()
>
> Also this would mean enabling cpuidle for all platforms, even if the
> platform doesn't have multiple idle states. So suppose a platform doesnt
> have multiple states, it wouldn't want the bloated code of cpuidle
> governors, and would want just a simple cpuidle loop.
Do talk to the powerpc maintainers about this. But yes, something like
that should be doable.
AFAICT the whole governor thing is optional and cpuidle provides a
spinning idle loop by default, and platforms can always register a
simple alternative when they set up bits -- the only thing to be careful
about is not creating a chicken-egg problem where the platform setup
runs before cpuidle is able to register a new handler or something.
I'd be delighted to see the end of pm_idle on x86.