LinuxLists.cc - [RFC PATCH V4 0/5] cpuidle: Cleanup pm_idle and include driver/cpuidle.c in-kernel

2011-03-22 12:32:25

Subject: [RFC PATCH V4 0/5] cpuidle: Cleanup pm_idle and include driver/cpuidle.c in-kernel

Changes in V4:
* Implemented cpuidle driver for xen. Earlier pm_idle
function pointer would be set to default idle. Now a cpuidle_driver
encapsulating the idle routine is cleanly registered for this
using cpuidle_register_driver API.

* Implemented a cpuidle driver for apm_cpu_idle() as part of
APM BIOS driver. Earlier APM BIOS driver would flip
pm_idle function pointer to apm_cpu_idle.

* This patch series applies on 2.6.38, and was tested on x86 system
with multiple sleep states.

Goal:
This patch series tries to achieve the goal of having cpuidle manage
all idle routine for x86. It removes pm_idle function pointer which
causes problems discussed at http://lkml.org/lkml/2009/8/28/43 and
http://lkml.org/lkml/2009/8/28/50.

V1 of this series is at https://lkml.org/lkml/2010/10/19/449,
V2 is at https://lkml.org/lkml/2011/1/13/98 and
V3 is at https://lkml.org/lkml/2011/2/8/73

---

Trinabh Gupta (5):
cpuidle: cpuidle driver for apm
cpuidle: driver for xen
cpuidle: default idle driver for x86
cpuidle: list based cpuidle driver registration and selection
cpuidle: Remove pm_idle pointer for x86

arch/x86/Kconfig | 2
arch/x86/kernel/apm_32.c | 75 ++++++-
arch/x86/kernel/process.c | 339 --------------------------------
arch/x86/kernel/process_32.c | 4
arch/x86/kernel/process_64.c | 4
arch/x86/xen/setup.c | 42 ++++
drivers/acpi/processor_idle.c | 2
drivers/cpuidle/Kconfig | 9 +
drivers/cpuidle/cpuidle.c | 50 ++++-
drivers/cpuidle/driver.c | 114 ++++++++++-
drivers/idle/Makefile | 1
drivers/idle/default_driver.c | 437 +++++++++++++++++++++++++++++++++++++++++
include/linux/cpuidle.h | 3
13 files changed, 707 insertions(+), 375 deletions(-)
create mode 100644 drivers/idle/default_driver.c

--
-Trinabh

2011-03-22 12:32:44

by Trinabh Gupta

[permalink] [raw]

Subject: [RFC PATCH V4 1/5] cpuidle: Remove pm_idle pointer for x86

This patch removes pm_idle function pointer and directly calls
cpuidle_idle_call from the idle loop on x86. Hence, CPUIdle has
to be built into the kernel for x86 and optional for other
archs.

Archs that still use pm_idle can continue to set
pm_idle=cpuidle_idle_call() and co-exist. The
cpuidle_(un)install_idle_handler() is defined per arch in
cpuidle.c and would set pm_idle pointer on non-x86 architectures
until they are incrementally converted to directly call
cpuidle_idle_call() and not use the pm_idle pointer.

Signed-off-by: Trinabh Gupta <[email protected]>
---

arch/x86/kernel/process_32.c | 4 +++-
arch/x86/kernel/process_64.c | 4 +++-
drivers/cpuidle/Kconfig | 3 ++-
drivers/cpuidle/cpuidle.c | 2 +-
4 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 8d12878..17b7101 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -74,6 +74,8 @@ static inline void play_dead(void)
}
#endif

+extern void cpuidle_idle_call(void);
+
/*
* The idle thread. There's no useful work to be
* done, so just try to conserve power and have a
@@ -109,7 +111,7 @@ void cpu_idle(void)
local_irq_disable();
/* Don't trace irqs off for idle */
stop_critical_timings();
- pm_idle();
+ cpuidle_idle_call();
start_critical_timings();
}
tick_nohz_restart_sched_tick();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index bd387e8..c736bf3 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -99,6 +99,8 @@ static inline void play_dead(void)
}
#endif

+extern void cpuidle_idle_call(void);
+
/*
* The idle thread. There's no useful work to be
* done, so just try to conserve power and have a
@@ -136,7 +138,7 @@ void cpu_idle(void)
enter_idle();
/* Don't trace irqs off for idle */
stop_critical_timings();
- pm_idle();
+ cpuidle_idle_call();
start_critical_timings();

/* In many cases the interrupt that ended idle
diff --git a/drivers/cpuidle/Kconfig b/drivers/cpuidle/Kconfig
index 7dbc4a8..e67c258 100644
--- a/drivers/cpuidle/Kconfig
+++ b/drivers/cpuidle/Kconfig
@@ -1,7 +1,8 @@

config CPU_IDLE
bool "CPU idle PM support"
- default ACPI
+ default y if X86
+ default y if ACPI
help
CPU idle is a generic framework for supporting software-controlled
idle processor power management. It includes modular cross-platform
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index bf50924..8baaa04 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -47,7 +47,7 @@ static int __cpuidle_register_device(struct cpuidle_device *dev);
*
* NOTE: no locks or semaphores should be used here
*/
-static void cpuidle_idle_call(void)
+void cpuidle_idle_call(void)
{
struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
struct cpuidle_state *target_state;

2011-03-22 12:32:51

by Trinabh Gupta

[permalink] [raw]

Subject: [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection

A cpuidle_driver structure represents a cpuidle driver like
acpi_idle, intel_idle providing low level idle routines.
A cpuidle_driver is global in nature as it provides routines
for all the CPUS. Each CPU registered with the cpuidle subsystem is
represented as a cpuidle_device. A cpuidle_device structure
points to the low level idle routines for that CPU provided by
a certain driver. In other words, a cpuidle driver creates a
cpuidle_device structure for each CPU that it registers with the
cpuidle subsystem. Whenever cpuidle idle loop is called, the cpuidle
subsystem picks the cpuidle_device structure for that cpu and
calls one of the low level idle routines through that structure.

In the current design, only one cpuidle_driver may be registered
and registration of any subsequent driver fails. The same registered
driver provides low level idle routines for each cpuidle_device.

A list based registration for cpuidle_driver provides a clean
mechanism for multiple subsystems/modules to register their own idle
routines and thus avoids using pm_idle.

This patch implements a list based registration for cpuidle
driver. Different drivers can be registered and the driver to
be used is selected based on a per driver priority. On a
cpuidle driver registration or unregistration cpuidle_device
structure for each CPU is changed. Each cpuidle_device points
to its driver to facilitate this registration and unregistration.

--------- ---------
|cpuidle| |cpuidle|
|driver |----------------------- |driver |
|default| |acpi |
--------- ---------
^ ^
| |
---------------------------- ---------------------------
^ ^ ^ ^
| | | |
| | | |
--------- --------- --------- ----------
|cpuidle| |cpuidle| |cpuidle| |cpuidle |
|device | |device | .... |device | |device | .....
| CPU0 | | CPU1 | |CPU0 | |CPU1 |
--------- --------- --------- ----------
Signed-off-by: Trinabh Gupta <[email protected]>
---

drivers/acpi/processor_idle.c | 2 +
drivers/cpuidle/Kconfig | 6 ++
drivers/cpuidle/cpuidle.c | 46 ++++++++++++++---
drivers/cpuidle/driver.c | 114 ++++++++++++++++++++++++++++++++++++++---
include/linux/cpuidle.h | 3 +
5 files changed, 157 insertions(+), 14 deletions(-)

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index d615b7d..65dde00 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -971,6 +971,7 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,
struct cpuidle_driver acpi_idle_driver = {
.name = "acpi_idle",
.owner = THIS_MODULE,
+ .priority = 10,
};

/**
@@ -992,6 +993,7 @@ static int acpi_processor_setup_cpuidle(struct acpi_processor *pr)
}

dev->cpu = pr->id;
+ dev->drv = &acpi_idle_driver;
for (i = 0; i < CPUIDLE_STATE_MAX; i++) {
dev->states[i].name[0] = '\0';
dev->states[i].desc[0] = '\0';
diff --git a/drivers/cpuidle/Kconfig b/drivers/cpuidle/Kconfig
index e67c258..8cc73c8 100644
--- a/drivers/cpuidle/Kconfig
+++ b/drivers/cpuidle/Kconfig
@@ -19,3 +19,9 @@ config CPU_IDLE_GOV_MENU
bool
depends on CPU_IDLE && NO_HZ
default y
+
+config ARCH_USES_PMIDLE
+ bool
+ depends on CPUIDLE
+ depends on !X86
+ default y
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 8baaa04..91ef1bf 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -116,6 +116,7 @@ void cpuidle_idle_call(void)
cpuidle_curr_governor->reflect(dev);
}

+#ifdef CONFIG_ARCH_USES_PMIDLE
/**
* cpuidle_install_idle_handler - installs the cpuidle idle loop handler
*/
@@ -138,6 +139,10 @@ void cpuidle_uninstall_idle_handler(void)
cpuidle_kick_cpus();
}
}
+#else
+void cpuidle_install_idle_handler(void) {}
+void cpuidle_uninstall_idle_handler(void) {}
+#endif

/**
* cpuidle_pause_and_lock - temporarily disables CPUIDLE
@@ -293,9 +298,21 @@ static int __cpuidle_register_device(struct cpuidle_device *dev)
struct sys_device *sys_dev = get_cpu_sysdev((unsigned long)dev->cpu);
struct cpuidle_driver *cpuidle_driver = cpuidle_get_driver();

+ if (dev->registered)
+ return 0;
if (!sys_dev)
return -EINVAL;
- if (!try_module_get(cpuidle_driver->owner))
+ /*
+ * This will maintain compatibility with old cpuidle_device
+ * structure where driver pointer is not set.
+ *
+ * To Do: Change all call sites to set dev->drv pointer.
+ * This can be changed to BUG() later in case somebody
+ * registers without a driver pointer.
+ */
+ if (!dev->drv)
+ dev->drv = cpuidle_driver;
+ if (!try_module_get(dev->drv->owner))
return -EINVAL;

init_completion(&dev->kobj_unregister);
@@ -320,10 +337,11 @@ static int __cpuidle_register_device(struct cpuidle_device *dev)
dev->states[i].power_usage = -1 - i;
}

- per_cpu(cpuidle_devices, dev->cpu) = dev;
+ if (cpuidle_driver == dev->drv)
+ per_cpu(cpuidle_devices, dev->cpu) = dev;
list_add(&dev->device_list, &cpuidle_detected_devices);
if ((ret = cpuidle_add_sysfs(sys_dev))) {
- module_put(cpuidle_driver->owner);
+ module_put(dev->drv->owner);
return ret;
}

@@ -374,13 +392,18 @@ void cpuidle_unregister_device(struct cpuidle_device *dev)
cpuidle_disable_device(dev);

cpuidle_remove_sysfs(sys_dev);
- list_del(&dev->device_list);
wait_for_completion(&dev->kobj_unregister);
- per_cpu(cpuidle_devices, dev->cpu) = NULL;
+ if (cpuidle_driver == dev->drv)
+ per_cpu(cpuidle_devices, dev->cpu) = NULL;

+ /*
+ * To Do: Ensure each CPU has exited the current
+ * idle routine which may belong to this module/driver.
+ * cpuidle_kick_cpus() equivalent is required.
+ */
cpuidle_resume_and_unlock();

- module_put(cpuidle_driver->owner);
+ module_put(dev->drv->owner);
}

EXPORT_SYMBOL_GPL(cpuidle_unregister_device);
@@ -420,6 +443,15 @@ static inline void latency_notifier_init(struct notifier_block *n)

#endif /* CONFIG_SMP */

+#ifdef CONFIG_ARCH_USES_PMIDLE
+static inline void save_pm_idle(void)
+{
+ pm_idle_old = pm_idle;
+}
+#else
+static inline void save_pm_idle(void) {}
+#endif
+
/**
* cpuidle_init - core initializer
*/
@@ -427,7 +459,7 @@ static int __init cpuidle_init(void)
{
int ret;

- pm_idle_old = pm_idle;
+ save_pm_idle();

ret = cpuidle_add_class_sysfs(&cpu_sysdev_class);
if (ret)
diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
index fd1601e..c235286 100644
--- a/drivers/cpuidle/driver.c
+++ b/drivers/cpuidle/driver.c
@@ -14,8 +14,83 @@

#include "cpuidle.h"

+#define MAX_PRIORITY 1000
+#define DEFAULT_PRIORITY 100
+
static struct cpuidle_driver *cpuidle_curr_driver;
DEFINE_SPINLOCK(cpuidle_driver_lock);
+LIST_HEAD(registered_cpuidle_drivers);
+
+static struct cpuidle_driver *select_cpuidle_driver(void)
+{
+ struct cpuidle_driver *item = NULL, *selected = NULL;
+ unsigned int min_priority = MAX_PRIORITY;
+
+ list_for_each_entry(item, &registered_cpuidle_drivers,
+ driver_list) {
+ if (item->priority <= min_priority) {
+ selected = item;
+ min_priority = item->priority;
+ }
+ }
+ return selected;
+}
+
+static void set_current_cpuidle_driver(struct cpuidle_driver *drv)
+{
+ struct cpuidle_device *item = NULL;
+ if (drv == cpuidle_curr_driver)
+ return;
+
+ /* Unregister the previous drivers devices */
+ /* Do we need to take cpuidle_lock ? {un}register_device takes lock*/
+ if (cpuidle_curr_driver) {
+ list_for_each_entry(item, &cpuidle_detected_devices,
+ device_list) {
+ if (item->drv == cpuidle_curr_driver)
+ cpuidle_unregister_device(item);
+ }
+ }
+
+ cpuidle_curr_driver = drv;
+
+ if (drv == NULL)
+ return;
+ else {
+ /* Register the new driver devices */
+ list_for_each_entry(item, &cpuidle_detected_devices,
+ device_list) {
+ if (item->drv == drv)
+ cpuidle_register_device(item);
+ }
+ }
+}
+
+static inline int arch_allow_register(void)
+{
+#ifdef CONFIG_ARCH_USES_PMIDLE
+ if (cpuidle_curr_driver)
+ return -EPERM;
+ else
+ return 0;
+#else
+ return 0;
+#endif
+}
+
+static inline int arch_allow_unregister(struct cpuidle_driver *drv)
+{
+#ifdef CONFIG_ARCH_USES_PMIDLE
+ if (drv != cpuidle_curr_driver) {
+ WARN(1, "invalid cpuidle_unregister_driver(%s)\n",
+ drv->name);
+ return -EPERM;
+ } else
+ return 0;
+#else
+ return 0;
+#endif
+}

/**
* cpuidle_register_driver - registers a driver
@@ -23,15 +98,29 @@ DEFINE_SPINLOCK(cpuidle_driver_lock);
*/
int cpuidle_register_driver(struct cpuidle_driver *drv)
{
+ struct cpuidle_driver *item = NULL;
if (!drv)
return -EINVAL;
+ if (drv->priority == 0)
+ drv->priority = DEFAULT_PRIORITY;

spin_lock(&cpuidle_driver_lock);
- if (cpuidle_curr_driver) {
+
+ if (arch_allow_register()) {
spin_unlock(&cpuidle_driver_lock);
return -EBUSY;
}
- cpuidle_curr_driver = drv;
+
+ /* Check if driver already registered */
+ list_for_each_entry(item, &registered_cpuidle_drivers, driver_list) {
+ if (item == drv) {
+ spin_unlock(&cpuidle_driver_lock);
+ return -EINVAL;
+ }
+ }
+
+ list_add(&drv->driver_list, &registered_cpuidle_drivers);
+ set_current_cpuidle_driver(select_cpuidle_driver());
spin_unlock(&cpuidle_driver_lock);

return 0;
@@ -54,14 +143,25 @@ EXPORT_SYMBOL_GPL(cpuidle_get_driver);
*/
void cpuidle_unregister_driver(struct cpuidle_driver *drv)
{
- if (drv != cpuidle_curr_driver) {
- WARN(1, "invalid cpuidle_unregister_driver(%s)\n",
- drv->name);
+ struct cpuidle_device *item = NULL;
+
+ if (arch_allow_unregister(drv))
return;
- }

spin_lock(&cpuidle_driver_lock);
- cpuidle_curr_driver = NULL;
+
+ /* Set some other driver as current */
+ list_del(&drv->driver_list);
+ set_current_cpuidle_driver(select_cpuidle_driver());
+
+ /* Delete all devices corresponding to this driver */
+ mutex_lock(&cpuidle_lock);
+ list_for_each_entry(item, &cpuidle_detected_devices, device_list) {
+ if (item->drv == drv)
+ list_del(&item->device_list);
+ }
+ mutex_unlock(&cpuidle_lock);
+
spin_unlock(&cpuidle_driver_lock);
}

diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 36719ea..a5f5fd0 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -90,6 +90,7 @@ struct cpuidle_device {
struct cpuidle_state *last_state;

struct list_head device_list;
+ struct cpuidle_driver *drv;
struct kobject kobj;
struct completion kobj_unregister;
void *governor_data;
@@ -119,6 +120,8 @@ static inline int cpuidle_get_last_residency(struct cpuidle_device *dev)
struct cpuidle_driver {
char name[CPUIDLE_NAME_LEN];
struct module *owner;
+ unsigned int priority;
+ struct list_head driver_list;
};

#ifdef CONFIG_CPU_IDLE

2011-03-22 12:33:28

by Trinabh Gupta

[permalink] [raw]

Subject: [RFC PATCH V4 3/5] cpuidle: default idle driver for x86

This default cpuidle_driver parses idle= boot parameters, selects
the optimal idle routine for x86 during bootup and registers with
cpuidle. The code for idle routines and the selection of optimal
routine is moved from arch/x86/kernel/process.c . At module_init this
default driver is registered with cpuidle and for non ACPI platforms
it continues to be used. For ACPI platforms, acpi_idle driver would
replace this driver at a later point in time during bootup. Until
this driver's registration, architecture supplied compile time
default idle routine is called from within cpuidle_idle_call().

Signed-off-by: Trinabh Gupta <[email protected]>
---

arch/x86/Kconfig | 2
arch/x86/kernel/process.c | 339 --------------------------------
drivers/cpuidle/cpuidle.c | 2
drivers/idle/Makefile | 1
drivers/idle/default_driver.c | 437 +++++++++++++++++++++++++++++++++++++++++
5 files changed, 440 insertions(+), 341 deletions(-)
create mode 100644 drivers/idle/default_driver.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d5ed94d..6c03c92 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -148,7 +148,7 @@ config RWSEM_XCHGADD_ALGORITHM
def_bool X86_XADD

config ARCH_HAS_CPU_IDLE_WAIT
- def_bool y
+ def_bool n

config GENERIC_CALIBRATE_DELAY
def_bool y
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index ff45541..27950dc 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -7,7 +7,6 @@
#include <linux/sched.h>
#include <linux/module.h>
#include <linux/pm.h>
-#include <linux/clockchips.h>
#include <linux/random.h>
#include <linux/user-return-notifier.h>
#include <linux/dmi.h>
@@ -330,344 +329,6 @@ long sys_execve(const char __user *name,
return error;
}

-/*
- * Idle related variables and functions
- */
-unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
-EXPORT_SYMBOL(boot_option_idle_override);
-
-/*
- * Powermanagement idle function, if any..
- */
-void (*pm_idle)(void);
-EXPORT_SYMBOL(pm_idle);
-
-#ifdef CONFIG_X86_32
-/*
- * This halt magic was a workaround for ancient floppy DMA
- * wreckage. It should be safe to remove.
- */
-static int hlt_counter;
-void disable_hlt(void)
-{
- hlt_counter++;
-}
-EXPORT_SYMBOL(disable_hlt);
-
-void enable_hlt(void)
-{
- hlt_counter--;
-}
-EXPORT_SYMBOL(enable_hlt);
-
-static inline int hlt_use_halt(void)
-{
- return (!hlt_counter && boot_cpu_data.hlt_works_ok);
-}
-#else
-static inline int hlt_use_halt(void)
-{
- return 1;
-}
-#endif
-
-/*
- * We use this if we don't have any better
- * idle routine..
- */
-void default_idle(void)
-{
- if (hlt_use_halt()) {
- trace_power_start(POWER_CSTATE, 1, smp_processor_id());
- trace_cpu_idle(1, smp_processor_id());
- current_thread_info()->status &= ~TS_POLLING;
- /*
- * TS_POLLING-cleared state must be visible before we
- * test NEED_RESCHED:
- */
- smp_mb();
-
- if (!need_resched())
- safe_halt(); /* enables interrupts racelessly */
- else
- local_irq_enable();
- current_thread_info()->status |= TS_POLLING;
- trace_power_end(smp_processor_id());
- trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
- } else {
- local_irq_enable();
- /* loop is done by the caller */
- cpu_relax();
- }
-}
-#ifdef CONFIG_APM_MODULE
-EXPORT_SYMBOL(default_idle);
-#endif
-
-void stop_this_cpu(void *dummy)
-{
- local_irq_disable();
- /*
- * Remove this CPU:
- */
- set_cpu_online(smp_processor_id(), false);
- disable_local_APIC();
-
- for (;;) {
- if (hlt_works(smp_processor_id()))
- halt();
- }
-}
-
-static void do_nothing(void *unused)
-{
-}
-
-/*
- * cpu_idle_wait - Used to ensure that all the CPUs discard old value of
- * pm_idle and update to new pm_idle value. Required while changing pm_idle
- * handler on SMP systems.
- *
- * Caller must have changed pm_idle to the new value before the call. Old
- * pm_idle value will not be used by any CPU after the return of this function.
- */
-void cpu_idle_wait(void)
-{
- smp_mb();
- /* kick all the CPUs so that they exit out of pm_idle */
- smp_call_function(do_nothing, NULL, 1);
-}
-EXPORT_SYMBOL_GPL(cpu_idle_wait);
-
-/*
- * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
- * which can obviate IPI to trigger checking of need_resched.
- * We execute MONITOR against need_resched and enter optimized wait state
- * through MWAIT. Whenever someone changes need_resched, we would be woken
- * up from MWAIT (without an IPI).
- *
- * New with Core Duo processors, MWAIT can take some hints based on CPU
- * capability.
- */
-void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
-{
- if (!need_resched()) {
- if (cpu_has(__this_cpu_ptr(&cpu_info), X86_FEATURE_CLFLUSH_MONITOR))
- clflush((void *)&current_thread_info()->flags);
-
- __monitor((void *)&current_thread_info()->flags, 0, 0);
- smp_mb();
- if (!need_resched())
- __mwait(ax, cx);
- }
-}
-
-/* Default MONITOR/MWAIT with no hints, used for default C1 state */
-static void mwait_idle(void)
-{
- if (!need_resched()) {
- trace_power_start(POWER_CSTATE, 1, smp_processor_id());
- trace_cpu_idle(1, smp_processor_id());
- if (cpu_has(__this_cpu_ptr(&cpu_info), X86_FEATURE_CLFLUSH_MONITOR))
- clflush((void *)&current_thread_info()->flags);
-
- __monitor((void *)&current_thread_info()->flags, 0, 0);
- smp_mb();
- if (!need_resched())
- __sti_mwait(0, 0);
- else
- local_irq_enable();
- trace_power_end(smp_processor_id());
- trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
- } else
- local_irq_enable();
-}
-
-/*
- * On SMP it's slightly faster (but much more power-consuming!)
- * to poll the ->work.need_resched flag instead of waiting for the
- * cross-CPU IPI to arrive. Use this option with caution.
- */
-static void poll_idle(void)
-{
- trace_power_start(POWER_CSTATE, 0, smp_processor_id());
- trace_cpu_idle(0, smp_processor_id());
- local_irq_enable();
- while (!need_resched())
- cpu_relax();
- trace_power_end(smp_processor_id());
- trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
-}
-
-/*
- * mwait selection logic:
- *
- * It depends on the CPU. For AMD CPUs that support MWAIT this is
- * wrong. Family 0x10 and 0x11 CPUs will enter C1 on HLT. Powersavings
- * then depend on a clock divisor and current Pstate of the core. If
- * all cores of a processor are in halt state (C1) the processor can
- * enter the C1E (C1 enhanced) state. If mwait is used this will never
- * happen.
- *
- * idle=mwait overrides this decision and forces the usage of mwait.
- */
-
-#define MWAIT_INFO 0x05
-#define MWAIT_ECX_EXTENDED_INFO 0x01
-#define MWAIT_EDX_C1 0xf0
-
-int mwait_usable(const struct cpuinfo_x86 *c)
-{
- u32 eax, ebx, ecx, edx;
-
- if (boot_option_idle_override == IDLE_FORCE_MWAIT)
- return 1;
-
- if (c->cpuid_level < MWAIT_INFO)
- return 0;
-
- cpuid(MWAIT_INFO, &eax, &ebx, &ecx, &edx);
- /* Check, whether EDX has extended info about MWAIT */
- if (!(ecx & MWAIT_ECX_EXTENDED_INFO))
- return 1;
-
- /*
- * edx enumeratios MONITOR/MWAIT extensions. Check, whether
- * C1 supports MWAIT
- */
- return (edx & MWAIT_EDX_C1);
-}
-
-bool c1e_detected;
-EXPORT_SYMBOL(c1e_detected);
-
-static cpumask_var_t c1e_mask;
-
-void c1e_remove_cpu(int cpu)
-{
- if (c1e_mask != NULL)
- cpumask_clear_cpu(cpu, c1e_mask);
-}
-
-/*
- * C1E aware idle routine. We check for C1E active in the interrupt
- * pending message MSR. If we detect C1E, then we handle it the same
- * way as C3 power states (local apic timer and TSC stop)
- */
-static void c1e_idle(void)
-{
- if (need_resched())
- return;
-
- if (!c1e_detected) {
- u32 lo, hi;
-
- rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
-
- if (lo & K8_INTP_C1E_ACTIVE_MASK) {
- c1e_detected = true;
- if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
- mark_tsc_unstable("TSC halt in AMD C1E");
- printk(KERN_INFO "System has AMD C1E enabled\n");
- }
- }
-
- if (c1e_detected) {
- int cpu = smp_processor_id();
-
- if (!cpumask_test_cpu(cpu, c1e_mask)) {
- cpumask_set_cpu(cpu, c1e_mask);
- /*
- * Force broadcast so ACPI can not interfere.
- */
- clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_FORCE,
- &cpu);
- printk(KERN_INFO "Switch to broadcast mode on CPU%d\n",
- cpu);
- }
- clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
-
- default_idle();
-
- /*
- * The switch back from broadcast mode needs to be
- * called with interrupts disabled.
- */
- local_irq_disable();
- clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
- local_irq_enable();
- } else
- default_idle();
-}
-
-void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
-{
-#ifdef CONFIG_SMP
- if (pm_idle == poll_idle && smp_num_siblings > 1) {
- printk_once(KERN_WARNING "WARNING: polling idle and HT enabled,"
- " performance may degrade.\n");
- }
-#endif
- if (pm_idle)
- return;
-
- if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
- /*
- * One CPU supports mwait => All CPUs supports mwait
- */
- printk(KERN_INFO "using mwait in idle threads.\n");
- pm_idle = mwait_idle;
- } else if (cpu_has_amd_erratum(amd_erratum_400)) {
- /* E400: APIC timer interrupt does not wake up CPU from C1e */
- printk(KERN_INFO "using C1E aware idle routine\n");
- pm_idle = c1e_idle;
- } else
- pm_idle = default_idle;
-}
-
-void __init init_c1e_mask(void)
-{
- /* If we're using c1e_idle, we need to allocate c1e_mask. */
- if (pm_idle == c1e_idle)
- zalloc_cpumask_var(&c1e_mask, GFP_KERNEL);
-}
-
-static int __init idle_setup(char *str)
-{
- if (!str)
- return -EINVAL;
-
- if (!strcmp(str, "poll")) {
- printk("using polling idle threads.\n");
- pm_idle = poll_idle;
- boot_option_idle_override = IDLE_POLL;
- } else if (!strcmp(str, "mwait")) {
- boot_option_idle_override = IDLE_FORCE_MWAIT;
- } else if (!strcmp(str, "halt")) {
- /*
- * When the boot option of idle=halt is added, halt is
- * forced to be used for CPU idle. In such case CPU C2/C3
- * won't be used again.
- * To continue to load the CPU idle driver, don't touch
- * the boot_option_idle_override.
- */
- pm_idle = default_idle;
- boot_option_idle_override = IDLE_HALT;
- } else if (!strcmp(str, "nomwait")) {
- /*
- * If the boot option of "idle=nomwait" is added,
- * it means that mwait will be disabled for CPU C2/C3
- * states. In such case it won't touch the variable
- * of boot_option_idle_override.
- */
- boot_option_idle_override = IDLE_NOMWAIT;
- } else
- return -1;
-
- return 0;
-}
-early_param("idle", idle_setup);
-
unsigned long arch_align_stack(unsigned long sp)
{
if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 91ef1bf..7486e0f 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -34,7 +34,7 @@ static void cpuidle_kick_cpus(void)
{
cpu_idle_wait();
}
-#elif defined(CONFIG_SMP)
+#elif defined(CONFIG_SMP) && defined(CONFIG_ARCH_USES_PMIDLE)
# error "Arch needs cpu_idle_wait() equivalent here"
#else /* !CONFIG_ARCH_HAS_CPU_IDLE_WAIT && !CONFIG_SMP */
static void cpuidle_kick_cpus(void) {}
diff --git a/drivers/idle/Makefile b/drivers/idle/Makefile
index 23d295c..f16e866 100644
--- a/drivers/idle/Makefile
+++ b/drivers/idle/Makefile
@@ -1,3 +1,4 @@
obj-$(CONFIG_I7300_IDLE) += i7300_idle.o
obj-$(CONFIG_INTEL_IDLE) += intel_idle.o
+obj-$(CONFIG_X86) += default_driver.o

diff --git a/drivers/idle/default_driver.c b/drivers/idle/default_driver.c
new file mode 100644
index 0000000..21de286
--- /dev/null
+++ b/drivers/idle/default_driver.c
@@ -0,0 +1,437 @@
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/sched.h>
+#include <linux/cpuidle.h>
+#include <linux/clockchips.h>
+#include <linux/slab.h>
+#include <trace/events/power.h>
+#include <asm/mwait.h>
+
+
+/*
+ * Idle related variables and functions
+ */
+unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
+EXPORT_SYMBOL(boot_option_idle_override);
+
+static struct cpuidle_state *opt_state;
+
+#ifdef CONFIG_X86_32
+/*
+ * This halt magic was a workaround for ancient floppy DMA
+ * wreckage. It should be safe to remove.
+ */
+static int hlt_counter;
+void disable_hlt(void)
+{
+ hlt_counter++;
+}
+EXPORT_SYMBOL(disable_hlt);
+
+void enable_hlt(void)
+{
+ hlt_counter--;
+}
+EXPORT_SYMBOL(enable_hlt);
+
+static inline int hlt_use_halt(void)
+{
+ return (!hlt_counter && boot_cpu_data.hlt_works_ok);
+}
+#else
+static inline int hlt_use_halt(void)
+{
+ return 1;
+}
+#endif
+
+/*
+ * We use this if we don't have any better
+ * idle routine..
+ */
+void default_idle(void)
+{
+ if (hlt_use_halt()) {
+ trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+ trace_cpu_idle(1, smp_processor_id());
+ current_thread_info()->status &= ~TS_POLLING;
+ /*
+ * TS_POLLING-cleared state must be visible before we
+ * test NEED_RESCHED:
+ */
+ smp_mb();
+
+ if (!need_resched())
+ safe_halt(); /* enables interrupts racelessly */
+ else
+ local_irq_enable();
+ current_thread_info()->status |= TS_POLLING;
+ trace_power_end(smp_processor_id());
+ trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
+ } else {
+ local_irq_enable();
+ /* loop is done by the caller */
+ cpu_relax();
+ }
+}
+#ifdef CONFIG_APM_MODULE
+EXPORT_SYMBOL(default_idle);
+#endif
+
+void stop_this_cpu(void *dummy)
+{
+ local_irq_disable();
+ /*
+ * Remove this CPU:
+ */
+ set_cpu_online(smp_processor_id(), false);
+ disable_local_APIC();
+
+ for (;;) {
+ if (hlt_works(smp_processor_id()))
+ halt();
+ }
+}
+
+/*
+ * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
+ * which can obviate IPI to trigger checking of need_resched.
+ * We execute MONITOR against need_resched and enter optimized wait state
+ * through MWAIT. Whenever someone changes need_resched, we would be woken
+ * up from MWAIT (without an IPI).
+ *
+ * New with Core Duo processors, MWAIT can take some hints based on CPU
+ * capability.
+ */
+void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
+{
+ if (!need_resched()) {
+ if (cpu_has(__this_cpu_ptr(&cpu_info), X86_FEATURE_CLFLUSH_MONITOR))
+ clflush((void *)&current_thread_info()->flags);
+
+ __monitor((void *)&current_thread_info()->flags, 0, 0);
+ smp_mb();
+ if (!need_resched())
+ __mwait(ax, cx);
+ }
+}
+
+/* Default MONITOR/MWAIT with no hints, used for default C1 state */
+static void mwait_idle(void)
+{
+ if (!need_resched()) {
+ trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+ trace_cpu_idle(1, smp_processor_id());
+ if (cpu_has(__this_cpu_ptr(&cpu_info), X86_FEATURE_CLFLUSH_MONITOR))
+ clflush((void *)&current_thread_info()->flags);
+
+ __monitor((void *)&current_thread_info()->flags, 0, 0);
+ smp_mb();
+ if (!need_resched())
+ __sti_mwait(0, 0);
+ else
+ local_irq_enable();
+ trace_power_end(smp_processor_id());
+ trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
+ } else
+ local_irq_enable();
+}
+
+/*
+ * On SMP it's slightly faster (but much more power-consuming!)
+ * to poll the ->work.need_resched flag instead of waiting for the
+ * cross-CPU IPI to arrive. Use this option with caution.
+ */
+static void poll_idle(void)
+{
+ trace_power_start(POWER_CSTATE, 0, smp_processor_id());
+ trace_cpu_idle(0, smp_processor_id());
+ local_irq_enable();
+ while (!need_resched())
+ cpu_relax();
+ trace_power_end(smp_processor_id());
+ trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
+}
+
+/*
+ * mwait selection logic:
+ *
+ * It depends on the CPU. For AMD CPUs that support MWAIT this is
+ * wrong. Family 0x10 and 0x11 CPUs will enter C1 on HLT. Powersavings
+ * then depend on a clock divisor and current Pstate of the core. If
+ * all cores of a processor are in halt state (C1) the processor can
+ * enter the C1E (C1 enhanced) state. If mwait is used this will never
+ * happen.
+ *
+ * idle=mwait overrides this decision and forces the usage of mwait.
+ */
+
+#define MWAIT_INFO 0x05
+#define MWAIT_ECX_EXTENDED_INFO 0x01
+#define MWAIT_EDX_C1 0xf0
+
+int mwait_usable(const struct cpuinfo_x86 *c)
+{
+ u32 eax, ebx, ecx, edx;
+
+ if (boot_option_idle_override == IDLE_FORCE_MWAIT)
+ return 1;
+
+ if (c->cpuid_level < MWAIT_INFO)
+ return 0;
+
+ cpuid(MWAIT_INFO, &eax, &ebx, &ecx, &edx);
+ /* Check, whether EDX has extended info about MWAIT */
+ if (!(ecx & MWAIT_ECX_EXTENDED_INFO))
+ return 1;
+
+ /*
+ * edx enumeratios MONITOR/MWAIT extensions. Check, whether
+ * C1 supports MWAIT
+ */
+ return (edx & MWAIT_EDX_C1);
+}
+
+bool c1e_detected;
+EXPORT_SYMBOL(c1e_detected);
+
+static cpumask_var_t c1e_mask;
+
+void c1e_remove_cpu(int cpu)
+{
+ if (c1e_mask != NULL)
+ cpumask_clear_cpu(cpu, c1e_mask);
+}
+
+/*
+ * C1E aware idle routine. We check for C1E active in the interrupt
+ * pending message MSR. If we detect C1E, then we handle it the same
+ * way as C3 power states (local apic timer and TSC stop)
+ */
+static void c1e_idle(void)
+{
+ if (need_resched())
+ return;
+
+ if (!c1e_detected) {
+ u32 lo, hi;
+
+ rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
+
+ if (lo & K8_INTP_C1E_ACTIVE_MASK) {
+ c1e_detected = true;
+ if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
+ mark_tsc_unstable("TSC halt in AMD C1E");
+ printk(KERN_INFO "System has AMD C1E enabled\n");
+ }
+ }
+
+ if (c1e_detected) {
+ int cpu = smp_processor_id();
+
+ if (!cpumask_test_cpu(cpu, c1e_mask)) {
+ cpumask_set_cpu(cpu, c1e_mask);
+ /*
+ * Force broadcast so ACPI can not interfere.
+ */
+ clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_FORCE,
+ &cpu);
+ printk(KERN_INFO "Switch to broadcast mode on CPU%d\n",
+ cpu);
+ }
+ clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
+
+ default_idle();
+
+ /*
+ * The switch back from broadcast mode needs to be
+ * called with interrupts disabled.
+ */
+ local_irq_disable();
+ clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
+ local_irq_enable();
+ } else
+ default_idle();
+}
+
+static int poll_idle_wrapper(struct cpuidle_device *dev,
+ struct cpuidle_state *state)
+{
+ poll_idle();
+ return 0;
+}
+
+static int mwait_idle_wrapper(struct cpuidle_device *dev,
+ struct cpuidle_state *state)
+{
+ mwait_idle();
+ return 0;
+}
+
+static int c1e_idle_wrapper(struct cpuidle_device *dev,
+ struct cpuidle_state *state)
+{
+ c1e_idle();
+ return 0;
+}
+
+int default_idle_wrapper(struct cpuidle_device *dev,
+ struct cpuidle_state *state)
+{
+ default_idle();
+ return 0;
+}
+
+static struct cpuidle_state state_poll = {
+ .name = "POLL",
+ .desc = "POLL",
+ .driver_data = (void *) 0x00,
+ .flags = CPUIDLE_FLAG_TIME_VALID,
+ .exit_latency = 1,
+ .target_residency = 1,
+ .enter = &poll_idle_wrapper,
+};
+
+static struct cpuidle_state state_mwait = {
+ .name = "C1",
+ .desc = "MWAIT No Hints",
+ .driver_data = (void *) 0x01,
+ .flags = CPUIDLE_FLAG_TIME_VALID,
+ .exit_latency = 1,
+ .target_residency = 1,
+ .enter = &mwait_idle_wrapper,
+};
+
+static struct cpuidle_state state_c1e = {
+ .name = "C1E",
+ .desc = "C1E",
+ .driver_data = (void *) 0x02,
+ .flags = CPUIDLE_FLAG_TIME_VALID,
+ .exit_latency = 1,
+ .target_residency = 1,
+ .enter = &c1e_idle_wrapper,
+};
+
+struct cpuidle_state state_default_idle = {
+ .name = "DEFAULT-IDLE",
+ .desc = "Default idle routine",
+ .driver_data = (void *) 0x03,
+ .flags = CPUIDLE_FLAG_TIME_VALID,
+ .exit_latency = 1,
+ .target_residency = 1,
+ .enter = &default_idle_wrapper,
+};
+
+void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
+{
+#ifdef CONFIG_SMP
+ if (opt_state == &state_poll && smp_num_siblings > 1) {
+ printk_once(KERN_WARNING "WARNING: polling idle and HT enabled,"
+ " performance may degrade.\n");
+ }
+#endif
+ if (opt_state)
+ return;
+
+ if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
+ /*
+ * One CPU supports mwait => All CPUs supports mwait
+ */
+ printk(KERN_INFO "using mwait in idle threads.\n");
+ opt_state = &state_mwait;
+ } else if (cpu_has_amd_erratum(amd_erratum_400)) {
+ /* E400: APIC timer interrupt does not wake up CPU from C1e */
+ printk(KERN_INFO "using C1E aware idle routine\n");
+ opt_state = &state_c1e;
+ } else
+ opt_state = &state_default_idle;
+}
+
+void __init init_c1e_mask(void)
+{
+ /* If we're using c1e_idle, we need to allocate c1e_mask. */
+ if (opt_state == &state_c1e)
+ zalloc_cpumask_var(&c1e_mask, GFP_KERNEL);
+}
+
+static int __init idle_setup(char *str)
+{
+ if (!str)
+ return -EINVAL;
+
+ if (!strcmp(str, "poll")) {
+ printk("using polling idle threads.\n");
+ opt_state = &state_poll;
+ boot_option_idle_override = IDLE_POLL;
+ } else if (!strcmp(str, "mwait")) {
+ boot_option_idle_override = IDLE_FORCE_MWAIT;
+ } else if (!strcmp(str, "halt")) {
+ /*
+ * When the boot option of idle=halt is added, halt is
+ * forced to be used for CPU idle. In such case CPU C2/C3
+ * won't be used again.
+ * To continue to load the CPU idle driver, don't touch
+ * the boot_option_idle_override.
+ */
+ opt_state = &state_default_idle;
+ boot_option_idle_override = IDLE_HALT;
+ } else if (!strcmp(str, "nomwait")) {
+ /*
+ * If the boot option of "idle=nomwait" is added,
+ * it means that mwait will be disabled for CPU C2/C3
+ * states. In such case it won't touch the variable
+ * of boot_option_idle_override.
+ */
+ boot_option_idle_override = IDLE_NOMWAIT;
+ } else
+ return -1;
+
+ return 0;
+}
+early_param("idle", idle_setup);
+
+static struct cpuidle_driver default_idle_driver = {
+ .name = "default_idle",
+ .owner = THIS_MODULE,
+ .priority = 100,
+};
+
+static int setup_cpuidle(int cpu)
+{
+ struct cpuidle_device *dev = kzalloc(sizeof(struct cpuidle_device),
+ GFP_KERNEL);
+ int count = CPUIDLE_DRIVER_STATE_START;
+ dev->cpu = cpu;
+ dev->drv = &default_idle_driver;
+
+ BUG_ON(opt_state == NULL);
+ dev->states[count] = *opt_state;
+ count++;
+
+ dev->state_count = count;
+
+ if (cpuidle_register_device(dev))
+ return -EIO;
+ return 0;
+}
+
+static int __init default_idle_init(void)
+{
+ int retval, i;
+ retval = cpuidle_register_driver(&default_idle_driver);
+
+ for_each_online_cpu(i) {
+ setup_cpuidle(i);
+ }
+
+ return 0;
+}
+
+
+static void __exit default_idle_exit(void)
+{
+ cpuidle_unregister_driver(&default_idle_driver);
+ return;
+}
+
+device_initcall(default_idle_init);

2011-03-22 12:33:37

by Trinabh Gupta

[permalink] [raw]

Subject: [RFC PATCH V4 4/5] cpuidle: driver for xen

This patch implements a default cpuidle driver for xen.
Earlier pm_idle was flipped to default_idle. Maybe there
is a better way to ensure default_idle is called
without using this cpuidle driver.

Signed-off-by: Trinabh Gupta <[email protected]>
---

arch/x86/xen/setup.c | 42 +++++++++++++++++++++++++++++++++++++++++-
1 files changed, 41 insertions(+), 1 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index a8a66a5..4fce4cd 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -9,6 +9,8 @@
#include <linux/mm.h>
#include <linux/pm.h>
#include <linux/memblock.h>
+#include <linux/cpuidle.h>
+#include <linux/slab.h>

#include <asm/elf.h>
#include <asm/vdso.h>
@@ -321,6 +323,44 @@ void __cpuinit xen_enable_syscall(void)
#endif /* CONFIG_X86_64 */
}

+static struct cpuidle_driver xen_idle_driver = {
+ .name = "xen_idle",
+ .owner = THIS_MODULE,
+ .priority = 1,
+};
+
+extern struct cpuidle_state state_default_state;
+
+static int setup_cpuidle(int cpu)
+{
+ struct cpuidle_device *dev = kzalloc(sizeof(struct cpuidle_device),
+ GFP_KERNEL);
+ int count = CPUIDLE_DRIVER_STATE_START;
+ dev->cpu = cpu;
+ dev->drv = &xen_idle_driver;
+
+ dev->states[count] = state_default_idle;
+ count++;
+
+ dev->state_count = count;
+
+ if (cpuidle_register_device(dev))
+ return -EIO;
+ return 0;
+}
+
+static int xen_idle_init(void)
+{
+ int retval, i;
+ retval = cpuidle_register_driver(&xen_idle_driver);
+
+ for_each_online_cpu(i) {
+ setup_cpuidle(i);
+ }
+
+ return 0;
+}
+
void __init xen_arch_setup(void)
{
xen_panic_handler_init();
@@ -354,7 +394,7 @@ void __init xen_arch_setup(void)
#ifdef CONFIG_X86_32
boot_cpu_data.hlt_works_ok = 1;
#endif
- pm_idle = default_idle;
+ xen_idle_init();
boot_option_idle_override = IDLE_HALT;

fiddle_vdso();

2011-03-22 12:33:49

by Trinabh Gupta

[permalink] [raw]

Subject: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm

Signed-off-by: Trinabh Gupta <[email protected]>
---

arch/x86/kernel/apm_32.c | 75 +++++++++++++++++++++++++++++++++++++---------
1 files changed, 60 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/apm_32.c b/arch/x86/kernel/apm_32.c
index 0e4f24c..22d40bf 100644
--- a/arch/x86/kernel/apm_32.c
+++ b/arch/x86/kernel/apm_32.c
@@ -882,8 +882,6 @@ static void apm_do_busy(void)
#define IDLE_CALC_LIMIT (HZ * 100)
#define IDLE_LEAKY_MAX 16

-static void (*original_pm_idle)(void) __read_mostly;
-
/**
* apm_cpu_idle - cpu idling for APM capable Linux
*
@@ -947,10 +945,7 @@ recalc:
break;
}
}
- if (original_pm_idle)
- original_pm_idle();
- else
- default_idle();
+ default_idle();
local_irq_disable();
jiffies_since_last_check = jiffies - last_jiffies;
if (jiffies_since_last_check > idle_period)
@@ -2256,6 +2251,63 @@ static struct dmi_system_id __initdata apm_dmi_table[] = {
{ }
};

+int apm_idle_wrapper(struct cpuidle_device *dev,
+ struct cpuidle_state *state)
+{
+ apm_cpu_idle();
+ return 0;
+}
+
+struct cpuidle_state state_apm_idle = {
+ .name = "APM-IDLE",
+ .desc = "APM idle routine",
+ .exit_latency = 1,
+ .target_residency = 1,
+ .enter = &apm_idle_wrapper,
+};
+
+static struct cpuidle_driver apm_idle_driver = {
+ .name = "apm_idle",
+ .owner = THIS_MODULE,
+ .priority = 1,
+};
+
+static int apm_setup_cpuidle(int cpu)
+{
+ struct cpuidle_device *dev = kzalloc(sizeof(struct cpuidle_device),
+ GFP_KERNEL);
+ int count = CPUIDLE_DRIVER_STATE_START;
+ dev->cpu = cpu;
+ dev->drv = &apm_idle_driver;
+
+ dev->states[count] = state_apm_idle;
+ count++;
+
+ dev->state_count = count;
+
+ if (cpuidle_register_device(dev))
+ return -EIO;
+ return 0;
+}
+
+static int apm_idle_init(void)
+{
+ int retval, i;
+ retval = cpuidle_register_driver(&apm_idle_driver);
+
+ for_each_online_cpu(i) {
+ apm_setup_cpuidle(i);
+ }
+
+ return 0;
+}
+
+static void apm_idle_exit(void)
+{
+ cpuidle_unregister_driver(&apm_idle_driver);
+ return;
+}
+
/*
* Just start the APM thread. We do NOT want to do APM BIOS
* calls from anything but the APM thread, if for no other reason
@@ -2393,8 +2445,7 @@ static int __init apm_init(void)
if (HZ != 100)
idle_period = (idle_period * HZ) / 100;
if (idle_threshold < 100) {
- original_pm_idle = pm_idle;
- pm_idle = apm_cpu_idle;
+ apm_idle_init();
set_pm_idle = 1;
}

@@ -2406,13 +2457,7 @@ static void __exit apm_exit(void)
int error;

if (set_pm_idle) {
- pm_idle = original_pm_idle;
- /*
- * We are about to unload the current idle thread pm callback
- * (pm_idle), Wait for all processors to update cached/local
- * copies of pm_idle before proceeding.
- */
- cpu_idle_wait();
+ apm_idle_exit();
}
if (((apm_info.bios.flags & APM_BIOS_DISENGAGED) == 0)
&& (apm_info.connection_version > 0x0100)) {

2011-03-22 14:53:08

by Konrad Rzeszutek Wilk

[permalink] [raw]

Subject: Re: [RFC PATCH V4 4/5] cpuidle: driver for xen

On Tue, Mar 22, 2011 at 06:03:28PM +0530, Trinabh Gupta wrote:
> This patch implements a default cpuidle driver for xen.
> Earlier pm_idle was flipped to default_idle. Maybe there
> is a better way to ensure default_idle is called
> without using this cpuidle driver.

Please also CC the Xen devel mailing list (I did this for you)

I couldn't find it in the description, but I was wondering
what is that you are trying to fix? What is the problem description?

Two, you mention in your description that it was tested on x86 systems. did
you test this with Xen? If so, what version.

>
> Signed-off-by: Trinabh Gupta <[email protected]>
> ---
>
> arch/x86/xen/setup.c | 42 +++++++++++++++++++++++++++++++++++++++++-
> 1 files changed, 41 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index a8a66a5..4fce4cd 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -9,6 +9,8 @@
> #include <linux/mm.h>
> #include <linux/pm.h>
> #include <linux/memblock.h>
> +#include <linux/cpuidle.h>
> +#include <linux/slab.h>
>
> #include <asm/elf.h>
> #include <asm/vdso.h>
> @@ -321,6 +323,44 @@ void __cpuinit xen_enable_syscall(void)
> #endif /* CONFIG_X86_64 */
> }
>
> +static struct cpuidle_driver xen_idle_driver = {
> + .name = "xen_idle",
> + .owner = THIS_MODULE,
> + .priority = 1,
> +};
> +
> +extern struct cpuidle_state state_default_state;
> +
> +static int setup_cpuidle(int cpu)
> +{
> + struct cpuidle_device *dev = kzalloc(sizeof(struct cpuidle_device),
> + GFP_KERNEL);

No checking to see if dev == NULL?
> + int count = CPUIDLE_DRIVER_STATE_START;
> + dev->cpu = cpu;
> + dev->drv = &xen_idle_driver;
> +
> + dev->states[count] = state_default_idle;
> + count++;
> +
> + dev->state_count = count;
> +
> + if (cpuidle_register_device(dev))
> + return -EIO;
No cleanup of the 'dev' so that we don't leak memory?

> + return 0;
> +}
> +
> +static int xen_idle_init(void)
> +{
> + int retval, i;
> + retval = cpuidle_register_driver(&xen_idle_driver);
> +
> + for_each_online_cpu(i) {
> + setup_cpuidle(i);
> + }
> +
> + return 0;
> +}
> +
> void __init xen_arch_setup(void)
> {
> xen_panic_handler_init();
> @@ -354,7 +394,7 @@ void __init xen_arch_setup(void)
> #ifdef CONFIG_X86_32
> boot_cpu_data.hlt_works_ok = 1;
> #endif
> - pm_idle = default_idle;
> + xen_idle_init();
> boot_option_idle_override = IDLE_HALT;
>
> fiddle_vdso();
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2011-03-23 01:00:58

by Stephen Rothwell

[permalink] [raw]

Subject: Re: [RFC PATCH V4 1/5] cpuidle: Remove pm_idle pointer for x86

Hi,

Just some simple comments.

Does having this patch first in the series break APM idle?

On Tue, 22 Mar 2011 18:02:27 +0530 Trinabh Gupta <[email protected]> wrote:
>
> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> index 8d12878..17b7101 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -74,6 +74,8 @@ static inline void play_dead(void)
> }
> #endif
>
> +extern void cpuidle_idle_call(void);

Put this declaration in a header file and include that header file here
and in the file that defines that function.

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/

Attachments:

(No filename) (696.00 B)
(No filename) (490.00 B)
Download all attachments

2011-03-23 01:15:09

by Stephen Rothwell

[permalink] [raw]

Subject: Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm

Hi,

On Tue, 22 Mar 2011 18:03:40 +0530 Trinabh Gupta <[email protected]> wrote:
>
> +static int apm_setup_cpuidle(int cpu)
> +{
> + struct cpuidle_device *dev = kzalloc(sizeof(struct cpuidle_device),
> + GFP_KERNEL);

Same as xen comments: no NULL check.

> + int count = CPUIDLE_DRIVER_STATE_START;
> + dev->cpu = cpu;
> + dev->drv = &apm_idle_driver;

Also wondering why you would ever have a different idle routine on
different cpus?

> +
> + dev->states[count] = state_apm_idle;
> + count++;
> +
> + dev->state_count = count;
> +
> + if (cpuidle_register_device(dev))
> + return -EIO;

And we leak dev.

> +static void apm_idle_exit(void)
> +{
> + cpuidle_unregister_driver(&apm_idle_driver);

Do we leak the per cpu device structure here?

> + return;

Unnecessary return statement.

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/

Attachments:

(No filename) (907.00 B)
(No filename) (490.00 B)
Download all attachments

2011-03-23 02:59:51

Subject: [RFC PATCH V4 0/5] cpuidle: Cleanup pm_idle and include driver/cpuidle.c in-kernel

Subject: [RFC PATCH V4 1/5] cpuidle: Remove pm_idle pointer for x86

Subject: [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection

Subject: [RFC PATCH V4 3/5] cpuidle: default idle driver for x86

Subject: [RFC PATCH V4 4/5] cpuidle: driver for xen

Subject: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm

Subject: Re: [RFC PATCH V4 4/5] cpuidle: driver for xen

Subject: Re: [RFC PATCH V4 1/5] cpuidle: Remove pm_idle pointer for x86

Attachments:

Subject: Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm

Attachments:

Subject: Re: [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection

Subject: Re: [RFC PATCH V4 3/5] cpuidle: default idle driver for x86

Subject: Re: [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection

Subject: Re: [RFC PATCH V4 3/5] cpuidle: default idle driver for x86

Subject: Re: [RFC PATCH V4 4/5] cpuidle: driver for xen

Subject: Re: [RFC PATCH V4 1/5] cpuidle: Remove pm_idle pointer for x86

Subject: Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm

Subject: Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm

Subject: Re: [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection

Subject: Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm

Subject: Re: [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection

Subject: Re: [RFC PATCH V4 4/5] cpuidle: driver for xen

Subject: Re: [RFC PATCH V4 4/5] cpuidle: driver for xen

Subject: Re: [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection

Subject: Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm

Subject: Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm

Subject: Re: [RFC PATCH V4 3/5] cpuidle: default idle driver for x86

Subject: Re: [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection

Subject: Re: [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection

Subject: Re: [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection

Subject: Re: [RFC PATCH V4 4/5] cpuidle: driver for xen

Subject: Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm

Subject: Re: [Xen-devel] Re: [RFC PATCH V4 4/5] cpuidle: driver for xen

Subject: Re: [Xen-devel] Re: [RFC PATCH V4 4/5] cpuidle: driver for xen

Subject: Re: [Xen-devel] Re: [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection

Subject: Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm

Subject: Re: [Xen-devel] Re: [RFC PATCH V4 4/5] cpuidle: driver for xen

Subject: cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm)

Subject: Re: [Xen-devel] Re: [RFC PATCH V4 2/5] cpuidle: list based cpuidle driver registration and selection

Subject: Re: cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm)

Subject: Re: [Xen-devel] Re: [RFC PATCH V4 4/5] cpuidle: driver for xen

Subject: Re: [Xen-devel] Re: [RFC PATCH V4 4/5] cpuidle: driver for xen

Subject: Re: [Xen-devel] Re: [RFC PATCH V4 4/5] cpuidle: driver for xen

Subject: Re: cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm)

Subject: Re: cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm)

Subject: Re: cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm)

Subject: Re: cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm)

Subject: Re: cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm)

Subject: Re: cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm)

Subject: Re: cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm)

Subject: Re: cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm)

Subject: Re: cpuidle asymmetry (was Re: [RFC PATCH V4 5/5] cpuidle: cpuidle driver for apm)