2009-12-02 09:54:34

by Arun R Bharadwaj

[permalink] [raw]
Subject: [v10 PATCH 0/9] cpuidle: cleanup cpuidle/ introduce cpuidle to POWER

Hi,

This patchset introduces cpuidle infrastructure to POWER, prototyping
for pSeries, and also does a major refactoring of current x86 idle
power management and a cleanup of cpuidle infrastructure.

This patch series has been in discussion for quite a while now and
below are the links to the previous discussions.

Please consider this for inclusion into the -tip tree.

v9 --> http://lkml.org/lkml/2009/10/16/63
v8 --> http://lkml.org/lkml/2009/10/8/82
v7 --> http://lkml.org/lkml/2009/10/6/278
v6 --> http://lkml.org/lkml/2009/9/22/180
v5 --> http://lkml.org/lkml/2009/9/22/26
v4 --> http://lkml.org/lkml/2009/9/1/133
v3 --> http://lkml.org/lkml/2009/8/27/124
v2 --> http://lkml.org/lkml/2009/8/26/233
v1 --> http://lkml.org/lkml/2009/8/19/150


Change in this version:

Pavel noticed that the code which calls the cpuidle's idle
loop was repeated at many places. So this set optimizes it so
that we dont have repetition of code. The rest of the patches
are same as the earlier iteration.


arun


2009-12-02 09:55:52

by Arun R Bharadwaj

[permalink] [raw]
Subject: [v10 PATCH 1/9]: cpuidle: Design documentation patch

* Arun R Bharadwaj <[email protected]> [2009-12-02 15:24:27]:

This patch adds a little information about the redesigned cpuidle
infrastructure in Documentation/cpuidle/core.txt

Signed-off-by: Arun R Bharadwaj <[email protected]>
---
Documentation/cpuidle/core.txt | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)

Index: linux.trees.git/Documentation/cpuidle/core.txt
===================================================================
--- linux.trees.git.orig/Documentation/cpuidle/core.txt
+++ linux.trees.git/Documentation/cpuidle/core.txt
@@ -21,3 +21,38 @@ which can be used to switch governors at
is meant for developer testing only. In normal usage, kernel picks the
best governor based on governor ratings.
SEE ALSO: sysfs.txt in this directory.
+
+Design:
+
+Cpuidle allows for registration of multiple sets of idle routines.
+The latest registered set is used by cpuidle governors as the current
+active set to choose the right idle state. This set is managed as a
+list and each time the newly registered set is added to the head of the
+list and made the current active set.
+
+An example of how this would work on x86 is shown below.
+
+----------------- -----------------
+| | | |
+| choose b/w | mwait is chosen | mwait |
+| mwait, poll, |-------------------------------------> |(current active|
+| default, c1e | register to cpuidle | set) |
+| | with mwait as the idle routine | |
+----------------- -----------------
+
+
+----------------- -----------------
+| | | c1, c2, c3 |
+| ACPI | register to cpuidle | (current) |
+| discovery |-------------------------------------> |---------------|
+| | with c1, c2, c3 | mwait |
+| | as set of idle routines | |
+----------------- -----------------
+
+With this mechanism, a module can register and unregister its set of
+idle routines at run time in a clean manner.
+
+The main idle routine called inside cpu_idle() of every arch is defined in
+driver/cpuidle/cpuidle.c which would in turn call the idle routine selected
+by the governor. If the CONFIG_CPU_IDLE is disabled, the arch needs to
+provide an alternate definition for cpuidle_idle_call().

2009-12-02 09:58:42

by Arun R Bharadwaj

[permalink] [raw]
Subject: [v10 PATCH 2/9]: cpuidle: cleanup drivers/cpuidle/cpuidle.c

* Arun R Bharadwaj <[email protected]> [2009-12-02 15:24:27]:

This patch cleans up drivers/cpuidle/cpuidle.c
Earlier cpuidle assumed pm_idle as the default idle loop. Break that
assumption and make it more generic. cpuidle_idle_call() which is the
main idle loop of cpuidle is to be called by architectures which have
registered to cpuidle.

Signed-off-by: Arun R Bharadwaj <[email protected]>
---
drivers/cpuidle/cpuidle.c | 93 +++++++++++----------------------------------
drivers/cpuidle/cpuidle.h | 6 --
drivers/cpuidle/driver.c | 4 -
drivers/cpuidle/governor.c | 13 ++----
drivers/cpuidle/sysfs.c | 34 +++++++++-------
include/linux/cpuidle.h | 10 +++-
6 files changed, 58 insertions(+), 102 deletions(-)

Index: linux.trees.git/drivers/cpuidle/cpuidle.c
===================================================================
--- linux.trees.git.orig/drivers/cpuidle/cpuidle.c
+++ linux.trees.git/drivers/cpuidle/cpuidle.c
@@ -24,10 +24,6 @@
DEFINE_PER_CPU(struct cpuidle_device *, cpuidle_devices);

DEFINE_MUTEX(cpuidle_lock);
-LIST_HEAD(cpuidle_detected_devices);
-static void (*pm_idle_old)(void);
-
-static int enabled_devices;

#if defined(CONFIG_ARCH_HAS_CPU_IDLE_WAIT)
static void cpuidle_kick_cpus(void)
@@ -47,21 +43,20 @@ static int __cpuidle_register_device(str
*
* NOTE: no locks or semaphores should be used here
*/
-static void cpuidle_idle_call(void)
+void cpuidle_idle_call(void)
{
struct cpuidle_device *dev = __get_cpu_var(cpuidle_devices);
struct cpuidle_state *target_state;
int next_state;
+ ktime_t t1, t2;
+ s64 diff;

/* check if the device is ready */
if (!dev || !dev->enabled) {
- if (pm_idle_old)
- pm_idle_old();
- else
#if defined(CONFIG_ARCH_HAS_DEFAULT_IDLE)
- default_idle();
+ default_idle();
#else
- local_irq_enable();
+ local_irq_enable();
#endif
return;
}
@@ -75,7 +70,11 @@ static void cpuidle_idle_call(void)
hrtimer_peek_ahead_timers();
#endif
/* ask the governor for the next state */
- next_state = cpuidle_curr_governor->select(dev);
+ if (dev->state_count > 1)
+ next_state = cpuidle_curr_governor->select(dev);
+ else
+ next_state = 0;
+
if (need_resched()) {
local_irq_enable();
return;
@@ -85,7 +84,18 @@ static void cpuidle_idle_call(void)

/* enter the state and update stats */
dev->last_state = target_state;
- dev->last_residency = target_state->enter(dev, target_state);
+
+ t1 = ktime_get();
+
+ target_state->enter(dev, target_state);
+
+ t2 = ktime_get();
+ diff = ktime_to_us(ktime_sub(t2, t1));
+ if (diff > INT_MAX)
+ diff = INT_MAX;
+
+ dev->last_residency = (int) diff;
+
if (dev->last_state)
target_state = dev->last_state;

@@ -99,35 +109,12 @@ static void cpuidle_idle_call(void)
}

/**
- * cpuidle_install_idle_handler - installs the cpuidle idle loop handler
- */
-void cpuidle_install_idle_handler(void)
-{
- if (enabled_devices && (pm_idle != cpuidle_idle_call)) {
- /* Make sure all changes finished before we switch to new idle */
- smp_wmb();
- pm_idle = cpuidle_idle_call;
- }
-}
-
-/**
- * cpuidle_uninstall_idle_handler - uninstalls the cpuidle idle loop handler
- */
-void cpuidle_uninstall_idle_handler(void)
-{
- if (enabled_devices && pm_idle_old && (pm_idle != pm_idle_old)) {
- pm_idle = pm_idle_old;
- cpuidle_kick_cpus();
- }
-}
-
-/**
* cpuidle_pause_and_lock - temporarily disables CPUIDLE
*/
void cpuidle_pause_and_lock(void)
{
mutex_lock(&cpuidle_lock);
- cpuidle_uninstall_idle_handler();
+ cpuidle_kick_cpus();
}

EXPORT_SYMBOL_GPL(cpuidle_pause_and_lock);
@@ -137,7 +124,6 @@ EXPORT_SYMBOL_GPL(cpuidle_pause_and_lock
*/
void cpuidle_resume_and_unlock(void)
{
- cpuidle_install_idle_handler();
mutex_unlock(&cpuidle_lock);
}

@@ -185,7 +171,6 @@ int cpuidle_enable_device(struct cpuidle

dev->enabled = 1;

- enabled_devices++;
return 0;

fail_sysfs:
@@ -216,30 +201,16 @@ void cpuidle_disable_device(struct cpuid
cpuidle_curr_governor->disable(dev);

cpuidle_remove_state_sysfs(dev);
- enabled_devices--;
}

EXPORT_SYMBOL_GPL(cpuidle_disable_device);

#ifdef CONFIG_ARCH_HAS_CPU_RELAX
-static int poll_idle(struct cpuidle_device *dev, struct cpuidle_state *st)
+static void poll_idle(struct cpuidle_device *dev, struct cpuidle_state *st)
{
- ktime_t t1, t2;
- s64 diff;
- int ret;
-
- t1 = ktime_get();
local_irq_enable();
while (!need_resched())
cpu_relax();
-
- t2 = ktime_get();
- diff = ktime_to_us(ktime_sub(t2, t1));
- if (diff > INT_MAX)
- diff = INT_MAX;
-
- ret = (int) diff;
- return ret;
}

static void poll_idle_init(struct cpuidle_device *dev)
@@ -269,7 +240,6 @@ static void poll_idle_init(struct cpuidl
*/
static int __cpuidle_register_device(struct cpuidle_device *dev)
{
- int ret;
struct sys_device *sys_dev = get_cpu_sysdev((unsigned long)dev->cpu);

if (!sys_dev)
@@ -277,16 +247,9 @@ static int __cpuidle_register_device(str
if (!try_module_get(cpuidle_curr_driver->owner))
return -EINVAL;

- init_completion(&dev->kobj_unregister);
-
poll_idle_init(dev);

per_cpu(cpuidle_devices, dev->cpu) = dev;
- list_add(&dev->device_list, &cpuidle_detected_devices);
- if ((ret = cpuidle_add_sysfs(sys_dev))) {
- module_put(cpuidle_curr_driver->owner);
- return ret;
- }

dev->registered = 1;
return 0;
@@ -308,7 +271,6 @@ int cpuidle_register_device(struct cpuid
}

cpuidle_enable_device(dev);
- cpuidle_install_idle_handler();

mutex_unlock(&cpuidle_lock);

@@ -324,8 +286,6 @@ EXPORT_SYMBOL_GPL(cpuidle_register_devic
*/
void cpuidle_unregister_device(struct cpuidle_device *dev)
{
- struct sys_device *sys_dev = get_cpu_sysdev((unsigned long)dev->cpu);
-
if (dev->registered == 0)
return;

@@ -333,9 +293,6 @@ void cpuidle_unregister_device(struct cp

cpuidle_disable_device(dev);

- cpuidle_remove_sysfs(sys_dev);
- list_del(&dev->device_list);
- wait_for_completion(&dev->kobj_unregister);
per_cpu(cpuidle_devices, dev->cpu) = NULL;

cpuidle_resume_and_unlock();
@@ -387,8 +344,6 @@ static int __init cpuidle_init(void)
{
int ret;

- pm_idle_old = pm_idle;
-
ret = cpuidle_add_class_sysfs(&cpu_sysdev_class);
if (ret)
return ret;
Index: linux.trees.git/drivers/cpuidle/governor.c
===================================================================
--- linux.trees.git.orig/drivers/cpuidle/governor.c
+++ linux.trees.git/drivers/cpuidle/governor.c
@@ -43,16 +43,14 @@ static struct cpuidle_governor * __cpuid
*/
int cpuidle_switch_governor(struct cpuidle_governor *gov)
{
- struct cpuidle_device *dev;
+ int cpu;

if (gov == cpuidle_curr_governor)
return 0;

- cpuidle_uninstall_idle_handler();
-
if (cpuidle_curr_governor) {
- list_for_each_entry(dev, &cpuidle_detected_devices, device_list)
- cpuidle_disable_device(dev);
+ for_each_online_cpu(cpu)
+ cpuidle_disable_device(per_cpu(cpuidle_devices, cpu));
module_put(cpuidle_curr_governor->owner);
}

@@ -61,9 +59,8 @@ int cpuidle_switch_governor(struct cpuid
if (gov) {
if (!try_module_get(cpuidle_curr_governor->owner))
return -EINVAL;
- list_for_each_entry(dev, &cpuidle_detected_devices, device_list)
- cpuidle_enable_device(dev);
- cpuidle_install_idle_handler();
+ for_each_online_cpu(cpu)
+ cpuidle_enable_device(per_cpu(cpuidle_devices, cpu));
printk(KERN_INFO "cpuidle: using governor %s\n", gov->name);
}

Index: linux.trees.git/include/linux/cpuidle.h
===================================================================
--- linux.trees.git.orig/include/linux/cpuidle.h
+++ linux.trees.git/include/linux/cpuidle.h
@@ -41,7 +41,7 @@ struct cpuidle_state {
unsigned long long usage;
unsigned long long time; /* in US */

- int (*enter) (struct cpuidle_device *dev,
+ void (*enter) (struct cpuidle_device *dev,
struct cpuidle_state *state);
};

@@ -92,7 +92,6 @@ struct cpuidle_device {
struct cpuidle_state_kobj *kobjs[CPUIDLE_STATE_MAX];
struct cpuidle_state *last_state;

- struct list_head device_list;
struct kobject kobj;
struct completion kobj_unregister;
void *governor_data;
@@ -112,6 +111,9 @@ static inline int cpuidle_get_last_resid
return dev->last_residency;
}

+extern struct cpuidle_driver *cpuidle_curr_driver;
+extern void cpuidle_idle_call(void);
+

/****************************
* CPUIDLE DRIVER INTERFACE *
@@ -133,6 +135,8 @@ extern void cpuidle_pause_and_lock(void)
extern void cpuidle_resume_and_unlock(void);
extern int cpuidle_enable_device(struct cpuidle_device *dev);
extern void cpuidle_disable_device(struct cpuidle_device *dev);
+extern int common_idle_loop(struct cpuidle_device *dev,
+ struct cpuidle_state *st, void (*idle)(void));

#else

@@ -148,6 +152,8 @@ static inline void cpuidle_resume_and_un
static inline int cpuidle_enable_device(struct cpuidle_device *dev)
{return 0;}
static inline void cpuidle_disable_device(struct cpuidle_device *dev) { }
+static inline int common_idle_loop(struct cpuidle_device *dev,
+ struct cpuidle_state *st, void (*idle)(void)) { }

#endif

Index: linux.trees.git/drivers/cpuidle/cpuidle.h
===================================================================
--- linux.trees.git.orig/drivers/cpuidle/cpuidle.h
+++ linux.trees.git/drivers/cpuidle/cpuidle.h
@@ -9,9 +9,7 @@

/* For internal use only */
extern struct cpuidle_governor *cpuidle_curr_governor;
-extern struct cpuidle_driver *cpuidle_curr_driver;
extern struct list_head cpuidle_governors;
-extern struct list_head cpuidle_detected_devices;
extern struct mutex cpuidle_lock;
extern spinlock_t cpuidle_driver_lock;

@@ -27,7 +25,7 @@ extern int cpuidle_add_class_sysfs(struc
extern void cpuidle_remove_class_sysfs(struct sysdev_class *cls);
extern int cpuidle_add_state_sysfs(struct cpuidle_device *device);
extern void cpuidle_remove_state_sysfs(struct cpuidle_device *device);
-extern int cpuidle_add_sysfs(struct sys_device *sysdev);
-extern void cpuidle_remove_sysfs(struct sys_device *sysdev);
+extern int cpuidle_add_sysfs(struct cpuidle_device *device);
+extern void cpuidle_remove_sysfs(struct cpuidle_device *device);

#endif /* __DRIVER_CPUIDLE_H */
Index: linux.trees.git/drivers/cpuidle/sysfs.c
===================================================================
--- linux.trees.git.orig/drivers/cpuidle/sysfs.c
+++ linux.trees.git/drivers/cpuidle/sysfs.c
@@ -311,6 +311,13 @@ int cpuidle_add_state_sysfs(struct cpuid
int i, ret = -ENOMEM;
struct cpuidle_state_kobj *kobj;

+ init_completion(&device->kobj_unregister);
+
+ ret = cpuidle_add_sysfs(device);
+ if (ret) {
+ module_put(cpuidle_curr_driver->owner);
+ return ret;
+ }
/* state statistics */
for (i = 0; i < device->state_count; i++) {
kobj = kzalloc(sizeof(struct cpuidle_state_kobj), GFP_KERNEL);
@@ -347,35 +354,32 @@ void cpuidle_remove_state_sysfs(struct c

for (i = 0; i < device->state_count; i++)
cpuidle_free_state_kobj(device, i);
+
+ cpuidle_remove_sysfs(device);
}

/**
* cpuidle_add_sysfs - creates a sysfs instance for the target device
- * @sysdev: the target device
+ * @device: the target device
*/
-int cpuidle_add_sysfs(struct sys_device *sysdev)
+int cpuidle_add_sysfs(struct cpuidle_device *device)
{
- int cpu = sysdev->id;
- struct cpuidle_device *dev;
int error;
+ struct sys_device *sysdev = get_cpu_sysdev((unsigned long)device->cpu);

- dev = per_cpu(cpuidle_devices, cpu);
- error = kobject_init_and_add(&dev->kobj, &ktype_cpuidle, &sysdev->kobj,
- "cpuidle");
+ error = kobject_init_and_add(&device->kobj, &ktype_cpuidle,
+ &sysdev->kobj, "cpuidle");
if (!error)
- kobject_uevent(&dev->kobj, KOBJ_ADD);
+ kobject_uevent(&device->kobj, KOBJ_ADD);
return error;
}

/**
* cpuidle_remove_sysfs - deletes a sysfs instance on the target device
- * @sysdev: the target device
+ * @device: the target device
*/
-void cpuidle_remove_sysfs(struct sys_device *sysdev)
+void cpuidle_remove_sysfs(struct cpuidle_device *device)
{
- int cpu = sysdev->id;
- struct cpuidle_device *dev;
-
- dev = per_cpu(cpuidle_devices, cpu);
- kobject_put(&dev->kobj);
+ kobject_put(&device->kobj);
+ wait_for_completion(&device->kobj_unregister);
}
Index: linux.trees.git/drivers/cpuidle/driver.c
===================================================================
--- linux.trees.git.orig/drivers/cpuidle/driver.c
+++ linux.trees.git/drivers/cpuidle/driver.c
@@ -27,10 +27,6 @@ int cpuidle_register_driver(struct cpuid
return -EINVAL;

spin_lock(&cpuidle_driver_lock);
- if (cpuidle_curr_driver) {
- spin_unlock(&cpuidle_driver_lock);
- return -EBUSY;
- }
cpuidle_curr_driver = drv;
spin_unlock(&cpuidle_driver_lock);

2009-12-02 09:58:05

by Arun R Bharadwaj

[permalink] [raw]
Subject: [v10 PATCH 3/9]: cpuidle: implement a list based approach to register a set of idle routines

* Arun R Bharadwaj <[email protected]> [2009-12-02 15:24:27]:

Implement a list based registering mechanism for architectures which
have multiple sets of idle routines which are to be registered.

Currently, in x86 it is done by merely setting pm_idle = idle_routine
and managing this pm_idle pointer is messy.

To give an example of how this mechanism works:
In x86, initially, idle routine is selected from the set of poll/mwait/
c1e/default idle loops. So the selected idle loop is registered in cpuidle
as one idle state cpuidle devices. Once ACPI comes up, it registers
another set of idle states on top of this state. Again, suppose a module
registers another set of idle loops, it is added to this list.

This provides a clean way of registering and unregistering idle state
routines.

In the current implementation, pm_idle is set as the current idle routine
being used and the old idle routine has to be maintained and when a module
registers/unregisters an idle routine, confusion arises.


Signed-off-by: Arun R Bharadwaj <[email protected]>
---
drivers/cpuidle/cpuidle.c | 54 ++++++++++++++++++++++++++++++++++++++++------
include/linux/cpuidle.h | 1
2 files changed, 48 insertions(+), 7 deletions(-)

Index: linux.trees.git/drivers/cpuidle/cpuidle.c
===================================================================
--- linux.trees.git.orig/drivers/cpuidle/cpuidle.c
+++ linux.trees.git/drivers/cpuidle/cpuidle.c
@@ -22,6 +22,7 @@
#include "cpuidle.h"

DEFINE_PER_CPU(struct cpuidle_device *, cpuidle_devices);
+DEFINE_PER_CPU(struct list_head, cpuidle_devices_list);

DEFINE_MUTEX(cpuidle_lock);

@@ -129,6 +130,45 @@ void cpuidle_resume_and_unlock(void)

EXPORT_SYMBOL_GPL(cpuidle_resume_and_unlock);

+int cpuidle_add_to_list(struct cpuidle_device *dev)
+{
+ int ret, cpu = dev->cpu;
+ struct cpuidle_device *old_dev;
+
+ if (!list_empty(&per_cpu(cpuidle_devices_list, cpu))) {
+ old_dev = list_first_entry(&per_cpu(cpuidle_devices_list, cpu),
+ struct cpuidle_device, idle_list);
+ cpuidle_remove_state_sysfs(old_dev);
+ }
+
+ list_add(&dev->idle_list, &per_cpu(cpuidle_devices_list, cpu));
+ ret = cpuidle_add_state_sysfs(dev);
+ return ret;
+}
+
+void cpuidle_remove_from_list(struct cpuidle_device *dev)
+{
+ struct cpuidle_device *temp_dev;
+ struct list_head *pos;
+ int ret, cpu = dev->cpu;
+
+ list_for_each(pos, &per_cpu(cpuidle_devices_list, cpu)) {
+ temp_dev = container_of(pos, struct cpuidle_device, idle_list);
+ if (dev == temp_dev) {
+ list_del(&temp_dev->idle_list);
+ cpuidle_remove_state_sysfs(temp_dev);
+ break;
+ }
+ }
+
+ if (!list_empty(&per_cpu(cpuidle_devices_list, cpu))) {
+ temp_dev = list_first_entry(&per_cpu(cpuidle_devices_list, cpu),
+ struct cpuidle_device, idle_list);
+ ret = cpuidle_add_state_sysfs(temp_dev);
+ }
+ cpuidle_kick_cpus();
+}
+
/**
* cpuidle_enable_device - enables idle PM for a CPU
* @dev: the CPU
@@ -153,9 +193,6 @@ int cpuidle_enable_device(struct cpuidle
return ret;
}

- if ((ret = cpuidle_add_state_sysfs(dev)))
- return ret;
-
if (cpuidle_curr_governor->enable &&
(ret = cpuidle_curr_governor->enable(dev)))
goto fail_sysfs;
@@ -174,7 +211,7 @@ int cpuidle_enable_device(struct cpuidle
return 0;

fail_sysfs:
- cpuidle_remove_state_sysfs(dev);
+ cpuidle_remove_from_list(dev);

return ret;
}
@@ -199,8 +236,6 @@ void cpuidle_disable_device(struct cpuid

if (cpuidle_curr_governor->disable)
cpuidle_curr_governor->disable(dev);
-
- cpuidle_remove_state_sysfs(dev);
}

EXPORT_SYMBOL_GPL(cpuidle_disable_device);
@@ -271,6 +306,7 @@ int cpuidle_register_device(struct cpuid
}

cpuidle_enable_device(dev);
+ cpuidle_add_to_list(dev);

mutex_unlock(&cpuidle_lock);

@@ -292,6 +328,7 @@ void cpuidle_unregister_device(struct cp
cpuidle_pause_and_lock();

cpuidle_disable_device(dev);
+ cpuidle_remove_from_list(dev);

per_cpu(cpuidle_devices, dev->cpu) = NULL;

@@ -342,12 +379,15 @@ static inline void latency_notifier_init
*/
static int __init cpuidle_init(void)
{
- int ret;
+ int ret, cpu;

ret = cpuidle_add_class_sysfs(&cpu_sysdev_class);
if (ret)
return ret;

+ for_each_possible_cpu(cpu)
+ INIT_LIST_HEAD(&per_cpu(cpuidle_devices_list, cpu));
+
latency_notifier_init(&cpuidle_latency_notifier);

return 0;
Index: linux.trees.git/include/linux/cpuidle.h
===================================================================
--- linux.trees.git.orig/include/linux/cpuidle.h
+++ linux.trees.git/include/linux/cpuidle.h
@@ -92,6 +92,7 @@ struct cpuidle_device {
struct cpuidle_state_kobj *kobjs[CPUIDLE_STATE_MAX];
struct cpuidle_state *last_state;

+ struct list_head idle_list;
struct kobject kobj;
struct completion kobj_unregister;
void *governor_data;

2009-12-02 09:59:44

by Arun R Bharadwaj

[permalink] [raw]
Subject: [v10 PATCH 4/9]: x86: refactor x86 idle power management code, remove all instances of pm_idle

* Arun R Bharadwaj <[email protected]> [2009-12-02 15:24:27]:

This patch cleans up x86 of all instances of pm_idle.

pm_idle which was earlier called from cpu_idle() idle loop
is replaced by cpuidle_idle_call.

x86 also registers to cpuidle when the idle routine is selected,
by populating the cpuidle_device data structure for each cpu.

This is replicated for apm module and for xen, which also used pm_idle.


Signed-off-by: Arun R Bharadwaj <[email protected]>
---
arch/x86/kernel/apm_32.c | 46 +++++++++++++++++++++++-
arch/x86/kernel/process.c | 78 +++++++++++++++++++++++++++++++-----------
arch/x86/kernel/process_32.c | 3 +
arch/x86/kernel/process_64.c | 3 +
arch/x86/xen/setup.c | 30 +++++++++++++++-
drivers/acpi/processor_core.c | 9 +++-
drivers/acpi/processor_idle.c | 44 ++++++++++-------------
7 files changed, 160 insertions(+), 53 deletions(-)

Index: linux.trees.git/arch/x86/kernel/process.c
===================================================================
--- linux.trees.git.orig/arch/x86/kernel/process.c
+++ linux.trees.git/arch/x86/kernel/process.c
@@ -10,6 +10,7 @@
#include <linux/clockchips.h>
#include <linux/random.h>
#include <linux/user-return-notifier.h>
+#include <linux/cpuidle.h>
#include <trace/events/power.h>
#include <linux/hw_breakpoint.h>
#include <asm/system.h>
@@ -241,12 +242,6 @@ int sys_vfork(struct pt_regs *regs)
unsigned long boot_option_idle_override = 0;
EXPORT_SYMBOL(boot_option_idle_override);

-/*
- * Powermanagement idle function, if any..
- */
-void (*pm_idle)(void);
-EXPORT_SYMBOL(pm_idle);
-
#ifdef CONFIG_X86_32
/*
* This halt magic was a workaround for ancient floppy DMA
@@ -326,17 +321,15 @@ static void do_nothing(void *unused)
}

/*
- * cpu_idle_wait - Used to ensure that all the CPUs discard old value of
- * pm_idle and update to new pm_idle value. Required while changing pm_idle
- * handler on SMP systems.
+ * cpu_idle_wait - Required while changing idle routine handler on SMP systems.
*
- * Caller must have changed pm_idle to the new value before the call. Old
- * pm_idle value will not be used by any CPU after the return of this function.
+ * Caller must have changed idle routine to the new value before the call. Old
+ * value will not be used by any CPU after the return of this function.
*/
void cpu_idle_wait(void)
{
smp_mb();
- /* kick all the CPUs so that they exit out of pm_idle */
+ /* kick all the CPUs so that they exit out of idle loop */
smp_call_function(do_nothing, NULL, 1);
}
EXPORT_SYMBOL_GPL(cpu_idle_wait);
@@ -515,15 +508,58 @@ static void c1e_idle(void)
default_idle();
}

+static void (*local_idle)(void);
+
+#ifndef CONFIG_CPU_IDLE
+void cpuidle_idle_call(void)
+{
+ if (local_idle)
+ local_idle();
+ else
+ default_idle();
+}
+#endif
+
+DEFINE_PER_CPU(struct cpuidle_device, idle_devices);
+
+struct cpuidle_driver cpuidle_default_driver = {
+ .name = "cpuidle_default",
+};
+
+static void local_idle_loop(struct cpuidle_device *dev,
+ struct cpuidle_state *st)
+{
+ local_idle();
+}
+
+static int setup_cpuidle_simple(void)
+{
+ struct cpuidle_device *dev;
+ int cpu;
+
+ if (!cpuidle_curr_driver)
+ cpuidle_register_driver(&cpuidle_default_driver);
+
+ for_each_online_cpu(cpu) {
+ dev = &per_cpu(idle_devices, cpu);
+ dev->cpu = cpu;
+ dev->states[0].enter = local_idle_loop;
+ dev->state_count = 1;
+ cpuidle_register_device(dev);
+ }
+ return 0;
+}
+device_initcall(setup_cpuidle_simple);
+
void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
{
#ifdef CONFIG_SMP
- if (pm_idle == poll_idle && smp_num_siblings > 1) {
+ if (local_idle == poll_idle && smp_num_siblings > 1) {
printk(KERN_WARNING "WARNING: polling idle and HT enabled,"
" performance may degrade.\n");
}
#endif
- if (pm_idle)
+ if (local_idle)
return;

if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
@@ -531,18 +567,20 @@ void __cpuinit select_idle_routine(const
* One CPU supports mwait => All CPUs supports mwait
*/
printk(KERN_INFO "using mwait in idle threads.\n");
- pm_idle = mwait_idle;
+ local_idle = mwait_idle;
} else if (check_c1e_idle(c)) {
printk(KERN_INFO "using C1E aware idle routine\n");
- pm_idle = c1e_idle;
+ local_idle = c1e_idle;
} else
- pm_idle = default_idle;
+ local_idle = default_idle;
+
+ return;
}

void __init init_c1e_mask(void)
{
/* If we're using c1e_idle, we need to allocate c1e_mask. */
- if (pm_idle == c1e_idle)
+ if (local_idle == c1e_idle)
zalloc_cpumask_var(&c1e_mask, GFP_KERNEL);
}

@@ -553,7 +591,7 @@ static int __init idle_setup(char *str)

if (!strcmp(str, "poll")) {
printk("using polling idle threads.\n");
- pm_idle = poll_idle;
+ local_idle = poll_idle;
} else if (!strcmp(str, "mwait"))
force_mwait = 1;
else if (!strcmp(str, "halt")) {
@@ -564,7 +602,7 @@ static int __init idle_setup(char *str)
* To continue to load the CPU idle driver, don't touch
* the boot_option_idle_override.
*/
- pm_idle = default_idle;
+ local_idle = default_idle;
idle_halt = 1;
return 0;
} else if (!strcmp(str, "nomwait")) {
Index: linux.trees.git/arch/x86/kernel/process_32.c
===================================================================
--- linux.trees.git.orig/arch/x86/kernel/process_32.c
+++ linux.trees.git/arch/x86/kernel/process_32.c
@@ -40,6 +40,7 @@
#include <linux/uaccess.h>
#include <linux/io.h>
#include <linux/kdebug.h>
+#include <linux/cpuidle.h>

#include <asm/pgtable.h>
#include <asm/system.h>
@@ -112,7 +113,7 @@ void cpu_idle(void)
local_irq_disable();
/* Don't trace irqs off for idle */
stop_critical_timings();
- pm_idle();
+ cpuidle_idle_call();
start_critical_timings();
}
tick_nohz_restart_sched_tick();
Index: linux.trees.git/arch/x86/kernel/process_64.c
===================================================================
--- linux.trees.git.orig/arch/x86/kernel/process_64.c
+++ linux.trees.git/arch/x86/kernel/process_64.c
@@ -39,6 +39,7 @@
#include <linux/io.h>
#include <linux/ftrace.h>
#include <linux/dmi.h>
+#include <linux/cpuidle.h>

#include <asm/pgtable.h>
#include <asm/system.h>
@@ -141,7 +142,7 @@ void cpu_idle(void)
enter_idle();
/* Don't trace irqs off for idle */
stop_critical_timings();
- pm_idle();
+ cpuidle_idle_call();
start_critical_timings();
/* In many cases the interrupt that ended idle
has already called exit_idle. But some idle
Index: linux.trees.git/arch/x86/kernel/apm_32.c
===================================================================
--- linux.trees.git.orig/arch/x86/kernel/apm_32.c
+++ linux.trees.git/arch/x86/kernel/apm_32.c
@@ -227,6 +227,7 @@
#include <linux/suspend.h>
#include <linux/kthread.h>
#include <linux/jiffies.h>
+#include <linux/cpuidle.h>

#include <asm/system.h>
#include <asm/uaccess.h>
@@ -2255,6 +2256,46 @@ static struct dmi_system_id __initdata a
{ }
};

+DEFINE_PER_CPU(struct cpuidle_device, apm_idle_devices);
+
+struct cpuidle_driver cpuidle_apm_driver = {
+ .name = "cpuidle_apm",
+};
+
+static void apm_idle_loop(struct cpuidle_device *dev, struct cpuidle_state *st)
+{
+ apm_cpu_idle();
+}
+
+static void setup_cpuidle_apm(void)
+{
+ struct cpuidle_device *dev;
+ int cpu;
+
+ if (!cpuidle_curr_driver)
+ cpuidle_register_driver(&cpuidle_apm_driver);
+
+ for_each_online_cpu(cpu) {
+ dev = &per_cpu(apm_idle_devices, cpu);
+ dev->cpu = cpu;
+ dev->states[0].enter = apm_idle_loop;
+ dev->state_count = 1;
+ cpuidle_register_device(dev);
+ }
+}
+
+void exit_cpuidle_apm(void)
+{
+ struct cpuidle_device *dev;
+ int cpu;
+
+ for_each_online_cpu(cpu) {
+ dev = &per_cpu(apm_idle_devices, cpu);
+ cpuidle_unregister_device(dev);
+ }
+}
+
+
/*
* Just start the APM thread. We do NOT want to do APM BIOS
* calls from anything but the APM thread, if for no other reason
@@ -2392,8 +2433,7 @@ static int __init apm_init(void)
if (HZ != 100)
idle_period = (idle_period * HZ) / 100;
if (idle_threshold < 100) {
- original_pm_idle = pm_idle;
- pm_idle = apm_cpu_idle;
+ setup_cpuidle_apm();
set_pm_idle = 1;
}

@@ -2405,7 +2445,7 @@ static void __exit apm_exit(void)
int error;

if (set_pm_idle) {
- pm_idle = original_pm_idle;
+ exit_cpuidle_apm();
/*
* We are about to unload the current idle thread pm callback
* (pm_idle), Wait for all processors to update cached/local
Index: linux.trees.git/arch/x86/xen/setup.c
===================================================================
--- linux.trees.git.orig/arch/x86/xen/setup.c
+++ linux.trees.git/arch/x86/xen/setup.c
@@ -8,6 +8,7 @@
#include <linux/sched.h>
#include <linux/mm.h>
#include <linux/pm.h>
+#include <linux/cpuidle.h>

#include <asm/elf.h>
#include <asm/vdso.h>
@@ -151,6 +152,33 @@ void __cpuinit xen_enable_syscall(void)
#endif /* CONFIG_X86_64 */
}

+DEFINE_PER_CPU(struct cpuidle_device, xen_idle_devices);
+struct cpuidle_driver cpuidle_xen_driver = {
+ .name = "cpuidle_xen",
+};
+
+static void xen_idle_loop(struct cpuidle_device *dev, struct cpuidle_state *st)
+{
+ xen_idle();
+}
+
+static void setup_cpuidle_xen(void)
+{
+ struct cpuidle_device *dev;
+ int cpu;
+
+ if (!cpuidle_curr_driver)
+ cpuidle_register_driver(&cpuidle_xen_driver);
+
+ for_each_online_cpu(cpu) {
+ dev = &per_cpu(xen_idle_devices, cpu);
+ dev->cpu = cpu;
+ dev->states[0].enter = xen_idle_loop;
+ dev->state_count = 1;
+ cpuidle_register_device(dev);
+ }
+}
+
void __init xen_arch_setup(void)
{
struct physdev_set_iopl set_iopl;
@@ -186,7 +214,7 @@ void __init xen_arch_setup(void)
MAX_GUEST_CMDLINE > COMMAND_LINE_SIZE ?
COMMAND_LINE_SIZE : MAX_GUEST_CMDLINE);

- pm_idle = xen_idle;
+ setup_cpuidle_xen();

paravirt_disable_iospace();

Index: linux.trees.git/drivers/acpi/processor_core.c
===================================================================
--- linux.trees.git.orig/drivers/acpi/processor_core.c
+++ linux.trees.git/drivers/acpi/processor_core.c
@@ -1150,9 +1150,12 @@ static int __init acpi_processor_init(vo
* should not use mwait for CPU-states.
*/
dmi_check_system(processor_idle_dmi_table);
- result = cpuidle_register_driver(&acpi_idle_driver);
- if (result < 0)
- goto out_proc;
+
+ if (!boot_option_idle_override) {
+ result = cpuidle_register_driver(&acpi_idle_driver);
+ if (result < 0)
+ goto out_proc;
+ }

result = acpi_bus_register_driver(&acpi_processor_driver);
if (result < 0)
Index: linux.trees.git/drivers/acpi/processor_idle.c
===================================================================
--- linux.trees.git.orig/drivers/acpi/processor_idle.c
+++ linux.trees.git/drivers/acpi/processor_idle.c
@@ -821,18 +821,16 @@ static inline void acpi_idle_do_entry(st
*
* This is equivalent to the HALT instruction.
*/
-static int acpi_idle_enter_c1(struct cpuidle_device *dev,
+static void acpi_idle_enter_c1(struct cpuidle_device *dev,
struct cpuidle_state *state)
{
- ktime_t kt1, kt2;
- s64 idle_time;
struct acpi_processor *pr;
struct acpi_processor_cx *cx = cpuidle_get_statedata(state);

pr = __get_cpu_var(processors);

if (unlikely(!pr))
- return 0;
+ return;

local_irq_disable();

@@ -840,20 +838,15 @@ static int acpi_idle_enter_c1(struct cpu
if (acpi_idle_suspend) {
local_irq_enable();
cpu_relax();
- return 0;
+ return;
}

lapic_timer_state_broadcast(pr, cx, 1);
- kt1 = ktime_get_real();
acpi_idle_do_entry(cx);
- kt2 = ktime_get_real();
- idle_time = ktime_to_us(ktime_sub(kt2, kt1));

local_irq_enable();
cx->usage++;
lapic_timer_state_broadcast(pr, cx, 0);
-
- return idle_time;
}

/**
@@ -861,7 +854,7 @@ static int acpi_idle_enter_c1(struct cpu
* @dev: the target CPU
* @state: the state data
*/
-static int acpi_idle_enter_simple(struct cpuidle_device *dev,
+static void acpi_idle_enter_simple(struct cpuidle_device *dev,
struct cpuidle_state *state)
{
struct acpi_processor *pr;
@@ -873,10 +866,12 @@ static int acpi_idle_enter_simple(struct
pr = __get_cpu_var(processors);

if (unlikely(!pr))
- return 0;
+ return;

- if (acpi_idle_suspend)
- return(acpi_idle_enter_c1(dev, state));
+ if (acpi_idle_suspend) {
+ acpi_idle_enter_c1(dev, state);
+ return;
+ }

local_irq_disable();
current_thread_info()->status &= ~TS_POLLING;
@@ -889,7 +884,7 @@ static int acpi_idle_enter_simple(struct
if (unlikely(need_resched())) {
current_thread_info()->status |= TS_POLLING;
local_irq_enable();
- return 0;
+ return;
}

/*
@@ -920,7 +915,6 @@ static int acpi_idle_enter_simple(struct

lapic_timer_state_broadcast(pr, cx, 0);
cx->time += sleep_ticks;
- return idle_time;
}

static int c3_cpu_count;
@@ -933,7 +927,7 @@ static DEFINE_SPINLOCK(c3_lock);
*
* If BM is detected, the deepest non-C3 idle state is entered instead.
*/
-static int acpi_idle_enter_bm(struct cpuidle_device *dev,
+static void acpi_idle_enter_bm(struct cpuidle_device *dev,
struct cpuidle_state *state)
{
struct acpi_processor *pr;
@@ -946,20 +940,23 @@ static int acpi_idle_enter_bm(struct cpu
pr = __get_cpu_var(processors);

if (unlikely(!pr))
- return 0;
+ return;

- if (acpi_idle_suspend)
- return(acpi_idle_enter_c1(dev, state));
+ if (acpi_idle_suspend) {
+ acpi_idle_enter_c1(dev, state);
+ return;
+ }

if (acpi_idle_bm_check()) {
if (dev->safe_state) {
dev->last_state = dev->safe_state;
- return dev->safe_state->enter(dev, dev->safe_state);
+ dev->safe_state->enter(dev, dev->safe_state);
+ return;
} else {
local_irq_disable();
acpi_safe_halt();
local_irq_enable();
- return 0;
+ return;
}
}

@@ -974,7 +971,7 @@ static int acpi_idle_enter_bm(struct cpu
if (unlikely(need_resched())) {
current_thread_info()->status |= TS_POLLING;
local_irq_enable();
- return 0;
+ return;
}

acpi_unlazy_tlb(smp_processor_id());
@@ -1032,7 +1029,6 @@ static int acpi_idle_enter_bm(struct cpu

lapic_timer_state_broadcast(pr, cx, 0);
cx->time += sleep_ticks;
- return idle_time;
}

struct cpuidle_driver acpi_idle_driver = {

2009-12-02 10:00:28

by Arun R Bharadwaj

[permalink] [raw]
Subject: [v10 PATCH 5/9]: POWER: enable cpuidle for POWER.

* Arun R Bharadwaj <[email protected]> [2009-12-02 15:24:27]:

This patch enables the cpuidle option in Kconfig for pSeries.

Currently cpuidle infrastructure is enabled only for x86 and ARM.
This code is almost completely borrowed from x86 to enable
cpuidle for pSeries.

Signed-off-by: Arun R Bharadwaj <[email protected]>
---
arch/powerpc/Kconfig | 9 +++++++++
arch/powerpc/include/asm/system.h | 2 ++
arch/powerpc/kernel/idle.c | 19 +++++++++++++++++++
3 files changed, 30 insertions(+)

Index: linux.trees.git/arch/powerpc/Kconfig
===================================================================
--- linux.trees.git.orig/arch/powerpc/Kconfig
+++ linux.trees.git/arch/powerpc/Kconfig
@@ -91,6 +91,9 @@ config ARCH_HAS_ILOG2_U64
bool
default y if 64BIT

+config ARCH_HAS_CPU_IDLE_WAIT
+ def_bool y
+
config GENERIC_HWEIGHT
bool
default y
@@ -247,6 +250,12 @@ source "kernel/Kconfig.freezer"
source "arch/powerpc/sysdev/Kconfig"
source "arch/powerpc/platforms/Kconfig"

+menu "Power management options"
+
+source "drivers/cpuidle/Kconfig"
+
+endmenu
+
menu "Kernel options"

config HIGHMEM
Index: linux.trees.git/arch/powerpc/include/asm/system.h
===================================================================
--- linux.trees.git.orig/arch/powerpc/include/asm/system.h
+++ linux.trees.git/arch/powerpc/include/asm/system.h
@@ -546,5 +546,7 @@ extern void account_system_vtime(struct

extern struct dentry *powerpc_debugfs_root;

+void cpu_idle_wait(void);
+
#endif /* __KERNEL__ */
#endif /* _ASM_POWERPC_SYSTEM_H */
Index: linux.trees.git/arch/powerpc/kernel/idle.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/kernel/idle.c
+++ linux.trees.git/arch/powerpc/kernel/idle.c
@@ -102,6 +102,25 @@ void cpu_idle(void)
}
}

+static void do_nothing(void *unused)
+{
+}
+
+/*
+ * cpu_idle_wait - Used to ensure that all the CPUs come out of the old
+ * idle loop and start using the new idle loop.
+ * Required while changing idle handler on SMP systems.
+ * Caller must have changed idle handler to the new value before the call.
+ */
+void cpu_idle_wait(void)
+{
+ /* Ensure that new value of idle is set */
+ smp_mb();
+ /* kick all the CPUs so that they exit out of old idle routine */
+ smp_call_function(do_nothing, NULL, 1);
+}
+EXPORT_SYMBOL_GPL(cpu_idle_wait);
+
int powersave_nap;

#ifdef CONFIG_SYSCTL

2009-12-02 10:01:14

by Arun R Bharadwaj

[permalink] [raw]
Subject: [v10 PATCH 6/9]: pSeries/cpuidle: refactor pseries idle loops

* Arun R Bharadwaj <[email protected]> [2009-12-02 15:24:27]:

This patch removes the routines, pseries_shared_idle_sleep and
pseries_dedicated_idle_sleep, since this is implemented as a part
of arch/powerpc/platform/pseries/processor_idle.c

Also, similar to x86, call cpuidle_idle_call from cpu_idle() idle
loop instead of ppc_md.power_save.


Signed-off-by: Arun R Bharadwaj <[email protected]>
---
arch/powerpc/kernel/idle.c | 58 ++++++++++-----------
arch/powerpc/platforms/pseries/setup.c | 89 ---------------------------------
2 files changed, 30 insertions(+), 117 deletions(-)

Index: linux.trees.git/arch/powerpc/platforms/pseries/setup.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/pseries/setup.c
+++ linux.trees.git/arch/powerpc/platforms/pseries/setup.c
@@ -75,9 +75,6 @@ EXPORT_SYMBOL(CMO_PageSize);

int fwnmi_active; /* TRUE if an FWNMI handler is present */

-static void pseries_shared_idle_sleep(void);
-static void pseries_dedicated_idle_sleep(void);
-
static struct device_node *pSeries_mpic_node;

static void pSeries_show_cpuinfo(struct seq_file *m)
@@ -297,18 +294,8 @@ static void __init pSeries_setup_arch(vo
pSeries_nvram_init();

/* Choose an idle loop */
- if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
+ if (firmware_has_feature(FW_FEATURE_SPLPAR))
vpa_init(boot_cpuid);
- if (get_lppaca()->shared_proc) {
- printk(KERN_DEBUG "Using shared processor idle loop\n");
- ppc_md.power_save = pseries_shared_idle_sleep;
- } else {
- printk(KERN_DEBUG "Using dedicated idle loop\n");
- ppc_md.power_save = pseries_dedicated_idle_sleep;
- }
- } else {
- printk(KERN_DEBUG "Using default idle loop\n");
- }

if (firmware_has_feature(FW_FEATURE_LPAR))
ppc_md.enable_pmcs = pseries_lpar_enable_pmcs;
@@ -496,80 +483,6 @@ static int __init pSeries_probe(void)
return 1;
}

-
-DECLARE_PER_CPU(unsigned long, smt_snooze_delay);
-
-static void pseries_dedicated_idle_sleep(void)
-{
- unsigned int cpu = smp_processor_id();
- unsigned long start_snooze;
- unsigned long in_purr, out_purr;
-
- /*
- * Indicate to the HV that we are idle. Now would be
- * a good time to find other work to dispatch.
- */
- get_lppaca()->idle = 1;
- get_lppaca()->donate_dedicated_cpu = 1;
- in_purr = mfspr(SPRN_PURR);
-
- /*
- * We come in with interrupts disabled, and need_resched()
- * has been checked recently. If we should poll for a little
- * while, do so.
- */
- if (__get_cpu_var(smt_snooze_delay)) {
- start_snooze = get_tb() +
- __get_cpu_var(smt_snooze_delay) * tb_ticks_per_usec;
- local_irq_enable();
- set_thread_flag(TIF_POLLING_NRFLAG);
-
- while (get_tb() < start_snooze) {
- if (need_resched() || cpu_is_offline(cpu))
- goto out;
- ppc64_runlatch_off();
- HMT_low();
- HMT_very_low();
- }
-
- HMT_medium();
- clear_thread_flag(TIF_POLLING_NRFLAG);
- smp_mb();
- local_irq_disable();
- if (need_resched() || cpu_is_offline(cpu))
- goto out;
- }
-
- cede_processor();
-
-out:
- HMT_medium();
- out_purr = mfspr(SPRN_PURR);
- get_lppaca()->wait_state_cycles += out_purr - in_purr;
- get_lppaca()->donate_dedicated_cpu = 0;
- get_lppaca()->idle = 0;
-}
-
-static void pseries_shared_idle_sleep(void)
-{
- /*
- * Indicate to the HV that we are idle. Now would be
- * a good time to find other work to dispatch.
- */
- get_lppaca()->idle = 1;
-
- /*
- * Yield the processor to the hypervisor. We return if
- * an external interrupt occurs (which are driven prior
- * to returning here) or if a prod occurs from another
- * processor. When returning here, external interrupts
- * are enabled.
- */
- cede_processor();
-
- get_lppaca()->idle = 0;
-}
-
static int pSeries_pci_probe_mode(struct pci_bus *bus)
{
if (firmware_has_feature(FW_FEATURE_LPAR))
Index: linux.trees.git/arch/powerpc/kernel/idle.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/kernel/idle.c
+++ linux.trees.git/arch/powerpc/kernel/idle.c
@@ -25,6 +25,7 @@
#include <linux/cpu.h>
#include <linux/sysctl.h>
#include <linux/tick.h>
+#include <linux/cpuidle.h>

#include <asm/system.h>
#include <asm/processor.h>
@@ -46,6 +47,14 @@ static int __init powersave_off(char *ar
}
__setup("powersave=off", powersave_off);

+#ifndef CONFIG_CPU_IDLE
+void cpuidle_idle_call(void)
+{
+ local_irq_enable();
+ cpu_relax();
+}
+#endif
+
/*
* The body of the idle task.
*/
@@ -60,35 +69,26 @@ void cpu_idle(void)
while (!need_resched() && !cpu_should_die()) {
ppc64_runlatch_off();

- if (ppc_md.power_save) {
- clear_thread_flag(TIF_POLLING_NRFLAG);
- /*
- * smp_mb is so clearing of TIF_POLLING_NRFLAG
- * is ordered w.r.t. need_resched() test.
- */
- smp_mb();
- local_irq_disable();
-
- /* Don't trace irqs off for idle */
- stop_critical_timings();
-
- /* check again after disabling irqs */
- if (!need_resched() && !cpu_should_die())
- ppc_md.power_save();
-
- start_critical_timings();
-
- local_irq_enable();
- set_thread_flag(TIF_POLLING_NRFLAG);
-
- } else {
- /*
- * Go into low thread priority and possibly
- * low power mode.
- */
- HMT_low();
- HMT_very_low();
- }
+ clear_thread_flag(TIF_POLLING_NRFLAG);
+ /*
+ * smp_mb is so clearing of TIF_POLLING_NRFLAG
+ * is ordered w.r.t. need_resched() test.
+ */
+ smp_mb();
+ local_irq_disable();
+
+ /* Don't trace irqs off for idle */
+ stop_critical_timings();
+
+ /* check again after disabling irqs */
+ if (!need_resched() && !cpu_should_die())
+ cpuidle_idle_call();
+
+ start_critical_timings();
+
+ local_irq_enable();
+ set_thread_flag(TIF_POLLING_NRFLAG);
+
}

HMT_medium();

2009-12-02 10:01:57

by Arun R Bharadwaj

[permalink] [raw]
Subject: [v10 PATCH 7/9]: POWER: add a default_idle idle loop for POWER

* Arun R Bharadwaj <[email protected]> [2009-12-02 15:24:27]:

In arch/powerpc/kernel/idle.c create a default_idle() routine by moving
the failover condition of the cpu_idle() idle loop. This is needed by
cpuidle infrastructure to call default_idle when other idle routines
are not yet registered. Functionality remains the same, but the code is
slightly moved around.


Signed-off-by: Arun R Bharadwaj <[email protected]>
---
arch/powerpc/Kconfig | 3 +++
arch/powerpc/include/asm/system.h | 1 +
arch/powerpc/kernel/idle.c | 6 ++++++
3 files changed, 10 insertions(+)

Index: linux.trees.git/arch/powerpc/Kconfig
===================================================================
--- linux.trees.git.orig/arch/powerpc/Kconfig
+++ linux.trees.git/arch/powerpc/Kconfig
@@ -94,6 +94,9 @@ config ARCH_HAS_ILOG2_U64
config ARCH_HAS_CPU_IDLE_WAIT
def_bool y

+config ARCH_HAS_DEFAULT_IDLE
+ def_bool y
+
config GENERIC_HWEIGHT
bool
default y
Index: linux.trees.git/arch/powerpc/include/asm/system.h
===================================================================
--- linux.trees.git.orig/arch/powerpc/include/asm/system.h
+++ linux.trees.git/arch/powerpc/include/asm/system.h
@@ -218,6 +218,7 @@ extern unsigned long klimit;
extern void *alloc_maybe_bootmem(size_t size, gfp_t mask);
extern void *zalloc_maybe_bootmem(size_t size, gfp_t mask);

+extern void default_idle(void);
extern int powersave_nap; /* set if nap mode can be used in idle loop */

/*
Index: linux.trees.git/arch/powerpc/kernel/idle.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/kernel/idle.c
+++ linux.trees.git/arch/powerpc/kernel/idle.c
@@ -121,6 +121,12 @@ void cpu_idle_wait(void)
}
EXPORT_SYMBOL_GPL(cpu_idle_wait);

+void default_idle(void)
+{
+ HMT_low();
+ HMT_very_low();
+}
+
int powersave_nap;

#ifdef CONFIG_SYSCTL

2009-12-02 10:02:59

by Arun R Bharadwaj

[permalink] [raw]
Subject: [v10 PATCH 8/9]: pSeries: implement pSeries processor idle module

* Arun R Bharadwaj <[email protected]> [2009-12-02 15:24:27]:

This patch creates arch/powerpc/platforms/pseries/processor_idle.c,
which implements the cpuidle infrastructure for pseries.
It implements a pseries_cpuidle_loop() which would be the main idle loop
called from cpu_idle(). It makes decision of entering either
dedicated_snooze_loop or dedicated_cede_loop for dedicated lpar and
shared_cede_loop for shared lpar processor based on the
decision taken by the cpuidle governor.

Signed-off-by: Arun R Bharadwaj <[email protected]>
---
arch/powerpc/include/asm/system.h | 6
arch/powerpc/kernel/sysfs.c | 2
arch/powerpc/platforms/pseries/Makefile | 1
arch/powerpc/platforms/pseries/processor_idle.c | 196 ++++++++++++++++++++++++
arch/powerpc/platforms/pseries/pseries.h | 6
5 files changed, 211 insertions(+)

Index: linux.trees.git/arch/powerpc/platforms/pseries/Makefile
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/pseries/Makefile
+++ linux.trees.git/arch/powerpc/platforms/pseries/Makefile
@@ -26,3 +26,4 @@ obj-$(CONFIG_HCALL_STATS) += hvCall_inst
obj-$(CONFIG_PHYP_DUMP) += phyp_dump.o
obj-$(CONFIG_CMM) += cmm.o
obj-$(CONFIG_DTL) += dtl.o
+obj-$(CONFIG_CPU_IDLE) += processor_idle.o
Index: linux.trees.git/arch/powerpc/platforms/pseries/pseries.h
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/pseries/pseries.h
+++ linux.trees.git/arch/powerpc/platforms/pseries/pseries.h
@@ -10,6 +10,8 @@
#ifndef _PSERIES_PSERIES_H
#define _PSERIES_PSERIES_H

+#include <linux/cpuidle.h>
+
extern void __init fw_feature_init(const char *hypertas, unsigned long len);

struct pt_regs;
@@ -40,4 +42,8 @@ extern unsigned long rtas_poweron_auto;

extern void find_udbg_vterm(void);

+DECLARE_PER_CPU(unsigned long, smt_snooze_delay);
+
+extern struct cpuidle_driver pseries_idle_driver;
+
#endif /* _PSERIES_PSERIES_H */
Index: linux.trees.git/arch/powerpc/platforms/pseries/processor_idle.c
===================================================================
--- /dev/null
+++ linux.trees.git/arch/powerpc/platforms/pseries/processor_idle.c
@@ -0,0 +1,196 @@
+/*
+ * processor_idle - idle state cpuidle driver.
+ * Adapted from drivers/acpi/processor_idle.c
+ *
+ * Arun R Bharadwaj <[email protected]>
+ *
+ * Copyright (C) 2009 IBM Corporation.
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or (at
+ * your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/moduleparam.h>
+#include <linux/cpuidle.h>
+
+#include <asm/paca.h>
+#include <asm/reg.h>
+#include <asm/system.h>
+#include <asm/machdep.h>
+#include <asm/firmware.h>
+
+#include "plpar_wrappers.h"
+#include "pseries.h"
+
+MODULE_AUTHOR("Arun R Bharadwaj");
+MODULE_DESCRIPTION("pSeries Idle State Driver");
+MODULE_LICENSE("GPL");
+
+struct cpuidle_driver pseries_idle_driver = {
+ .name = "pseries_idle",
+ .owner = THIS_MODULE,
+};
+
+DEFINE_PER_CPU(struct cpuidle_device, pseries_dev);
+
+#define IDLE_STATE_COUNT 2
+
+/* pSeries Idle state Flags */
+#define PSERIES_DEDICATED_SNOOZE (0x01)
+#define PSERIES_DEDICATED_CEDE (0x02)
+#define PSERIES_SHARED_CEDE (0x03)
+
+static int pseries_idle_init(struct cpuidle_device *dev)
+{
+ return cpuidle_register_device(dev);
+}
+
+static void shared_cede_loop(void)
+{
+ get_lppaca()->idle = 1;
+ cede_processor();
+ get_lppaca()->idle = 0;
+}
+
+static void dedicated_snooze_loop(void)
+{
+ local_irq_enable();
+ set_thread_flag(TIF_POLLING_NRFLAG);
+ while (!need_resched()) {
+ ppc64_runlatch_off();
+ HMT_low();
+ HMT_very_low();
+ }
+ HMT_medium();
+ clear_thread_flag(TIF_POLLING_NRFLAG);
+ smp_mb();
+ local_irq_disable();
+}
+
+static void dedicated_cede_loop(void)
+{
+ ppc64_runlatch_off();
+ HMT_medium();
+ cede_processor();
+}
+
+static void pseries_cpuidle_loop(struct cpuidle_device *dev,
+ struct cpuidle_state *st)
+{
+ unsigned long in_purr, out_purr;
+
+ get_lppaca()->idle = 1;
+ get_lppaca()->donate_dedicated_cpu = 1;
+ in_purr = mfspr(SPRN_PURR);
+
+ if (st->flags & PSERIES_SHARED_CEDE)
+ shared_cede_loop();
+ else if (st->flags & PSERIES_DEDICATED_SNOOZE)
+ dedicated_snooze_loop();
+ else
+ dedicated_cede_loop();
+
+ out_purr = mfspr(SPRN_PURR);
+ get_lppaca()->wait_state_cycles += out_purr - in_purr;
+ get_lppaca()->donate_dedicated_cpu = 0;
+ get_lppaca()->idle = 0;
+}
+
+static int pseries_setup_cpuidle(struct cpuidle_device *dev, int cpu)
+{
+ int i;
+ struct cpuidle_state *state;
+
+ dev->cpu = cpu;
+
+ if (get_lppaca()->shared_proc) {
+ state = &dev->states[0];
+ snprintf(state->name, CPUIDLE_NAME_LEN, "IDLE");
+ state->enter = pseries_cpuidle_loop;
+ strncpy(state->desc, "shared_cede", CPUIDLE_DESC_LEN);
+ state->flags = PSERIES_SHARED_CEDE;
+ state->exit_latency = 0;
+ state->target_residency = 0;
+ return 0;
+ }
+
+ for (i = 0; i < IDLE_STATE_COUNT; i++) {
+ state = &dev->states[i];
+
+ snprintf(state->name, CPUIDLE_NAME_LEN, "CEDE%d", i);
+ state->enter = pseries_cpuidle_loop;
+
+ switch (i) {
+ case 0:
+ strncpy(state->desc, "snooze", CPUIDLE_DESC_LEN);
+ state->flags = PSERIES_DEDICATED_SNOOZE;
+ state->exit_latency = 0;
+ state->target_residency = 0;
+ break;
+
+ case 1:
+ strncpy(state->desc, "cede", CPUIDLE_DESC_LEN);
+ state->flags = PSERIES_DEDICATED_CEDE;
+ state->exit_latency = 1;
+ state->target_residency =
+ __get_cpu_var(smt_snooze_delay);
+ break;
+ }
+ }
+ dev->state_count = IDLE_STATE_COUNT;
+
+ return 0;
+}
+
+void update_smt_snooze_delay(int snooze)
+{
+ int cpu;
+ for_each_online_cpu(cpu)
+ per_cpu(pseries_dev, cpu).states[0].target_residency = snooze;
+}
+
+static int __init pseries_processor_idle_init(void)
+{
+ int cpu;
+ int result;
+
+ result = cpuidle_register_driver(&pseries_idle_driver);
+
+ if (result < 0)
+ return result;
+
+ printk(KERN_DEBUG "pSeries idle driver registered\n");
+
+ if (!firmware_has_feature(FW_FEATURE_SPLPAR)) {
+ printk(KERN_DEBUG "Using default idle\n");
+ return 0;
+ }
+
+ for_each_online_cpu(cpu) {
+ pseries_setup_cpuidle(&per_cpu(pseries_dev, cpu), cpu);
+ pseries_idle_init(&per_cpu(pseries_dev, cpu));
+ }
+
+ printk(KERN_DEBUG "Using cpuidle idle loop\n");
+
+ return 0;
+}
+
+device_initcall(pseries_processor_idle_init);
Index: linux.trees.git/arch/powerpc/include/asm/system.h
===================================================================
--- linux.trees.git.orig/arch/powerpc/include/asm/system.h
+++ linux.trees.git/arch/powerpc/include/asm/system.h
@@ -549,5 +549,11 @@ extern struct dentry *powerpc_debugfs_ro

void cpu_idle_wait(void);

+#ifdef CONFIG_CPU_IDLE
+extern void update_smt_snooze_delay(int snooze);
+#else
+static inline void update_smt_snooze_delay(int snooze) {}
+#endif
+
#endif /* __KERNEL__ */
#endif /* _ASM_POWERPC_SYSTEM_H */
Index: linux.trees.git/arch/powerpc/kernel/sysfs.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/kernel/sysfs.c
+++ linux.trees.git/arch/powerpc/kernel/sysfs.c
@@ -18,6 +18,7 @@
#include <asm/machdep.h>
#include <asm/smp.h>
#include <asm/pmc.h>
+#include <asm/system.h>

#include "cacheinfo.h"

@@ -51,6 +52,7 @@ static ssize_t store_smt_snooze_delay(st
return -EINVAL;

per_cpu(smt_snooze_delay, cpu->sysdev.id) = snooze;
+ update_smt_snooze_delay(snooze);

return count;
}

2009-12-02 10:03:51

by Arun R Bharadwaj

[permalink] [raw]
Subject: [v10 PATCH 9/9]: POWER: Enable default_idle when power_save=off

* Arun R Bharadwaj <[email protected]> [2009-12-02 15:24:27]:

This patch enables default_idle when power_save=off kernel boot
option is specified.

Earlier, this was done by setting ppc_md.power_save = NULL and hence
HMT_low() and HMT_very_low() was called. Now this is defined under
default_idle() and hence by setting boot_option_idle_override = 1,
the cpuidle registration stuff does not happen and hence default_idle
is chosen in cpuidle_idle_call.

Signed-off-by: Arun R Bharadwaj <[email protected]>
---
arch/powerpc/include/asm/processor.h | 2 ++
arch/powerpc/kernel/idle.c | 4 +++-
arch/powerpc/platforms/pseries/processor_idle.c | 5 +++++
3 files changed, 10 insertions(+), 1 deletion(-)

Index: linux.trees.git/arch/powerpc/include/asm/processor.h
===================================================================
--- linux.trees.git.orig/arch/powerpc/include/asm/processor.h
+++ linux.trees.git/arch/powerpc/include/asm/processor.h
@@ -332,6 +332,8 @@ static inline unsigned long get_clean_sp
}
#endif

+extern int boot_option_idle_override;
+
#endif /* __KERNEL__ */
#endif /* __ASSEMBLY__ */
#endif /* _ASM_POWERPC_PROCESSOR_H */
Index: linux.trees.git/arch/powerpc/kernel/idle.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/kernel/idle.c
+++ linux.trees.git/arch/powerpc/kernel/idle.c
@@ -40,9 +40,11 @@
#define cpu_should_die() 0
#endif

+int boot_option_idle_override = 0;
+
static int __init powersave_off(char *arg)
{
- ppc_md.power_save = NULL;
+ boot_option_idle_override = 1;
return 0;
}
__setup("powersave=off", powersave_off);
Index: linux.trees.git/arch/powerpc/platforms/pseries/processor_idle.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/pseries/processor_idle.c
+++ linux.trees.git/arch/powerpc/platforms/pseries/processor_idle.c
@@ -171,6 +171,11 @@ static int __init pseries_processor_idle
int cpu;
int result;

+ if (boot_option_idle_override) {
+ printk(KERN_DEBUG "Using default idle\n");
+ return 0;
+ }
+
result = cpuidle_register_driver(&pseries_idle_driver);

if (result < 0)

2009-12-02 17:41:43

by Daniel Walker

[permalink] [raw]
Subject: Re: [v10 PATCH 9/9]: POWER: Enable default_idle when power_save=off

On Wed, 2009-12-02 at 15:33 +0530, Arun R Bharadwaj wrote:
> +int boot_option_idle_override = 0;
> +

Doesn't need to be set to zero AFAIK, since we do a mass initialization
to zero during boot up. Did you notice some type of failure when you
didn't initialize that to zero?

(checkpatch output below..)
--

ERROR: do not initialise externals to 0 or NULL
#97: FILE: arch/powerpc/kernel/idle.c:43:
+int boot_option_idle_override = 0;

total: 1 errors, 0 warnings, 0 checks, 31 lines checked

Your patch has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.


2009-12-03 06:33:06

by Arun R Bharadwaj

[permalink] [raw]
Subject: Re: [v10 PATCH 9/9]: POWER: Enable default_idle when power_save=off

* Arun R Bharadwaj <[email protected]> [2009-12-02 15:33:46]:

Thanks for running checkpatch on the patch Daniel. Will fix this.

arun

2009-12-04 02:48:14

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [v10 PATCH 6/9]: pSeries/cpuidle: refactor pseries idle loops

On Wed, 2009-12-02 at 15:31 +0530, Arun R Bharadwaj wrote:
> * Arun R Bharadwaj <[email protected]> [2009-12-02 15:24:27]:
>
> This patch removes the routines, pseries_shared_idle_sleep and
> pseries_dedicated_idle_sleep, since this is implemented as a part
> of arch/powerpc/platform/pseries/processor_idle.c
>
> Also, similar to x86, call cpuidle_idle_call from cpu_idle() idle
> loop instead of ppc_md.power_save.

Tentative NAK... you remove the code before you put a replacement in, or
did I miss something ?

IE. That patch breaks Idle on all powerpc platform it would think..

Cheers,
Ben.

>
> Signed-off-by: Arun R Bharadwaj <[email protected]>
> ---
> arch/powerpc/kernel/idle.c | 58 ++++++++++-----------
> arch/powerpc/platforms/pseries/setup.c | 89 ---------------------------------
> 2 files changed, 30 insertions(+), 117 deletions(-)
>
> Index: linux.trees.git/arch/powerpc/platforms/pseries/setup.c
> ===================================================================
> --- linux.trees.git.orig/arch/powerpc/platforms/pseries/setup.c
> +++ linux.trees.git/arch/powerpc/platforms/pseries/setup.c
> @@ -75,9 +75,6 @@ EXPORT_SYMBOL(CMO_PageSize);
>
> int fwnmi_active; /* TRUE if an FWNMI handler is present */
>
> -static void pseries_shared_idle_sleep(void);
> -static void pseries_dedicated_idle_sleep(void);
> -
> static struct device_node *pSeries_mpic_node;
>
> static void pSeries_show_cpuinfo(struct seq_file *m)
> @@ -297,18 +294,8 @@ static void __init pSeries_setup_arch(vo
> pSeries_nvram_init();
>
> /* Choose an idle loop */
> - if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
> + if (firmware_has_feature(FW_FEATURE_SPLPAR))
> vpa_init(boot_cpuid);
> - if (get_lppaca()->shared_proc) {
> - printk(KERN_DEBUG "Using shared processor idle loop\n");
> - ppc_md.power_save = pseries_shared_idle_sleep;
> - } else {
> - printk(KERN_DEBUG "Using dedicated idle loop\n");
> - ppc_md.power_save = pseries_dedicated_idle_sleep;
> - }
> - } else {
> - printk(KERN_DEBUG "Using default idle loop\n");
> - }
>
> if (firmware_has_feature(FW_FEATURE_LPAR))
> ppc_md.enable_pmcs = pseries_lpar_enable_pmcs;
> @@ -496,80 +483,6 @@ static int __init pSeries_probe(void)
> return 1;
> }
>
> -
> -DECLARE_PER_CPU(unsigned long, smt_snooze_delay);
> -
> -static void pseries_dedicated_idle_sleep(void)
> -{
> - unsigned int cpu = smp_processor_id();
> - unsigned long start_snooze;
> - unsigned long in_purr, out_purr;
> -
> - /*
> - * Indicate to the HV that we are idle. Now would be
> - * a good time to find other work to dispatch.
> - */
> - get_lppaca()->idle = 1;
> - get_lppaca()->donate_dedicated_cpu = 1;
> - in_purr = mfspr(SPRN_PURR);
> -
> - /*
> - * We come in with interrupts disabled, and need_resched()
> - * has been checked recently. If we should poll for a little
> - * while, do so.
> - */
> - if (__get_cpu_var(smt_snooze_delay)) {
> - start_snooze = get_tb() +
> - __get_cpu_var(smt_snooze_delay) * tb_ticks_per_usec;
> - local_irq_enable();
> - set_thread_flag(TIF_POLLING_NRFLAG);
> -
> - while (get_tb() < start_snooze) {
> - if (need_resched() || cpu_is_offline(cpu))
> - goto out;
> - ppc64_runlatch_off();
> - HMT_low();
> - HMT_very_low();
> - }
> -
> - HMT_medium();
> - clear_thread_flag(TIF_POLLING_NRFLAG);
> - smp_mb();
> - local_irq_disable();
> - if (need_resched() || cpu_is_offline(cpu))
> - goto out;
> - }
> -
> - cede_processor();
> -
> -out:
> - HMT_medium();
> - out_purr = mfspr(SPRN_PURR);
> - get_lppaca()->wait_state_cycles += out_purr - in_purr;
> - get_lppaca()->donate_dedicated_cpu = 0;
> - get_lppaca()->idle = 0;
> -}
> -
> -static void pseries_shared_idle_sleep(void)
> -{
> - /*
> - * Indicate to the HV that we are idle. Now would be
> - * a good time to find other work to dispatch.
> - */
> - get_lppaca()->idle = 1;
> -
> - /*
> - * Yield the processor to the hypervisor. We return if
> - * an external interrupt occurs (which are driven prior
> - * to returning here) or if a prod occurs from another
> - * processor. When returning here, external interrupts
> - * are enabled.
> - */
> - cede_processor();
> -
> - get_lppaca()->idle = 0;
> -}
> -
> static int pSeries_pci_probe_mode(struct pci_bus *bus)
> {
> if (firmware_has_feature(FW_FEATURE_LPAR))
> Index: linux.trees.git/arch/powerpc/kernel/idle.c
> ===================================================================
> --- linux.trees.git.orig/arch/powerpc/kernel/idle.c
> +++ linux.trees.git/arch/powerpc/kernel/idle.c
> @@ -25,6 +25,7 @@
> #include <linux/cpu.h>
> #include <linux/sysctl.h>
> #include <linux/tick.h>
> +#include <linux/cpuidle.h>
>
> #include <asm/system.h>
> #include <asm/processor.h>
> @@ -46,6 +47,14 @@ static int __init powersave_off(char *ar
> }
> __setup("powersave=off", powersave_off);
>
> +#ifndef CONFIG_CPU_IDLE
> +void cpuidle_idle_call(void)
> +{
> + local_irq_enable();
> + cpu_relax();
> +}
> +#endif
> +
> /*
> * The body of the idle task.
> */
> @@ -60,35 +69,26 @@ void cpu_idle(void)
> while (!need_resched() && !cpu_should_die()) {
> ppc64_runlatch_off();
>
> - if (ppc_md.power_save) {
> - clear_thread_flag(TIF_POLLING_NRFLAG);
> - /*
> - * smp_mb is so clearing of TIF_POLLING_NRFLAG
> - * is ordered w.r.t. need_resched() test.
> - */
> - smp_mb();
> - local_irq_disable();
> -
> - /* Don't trace irqs off for idle */
> - stop_critical_timings();
> -
> - /* check again after disabling irqs */
> - if (!need_resched() && !cpu_should_die())
> - ppc_md.power_save();
> -
> - start_critical_timings();
> -
> - local_irq_enable();
> - set_thread_flag(TIF_POLLING_NRFLAG);
> -
> - } else {
> - /*
> - * Go into low thread priority and possibly
> - * low power mode.
> - */
> - HMT_low();
> - HMT_very_low();
> - }
> + clear_thread_flag(TIF_POLLING_NRFLAG);
> + /*
> + * smp_mb is so clearing of TIF_POLLING_NRFLAG
> + * is ordered w.r.t. need_resched() test.
> + */
> + smp_mb();
> + local_irq_disable();
> +
> + /* Don't trace irqs off for idle */
> + stop_critical_timings();
> +
> + /* check again after disabling irqs */
> + if (!need_resched() && !cpu_should_die())
> + cpuidle_idle_call();
> +
> + start_critical_timings();
> +
> + local_irq_enable();
> + set_thread_flag(TIF_POLLING_NRFLAG);
> +
> }
>
> HMT_medium();

2009-12-04 02:49:05

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [v10 PATCH 8/9]: pSeries: implement pSeries processor idle module

On Wed, 2009-12-02 at 15:32 +0530, Arun R Bharadwaj wrote:
> * Arun R Bharadwaj <[email protected]> [2009-12-02 15:24:27]:
>
> This patch creates arch/powerpc/platforms/pseries/processor_idle.c,
> which implements the cpuidle infrastructure for pseries.
> It implements a pseries_cpuidle_loop() which would be the main idle loop
> called from cpu_idle(). It makes decision of entering either
> dedicated_snooze_loop or dedicated_cede_loop for dedicated lpar and
> shared_cede_loop for shared lpar processor based on the
> decision taken by the cpuidle governor.

So unless I'm mistaken, you removed our powerpc "generic" idle loop that
calls into ppc_md.power_save(), and replaced it by some pseries specific
idle loops... Now what about all the other powerpc platforms ? native
970 (aka G5) ? 6xx ? Cell ? Or are you still calling ppc_md.power_save
somewhere that I missed ?

Cheers,
Ben.


> Signed-off-by: Arun R Bharadwaj <[email protected]>
> ---
> arch/powerpc/include/asm/system.h | 6
> arch/powerpc/kernel/sysfs.c | 2
> arch/powerpc/platforms/pseries/Makefile | 1
> arch/powerpc/platforms/pseries/processor_idle.c | 196 ++++++++++++++++++++++++
> arch/powerpc/platforms/pseries/pseries.h | 6
> 5 files changed, 211 insertions(+)
>
> Index: linux.trees.git/arch/powerpc/platforms/pseries/Makefile
> ===================================================================
> --- linux.trees.git.orig/arch/powerpc/platforms/pseries/Makefile
> +++ linux.trees.git/arch/powerpc/platforms/pseries/Makefile
> @@ -26,3 +26,4 @@ obj-$(CONFIG_HCALL_STATS) += hvCall_inst
> obj-$(CONFIG_PHYP_DUMP) += phyp_dump.o
> obj-$(CONFIG_CMM) += cmm.o
> obj-$(CONFIG_DTL) += dtl.o
> +obj-$(CONFIG_CPU_IDLE) += processor_idle.o
> Index: linux.trees.git/arch/powerpc/platforms/pseries/pseries.h
> ===================================================================
> --- linux.trees.git.orig/arch/powerpc/platforms/pseries/pseries.h
> +++ linux.trees.git/arch/powerpc/platforms/pseries/pseries.h
> @@ -10,6 +10,8 @@
> #ifndef _PSERIES_PSERIES_H
> #define _PSERIES_PSERIES_H
>
> +#include <linux/cpuidle.h>
> +
> extern void __init fw_feature_init(const char *hypertas, unsigned long len);
>
> struct pt_regs;
> @@ -40,4 +42,8 @@ extern unsigned long rtas_poweron_auto;
>
> extern void find_udbg_vterm(void);
>
> +DECLARE_PER_CPU(unsigned long, smt_snooze_delay);
> +
> +extern struct cpuidle_driver pseries_idle_driver;
> +
> #endif /* _PSERIES_PSERIES_H */
> Index: linux.trees.git/arch/powerpc/platforms/pseries/processor_idle.c
> ===================================================================
> --- /dev/null
> +++ linux.trees.git/arch/powerpc/platforms/pseries/processor_idle.c
> @@ -0,0 +1,196 @@
> +/*
> + * processor_idle - idle state cpuidle driver.
> + * Adapted from drivers/acpi/processor_idle.c
> + *
> + * Arun R Bharadwaj <[email protected]>
> + *
> + * Copyright (C) 2009 IBM Corporation.
> + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or (at
> + * your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, write to the Free Software Foundation, Inc.,
> + * 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
> + *
> + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/moduleparam.h>
> +#include <linux/cpuidle.h>
> +
> +#include <asm/paca.h>
> +#include <asm/reg.h>
> +#include <asm/system.h>
> +#include <asm/machdep.h>
> +#include <asm/firmware.h>
> +
> +#include "plpar_wrappers.h"
> +#include "pseries.h"
> +
> +MODULE_AUTHOR("Arun R Bharadwaj");
> +MODULE_DESCRIPTION("pSeries Idle State Driver");
> +MODULE_LICENSE("GPL");
> +
> +struct cpuidle_driver pseries_idle_driver = {
> + .name = "pseries_idle",
> + .owner = THIS_MODULE,
> +};
> +
> +DEFINE_PER_CPU(struct cpuidle_device, pseries_dev);
> +
> +#define IDLE_STATE_COUNT 2
> +
> +/* pSeries Idle state Flags */
> +#define PSERIES_DEDICATED_SNOOZE (0x01)
> +#define PSERIES_DEDICATED_CEDE (0x02)
> +#define PSERIES_SHARED_CEDE (0x03)
> +
> +static int pseries_idle_init(struct cpuidle_device *dev)
> +{
> + return cpuidle_register_device(dev);
> +}
> +
> +static void shared_cede_loop(void)
> +{
> + get_lppaca()->idle = 1;
> + cede_processor();
> + get_lppaca()->idle = 0;
> +}
> +
> +static void dedicated_snooze_loop(void)
> +{
> + local_irq_enable();
> + set_thread_flag(TIF_POLLING_NRFLAG);
> + while (!need_resched()) {
> + ppc64_runlatch_off();
> + HMT_low();
> + HMT_very_low();
> + }
> + HMT_medium();
> + clear_thread_flag(TIF_POLLING_NRFLAG);
> + smp_mb();
> + local_irq_disable();
> +}
> +
> +static void dedicated_cede_loop(void)
> +{
> + ppc64_runlatch_off();
> + HMT_medium();
> + cede_processor();
> +}
> +
> +static void pseries_cpuidle_loop(struct cpuidle_device *dev,
> + struct cpuidle_state *st)
> +{
> + unsigned long in_purr, out_purr;
> +
> + get_lppaca()->idle = 1;
> + get_lppaca()->donate_dedicated_cpu = 1;
> + in_purr = mfspr(SPRN_PURR);
> +
> + if (st->flags & PSERIES_SHARED_CEDE)
> + shared_cede_loop();
> + else if (st->flags & PSERIES_DEDICATED_SNOOZE)
> + dedicated_snooze_loop();
> + else
> + dedicated_cede_loop();
> +
> + out_purr = mfspr(SPRN_PURR);
> + get_lppaca()->wait_state_cycles += out_purr - in_purr;
> + get_lppaca()->donate_dedicated_cpu = 0;
> + get_lppaca()->idle = 0;
> +}
> +
> +static int pseries_setup_cpuidle(struct cpuidle_device *dev, int cpu)
> +{
> + int i;
> + struct cpuidle_state *state;
> +
> + dev->cpu = cpu;
> +
> + if (get_lppaca()->shared_proc) {
> + state = &dev->states[0];
> + snprintf(state->name, CPUIDLE_NAME_LEN, "IDLE");
> + state->enter = pseries_cpuidle_loop;
> + strncpy(state->desc, "shared_cede", CPUIDLE_DESC_LEN);
> + state->flags = PSERIES_SHARED_CEDE;
> + state->exit_latency = 0;
> + state->target_residency = 0;
> + return 0;
> + }
> +
> + for (i = 0; i < IDLE_STATE_COUNT; i++) {
> + state = &dev->states[i];
> +
> + snprintf(state->name, CPUIDLE_NAME_LEN, "CEDE%d", i);
> + state->enter = pseries_cpuidle_loop;
> +
> + switch (i) {
> + case 0:
> + strncpy(state->desc, "snooze", CPUIDLE_DESC_LEN);
> + state->flags = PSERIES_DEDICATED_SNOOZE;
> + state->exit_latency = 0;
> + state->target_residency = 0;
> + break;
> +
> + case 1:
> + strncpy(state->desc, "cede", CPUIDLE_DESC_LEN);
> + state->flags = PSERIES_DEDICATED_CEDE;
> + state->exit_latency = 1;
> + state->target_residency =
> + __get_cpu_var(smt_snooze_delay);
> + break;
> + }
> + }
> + dev->state_count = IDLE_STATE_COUNT;
> +
> + return 0;
> +}
> +
> +void update_smt_snooze_delay(int snooze)
> +{
> + int cpu;
> + for_each_online_cpu(cpu)
> + per_cpu(pseries_dev, cpu).states[0].target_residency = snooze;
> +}
> +
> +static int __init pseries_processor_idle_init(void)
> +{
> + int cpu;
> + int result;
> +
> + result = cpuidle_register_driver(&pseries_idle_driver);
> +
> + if (result < 0)
> + return result;
> +
> + printk(KERN_DEBUG "pSeries idle driver registered\n");
> +
> + if (!firmware_has_feature(FW_FEATURE_SPLPAR)) {
> + printk(KERN_DEBUG "Using default idle\n");
> + return 0;
> + }
> +
> + for_each_online_cpu(cpu) {
> + pseries_setup_cpuidle(&per_cpu(pseries_dev, cpu), cpu);
> + pseries_idle_init(&per_cpu(pseries_dev, cpu));
> + }
> +
> + printk(KERN_DEBUG "Using cpuidle idle loop\n");
> +
> + return 0;
> +}
> +
> +device_initcall(pseries_processor_idle_init);
> Index: linux.trees.git/arch/powerpc/include/asm/system.h
> ===================================================================
> --- linux.trees.git.orig/arch/powerpc/include/asm/system.h
> +++ linux.trees.git/arch/powerpc/include/asm/system.h
> @@ -549,5 +549,11 @@ extern struct dentry *powerpc_debugfs_ro
>
> void cpu_idle_wait(void);
>
> +#ifdef CONFIG_CPU_IDLE
> +extern void update_smt_snooze_delay(int snooze);
> +#else
> +static inline void update_smt_snooze_delay(int snooze) {}
> +#endif
> +
> #endif /* __KERNEL__ */
> #endif /* _ASM_POWERPC_SYSTEM_H */
> Index: linux.trees.git/arch/powerpc/kernel/sysfs.c
> ===================================================================
> --- linux.trees.git.orig/arch/powerpc/kernel/sysfs.c
> +++ linux.trees.git/arch/powerpc/kernel/sysfs.c
> @@ -18,6 +18,7 @@
> #include <asm/machdep.h>
> #include <asm/smp.h>
> #include <asm/pmc.h>
> +#include <asm/system.h>
>
> #include "cacheinfo.h"
>
> @@ -51,6 +52,7 @@ static ssize_t store_smt_snooze_delay(st
> return -EINVAL;
>
> per_cpu(smt_snooze_delay, cpu->sysdev.id) = snooze;
> + update_smt_snooze_delay(snooze);
>
> return count;
> }

2009-12-04 08:15:42

by Arun R Bharadwaj

[permalink] [raw]
Subject: Re: [v10 PATCH 8/9]: pSeries: implement pSeries processor idle module

* Benjamin Herrenschmidt <[email protected]> [2009-12-04 13:47:38]:

> On Wed, 2009-12-02 at 15:32 +0530, Arun R Bharadwaj wrote:
> > * Arun R Bharadwaj <[email protected]> [2009-12-02 15:24:27]:
> >
> > This patch creates arch/powerpc/platforms/pseries/processor_idle.c,
> > which implements the cpuidle infrastructure for pseries.
> > It implements a pseries_cpuidle_loop() which would be the main idle loop
> > called from cpu_idle(). It makes decision of entering either
> > dedicated_snooze_loop or dedicated_cede_loop for dedicated lpar and
> > shared_cede_loop for shared lpar processor based on the
> > decision taken by the cpuidle governor.
>
> So unless I'm mistaken, you removed our powerpc "generic" idle loop that
> calls into ppc_md.power_save(), and replaced it by some pseries specific
> idle loops... Now what about all the other powerpc platforms ? native
> 970 (aka G5) ? 6xx ? Cell ? Or are you still calling ppc_md.power_save
> somewhere that I missed ?
>
> Cheers,
> Ben.

Hi Ben,

I forgot to attach the patch which enables cpuidle for the rest of the
POWER platforms. Attaching it below.

So for these platforms, ppc_md.power_save will be called from from the
cpuidle_idle_call idle loop itself. Also, this cpuidle_idle_call is
not a pseries specific idle loop. It is a common loop for Intel and
PPC which use cpuidle infrastructure.

arun



This patch enables cpuidle for the rest of the POWER platforms like
44x, Cell, Pasemi etc.

Signed-off-by: Arun R Bharadwaj <[email protected]>
---
arch/powerpc/include/asm/system.h | 2 ++
arch/powerpc/kernel/idle.c | 28 ++++++++++++++++++++++++++++
arch/powerpc/kernel/setup_32.c | 8 ++++++--
arch/powerpc/platforms/44x/idle.c | 2 ++
arch/powerpc/platforms/cell/pervasive.c | 2 ++
arch/powerpc/platforms/pasemi/idle.c | 2 ++
arch/powerpc/platforms/ps3/setup.c | 2 ++
7 files changed, 44 insertions(+), 2 deletions(-)

Index: linux.trees.git/arch/powerpc/include/asm/system.h
===================================================================
--- linux.trees.git.orig/arch/powerpc/include/asm/system.h
+++ linux.trees.git/arch/powerpc/include/asm/system.h
@@ -551,8 +551,10 @@ void cpu_idle_wait(void);

#ifdef CONFIG_CPU_IDLE
extern void update_smt_snooze_delay(int snooze);
+extern void setup_cpuidle_ppc(void);
#else
static inline void update_smt_snooze_delay(int snooze) {}
+static inline void setup_cpuidle_ppc(void) {}
#endif

#endif /* __KERNEL__ */
Index: linux.trees.git/arch/powerpc/kernel/idle.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/kernel/idle.c
+++ linux.trees.git/arch/powerpc/kernel/idle.c
@@ -129,6 +129,34 @@ void default_idle(void)
HMT_very_low();
}

+#ifdef CONFIG_CPU_IDLE
+DEFINE_PER_CPU(struct cpuidle_device, ppc_idle_devices);
+struct cpuidle_driver cpuidle_ppc_driver = {
+ .name = "cpuidle_ppc",
+};
+
+static void ppc_idle_loop(struct cpuidle_device *dev, struct cpuidle_state *st)
+{
+ ppc_md.power_save();
+}
+
+void setup_cpuidle_ppc(void)
+{
+ struct cpuidle_device *dev;
+ int cpu;
+
+ cpuidle_register_driver(&cpuidle_ppc_driver);
+
+ for_each_online_cpu(cpu) {
+ dev = &per_cpu(ppc_idle_devices, cpu);
+ dev->cpu = cpu;
+ dev->states[0].enter = ppc_idle_loop;
+ dev->state_count = 1;
+ cpuidle_register_device(dev);
+ }
+}
+#endif
+
int powersave_nap;

#ifdef CONFIG_SYSCTL
Index: linux.trees.git/arch/powerpc/kernel/setup_32.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/kernel/setup_32.c
+++ linux.trees.git/arch/powerpc/kernel/setup_32.c
@@ -133,14 +133,18 @@ notrace void __init machine_init(unsigne

#ifdef CONFIG_6xx
if (cpu_has_feature(CPU_FTR_CAN_DOZE) ||
- cpu_has_feature(CPU_FTR_CAN_NAP))
+ cpu_has_feature(CPU_FTR_CAN_NAP)) {
ppc_md.power_save = ppc6xx_idle;
+ setup_cpuidle_ppc();
+ }
#endif

#ifdef CONFIG_E500
if (cpu_has_feature(CPU_FTR_CAN_DOZE) ||
- cpu_has_feature(CPU_FTR_CAN_NAP))
+ cpu_has_feature(CPU_FTR_CAN_NAP)) {
ppc_md.power_save = e500_idle;
+ setup_cpuidle_ppc();
+ }
#endif
if (ppc_md.progress)
ppc_md.progress("id mach(): done", 0x200);
Index: linux.trees.git/arch/powerpc/platforms/44x/idle.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/44x/idle.c
+++ linux.trees.git/arch/powerpc/platforms/44x/idle.c
@@ -24,6 +24,7 @@
#include <linux/of.h>
#include <linux/kernel.h>
#include <asm/machdep.h>
+#include <asm/system.h>

static int mode_spin;

@@ -46,6 +47,7 @@ int __init ppc44x_idle_init(void)
/* If we are not setting spin mode
then we set to wait mode */
ppc_md.power_save = &ppc44x_idle;
+ setup_cpuidle_ppc();
}

return 0;
Index: linux.trees.git/arch/powerpc/platforms/cell/pervasive.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/cell/pervasive.c
+++ linux.trees.git/arch/powerpc/platforms/cell/pervasive.c
@@ -35,6 +35,7 @@
#include <asm/pgtable.h>
#include <asm/reg.h>
#include <asm/cell-regs.h>
+#include <asm/system.h>

#include "pervasive.h"

@@ -128,5 +129,6 @@ void __init cbe_pervasive_init(void)
}

ppc_md.power_save = cbe_power_save;
+ setup_cpuidle_ppc();
ppc_md.system_reset_exception = cbe_system_reset_exception;
}
Index: linux.trees.git/arch/powerpc/platforms/pasemi/idle.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/pasemi/idle.c
+++ linux.trees.git/arch/powerpc/platforms/pasemi/idle.c
@@ -27,6 +27,7 @@
#include <asm/machdep.h>
#include <asm/reg.h>
#include <asm/smp.h>
+#include <asm/system.h>

#include "pasemi.h"

@@ -81,6 +82,7 @@ static int __init pasemi_idle_init(void)

ppc_md.system_reset_exception = pasemi_system_reset_exception;
ppc_md.power_save = modes[current_mode].entry;
+ setup_cpuidle_ppc();
printk(KERN_INFO "Using PA6T idle loop (%s)\n", modes[current_mode].name);

return 0;
Index: linux.trees.git/arch/powerpc/platforms/ps3/setup.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/ps3/setup.c
+++ linux.trees.git/arch/powerpc/platforms/ps3/setup.c
@@ -33,6 +33,7 @@
#include <asm/prom.h>
#include <asm/lv1call.h>
#include <asm/ps3gpu.h>
+#include <asm/system.h>

#include "platform.h"

@@ -214,6 +215,7 @@ static void __init ps3_setup_arch(void)
prealloc_ps3flash_bounce_buffer();

ppc_md.power_save = ps3_power_save;
+ setup_cpuidle_ppc();
ps3_os_area_init();

DBG(" <- %s:%d\n", __func__, __LINE__);

2009-12-04 10:01:28

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [v10 PATCH 8/9]: pSeries: implement pSeries processor idle module

On Fri, 2009-12-04 at 13:45 +0530, Arun R Bharadwaj wrote:

>
> Hi Ben,
>
> I forgot to attach the patch which enables cpuidle for the rest of the
> POWER platforms. Attaching it below.
>
> So for these platforms, ppc_md.power_save will be called from from the
> cpuidle_idle_call idle loop itself. Also, this cpuidle_idle_call is
> not a pseries specific idle loop. It is a common loop for Intel and
> PPC which use cpuidle infrastructure.

Ok, so there was a missing piece in the puzzle ;-)

I'll review asap.

Cheers,
Ben.

> arun
>
>
>
> This patch enables cpuidle for the rest of the POWER platforms like
> 44x, Cell, Pasemi etc.
>
> Signed-off-by: Arun R Bharadwaj <[email protected]>
> ---
> arch/powerpc/include/asm/system.h | 2 ++
> arch/powerpc/kernel/idle.c | 28 ++++++++++++++++++++++++++++
> arch/powerpc/kernel/setup_32.c | 8 ++++++--
> arch/powerpc/platforms/44x/idle.c | 2 ++
> arch/powerpc/platforms/cell/pervasive.c | 2 ++
> arch/powerpc/platforms/pasemi/idle.c | 2 ++
> arch/powerpc/platforms/ps3/setup.c | 2 ++
> 7 files changed, 44 insertions(+), 2 deletions(-)
>
> Index: linux.trees.git/arch/powerpc/include/asm/system.h
> ===================================================================
> --- linux.trees.git.orig/arch/powerpc/include/asm/system.h
> +++ linux.trees.git/arch/powerpc/include/asm/system.h
> @@ -551,8 +551,10 @@ void cpu_idle_wait(void);
>
> #ifdef CONFIG_CPU_IDLE
> extern void update_smt_snooze_delay(int snooze);
> +extern void setup_cpuidle_ppc(void);
> #else
> static inline void update_smt_snooze_delay(int snooze) {}
> +static inline void setup_cpuidle_ppc(void) {}
> #endif
>
> #endif /* __KERNEL__ */
> Index: linux.trees.git/arch/powerpc/kernel/idle.c
> ===================================================================
> --- linux.trees.git.orig/arch/powerpc/kernel/idle.c
> +++ linux.trees.git/arch/powerpc/kernel/idle.c
> @@ -129,6 +129,34 @@ void default_idle(void)
> HMT_very_low();
> }
>
> +#ifdef CONFIG_CPU_IDLE
> +DEFINE_PER_CPU(struct cpuidle_device, ppc_idle_devices);
> +struct cpuidle_driver cpuidle_ppc_driver = {
> + .name = "cpuidle_ppc",
> +};
> +
> +static void ppc_idle_loop(struct cpuidle_device *dev, struct cpuidle_state *st)
> +{
> + ppc_md.power_save();
> +}
> +
> +void setup_cpuidle_ppc(void)
> +{
> + struct cpuidle_device *dev;
> + int cpu;
> +
> + cpuidle_register_driver(&cpuidle_ppc_driver);
> +
> + for_each_online_cpu(cpu) {
> + dev = &per_cpu(ppc_idle_devices, cpu);
> + dev->cpu = cpu;
> + dev->states[0].enter = ppc_idle_loop;
> + dev->state_count = 1;
> + cpuidle_register_device(dev);
> + }
> +}
> +#endif
> +
> int powersave_nap;
>
> #ifdef CONFIG_SYSCTL
> Index: linux.trees.git/arch/powerpc/kernel/setup_32.c
> ===================================================================
> --- linux.trees.git.orig/arch/powerpc/kernel/setup_32.c
> +++ linux.trees.git/arch/powerpc/kernel/setup_32.c
> @@ -133,14 +133,18 @@ notrace void __init machine_init(unsigne
>
> #ifdef CONFIG_6xx
> if (cpu_has_feature(CPU_FTR_CAN_DOZE) ||
> - cpu_has_feature(CPU_FTR_CAN_NAP))
> + cpu_has_feature(CPU_FTR_CAN_NAP)) {
> ppc_md.power_save = ppc6xx_idle;
> + setup_cpuidle_ppc();
> + }
> #endif
>
> #ifdef CONFIG_E500
> if (cpu_has_feature(CPU_FTR_CAN_DOZE) ||
> - cpu_has_feature(CPU_FTR_CAN_NAP))
> + cpu_has_feature(CPU_FTR_CAN_NAP)) {
> ppc_md.power_save = e500_idle;
> + setup_cpuidle_ppc();
> + }
> #endif
> if (ppc_md.progress)
> ppc_md.progress("id mach(): done", 0x200);
> Index: linux.trees.git/arch/powerpc/platforms/44x/idle.c
> ===================================================================
> --- linux.trees.git.orig/arch/powerpc/platforms/44x/idle.c
> +++ linux.trees.git/arch/powerpc/platforms/44x/idle.c
> @@ -24,6 +24,7 @@
> #include <linux/of.h>
> #include <linux/kernel.h>
> #include <asm/machdep.h>
> +#include <asm/system.h>
>
> static int mode_spin;
>
> @@ -46,6 +47,7 @@ int __init ppc44x_idle_init(void)
> /* If we are not setting spin mode
> then we set to wait mode */
> ppc_md.power_save = &ppc44x_idle;
> + setup_cpuidle_ppc();
> }
>
> return 0;
> Index: linux.trees.git/arch/powerpc/platforms/cell/pervasive.c
> ===================================================================
> --- linux.trees.git.orig/arch/powerpc/platforms/cell/pervasive.c
> +++ linux.trees.git/arch/powerpc/platforms/cell/pervasive.c
> @@ -35,6 +35,7 @@
> #include <asm/pgtable.h>
> #include <asm/reg.h>
> #include <asm/cell-regs.h>
> +#include <asm/system.h>
>
> #include "pervasive.h"
>
> @@ -128,5 +129,6 @@ void __init cbe_pervasive_init(void)
> }
>
> ppc_md.power_save = cbe_power_save;
> + setup_cpuidle_ppc();
> ppc_md.system_reset_exception = cbe_system_reset_exception;
> }
> Index: linux.trees.git/arch/powerpc/platforms/pasemi/idle.c
> ===================================================================
> --- linux.trees.git.orig/arch/powerpc/platforms/pasemi/idle.c
> +++ linux.trees.git/arch/powerpc/platforms/pasemi/idle.c
> @@ -27,6 +27,7 @@
> #include <asm/machdep.h>
> #include <asm/reg.h>
> #include <asm/smp.h>
> +#include <asm/system.h>
>
> #include "pasemi.h"
>
> @@ -81,6 +82,7 @@ static int __init pasemi_idle_init(void)
>
> ppc_md.system_reset_exception = pasemi_system_reset_exception;
> ppc_md.power_save = modes[current_mode].entry;
> + setup_cpuidle_ppc();
> printk(KERN_INFO "Using PA6T idle loop (%s)\n", modes[current_mode].name);
>
> return 0;
> Index: linux.trees.git/arch/powerpc/platforms/ps3/setup.c
> ===================================================================
> --- linux.trees.git.orig/arch/powerpc/platforms/ps3/setup.c
> +++ linux.trees.git/arch/powerpc/platforms/ps3/setup.c
> @@ -33,6 +33,7 @@
> #include <asm/prom.h>
> #include <asm/lv1call.h>
> #include <asm/ps3gpu.h>
> +#include <asm/system.h>
>
> #include "platform.h"
>
> @@ -214,6 +215,7 @@ static void __init ps3_setup_arch(void)
> prealloc_ps3flash_bounce_buffer();
>
> ppc_md.power_save = ps3_power_save;
> + setup_cpuidle_ppc();
> ps3_os_area_init();
>
> DBG(" <- %s:%d\n", __func__, __LINE__);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arch" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-12-04 22:20:25

by Torsten Duwe

[permalink] [raw]
Subject: Re: [v10 PATCH 2/9]: cpuidle: cleanup drivers/cpuidle/cpuidle.c

On Wednesday 02 December 2009, Arun R Bharadwaj wrote:
> * Arun R Bharadwaj <[email protected]> [2009-12-02 15:24:27]:
>
> This patch cleans up drivers/cpuidle/cpuidle.c
> Earlier cpuidle assumed pm_idle as the default idle loop. Break that
> assumption and make it more generic.

Is there a problem with the old pm_idle? Couldn't it be integrated more
transparently, instead of replacing it this intrusively?

> --- linux.trees.git.orig/include/linux/cpuidle.h
> +++ linux.trees.git/include/linux/cpuidle.h
> @@ -41,7 +41,7 @@ struct cpuidle_state {
> unsigned long long usage;
> unsigned long long time; /* in US */
>
> - int (*enter) (struct cpuidle_device *dev,
> + void (*enter) (struct cpuidle_device *dev,
> struct cpuidle_state *state);
> };

While it may be a good idea to move the residency calculation to one central
place, at least in theory a cpuidle_state->enter() function could have a
better method to determine its value.

Either way you're implicitly introducing an API change here, and you're at
least missing two functions on ARM and SuperH, respectively. Could you
separate this API change out, and not take it for granted in the other
patches?

Torsten

2009-12-06 05:19:36

by Arun R Bharadwaj

[permalink] [raw]
Subject: Re: [v10 PATCH 2/9]: cpuidle: cleanup drivers/cpuidle/cpuidle.c

* Torsten Duwe <[email protected]> [2009-12-04 23:20:00]:

> On Wednesday 02 December 2009, Arun R Bharadwaj wrote:
> > * Arun R Bharadwaj <[email protected]> [2009-12-02 15:24:27]:
> >
> > This patch cleans up drivers/cpuidle/cpuidle.c
> > Earlier cpuidle assumed pm_idle as the default idle loop. Break that
> > assumption and make it more generic.
>
> Is there a problem with the old pm_idle? Couldn't it be integrated more
> transparently, instead of replacing it this intrusively?
>

Hi Torsten,

Peter objected to the idea of integrating this with the old pm_idle
because it has already caused a lot of problems on x86 and we wouldn't
want to be doing the same mistake on POWER. The discussion related to
that could be found here http://lkml.org/lkml/2009/8/26/233

> > --- linux.trees.git.orig/include/linux/cpuidle.h
> > +++ linux.trees.git/include/linux/cpuidle.h
> > @@ -41,7 +41,7 @@ struct cpuidle_state {
> > unsigned long long usage;
> > unsigned long long time; /* in US */
> >
> > - int (*enter) (struct cpuidle_device *dev,
> > + void (*enter) (struct cpuidle_device *dev,
> > struct cpuidle_state *state);
> > };
>
> While it may be a good idea to move the residency calculation to one central
> place, at least in theory a cpuidle_state->enter() function could have a
> better method to determine its value.
>

This would mean a lot of code replication, which Pavel pointed out in
the previous iteration. So I moved the residency calculation to a
central place.

> Either way you're implicitly introducing an API change here, and you're at
> least missing two functions on ARM and SuperH, respectively. Could you
> separate this API change out, and not take it for granted in the other
> patches?
>
> Torsten

2009-12-07 10:18:25

by Torsten Duwe

[permalink] [raw]
Subject: Re: [v10 PATCH 2/9]: cpuidle: cleanup drivers/cpuidle/cpuidle.c

On Sunday 06 December 2009, Arun R Bharadwaj wrote:

> Peter objected to the idea of integrating this with the old pm_idle
> because it has already caused a lot of problems on x86 and we wouldn't
> want to be doing the same mistake on POWER. The discussion related to
> that could be found here http://lkml.org/lkml/2009/8/26/233

And BenH has sketched how it should be done on ppc, in that thread:
http://lkml.org/lkml/2009/8/26/624 AFAIS this comment is still valid for v10.

Not only I would like to understand what is the conceptual idea behind the
other changes. Nothing wrong with cleanups, but there's got to be a purpose
and benefits.

Torsten

2009-12-07 10:56:15

by Arun R Bharadwaj

[permalink] [raw]
Subject: Re: [v10 PATCH 2/9]: cpuidle: cleanup drivers/cpuidle/cpuidle.c

* Torsten Duwe <[email protected]> [2009-12-07 11:17:57]:

> On Sunday 06 December 2009, Arun R Bharadwaj wrote:
>
> > Peter objected to the idea of integrating this with the old pm_idle
> > because it has already caused a lot of problems on x86 and we wouldn't
> > want to be doing the same mistake on POWER. The discussion related to
> > that could be found here http://lkml.org/lkml/2009/8/26/233
>
> And BenH has sketched how it should be done on ppc, in that thread:
> http://lkml.org/lkml/2009/8/26/624 AFAIS this comment is still valid for v10.
>
> Not only I would like to understand what is the conceptual idea behind the
> other changes. Nothing wrong with cleanups, but there's got to be a purpose
> and benefits.
>
> Torsten

The reason for the cleanups is that we should have just one idle
function manager instead of having one for each arch, which needs to
be exported and hence really ugly. So thats why we
decided to do away with pm_idle and make cpuidle as _the_ idle
function manager. So in case of POWER, we have the ppc_md.power_save
which is the pm_idle equivalent. We discussed that in this thread
http://lkml.org/lkml/2009/9/2/20

thanks
arun

2009-12-15 11:49:43

by Arun R Bharadwaj

[permalink] [raw]
Subject: Re: [v10 PATCH 8/9]: pSeries: implement pSeries processor idle module

* Benjamin Herrenschmidt <[email protected]> [2009-12-04 21:00:52]:

> On Fri, 2009-12-04 at 13:45 +0530, Arun R Bharadwaj wrote:
>
> >
> > Hi Ben,
> >
> > I forgot to attach the patch which enables cpuidle for the rest of the
> > POWER platforms. Attaching it below.
> >
> > So for these platforms, ppc_md.power_save will be called from from the
> > cpuidle_idle_call idle loop itself. Also, this cpuidle_idle_call is
> > not a pseries specific idle loop. It is a common loop for Intel and
> > PPC which use cpuidle infrastructure.
>
> Ok, so there was a missing piece in the puzzle ;-)
>
> I'll review asap.
>

Hi Ben,

Did you get time to review this?

thanks
arun

> Cheers,
> Ben.
>
> > arun
> >
> >
> >
> > This patch enables cpuidle for the rest of the POWER platforms like
> > 44x, Cell, Pasemi etc.
> >
> > Signed-off-by: Arun R Bharadwaj <[email protected]>
> > ---
> > arch/powerpc/include/asm/system.h | 2 ++
> > arch/powerpc/kernel/idle.c | 28 ++++++++++++++++++++++++++++
> > arch/powerpc/kernel/setup_32.c | 8 ++++++--
> > arch/powerpc/platforms/44x/idle.c | 2 ++
> > arch/powerpc/platforms/cell/pervasive.c | 2 ++
> > arch/powerpc/platforms/pasemi/idle.c | 2 ++
> > arch/powerpc/platforms/ps3/setup.c | 2 ++
> > 7 files changed, 44 insertions(+), 2 deletions(-)
> >
> > Index: linux.trees.git/arch/powerpc/include/asm/system.h
> > ===================================================================
> > --- linux.trees.git.orig/arch/powerpc/include/asm/system.h
> > +++ linux.trees.git/arch/powerpc/include/asm/system.h
> > @@ -551,8 +551,10 @@ void cpu_idle_wait(void);
> >
> > #ifdef CONFIG_CPU_IDLE
> > extern void update_smt_snooze_delay(int snooze);
> > +extern void setup_cpuidle_ppc(void);
> > #else
> > static inline void update_smt_snooze_delay(int snooze) {}
> > +static inline void setup_cpuidle_ppc(void) {}
> > #endif
> >
> > #endif /* __KERNEL__ */
> > Index: linux.trees.git/arch/powerpc/kernel/idle.c
> > ===================================================================
> > --- linux.trees.git.orig/arch/powerpc/kernel/idle.c
> > +++ linux.trees.git/arch/powerpc/kernel/idle.c
> > @@ -129,6 +129,34 @@ void default_idle(void)
> > HMT_very_low();
> > }
> >
> > +#ifdef CONFIG_CPU_IDLE
> > +DEFINE_PER_CPU(struct cpuidle_device, ppc_idle_devices);
> > +struct cpuidle_driver cpuidle_ppc_driver = {
> > + .name = "cpuidle_ppc",
> > +};
> > +
> > +static void ppc_idle_loop(struct cpuidle_device *dev, struct cpuidle_state *st)
> > +{
> > + ppc_md.power_save();
> > +}
> > +
> > +void setup_cpuidle_ppc(void)
> > +{
> > + struct cpuidle_device *dev;
> > + int cpu;
> > +
> > + cpuidle_register_driver(&cpuidle_ppc_driver);
> > +
> > + for_each_online_cpu(cpu) {
> > + dev = &per_cpu(ppc_idle_devices, cpu);
> > + dev->cpu = cpu;
> > + dev->states[0].enter = ppc_idle_loop;
> > + dev->state_count = 1;
> > + cpuidle_register_device(dev);
> > + }
> > +}
> > +#endif
> > +
> > int powersave_nap;
> >
> > #ifdef CONFIG_SYSCTL
> > Index: linux.trees.git/arch/powerpc/kernel/setup_32.c
> > ===================================================================
> > --- linux.trees.git.orig/arch/powerpc/kernel/setup_32.c
> > +++ linux.trees.git/arch/powerpc/kernel/setup_32.c
> > @@ -133,14 +133,18 @@ notrace void __init machine_init(unsigne
> >
> > #ifdef CONFIG_6xx
> > if (cpu_has_feature(CPU_FTR_CAN_DOZE) ||
> > - cpu_has_feature(CPU_FTR_CAN_NAP))
> > + cpu_has_feature(CPU_FTR_CAN_NAP)) {
> > ppc_md.power_save = ppc6xx_idle;
> > + setup_cpuidle_ppc();
> > + }
> > #endif
> >
> > #ifdef CONFIG_E500
> > if (cpu_has_feature(CPU_FTR_CAN_DOZE) ||
> > - cpu_has_feature(CPU_FTR_CAN_NAP))
> > + cpu_has_feature(CPU_FTR_CAN_NAP)) {
> > ppc_md.power_save = e500_idle;
> > + setup_cpuidle_ppc();
> > + }
> > #endif
> > if (ppc_md.progress)
> > ppc_md.progress("id mach(): done", 0x200);
> > Index: linux.trees.git/arch/powerpc/platforms/44x/idle.c
> > ===================================================================
> > --- linux.trees.git.orig/arch/powerpc/platforms/44x/idle.c
> > +++ linux.trees.git/arch/powerpc/platforms/44x/idle.c
> > @@ -24,6 +24,7 @@
> > #include <linux/of.h>
> > #include <linux/kernel.h>
> > #include <asm/machdep.h>
> > +#include <asm/system.h>
> >
> > static int mode_spin;
> >
> > @@ -46,6 +47,7 @@ int __init ppc44x_idle_init(void)
> > /* If we are not setting spin mode
> > then we set to wait mode */
> > ppc_md.power_save = &ppc44x_idle;
> > + setup_cpuidle_ppc();
> > }
> >
> > return 0;
> > Index: linux.trees.git/arch/powerpc/platforms/cell/pervasive.c
> > ===================================================================
> > --- linux.trees.git.orig/arch/powerpc/platforms/cell/pervasive.c
> > +++ linux.trees.git/arch/powerpc/platforms/cell/pervasive.c
> > @@ -35,6 +35,7 @@
> > #include <asm/pgtable.h>
> > #include <asm/reg.h>
> > #include <asm/cell-regs.h>
> > +#include <asm/system.h>
> >
> > #include "pervasive.h"
> >
> > @@ -128,5 +129,6 @@ void __init cbe_pervasive_init(void)
> > }
> >
> > ppc_md.power_save = cbe_power_save;
> > + setup_cpuidle_ppc();
> > ppc_md.system_reset_exception = cbe_system_reset_exception;
> > }
> > Index: linux.trees.git/arch/powerpc/platforms/pasemi/idle.c
> > ===================================================================
> > --- linux.trees.git.orig/arch/powerpc/platforms/pasemi/idle.c
> > +++ linux.trees.git/arch/powerpc/platforms/pasemi/idle.c
> > @@ -27,6 +27,7 @@
> > #include <asm/machdep.h>
> > #include <asm/reg.h>
> > #include <asm/smp.h>
> > +#include <asm/system.h>
> >
> > #include "pasemi.h"
> >
> > @@ -81,6 +82,7 @@ static int __init pasemi_idle_init(void)
> >
> > ppc_md.system_reset_exception = pasemi_system_reset_exception;
> > ppc_md.power_save = modes[current_mode].entry;
> > + setup_cpuidle_ppc();
> > printk(KERN_INFO "Using PA6T idle loop (%s)\n", modes[current_mode].name);
> >
> > return 0;
> > Index: linux.trees.git/arch/powerpc/platforms/ps3/setup.c
> > ===================================================================
> > --- linux.trees.git.orig/arch/powerpc/platforms/ps3/setup.c
> > +++ linux.trees.git/arch/powerpc/platforms/ps3/setup.c
> > @@ -33,6 +33,7 @@
> > #include <asm/prom.h>
> > #include <asm/lv1call.h>
> > #include <asm/ps3gpu.h>
> > +#include <asm/system.h>
> >
> > #include "platform.h"
> >
> > @@ -214,6 +215,7 @@ static void __init ps3_setup_arch(void)
> > prealloc_ps3flash_bounce_buffer();
> >
> > ppc_md.power_save = ps3_power_save;
> > + setup_cpuidle_ppc();
> > ps3_os_area_init();
> >
> > DBG(" <- %s:%d\n", __func__, __LINE__);
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-arch" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>