2011-05-12 17:11:22

by Raffaele Recalcati

[permalink] [raw]
Subject: pm loss development

What happen normally in runtime pm implementation is that every devices
are switched off and are enabled only when needed.
In our case instead we have a completely functional embedded system and,
when an asyncrhonous event appear, we have only some tens milliseconds
before the actual power failure takes place.
This patchset add a support in order to switch off not vital part of the system,
in order to allow the board to survive longer.
This allow the possibility to save important data.

The implementation has been written by Davide Ciminaghi.
My work instead was about analyzing different previuos implementation,
a first completely custom one, a second using runtime pm, arriving
finally to that one.

I have tested PM loss in our DaVinci dm365 basi board and I write here below a
piece of code showing a possible usage.

-------------------

static int powerfail_status;

static irqreturn_t basi_powerfail_stop(int irq, void *dev_id);

static irqreturn_t basi_powerfail_quick_check_start(int irq, void *dev_id)
{
basi_mask_irq_gpio0(IRQ_DM365_GPIO0_2);
basi_unmask_irq_gpio0(IRQ_DM365_GPIO0_0);

/* PowerFail situation - START: power is going away */
return IRQ_WAKE_THREAD;
}

static irqreturn_t basi_powerfail_start(int irq, void *dev_id)
{
if (powerfail_status)
return IRQ_HANDLED;
powerfail_status = 1;
pm_loss_power_changed(SYS_PWR_FAILING);
return IRQ_HANDLED;
}


static irqreturn_t basi_powerfail_quick_check_stop(int irq, void *dev_id)
{
basi_mask_irq_gpio0(IRQ_DM365_GPIO0_0);
basi_unmask_irq_gpio0(IRQ_DM365_GPIO0_2);

/* PowerFail situation - STOP: power is coming back */
return IRQ_WAKE_THREAD;
}

static irqreturn_t basi_powerfail_stop(int irq, void *dev_id)
{
if (!powerfail_status)
return IRQ_HANDLED;
powerfail_status = 0;
pm_loss_power_changed(SYS_PWR_GOOD);
return IRQ_HANDLED;
}

enum basi_pwrfail_prio {
BASI_PWR_FAIL_PRIO_0,
BASI_PWR_FAIL_MIN_PRIO = BASI_PWR_FAIL_PRIO_0,
BASI_PWR_FAIL_PRIO_1,
BASI_PWR_FAIL_PRIO_2,
BASI_PWR_FAIL_PRIO_3,
BASI_PWR_FAIL_MAX_PRIO = BASI_PWR_FAIL_PRIO_3,
};

struct pm_loss_default_policy_item basi_pm_loss_policy_items[] = {
{
.bus_name = "mmc",
.bus_priority = BASI_PWR_FAIL_PRIO_1,
},
{
.bus_name = "platform",
.bus_priority = BASI_PWR_FAIL_PRIO_2,
},
};

#define BASI_POLICY_NAME "basi-default"

struct pm_loss_default_policy_table basi_pm_loss_policy_table = {
.name = BASI_POLICY_NAME,
.items = basi_pm_loss_policy_items,
.nitems = ARRAY_SIZE(basi_pm_loss_policy_items),
};

static void basi_powerfail_configure(void)
{
int stat;
struct pm_loss_policy *p;
stat = request_threaded_irq(IRQ_DM365_GPIO0_2,
basi_powerfail_quick_check_start,
basi_powerfail_start,
0,
"pwrfail-on", NULL);
if (stat < 0)
printk(KERN_ERR "request_threaded_irq for IRQ%d (pwrfail-on) "
"failed\n", IRQ_DM365_GPIO0_2);
stat = request_threaded_irq(IRQ_DM365_GPIO0_0,
basi_powerfail_quick_check_stop,
basi_powerfail_stop, 0,
"pwrfail-off", NULL);
if (stat < 0)
printk(KERN_ERR "request_threaded_irq for IRQ%d (pwrfail-off) "
"failed\n", IRQ_DM365_GPIO0_0);
basi_mask_irq_gpio0(IRQ_DM365_GPIO0_0);
p = pm_loss_setup_default_policy(&basi_pm_loss_policy_table);

if (!p)
printk(KERN_ERR "Could not register pwr change policy\n");

if (pm_loss_set_policy(BASI_POLICY_NAME) < 0)
printk(KERN_ERR "Could not set %s power loss policy\n",
BASI_POLICY_NAME);
}

int platform_pm_loss_power_changed(struct device *dev,
enum sys_power_state s)
{
int ret = 0;

/* Calling platform bus pm_loss functions */
pr_debug_pm_loss("platform_pm_loss_power_changed(%d)\n", s);

if (dev->driver && dev->driver->pm &&
dev->driver->pm->power_changed)
ret = dev->driver->pm->power_changed(dev, s);
return ret;
}




2011-05-12 17:11:25

by Raffaele Recalcati

[permalink] [raw]
Subject: [PATCH 1/4] export bus_kset

From: Davide Ciminaghi <[email protected]>

Signed-off-by: Davide Ciminaghi <[email protected]>
---
drivers/base/bus.c | 3 ++-
include/linux/kobject.h | 2 ++
2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 000e7b2..2134248 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -158,7 +158,8 @@ static const struct kset_uevent_ops bus_uevent_ops = {
.filter = bus_uevent_filter,
};

-static struct kset *bus_kset;
+struct kset *bus_kset;
+EXPORT_SYMBOL(bus_kset);


#ifdef CONFIG_HOTPLUG
diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index 8f6d121..456b20d 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -205,6 +205,8 @@ extern struct kobject *power_kobj;
/* The global /sys/firmware/ kobject for people to chain off of */
extern struct kobject *firmware_kobj;

+extern struct kset *bus_kset ;
+
#if defined(CONFIG_HOTPLUG)
int kobject_uevent(struct kobject *kobj, enum kobject_action action);
int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
--
1.7.0.4

2011-05-12 17:11:57

by Raffaele Recalcati

[permalink] [raw]
Subject: [PATCH 2/4] PM / Loss: power loss management

From: Davide Ciminaghi <[email protected]>

On some embedded systems, an asynchronous power failure notification event is
available some tens to hundreds milliseconds before the actual power failure
takes place.
Such an event can be used to trigger some actions, typically shutting down
all non-vital power sinks, thus allowing the board to survive longer to
temporary power losses.

Signed-off-by: Davide Ciminaghi <[email protected]>
Signed-off-by: Raffaele Recalcati <[email protected]>
---
Documentation/power/loss.txt | 191 +++++++++++++
drivers/base/bus.c | 6 +
drivers/base/init.c | 1 +
drivers/base/platform.c | 14 +
drivers/base/power/Makefile | 2 +
drivers/base/power/loss.c | 648 ++++++++++++++++++++++++++++++++++++++++++
drivers/base/power/power.h | 14 +
include/linux/pm.h | 16 +
include/linux/pm_loss.h | 109 +++++++
kernel/power/Kconfig | 27 ++
10 files changed, 1028 insertions(+), 0 deletions(-)
create mode 100644 Documentation/power/loss.txt
create mode 100644 drivers/base/power/loss.c
create mode 100644 include/linux/pm_loss.h

diff --git a/Documentation/power/loss.txt b/Documentation/power/loss.txt
new file mode 100644
index 0000000..10da89c
--- /dev/null
+++ b/Documentation/power/loss.txt
@@ -0,0 +1,191 @@
+**** POWER LOSS MANAGEMENT ****
+
+Davide Ciminaghi <[email protected]> 2011
+
+1. Overview
+
+On some embedded systems, an asynchronous power failure notification event is
+available some tens to hundreds milliseconds before the actual power failure
+takes place.
+Such an event can be used to trigger some actions, typically shutting down
+all non-vital power sinks, thus allowing the board to survive longer to
+temporary power losses. Sometimes, also flash-based block devices can be
+stopped after a power loss event notification has been received. This should
+be useful for mmc devices, for which a sudden power failure while a write
+command is being executed can threaten file system integrity even in case a
+journalling fs is in use.
+Generally speaking, one can expect the course of action taken when a power
+failure warning is received to be deeply system specific. Similarly, the
+mechanism used for power failure notifications can equally be board/platform
+specific. For these reasons, support for power loss management has been split
+into three parts:
+
+ + Generic code (board and driver independent).
+ + Board dependent code.
+ + Power loss policy definition.
+
+The generic part of the code is located under drivers/base/power/loss.c .
+On the other hand, board dependent code and power loss policies definitions
+should live somewhere in the platform/board specific files.
+The header file include/linux/pm_loss.h contains all the pm_loss function
+prototypes, together with definitions of data structures.
+For what concerns power loss policies, the framework already contains a couple
+of predefined policies: "nop" and "default" (see later paragraphs for more
+details).
+
+1.1 Sysfs interface.
+
+It can be useful (for instance during tests), to be able to quickly switch
+from one power loss policy to another, or to simulate power fail and resume
+events. To this end, a sysfs interface of the following form has been devised:
+
+/sys/power/loss +
+ |
+ +-- active_policy
+ |
+ +-- policies -+
+ | |
+ | +-- nop
+ | |
+ | +-- default +
+ | | |
+ | | +-- bus_table
+ | |
+ | +-- ....
+ |
+ +-- pwrfail_sim
+
+2. Details
+
+2.1 Sending events to the power loss management core.
+
+The board specific code shall trigger a power failure event notification by
+calling pm_loss_power_changed(SYS_PWR_FAILING).
+In case of a temporary power loss, the same function shall be called with
+SYS_PWR_GOOD argument on power resume. pm_loss_power_changed() can sleep, so
+it shall not be called from interrupt context.
+
+2.3 Effects on bus and device drivers.
+
+One more entry has been added to the device PM callbacks:
+
+ int (*power_changed)(struct device *, enum sys_power_state);
+
+This function can be optionally invoked by power loss policies in case of
+system power changes (loss and/or resume). Of course every bus or device driver
+can react to such events in some specific way. For instance the mmc block
+driver will probably block its request queue during a temporary power loss.
+
+2.3.1 The platform bus.
+
+For what concerns platform bus drivers, platform specific code can override
+the power_changed() callback :
+
+platform_pm_loss_power_changed(struct device *dev, enum sys_power_state s)
+
+whose default (empty) version is defined as a __weak symbol in
+drivers/base/platform.c.
+
+2.4 Power loss policies.
+
+A power loss policy can be registered via the pm_loss_register_policy()
+function, which receives:
+
+ + A pointer to a struct pm_loss_policy_ops , which hosts the pointers
+ to the policy's specific callbacks (see below for more details).
+ + A pointer to a struct kobj_type, which will allow the pm loss core
+ to setup the policy related directory under /sys/power/loss/policies.
+ See Documentation/kobject.txt for more details on struct kobj_type.
+ + A pointer to an array of chars containing the name of the policy.
+ + A pointer to the policy's private data (void *).
+
+A power loss policy can define the following callbacks:
+
+ + int (*bus_added)(struct pm_loss_policy *, struct bus_type *) : this
+ is invoked whenever a new bus has been registered within the system.
+ Since power related events are often managed at bus level, it can be
+ useful for the policy to have a list of available busses within the
+ system. When a policy is registered, this callback is invoked once
+ for every already registered bus.
+ + int (*bus_removed)(struct pm_loss_policy *, struct bus_type *):
+ this is invoked when a bus is removed from the system.
+ + int (*start)(struct pm_loss_policy *): this is called when a policy
+ becomes active.
+ + void (*stop)(struct pm_loss_policy *): this is called when a policy
+ becomes inactive.
+
+2.4.1 The nop predefined policy
+
+The nop policy is the first active policy when the pm loss framework is
+initialized. It just does nothing in case of power loss / power resume
+events.
+
+2.4.2 The default predefined policy
+
+When a power failure warning is received, the default policy walks through a
+list of critical busses and invokes their drivers' power_changed() callback.
+The default policy can be customized and registered by calling:
+
+ pm_loss_setup_default_policy(struct pm_loss_default_policy_table *);
+
+This function receives a pointer to a pm_loss_default_policy_table structure,
+which defines a priority ordered list of critical buffers:
+
+ struct pm_loss_default_policy_table {
+ struct module *owner ;
+ const char *name ;
+ struct pm_loss_default_policy_item *items;
+ int nitems;
+ };
+
+Here's a short description of such structure:
+
+ + owner : shall point to the module registering the table (or NULL
+ if the table is not instantiated by a module).
+ + name : this is the name given to this particular customization of
+ the default policy.
+ + items : pointer to an array of table items.
+ + nitems : number of the items in the array.
+
+Each item is a struct pm_loss_default_policy_item, defined as follows:
+
+ struct pm_loss_default_policy_item {
+ const char *bus_name;
+ int bus_priority ;
+ };
+
+where bus_name is the name of a bus and bus_priority is a numerical priority
+of the bus itself. Numerically higher priorities correspond to more prioritary
+busses.
+
+2.4.3 Activating a specific policy.
+
+A policy can be activated:
+
+ + From within the kernel by calling pm_loss_set_policy(char *name).
+ The argument passed to this function shall be the name of the policy
+ to be activated.
+
+ + From user space by writing the name of the policy to be activated
+ to /sys/power/loss/active_policy.
+
+2.4.4 Unregistering a policy
+
+For a generic user defined policy, just call :
+
+ pm_loss_unregister_policy(struct pm_loss_policy *);
+
+For a default policy customization:
+
+ pm_loss_shutdown_default_policy(struct pm_loss_policy *);
+
+3. Kernel configuration
+
+The following configuration options have been defined:
+
+ CONFIG_PM_LOSS : this enables the generic pm_loss framework.
+ CONFIG_PM_LOSS_SIM : this adds the pwrfail_sim file to the sysfs interface
+ and allows power loss events simulation.
+ CONFIG_PM_LOSS_DEBUG : this option enables some debug printk's .
+ CONFIG_PM_LOSS_TEST: this enables the compilation of a sample test module
+ containing a customized default policy definition
diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 2134248..d67a506 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -910,6 +910,9 @@ int bus_register(struct bus_type *bus)
if (retval)
goto bus_attrs_fail;

+#ifdef CONFIG_PM_LOSS
+ pm_loss_on_bus_added(bus);
+#endif
pr_debug("bus: '%s': registered\n", bus->name);
return 0;

@@ -940,6 +943,9 @@ EXPORT_SYMBOL_GPL(bus_register);
void bus_unregister(struct bus_type *bus)
{
pr_debug("bus: '%s': unregistering\n", bus->name);
+#ifdef CONFIG_PM_LOSS
+ pm_loss_on_bus_removed(bus);
+#endif
bus_remove_attrs(bus);
remove_probe_files(bus);
kset_unregister(bus->p->drivers_kset);
diff --git a/drivers/base/init.c b/drivers/base/init.c
index c8a934e..d979da9 100644
--- a/drivers/base/init.c
+++ b/drivers/base/init.c
@@ -10,6 +10,7 @@
#include <linux/memory.h>

#include "base.h"
+#include "power/power.h"

/**
* driver_init - initialize driver model.
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index f051cff..427b686 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -20,6 +20,7 @@
#include <linux/err.h>
#include <linux/slab.h>
#include <linux/pm_runtime.h>
+#include <linux/pm_loss.h>

#include "base.h"

@@ -947,6 +948,18 @@ int __weak platform_pm_runtime_idle(struct device *dev)

#endif /* !CONFIG_PM_RUNTIME */

+#ifdef CONFIG_PM_LOSS
+
+int __weak platform_pm_loss_power_changed(struct device *dev,
+ enum sys_power_state s)
+{
+ return -ENOSYS;
+}
+
+#else
+#define platform_pm_loss_power_changed NULL
+#endif
+
static const struct dev_pm_ops platform_dev_pm_ops = {
.prepare = platform_pm_prepare,
.complete = platform_pm_complete,
@@ -965,6 +978,7 @@ static const struct dev_pm_ops platform_dev_pm_ops = {
.runtime_suspend = platform_pm_runtime_suspend,
.runtime_resume = platform_pm_runtime_resume,
.runtime_idle = platform_pm_runtime_idle,
+ .power_changed = platform_pm_loss_power_changed,
};

struct bus_type platform_bus_type = {
diff --git a/drivers/base/power/Makefile b/drivers/base/power/Makefile
index abe46ed..6373944 100644
--- a/drivers/base/power/Makefile
+++ b/drivers/base/power/Makefile
@@ -1,6 +1,8 @@
obj-$(CONFIG_PM) += sysfs.o
obj-$(CONFIG_PM_SLEEP) += main.o wakeup.o
obj-$(CONFIG_PM_RUNTIME) += runtime.o
+obj-$(CONFIG_PM_LOSS) += loss.o
+obj-$(CONFIG_PM_LOSS_TEST) += loss_test.o
obj-$(CONFIG_PM_OPS) += generic_ops.o
obj-$(CONFIG_PM_TRACE_RTC) += trace.o
obj-$(CONFIG_PM_OPP) += opp.o
diff --git a/drivers/base/power/loss.c b/drivers/base/power/loss.c
new file mode 100644
index 0000000..a265b8b
--- /dev/null
+++ b/drivers/base/power/loss.c
@@ -0,0 +1,648 @@
+/*
+ * drivers/base/power/loss.c - Power loss management related functions
+ *
+ * Copyright (C) 2011 Davide Ciminaghi <[email protected]>
+ * Copyright (C) 2011 BTicino S.p.A.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifdef CONFIG_PM_LOSS_DEBUG
+#define DEBUG
+#endif
+
+#include <linux/kernel.h>
+#include <linux/kobject.h>
+#include <linux/list.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/pm_loss.h>
+#include "../base.h"
+
+#define LOSS_STR "pm_loss"
+
+#define xstr(s) str(s)
+#define str(s) #s
+#define MAX_POLICY_NAME_LENGHT 20
+
+struct power_loss_state_struct {
+ spinlock_t lock;
+ enum sys_power_state state;
+ struct pm_loss_policy *active_policy;
+ wait_queue_head_t wq;
+ struct kobject *loss_kobj;
+ struct kset *policies;
+};
+
+static struct power_loss_state_struct power_state;
+
+
+void pm_loss_power_changed(enum sys_power_state s)
+{
+ pr_debug_pm_loss("pm_loss_power_changed, "
+ "power_state.active_policy = %p\n",
+ power_state.active_policy);
+ if (!power_state.active_policy ||
+ !power_state.active_policy->ops ||
+ !power_state.active_policy->ops->power_changed) {
+ pr_debug_pm_loss("pm_loss_power_changed with no active "
+ "policy or "
+ "policy with no sys_power_changed cb\n");
+ return ;
+ }
+ pr_debug_pm_loss("pm_loss_power_changed(%d)\n", s);
+ spin_lock_irq(&power_state.lock);
+ if (power_state.state == s) {
+ spin_unlock_irq(&power_state.lock);
+ return ;
+ }
+ if (power_state.state == SYS_PWR_NOTIFYING) {
+ DEFINE_WAIT(wait);
+
+ for (;;) {
+ prepare_to_wait(&power_state.wq, &wait,
+ TASK_UNINTERRUPTIBLE);
+ if (power_state.state != SYS_PWR_NOTIFYING)
+ break;
+ spin_unlock_irq(&power_state.lock);
+ schedule();
+ spin_lock_irq(&power_state.lock);
+ }
+ }
+ power_state.state = SYS_PWR_NOTIFYING ;
+ spin_unlock_irq(&power_state.lock);
+ if (power_state.active_policy->ops->power_changed) {
+ pr_debug_pm_loss("invoking sys_power_changed cb\n");
+ power_state.active_policy->ops->power_changed
+ (power_state.active_policy, s);
+ }
+ power_state.state = s;
+}
+EXPORT_SYMBOL(pm_loss_power_changed);
+
+#define to_bus(obj) container_of(obj, struct subsys_private, subsys.kobj)
+
+static int __bus_added_removed(struct pm_loss_policy *p,
+ struct bus_type *bus, bool added)
+{
+ int ret = 0;
+ spin_lock_irq(&power_state.lock);
+ if (power_state.state == SYS_PWR_NOTIFYING) {
+ DEFINE_WAIT(wait);
+
+ for (;;) {
+ prepare_to_wait(&power_state.wq, &wait,
+ TASK_UNINTERRUPTIBLE);
+ if (power_state.state != SYS_PWR_NOTIFYING)
+ break;
+ spin_unlock_irq(&power_state.lock);
+ schedule();
+ spin_lock_irq(&power_state.lock);
+ }
+ }
+ if (added)
+ ret = p->ops->bus_added ? p->ops->bus_added(p, bus) : 0;
+ else
+ ret = p->ops->bus_removed ? p->ops->bus_removed(p, bus) : 0;
+ spin_unlock_irq(&power_state.lock);
+ return ret;
+}
+
+static int bus_added(struct pm_loss_policy *p, struct bus_type *bus)
+{
+ return __bus_added_removed(p, bus, true);
+}
+
+static int bus_removed(struct pm_loss_policy *p, struct bus_type *bus)
+{
+ return __bus_added_removed(p, bus, false);
+}
+
+static int policy_add_busses(struct pm_loss_policy *p)
+{
+ struct kobject *k;
+ int stat;
+
+ if (!p || !p->ops)
+ return -EINVAL;
+ if (!p->ops->bus_added)
+ return 0;
+
+ pr_debug_pm_loss("policy_add_busses()\n");
+
+ if (!bus_kset) {
+ pr_debug_pm_loss("bus_kset still NULL\n");
+ return 0;
+ }
+
+ spin_lock(&bus_kset->list_lock);
+ list_for_each_entry(k, &bus_kset->list, entry) {
+ spin_unlock(&bus_kset->list_lock);
+ /* This might sleep */
+ pr_debug_pm_loss("invoking bus_added()\n");
+ stat = bus_added(p, to_bus(k)->bus);
+ spin_lock(&bus_kset->list_lock);
+ }
+ spin_unlock(&bus_kset->list_lock);
+ return 0;
+}
+
+static void __pm_loss_on_bus_added_removed(struct bus_type *bus, bool added)
+{
+ struct list_head *i;
+ struct kobject *k;
+
+ if (!power_state.policies)
+ /* Not yet initialized */
+ return ;
+
+ spin_lock(&power_state.policies->list_lock);
+ list_for_each(i, &power_state.policies->list) {
+ k = container_of(i, struct kobject, entry);
+ if (added)
+ bus_added(to_policy(k), bus);
+ else
+ bus_removed(to_policy(k), bus);
+ }
+ spin_unlock(&power_state.policies->list_lock);
+}
+
+void pm_loss_on_bus_added(struct bus_type *bus)
+{
+ __pm_loss_on_bus_added_removed(bus, true);
+}
+EXPORT_SYMBOL(pm_loss_on_bus_added);
+
+void pm_loss_on_bus_removed(struct bus_type *bus)
+{
+ __pm_loss_on_bus_added_removed(bus, false);
+}
+EXPORT_SYMBOL(pm_loss_on_bus_removed);
+
+struct pm_loss_policy *
+pm_loss_register_policy(const struct pm_loss_policy_ops *ops,
+ struct kobj_type *ktype,
+ const char *name, void *priv)
+{
+ struct pm_loss_policy *out = NULL;
+ int ret ;
+
+ if (strlen(name) > MAX_POLICY_NAME_LENGHT)
+ printk(KERN_WARNING LOSS_STR "warning : name %s is too long, "
+ "will be truncated\n", name);
+
+ out = kzalloc(sizeof(struct pm_loss_policy), GFP_KERNEL);
+ if (!out)
+ return NULL;
+ out->ops = ops;
+ out->priv = priv;
+ out->kobj.kset = power_state.policies;
+
+ if (policy_add_busses(out) < 0)
+ goto err0;
+
+ ret = kobject_init_and_add(&out->kobj, ktype, NULL, "%s", name);
+ if (ret < 0)
+ goto err1;
+
+ kobject_uevent(&out->kobj, KOBJ_ADD);
+
+ return out;
+
+ err1:
+ kobject_put(&out->kobj);
+ err0:
+ kfree(out);
+ return NULL;
+}
+EXPORT_SYMBOL(pm_loss_register_policy);
+
+int pm_loss_unregister_policy(struct pm_loss_policy *p)
+{
+ pr_debug_pm_loss("pm_loss_unregister_policy(%p)\n", p);
+ if (power_state.active_policy == p) {
+ pr_debug_pm_loss("cannot unregister policy (busy)\n");
+ return -EBUSY ;
+ }
+ kobject_del(&p->kobj);
+ kobject_put(&p->kobj);
+ return 0;
+}
+EXPORT_SYMBOL(pm_loss_unregister_policy);
+
+int pm_loss_set_policy(const char *name)
+{
+ struct kobject *o;
+ struct pm_loss_policy *p ;
+ spin_lock_irq(&power_state.lock);
+ o = kset_find_obj(power_state.policies, name);
+ if (!o) {
+ spin_unlock_irq(&power_state.lock);
+ return -ENODEV;
+ }
+ p = to_policy(o);
+ if (power_state.active_policy) {
+ if (power_state.active_policy->ops->stop)
+ power_state.active_policy->ops->stop
+ (power_state.active_policy);
+ }
+ power_state.active_policy = p;
+ if (power_state.active_policy->ops->start)
+ power_state.active_policy->ops->start
+ (power_state.active_policy);
+ spin_unlock_irq(&power_state.lock);
+ return 0;
+}
+EXPORT_SYMBOL(pm_loss_set_policy);
+
+/*
+ * Default release function for pm_loss_policy object
+ */
+static void pm_loss_policy_release(struct kobject *kobj)
+{
+ struct pm_loss_policy *p;
+ p = to_policy(kobj);
+ pr_debug_pm_loss("pm_loss_policy_release(%p)\n", p);
+ kfree(p);
+}
+
+/*
+ * Dummy nop policy implementation (all null pointers)
+ */
+static const struct pm_loss_policy_ops nop_policy_ops = {
+};
+
+static const struct sysfs_ops nop_sysfs_ops = {
+ .show = NULL,
+ .store = NULL,
+};
+
+static struct attribute *nop_default_attrs[] = {
+ NULL,
+};
+
+static struct kobj_type nop_ktype = {
+ .sysfs_ops = &nop_sysfs_ops,
+ .release = pm_loss_policy_release,
+ .default_attrs = nop_default_attrs,
+};
+
+/*
+ * Default policy implementation
+ */
+
+struct pm_loss_default_policy_data {
+ struct pm_loss_default_policy_table *table;
+ struct list_head busses ;
+};
+
+struct pm_loss_default_policy_bus_data {
+ struct bus_type *bus;
+ int priority ;
+ struct list_head list;
+};
+
+static struct pm_loss_default_policy_item *
+__find_bus_by_name(const char *name,
+ struct pm_loss_default_policy_table *table)
+{
+ int i;
+ struct pm_loss_default_policy_item *out = NULL;
+ for (i = 0; i < table->nitems; i++)
+ if (!strcmp(table->items[i].bus_name, name)) {
+ out = &table->items[i];
+ break;
+ }
+ return out;
+}
+
+static void __add_bus(struct pm_loss_default_policy_data *data,
+ struct pm_loss_default_policy_bus_data *new)
+{
+ struct pm_loss_default_policy_bus_data *bd;
+ list_for_each_entry(bd, &data->busses, list)
+ if (new->priority > bd->priority) {
+ list_add_tail(&new->list, &bd->list);
+ return;
+ }
+ list_add_tail(&new->list, &bd->list);
+}
+
+static int pm_loss_default_policy_bus_added(struct pm_loss_policy *p,
+ struct bus_type *bus)
+{
+ struct pm_loss_default_policy_item *item ;
+ struct pm_loss_default_policy_data *pd = p->priv;
+ struct pm_loss_default_policy_bus_data *bd =
+ kzalloc(sizeof(struct pm_loss_default_policy_bus_data),
+ GFP_KERNEL);
+
+ BUG_ON(!pd);
+ if (!bd)
+ return -ENOMEM;
+ pr_debug_pm_loss("pm_loss_default_policy_bus_added(), name = %s\n",
+ bus->name);
+ item = __find_bus_by_name(bus->name, pd->table);
+ if (!item)
+ return 0;
+
+ pr_debug_pm_loss("bus_found\n");
+
+ bd->priority = item->bus_priority ;
+ bd->bus = bus ;
+ __add_bus(pd, bd);
+
+ return 0;
+}
+
+static int pm_loss_default_policy_bus_removed(struct pm_loss_policy *p,
+ struct bus_type *bus)
+{
+ struct pm_loss_default_policy_data *pd = p->priv;
+ struct pm_loss_default_policy_bus_data *bd, *tmp;
+
+ BUG_ON(!pd);
+ list_for_each_entry_safe(bd, tmp, &pd->busses, list)
+ if (bd->bus == bus) {
+ list_del(&bd->list);
+ kfree(bd);
+ break;
+ }
+ return 0;
+}
+
+struct bus_sys_pwr_changed_data {
+ struct bus_type *bus;
+ enum sys_power_state s;
+};
+
+static int __bus_sys_pwr_changed(struct device *dev, void *_d)
+{
+ struct bus_sys_pwr_changed_data *data =
+ (struct bus_sys_pwr_changed_data *)_d;
+ struct bus_type *bus = data->bus;
+
+ BUG_ON(!dev);
+ BUG_ON(!bus);
+
+ pr_debug_pm_loss("__bus_sys_pwr_changed(%p), bus %s\n", dev,
+ bus->name);
+
+ if (bus->pm && bus->pm->power_changed)
+ bus->pm->power_changed(dev, data->s);
+ return 0;
+}
+
+
+static void
+pm_loss_default_policy_sys_power_changed(struct pm_loss_policy *p,
+ enum sys_power_state s)
+{
+ struct pm_loss_default_policy_data *pd = p->priv;
+ struct pm_loss_default_policy_bus_data *bd ;
+ struct bus_sys_pwr_changed_data cb_data ;
+
+ pr_debug_pm_loss("pm_loss_default_policy_sys_power_changed()\n");
+
+ BUG_ON(!pd);
+
+ list_for_each_entry(bd, &pd->busses, list) {
+ struct bus_type *bus = bd->bus;
+ const struct dev_pm_ops *pm = bus->pm;
+ pr_debug_pm_loss("bus = %s\n", bus->name);
+ if (pm && pm->power_changed) {
+ cb_data.bus = bus ; cb_data.s = s;
+ pr_debug_pm_loss("before bus_for_each_dev\n");
+ bus_for_each_dev(bus, NULL, (void *)&cb_data,
+ __bus_sys_pwr_changed);
+ }
+ }
+}
+
+static int pm_loss_default_policy_start(struct pm_loss_policy *p)
+{
+ return 0;
+}
+
+static void pm_loss_default_policy_stop(struct pm_loss_policy *p)
+{
+ struct pm_loss_default_policy_data *pd = p->priv;
+ if (pd->table->owner)
+ module_put(pd->table->owner);
+}
+
+static const struct pm_loss_policy_ops default_policy_ops = {
+ .bus_added = pm_loss_default_policy_bus_added,
+ .bus_removed = pm_loss_default_policy_bus_removed,
+ .power_changed = pm_loss_default_policy_sys_power_changed,
+ .start = pm_loss_default_policy_start,
+ .stop = pm_loss_default_policy_stop,
+};
+
+/* sysfs stuff */
+
+struct dflt_bus_table_attribute {
+ struct attribute attr;
+ ssize_t (*show)(struct pm_loss_policy *,
+ struct dflt_bus_table_attribute *attr,
+ char *buf);
+};
+#define to_dflt_bus_table_attr(x) \
+container_of(x, struct dflt_bus_table_attribute, attr)
+
+
+static ssize_t bus_table_show(struct pm_loss_policy *p,
+ struct dflt_bus_table_attribute *attr,
+ char *buf)
+{
+ struct pm_loss_default_policy_data *pd = p->priv;
+ struct pm_loss_default_policy_bus_data *bd ;
+ char *ptr = buf;
+
+ BUG_ON(!pd);
+
+ list_for_each_entry(bd, &pd->busses, list) {
+ struct bus_type *bus = bd->bus;
+ ptr += snprintf(ptr, PAGE_SIZE - (ptr - buf), "%s:%d,",
+ bus->name, bd->priority);
+ }
+ return ptr - buf;
+}
+
+static struct dflt_bus_table_attribute bus_table_attribute =
+__ATTR_RO(bus_table);
+
+
+static struct attribute *dflt_default_attrs[] = {
+ &bus_table_attribute.attr,
+ NULL,
+};
+
+/* dflt sysfs ops */
+
+static ssize_t dflt_show(struct kobject *kobj, struct attribute *_attr,
+ char *buf)
+{
+ if (!strcmp(_attr->name, "bus_table")) {
+ struct dflt_bus_table_attribute *attr =
+ to_dflt_bus_table_attr(_attr);
+ struct pm_loss_policy *p =
+ to_policy(kobj);
+ return attr->show(p, attr, buf);
+ }
+ return -EIO;
+}
+
+static ssize_t dflt_store(struct kobject *kobj,
+ struct attribute *attr,
+ const char *buf, size_t len)
+{
+ return -EIO;
+}
+
+static const struct sysfs_ops dflt_sysfs_ops = {
+ .show = dflt_show,
+ .store = dflt_store,
+};
+
+static struct kobj_type dflt_ktype = {
+ .sysfs_ops = &dflt_sysfs_ops,
+ .release = pm_loss_policy_release,
+ .default_attrs = dflt_default_attrs,
+};
+
+struct pm_loss_policy *
+pm_loss_setup_default_policy(struct pm_loss_default_policy_table *table)
+{
+ struct pm_loss_policy *p ;
+ struct pm_loss_default_policy_data *data ;
+
+ if (!table)
+ return NULL;
+
+ if (table->owner && try_module_get(table->owner) < 0)
+ return NULL;
+
+ data = kzalloc(sizeof(struct pm_loss_default_policy_data), GFP_KERNEL);
+ if (!data)
+ return NULL;
+ INIT_LIST_HEAD(&data->busses);
+ data->table = table;
+
+ p = pm_loss_register_policy(&default_policy_ops, &dflt_ktype,
+ table->name, data);
+ if (!p) {
+ kfree(data);
+ return p;
+ }
+ return p;
+}
+EXPORT_SYMBOL(pm_loss_setup_default_policy);
+
+int pm_loss_shutdown_default_policy(struct pm_loss_policy *p)
+{
+ int ret;
+ pr_debug_pm_loss("pm_loss_shutdown_default_policy()\n");
+ ret = pm_loss_unregister_policy(p);
+ if (ret < 0)
+ pr_debug_pm_loss("cannot unregister policy\n");
+ return ret;
+}
+EXPORT_SYMBOL(pm_loss_shutdown_default_policy);
+
+static ssize_t active_policy_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ return sprintf(buf, "%s\n",
+ kobject_name(&power_state.active_policy->kobj));
+}
+
+
+static ssize_t active_policy_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ char pname[MAX_POLICY_NAME_LENGHT];
+ int ret;
+
+ sscanf(buf, "%" xstr(MAX_POLICY_NAME_LENGHT) "s", pname);
+ ret = pm_loss_set_policy(pname);
+ return ret < 0 ? ret : count;
+}
+
+static struct kobj_attribute active_policy_attribute =
+ __ATTR(active_policy, 0644, active_policy_show, active_policy_store);
+
+#ifdef CONFIG_PM_LOSS_SIM
+static ssize_t pwrfail_sim_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ int n;
+ sscanf(buf, "%d", &n);
+ pm_loss_power_changed(n ? SYS_PWR_FAILING : SYS_PWR_GOOD);
+ return count;
+}
+
+
+static struct kobj_attribute pwrfail_sim_attribute =
+ __ATTR(pwrfail_sim, 0200, NULL, pwrfail_sim_store);
+#endif
+
+
+static struct attribute *loss_attrs[] = {
+ &active_policy_attribute.attr,
+#ifdef CONFIG_PM_LOSS_SIM
+ &pwrfail_sim_attribute.attr,
+#endif
+ NULL,
+};
+
+static struct attribute_group loss_attr_group = {
+ .attrs = loss_attrs,
+};
+
+/*
+ * Init function
+ */
+int __init pm_loss_init(void)
+{
+ struct pm_loss_policy *p ;
+ int ret = 0;
+
+ pr_debug_pm_loss("pm_loss_init()\n");
+ spin_lock_init(&power_state.lock);
+ power_state.state = SYS_PWR_GOOD;
+ init_waitqueue_head(&power_state.wq);
+ power_state.active_policy = NULL;
+ power_state.loss_kobj = kobject_create_and_add("loss", power_kobj);
+ if (!power_state.loss_kobj) {
+ ret = -ENOMEM;
+ goto err0;
+ }
+ power_state.policies =
+ kset_create_and_add("policies", NULL, power_state.loss_kobj);
+ if (!power_state.policies) {
+ ret = -ENOMEM;
+ goto err1;
+ }
+ ret = sysfs_create_group(power_state.loss_kobj, &loss_attr_group);
+ if (ret < 0)
+ goto err2;
+ pr_debug_pm_loss("invoking pm_loss_register_policy (nop)\n");
+ p = pm_loss_register_policy(&nop_policy_ops, &nop_ktype, "nop", NULL);
+ BUG_ON(!p);
+ pr_debug_pm_loss("setting nop policy as current one\n");
+ pm_loss_set_policy("nop");
+ return ret;
+
+ err2:
+ kset_unregister(power_state.policies);
+ err1:
+ kobject_put(power_state.loss_kobj);
+ err0:
+ return ret;
+}
+
+core_initcall(pm_loss_init);
diff --git a/drivers/base/power/power.h b/drivers/base/power/power.h
index 698dde7..ad54ef5 100644
--- a/drivers/base/power/power.h
+++ b/drivers/base/power/power.h
@@ -10,6 +10,20 @@ static inline void pm_runtime_remove(struct device *dev) {}

#endif /* !CONFIG_PM_RUNTIME */

+#ifdef CONFIG_PM_LOSS
+
+extern void pm_loss_init(void) ;
+extern void pm_loss_on_bus_added(struct bus_type *);
+extern void pm_loss_on_bus_removed(struct bus_type *);
+
+#else /* !CONFIG_PM_LOSS */
+
+static inline void pm_loss_init(void) {}
+static inline void pm_loss_on_bus_added(struct bus_type *b) {};
+static inline void pm_loss_on_bus_removed(struct bus_type *b) {};
+
+#endif
+
#ifdef CONFIG_PM_SLEEP

/* kernel/power/main.c */
diff --git a/include/linux/pm.h b/include/linux/pm.h
index 21415cc..0316cd0 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -40,6 +40,7 @@ extern void (*pm_power_off_prepare)(void);
*/

struct device;
+enum sys_power_state ;

#ifdef CONFIG_PM
extern const char power_group_name[]; /* = "power" */
@@ -199,6 +200,10 @@ typedef struct pm_message {
* power state if all of the necessary conditions are satisfied. Check
* these conditions and handle the device as appropriate, possibly queueing
* a suspend request for it. The return value is ignored by the PM core.
+ *
+ * @power_changed: device power has changed state (used by pm_loss). This
+ * is invoked whenever system power changes its state (from GOOD to
+ * FAILING or viceversa)
*/

struct dev_pm_ops {
@@ -219,6 +224,7 @@ struct dev_pm_ops {
int (*runtime_suspend)(struct device *dev);
int (*runtime_resume)(struct device *dev);
int (*runtime_idle)(struct device *dev);
+ int (*power_changed)(struct device *, enum sys_power_state);
};

#ifdef CONFIG_PM_SLEEP
@@ -421,6 +427,16 @@ enum rpm_request {

struct wakeup_source;

+/**
+ * System power states
+ *
+ */
+enum sys_power_state {
+ SYS_PWR_GOOD = 0,
+ SYS_PWR_FAILING,
+ SYS_PWR_NOTIFYING,
+};
+
struct dev_pm_info {
pm_message_t power_state;
unsigned int can_wakeup:1;
diff --git a/include/linux/pm_loss.h b/include/linux/pm_loss.h
new file mode 100644
index 0000000..46af47c
--- /dev/null
+++ b/include/linux/pm_loss.h
@@ -0,0 +1,109 @@
+/*
+ * pm_loss.h - Power loss management related functions, header file
+ *
+ * Copyright (C) 2011 Davide Ciminaghi <[email protected]>
+ * Copyright (C) 2011 BTicino S.p.A.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _LINUX_PM_LOSS_H
+#define _LINUX_PM_LOSS_H
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+struct pm_loss_policy ;
+
+struct pm_loss_policy_ops {
+ int (*bus_added)(struct pm_loss_policy *,
+ struct bus_type *);
+ int (*bus_removed)(struct pm_loss_policy *,
+ struct bus_type *);
+ void (*power_changed)(struct pm_loss_policy *,
+ enum sys_power_state);
+ int (*start)(struct pm_loss_policy *);
+ void (*stop)(struct pm_loss_policy *);
+};
+
+struct pm_loss_policy {
+ const struct pm_loss_policy_ops *ops;
+ void *priv;
+ struct list_head list;
+ struct kobject kobj;
+};
+
+#define to_policy(k) container_of(k, struct pm_loss_policy, kobj)
+
+struct pm_loss_default_policy_table {
+ struct module *owner ;
+ const char *name ;
+ struct pm_loss_default_policy_item *items;
+ int nitems;
+};
+
+#ifdef CONFIG_PM_LOSS
+
+#ifdef CONFIG_PM_LOSS_DEBUG
+#define PM_LOSS_STR "pm_loss:"
+#define pr_debug_pm_loss(a, args...) \
+printk(KERN_INFO PM_LOSS_STR a, ## args)
+#else
+#define pr_debug_pm_loss(a, args...)
+#endif
+
+extern void pm_loss_power_changed(enum sys_power_state s);
+
+extern struct pm_loss_policy *
+pm_loss_register_policy(const struct pm_loss_policy_ops *ops,
+ struct kobj_type *ktype,
+ const char *name, void *priv);
+
+extern int pm_loss_unregister_policy(struct pm_loss_policy *);
+
+extern int pm_loss_set_policy(const char *name);
+
+struct pm_loss_default_policy_item {
+ const char *bus_name;
+ int bus_priority ;
+};
+
+extern struct pm_loss_policy *
+pm_loss_setup_default_policy(struct pm_loss_default_policy_table *);
+
+extern int pm_loss_shutdown_default_policy(struct pm_loss_policy *);
+
+#else /* !CONFIG_PM_LOSS */
+
+static inline struct pm_loss_policy *
+pm_loss_register_policy(const struct pm_loss_policy_ops *ops,
+ void *priv)
+{
+ return NULL ;
+}
+
+static inline int pm_loss_unregister_policy(struct pm_loss_policy *p)
+{
+ return -ENOSYS ;
+}
+
+static inline int pm_loss_set_policy(struct pm_loss_policy *p)
+{
+ return -ENOSYS ;
+}
+
+static inline struct pm_loss_policy *
+pm_loss_setup_default_policy(struct pm_loss_default_policy_table *t)
+{
+ return NULL;
+}
+
+static inline int
+pm_loss_shutdown_default_policy(struct pm_loss_policy *p)
+{
+ return -ENOSYS;
+}
+
+#endif /* !CONFIG_PM_LOSS */
+
+#endif /* _LINUX_PM_LOSS_H */
diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
index 2657299..7527847 100644
--- a/kernel/power/Kconfig
+++ b/kernel/power/Kconfig
@@ -236,6 +236,33 @@ config PM_RUNTIME
responsible for the actual handling of the autosuspend requests and
wake-up events.

+config PM_LOSS
+ bool "Help drivers reacting to system power losses"
+ depends on PM
+ ---help---
+ Enable power loss management. On some embedded systems, an
+ asynchronous power failure notification event is sometimes available
+ less than one second before the actual power failure takes place.
+ Custom policies can be setup to react to such an event. See
+ Documentation/power/loss.txt for more details.
+ If unsure, say No.
+
+config PM_LOSS_SIM
+ bool "Simulate power fail events"
+ depends on PM_LOSS
+ ---help---
+ Add the /sys/power/loss/pwrfail_sim file. Do
+ echo 1 > /sys/power/loss/pwrfail_sim
+ to simulate a power failure and
+ echo 0 > /sys/power/loss/pwrfail_sim
+ to simulate a power resume
+
+config PM_LOSS_DEBUG
+ bool "Enable power fail debug"
+ depends on PM_LOSS
+ ---help---
+ This option enables debugging printk's for the PM_LOSS framework
+
config PM_OPS
bool
depends on PM_SLEEP || PM_RUNTIME
--
1.7.0.4

2011-05-12 17:11:31

by Raffaele Recalcati

[permalink] [raw]
Subject: [PATCH 3/4] mmc: bus and block device drivers: support for pm_loss

From: Davide Ciminaghi <[email protected]>

Signed-off-by: Davide Ciminaghi <[email protected]>
Signed-off-by: Raffaele Recalcati <[email protected]>
---
drivers/mmc/card/block.c | 48 +++++++++++++++++++++++++++++++++++++++++++-
drivers/mmc/core/bus.c | 49 +++++++++++++++++++++++++++++++++++++++++----
2 files changed, 90 insertions(+), 7 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index bfc8a8a..c88afef 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -37,6 +37,9 @@
#include <linux/mmc/mmc.h>
#include <linux/mmc/sd.h>

+#include <linux/pm.h>
+#include <linux/pm_loss.h>
+
#include <asm/system.h>
#include <asm/uaccess.h>

@@ -755,14 +758,55 @@ static int mmc_blk_resume(struct mmc_card *card)
}
return 0;
}
-#else
+
+#ifdef CONFIG_PM_LOSS
+
+#define dev_to_mmc_card(d) container_of(d, struct mmc_card, dev)
+
+static int mmc_blk_power_changed(struct device *dev,
+ enum sys_power_state s)
+{
+ struct mmc_card *card = dev_to_mmc_card(dev);
+ struct mmc_blk_data *md = mmc_get_drvdata(card);
+
+ switch (s) {
+ case SYS_PWR_GOOD:
+ pr_debug_pm_loss("mmc_blk_power_changed(%d)\n", s);
+ mmc_blk_set_blksize(md, card);
+ mmc_queue_resume(&md->queue);
+ break;
+ case SYS_PWR_FAILING:
+ pr_debug_pm_loss("mmc_blk_power_changed(%d)\n", s);
+ mmc_queue_suspend(&md->queue);
+ break;
+ default:
+ BUG();
+ }
+ return 0;
+}
+
+static const struct dev_pm_ops mmc_blk_pm_ops = {
+ .power_changed = mmc_blk_power_changed,
+};
+
+#define MMC_BLK_DEV_PM_OPS_PTR (&mmc_blk_pm_ops)
+
+#else /* !CONFIG_PM_LOSS */
+
+#define MMC_BLK_DEV_PM_OPS_PTR NULL
+
+#endif /* !CONFIG_PM_LOSS */
+
+#else /* !CONFIG_PM */
#define mmc_blk_suspend NULL
#define mmc_blk_resume NULL
-#endif
+#define MMC_BLK_DEV_PM_OPS_PTR NULL
+#endif /* !CONFIG_PM */

static struct mmc_driver mmc_driver = {
.drv = {
.name = "mmcblk",
+ .pm = MMC_BLK_DEV_PM_OPS_PTR,
},
.probe = mmc_blk_probe,
.remove = mmc_blk_remove,
diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c
index 63667a8..548d3a9 100644
--- a/drivers/mmc/core/bus.c
+++ b/drivers/mmc/core/bus.c
@@ -19,6 +19,8 @@
#include <linux/mmc/card.h>
#include <linux/mmc/host.h>

+#include <linux/pm_loss.h>
+
#include "core.h"
#include "sdio_cis.h"
#include "bus.h"
@@ -163,19 +165,56 @@ static int mmc_runtime_idle(struct device *dev)
return pm_runtime_suspend(dev);
}

+#define MMC_PM_RUNTIME_OPS_INIT \
+.runtime_suspend = mmc_runtime_suspend, \
+.runtime_resume = mmc_runtime_resume, \
+.runtime_idle = mmc_runtime_idle,
+
+#else /* !CONFIG_PM_RUNTIME */
+
+#define MMC_PM_RUNTIME_OPS_INIT
+
+#endif /* !CONFIG_PM_RUNTIME */
+
+#ifdef CONFIG_PM_LOSS
+
+static int mmc_bus_power_changed(struct device *dev,
+ enum sys_power_state s)
+{
+ int ret = 0;
+
+ pr_debug_pm_loss("mmc_bus_power_changed()\n");
+
+ if (dev->driver && dev->driver->pm &&
+ dev->driver->pm->power_changed)
+ ret = dev->driver->pm->power_changed(dev, s);
+
+ return ret;
+}
+
+#define MMC_PM_LOSS_OPS_INIT \
+.power_changed = mmc_bus_power_changed,
+
+#else /* !CONFIG_PM_LOSS */
+
+#define MMC_PM_LOSS_OPS_INIT
+
+#endif /* !CONFIG_PM_LOSS */
+
+#if defined CONFIG_PM_RUNTIME || defined CONFIG_PM_LOSS
+
static const struct dev_pm_ops mmc_bus_pm_ops = {
- .runtime_suspend = mmc_runtime_suspend,
- .runtime_resume = mmc_runtime_resume,
- .runtime_idle = mmc_runtime_idle,
+ MMC_PM_RUNTIME_OPS_INIT
+ MMC_PM_LOSS_OPS_INIT
};

#define MMC_PM_OPS_PTR (&mmc_bus_pm_ops)

-#else /* !CONFIG_PM_RUNTIME */
+#else /* !(CONFIG_PM_RUNTIME || CONFIG_PM_LOSS) */

#define MMC_PM_OPS_PTR NULL

-#endif /* !CONFIG_PM_RUNTIME */
+#endif /* !(CONFIG_PM_RUNTIME || CONFIG_PM_LOSS) */

static struct bus_type mmc_bus_type = {
.name = "mmc",
--
1.7.0.4

2011-05-12 17:11:33

by Raffaele Recalcati

[permalink] [raw]
Subject: [PATCH 4/4] DaVinci: vpfe: support for pm_loss

From: Raffaele Recalcati <[email protected]>

Signed-off-by: Raffaele Recalcati <[email protected]>
---
drivers/media/video/davinci/vpfe_capture.c | 45 ++++++++++++++++++++++++++++
1 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/drivers/media/video/davinci/vpfe_capture.c b/drivers/media/video/davinci/vpfe_capture.c
index 353eada..21fa9bf 100644
--- a/drivers/media/video/davinci/vpfe_capture.c
+++ b/drivers/media/video/davinci/vpfe_capture.c
@@ -74,6 +74,8 @@
#include <media/v4l2-common.h>
#include <linux/io.h>
#include <media/davinci/vpfe_capture.h>
+#include <linux/pm.h>
+#include <linux/pm_loss.h>
#include "ccdc_hw_device.h"

static int debug;
@@ -2051,6 +2053,46 @@ static int __devexit vpfe_remove(struct platform_device *pdev)
return 0;
}

+#ifdef CONFIG_PM_LOSS
+static int vpfe_capture_power_changed(struct device *dev,
+ enum sys_power_state s)
+{
+ int ret;
+ struct vpfe_device *vpfe_dev = dev_get_drvdata(dev);
+ struct vpfe_subdev_info *sdinfo = vpfe_dev->current_subdev;
+
+ if (!sdinfo)
+ return -EINVAL;
+
+ if (!vpfe_dev->started) {
+ pr_debug_pm_loss("vpfe_capture_power_changed(%d) "
+ "BUT NOTHING TO DO\n", s);
+ return 0;
+ }
+
+ sdinfo = vpfe_dev->current_subdev;
+
+ switch (s) {
+ case SYS_PWR_GOOD:
+ pr_debug_pm_loss("vpfe_capture_power_changed(%d)\n", s);
+ ret = v4l2_device_call_until_err(&vpfe_dev->v4l2_dev,
+ sdinfo->grp_id,
+ video, s_stream, 1);
+ break;
+ case SYS_PWR_FAILING:
+ pr_debug_pm_loss("vpfe_capture_power_changed(%d)\n", s);
+ ret = v4l2_device_call_until_err(&vpfe_dev->v4l2_dev,
+ sdinfo->grp_id,
+ video, s_stream, 0);
+ break;
+ default:
+ BUG();
+ ret = -ENODEV;
+ }
+ return ret;
+}
+#endif /* CONFIG_PM_LOSS */
+
static int vpfe_suspend(struct device *dev)
{
return 0;
@@ -2064,6 +2106,9 @@ static int vpfe_resume(struct device *dev)
static const struct dev_pm_ops vpfe_dev_pm_ops = {
.suspend = vpfe_suspend,
.resume = vpfe_resume,
+#ifdef CONFIG_PM_LOSS
+ .power_changed = vpfe_capture_power_changed,
+#endif
};

static struct platform_driver vpfe_driver = {
--
1.7.0.4

2011-05-12 19:27:27

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: pm loss development

On Thursday, May 12, 2011, Raffaele Recalcati wrote:
> What happen normally in runtime pm implementation is that every devices
> are switched off and are enabled only when needed.
> In our case instead we have a completely functional embedded system and,
> when an asyncrhonous event appear, we have only some tens milliseconds
> before the actual power failure takes place.
> This patchset add a support in order to switch off not vital part of the system,
> in order to allow the board to survive longer.
> This allow the possibility to save important data.

OK, so first, who decides what parts of the system are vital and what aren't?

Thanks,
Rafael

2011-05-12 19:28:19

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 1/4] export bus_kset

On Thursday, May 12, 2011, Raffaele Recalcati wrote:
> From: Davide Ciminaghi <[email protected]>

Please explain why you need to export it, what the alternatives are and
why you think this approach is better than the alternatives.

Thanks,
Rafael


> Signed-off-by: Davide Ciminaghi <[email protected]>
> ---
> drivers/base/bus.c | 3 ++-
> include/linux/kobject.h | 2 ++
> 2 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/base/bus.c b/drivers/base/bus.c
> index 000e7b2..2134248 100644
> --- a/drivers/base/bus.c
> +++ b/drivers/base/bus.c
> @@ -158,7 +158,8 @@ static const struct kset_uevent_ops bus_uevent_ops = {
> .filter = bus_uevent_filter,
> };
>
> -static struct kset *bus_kset;
> +struct kset *bus_kset;
> +EXPORT_SYMBOL(bus_kset);
>
>
> #ifdef CONFIG_HOTPLUG
> diff --git a/include/linux/kobject.h b/include/linux/kobject.h
> index 8f6d121..456b20d 100644
> --- a/include/linux/kobject.h
> +++ b/include/linux/kobject.h
> @@ -205,6 +205,8 @@ extern struct kobject *power_kobj;
> /* The global /sys/firmware/ kobject for people to chain off of */
> extern struct kobject *firmware_kobj;
>
> +extern struct kset *bus_kset ;
> +
> #if defined(CONFIG_HOTPLUG)
> int kobject_uevent(struct kobject *kobj, enum kobject_action action);
> int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
>

2011-05-12 19:57:15

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 2/4] PM / Loss: power loss management

Hi,

On Thursday, May 12, 2011, Raffaele Recalcati wrote:
> From: Davide Ciminaghi <[email protected]>
>
> On some embedded systems, an asynchronous power failure notification event is
> available some tens to hundreds milliseconds before the actual power failure
> takes place.
> Such an event can be used to trigger some actions, typically shutting down
> all non-vital power sinks, thus allowing the board to survive longer to
> temporary power losses.
>
> Signed-off-by: Davide Ciminaghi <[email protected]>
> Signed-off-by: Raffaele Recalcati <[email protected]>
> ---
> Documentation/power/loss.txt | 191 +++++++++++++
> drivers/base/bus.c | 6 +
> drivers/base/init.c | 1 +
> drivers/base/platform.c | 14 +
> drivers/base/power/Makefile | 2 +
> drivers/base/power/loss.c | 648 ++++++++++++++++++++++++++++++++++++++++++
> drivers/base/power/power.h | 14 +
> include/linux/pm.h | 16 +
> include/linux/pm_loss.h | 109 +++++++
> kernel/power/Kconfig | 27 ++
> 10 files changed, 1028 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/power/loss.txt
> create mode 100644 drivers/base/power/loss.c
> create mode 100644 include/linux/pm_loss.h
>
> diff --git a/Documentation/power/loss.txt b/Documentation/power/loss.txt
> new file mode 100644
> index 0000000..10da89c
> --- /dev/null
> +++ b/Documentation/power/loss.txt
> @@ -0,0 +1,191 @@
> +**** POWER LOSS MANAGEMENT ****
> +
> +Davide Ciminaghi <[email protected]> 2011
> +
> +1. Overview
> +
> +On some embedded systems, an asynchronous power failure notification event is
> +available some tens to hundreds milliseconds before the actual power failure
> +takes place.
> +Such an event can be used to trigger some actions, typically shutting down
> +all non-vital power sinks, thus allowing the board to survive longer to
> +temporary power losses. Sometimes, also flash-based block devices can be
> +stopped after a power loss event notification has been received. This should
> +be useful for mmc devices, for which a sudden power failure while a write
> +command is being executed can threaten file system integrity even in case a
> +journalling fs is in use.
> +Generally speaking, one can expect the course of action taken when a power
> +failure warning is received to be deeply system specific. Similarly, the
> +mechanism used for power failure notifications can equally be board/platform
> +specific. For these reasons, support for power loss management has been split
> +into three parts:
> +
> + + Generic code (board and driver independent).
> + + Board dependent code.
> + + Power loss policy definition.
> +
> +The generic part of the code is located under drivers/base/power/loss.c .
> +On the other hand, board dependent code and power loss policies definitions
> +should live somewhere in the platform/board specific files.

I'm not really sure about that. Do you want to hardcode policies in some
platform/board-specific files inside of the kernel?

> +The header file include/linux/pm_loss.h contains all the pm_loss function
> +prototypes, together with definitions of data structures.
> +For what concerns power loss policies, the framework already contains a couple
> +of predefined policies: "nop" and "default" (see later paragraphs for more
> +details).

I think it would be cleaner to split the patch so that the actual functionality
is added first and the policy part goes in a separate patch on top of that.

> +
> +1.1 Sysfs interface.
> +
> +It can be useful (for instance during tests), to be able to quickly switch
> +from one power loss policy to another, or to simulate power fail and resume
> +events. To this end, a sysfs interface of the following form has been devised:
> +
> +/sys/power/loss +
> + |
> + +-- active_policy
> + |
> + +-- policies -+
> + | |
> + | +-- nop
> + | |
> + | +-- default +
> + | | |
> + | | +-- bus_table
> + | |
> + | +-- ....
> + |
> + +-- pwrfail_sim
> +
> +2. Details
> +
> +2.1 Sending events to the power loss management core.
> +
> +The board specific code shall trigger a power failure event notification by
> +calling pm_loss_power_changed(SYS_PWR_FAILING).
> +In case of a temporary power loss, the same function shall be called with
> +SYS_PWR_GOOD argument on power resume. pm_loss_power_changed() can sleep, so
> +it shall not be called from interrupt context.
> +
> +2.3 Effects on bus and device drivers.
> +
> +One more entry has been added to the device PM callbacks:
> +
> + int (*power_changed)(struct device *, enum sys_power_state);

Please don't use the second argument. Make it two (or as many as you need)
callbacks each with one struct device * argument instead (following the
convention we have in struct dev_pm_ops already).

That said, I'm not quite sure we _need_ those additional callbacks at all.
Your example mmc implementation appears to use a simple "runtime suspend on
power fail, runtime resume on power good" approach, for which it's not
necessary to add any new callbacks.

> +This function can be optionally invoked by power loss policies in case of
> +system power changes (loss and/or resume). Of course every bus or device driver
> +can react to such events in some specific way. For instance the mmc block
> +driver will probably block its request queue during a temporary power loss.

I think you'd have to deal with user space somehow before suspending devices.
Runtime PM suspends devices when they aren't in use (as indicated by the
usage counters), but in this power loss case we may really end up suspending
devices that _are_ in use, right?

> +2.3.1 The platform bus.
> +
> +For what concerns platform bus drivers, platform specific code can override
> +the power_changed() callback :
> +
> +platform_pm_loss_power_changed(struct device *dev, enum sys_power_state s)
> +
> +whose default (empty) version is defined as a __weak symbol in
> +drivers/base/platform.c.

Please don't use any more __weak symbols in that file.

> +2.4 Power loss policies.
> +
> +A power loss policy can be registered via the pm_loss_register_policy()
> +function, which receives:
> +
> + + A pointer to a struct pm_loss_policy_ops , which hosts the pointers
> + to the policy's specific callbacks (see below for more details).
> + + A pointer to a struct kobj_type, which will allow the pm loss core
> + to setup the policy related directory under /sys/power/loss/policies.
> + See Documentation/kobject.txt for more details on struct kobj_type.
> + + A pointer to an array of chars containing the name of the policy.
> + + A pointer to the policy's private data (void *).

This sounds like quilte a bit of new infrastructure and it seems it might
be implemented in a more straightforward way (presumably in a less general
but still useful manner).

> +A power loss policy can define the following callbacks:
> +
> + + int (*bus_added)(struct pm_loss_policy *, struct bus_type *) : this
> + is invoked whenever a new bus has been registered within the system.
> + Since power related events are often managed at bus level, it can be
> + useful for the policy to have a list of available busses within the
> + system. When a policy is registered, this callback is invoked once
> + for every already registered bus.

Well, we have a list of registered bus types anyway. They may or may not
register "power loss" callbacks. Why don't you simply execute the callbacks
from the bus types that defined them? What exactly is the purpose of the
new list?

> + + int (*bus_removed)(struct pm_loss_policy *, struct bus_type *):
> + this is invoked when a bus is removed from the system.
> + + int (*start)(struct pm_loss_policy *): this is called when a policy
> + becomes active.
> + + void (*stop)(struct pm_loss_policy *): this is called when a policy
> + becomes inactive.

What are the start/stop callbacks needed for? IOW, what's your envisioned
use case for those callbacks?

> +2.4.1 The nop predefined policy
> +
> +The nop policy is the first active policy when the pm loss framework is
> +initialized. It just does nothing in case of power loss / power resume
> +events.
> +
> +2.4.2 The default predefined policy
> +
> +When a power failure warning is received, the default policy walks through a
> +list of critical busses and invokes their drivers' power_changed() callback.

Why not to walk all bus types here?

> +The default policy can be customized and registered by calling:
> +
> + pm_loss_setup_default_policy(struct pm_loss_default_policy_table *);
> +
> +This function receives a pointer to a pm_loss_default_policy_table structure,
> +which defines a priority ordered list of critical buffers:
> +
> + struct pm_loss_default_policy_table {
> + struct module *owner ;
> + const char *name ;
> + struct pm_loss_default_policy_item *items;
> + int nitems;
> + };
> +
> +Here's a short description of such structure:
> +
> + + owner : shall point to the module registering the table (or NULL
> + if the table is not instantiated by a module).
> + + name : this is the name given to this particular customization of
> + the default policy.
> + + items : pointer to an array of table items.
> + + nitems : number of the items in the array.
> +
> +Each item is a struct pm_loss_default_policy_item, defined as follows:
> +
> + struct pm_loss_default_policy_item {
> + const char *bus_name;
> + int bus_priority ;
> + };
> +
> +where bus_name is the name of a bus and bus_priority is a numerical priority
> +of the bus itself. Numerically higher priorities correspond to more prioritary
> +busses.

So I can understand the need for priorities, but why do you use the name
instead of a pointer to struct bus_type?

> +
> +2.4.3 Activating a specific policy.
> +
> +A policy can be activated:
> +
> + + From within the kernel by calling pm_loss_set_policy(char *name).
> + The argument passed to this function shall be the name of the policy
> + to be activated.
> +
> + + From user space by writing the name of the policy to be activated
> + to /sys/power/loss/active_policy.
> +
> +2.4.4 Unregistering a policy
> +
> +For a generic user defined policy, just call :
> +
> + pm_loss_unregister_policy(struct pm_loss_policy *);
> +
> +For a default policy customization:
> +
> + pm_loss_shutdown_default_policy(struct pm_loss_policy *);
> +
> +3. Kernel configuration
> +
> +The following configuration options have been defined:
> +
> + CONFIG_PM_LOSS : this enables the generic pm_loss framework.
> + CONFIG_PM_LOSS_SIM : this adds the pwrfail_sim file to the sysfs interface
> + and allows power loss events simulation.
> + CONFIG_PM_LOSS_DEBUG : this option enables some debug printk's .
> + CONFIG_PM_LOSS_TEST: this enables the compilation of a sample test module
> + containing a customized default policy definition

OK, so I think that this feature may be kind of useful, but it will require
some work before it's ready to go into the kernel.

Please separate the policy part for starters and let's see what's left.

Thanks,
Rafael

2011-05-13 06:39:44

by Raffaele Recalcati

[permalink] [raw]
Subject: Re: pm loss development

Hi Rafael,

2011/5/12 Rafael J. Wysocki <[email protected]>:
> On Thursday, May 12, 2011, Raffaele Recalcati wrote:
>> What happen normally in runtime pm implementation is that every devices
>> are switched off and are enabled only when needed.
>> In our case instead we have a completely functional embedded system and,
>> when an asyncrhonous event appear, we have only some tens milliseconds
>> before the actual power failure takes place.
>> This patchset add a support in order to switch off not vital part of the system,
>> in order to allow the board to survive longer.
>> This allow the possibility to save important data.
>
> OK, so first, who decides what parts of the system are vital and what aren't?

Take a quick look at Documentation/power/loss.txt paragrpah "2.4
Power loss policies".
You can decide what can be powered off.

Thanks,
Raffaele



--
http://www.opensurf.it

2011-05-13 13:14:19

by Davide Ciminaghi

[permalink] [raw]
Subject: Re: [PATCH 2/4] PM / Loss: power loss management

On Thu, May 12, 2011 at 09:57:46PM +0200, Rafael J. Wysocki wrote:
> Hi,
>
> On Thursday, May 12, 2011, Raffaele Recalcati wrote:
> > From: Davide Ciminaghi <[email protected]>
> >

hello, I (Davide) will reply as the patch author.

....................
>
> I'm not really sure about that. Do you want to hardcode policies in some
> platform/board-specific files inside of the kernel?
>

I think a little history of the patch can help: it looks like recent mmc
devices can do any sort of weird things in case of sudden power losses during
write operations. This is still to be confirmed by actual tests on our
particular hardware, but we suspect that even a journalling fs could be
corrupted in such a case.
Stopping the mmc request queue when a power loss is detected, does not
completely solve the problem, but at least mitigates the risk of disasters.
Plus, we needed to get our board to immediately turn off any unneeded power
sink in case of temporary power losses (just try to survive as long as you
can ...).
Our hardware triggers an interrupt about 100 millisenconds before the cpu
dies, so we needed to notify the drivers about the event as fast as we could.
That was one of the two main reasons why we chose to put this stuff inside
the kernel. The other reason was that the mmc request queue can be easily
stopped from kernel space.
Since forcing a given policy seemed the wrong thing to do, we chose to adopt
a modular policy approach.

> > +The header file include/linux/pm_loss.h contains all the pm_loss function
> > +prototypes, together with definitions of data structures.
> > +For what concerns power loss policies, the framework already contains a couple
> > +of predefined policies: "nop" and "default" (see later paragraphs for more
> > +details).
>
> I think it would be cleaner to split the patch so that the actual functionality
> is added first and the policy part goes in a separate patch on top of that.
>

agreed.

> > +
> > +1.1 Sysfs interface.
> > +
> > +It can be useful (for instance during tests), to be able to quickly switch
> > +from one power loss policy to another, or to simulate power fail and resume
> > +events. To this end, a sysfs interface of the following form has been devised:
> > +
> > +/sys/power/loss +
> > + |
> > + +-- active_policy
> > + |
> > + +-- policies -+
> > + | |
> > + | +-- nop
> > + | |
> > + | +-- default +
> > + | | |
> > + | | +-- bus_table
> > + | |
> > + | +-- ....
> > + |
> > + +-- pwrfail_sim
> > +
> > +2. Details
> > +
> > +2.1 Sending events to the power loss management core.
> > +
> > +The board specific code shall trigger a power failure event notification by
> > +calling pm_loss_power_changed(SYS_PWR_FAILING).
> > +In case of a temporary power loss, the same function shall be called with
> > +SYS_PWR_GOOD argument on power resume. pm_loss_power_changed() can sleep, so
> > +it shall not be called from interrupt context.
> > +
> > +2.3 Effects on bus and device drivers.
> > +
> > +One more entry has been added to the device PM callbacks:
> > +
> > + int (*power_changed)(struct device *, enum sys_power_state);
>
> Please don't use the second argument. Make it two (or as many as you need)
> callbacks each with one struct device * argument instead (following the
> convention we have in struct dev_pm_ops already).
>
> That said, I'm not quite sure we _need_ those additional callbacks at all.
> Your example mmc implementation appears to use a simple "runtime suspend on
> power fail, runtime resume on power good" approach, for which it's not
> necessary to add any new callbacks.
>

maybe this is design choice comes from our little knowledge of pm_runtime,
but the reason why we didn't use the existing callbacks was that we wanted
to avoid any possible conflict.
In our understanding, pm_runtime works more or less like this: "when
you're done with a peripheral, turn it off" (well, maybe also wait a little
while to avoid continuously toggling power on and off, thus wasting more
power than just doing nothing). This is because its goal is to save power
during normal operation.
On the other hand, we needed something like this: "when some external event
takes place, just turn off anything you can as fast as you can".
Actually, one of the attempts we made before adding pm_loss was to implement
it via pm_runtime, by just immediately stopping devices on idle and refusing
to turn them on again in case a power loss event took place in the meanwhile.
This worked, and was ok for drivers with no implementation of the pm_runtime
callbacks, but introduced a potential conflict as far as pm_runtime enabled
drivers were concerned. That's why we (maybe wrongfully) thought a new
callback was needed.

> > +This function can be optionally invoked by power loss policies in case of
> > +system power changes (loss and/or resume). Of course every bus or device driver
> > +can react to such events in some specific way. For instance the mmc block
> > +driver will probably block its request queue during a temporary power loss.
>
> I think you'd have to deal with user space somehow before suspending devices.
> Runtime PM suspends devices when they aren't in use (as indicated by the
> usage counters), but in this power loss case we may really end up suspending
> devices that _are_ in use, right?
>
yes. This is not a problem for us right now, because when a power loss happens,
the "power bad" condition just lasts for about 100ms. At present, pm_loss
callbacks are implemented for the mmc block device and a video capture device
only.
In practice what happens on our platform is that processes accessing the mmc
can be put to sleep until either the board dies or a power resume event takes
place. If video capture is running, some "black" frames are generated, but
that is not considered a real fault because power losses should not occur
during normal operation (unless you really turn the board off, of course).

That said, I think you're right anyway, userspace should be notified somehow.
SIGPWR to init ?. As far as I know, there's no single place in the kernel
where SIGPWR is sent to init:

find . -name \*.c | xargs grep SIGPWR
...
./drivers/char/snsc_event.c: kill_cad_pid(SIGPWR, 0);
...
./arch/s390/kernel/nmi.c: kill_cad_pid(SIGPWR, 1);

Maybe pm_loss could become such a place ?

> > +2.3.1 The platform bus.
> > +
> > +For what concerns platform bus drivers, platform specific code can override
> > +the power_changed() callback :
> > +
> > +platform_pm_loss_power_changed(struct device *dev, enum sys_power_state s)
> > +
> > +whose default (empty) version is defined as a __weak symbol in
> > +drivers/base/platform.c.
>
> Please don't use any more __weak symbols in that file.
>
ok.

> > +2.4 Power loss policies.
> > +
> > +A power loss policy can be registered via the pm_loss_register_policy()
> > +function, which receives:
> > +
> > + + A pointer to a struct pm_loss_policy_ops , which hosts the pointers
> > + to the policy's specific callbacks (see below for more details).
> > + + A pointer to a struct kobj_type, which will allow the pm loss core
> > + to setup the policy related directory under /sys/power/loss/policies.
> > + See Documentation/kobject.txt for more details on struct kobj_type.
> > + + A pointer to an array of chars containing the name of the policy.
> > + + A pointer to the policy's private data (void *).
>
> This sounds like quilte a bit of new infrastructure and it seems it might
> be implemented in a more straightforward way (presumably in a less general
> but still useful manner).
>
well, we just followed the most generic approach. Maybe it could be simplified.

> > +A power loss policy can define the following callbacks:
> > +
> > + + int (*bus_added)(struct pm_loss_policy *, struct bus_type *) : this
> > + is invoked whenever a new bus has been registered within the system.
> > + Since power related events are often managed at bus level, it can be
> > + useful for the policy to have a list of available busses within the
> > + system. When a policy is registered, this callback is invoked once
> > + for every already registered bus.
>
> Well, we have a list of registered bus types anyway. They may or may not
> register "power loss" callbacks. Why don't you simply execute the callbacks
> from the bus types that defined them? What exactly is the purpose of the
> new list?
>
this list is ordered by priority.

> > + + int (*bus_removed)(struct pm_loss_policy *, struct bus_type *):
> > + this is invoked when a bus is removed from the system.
> > + + int (*start)(struct pm_loss_policy *): this is called when a policy
> > + becomes active.
> > + + void (*stop)(struct pm_loss_policy *): this is called when a policy
> > + becomes inactive.
>
> What are the start/stop callbacks needed for? IOW, what's your envisioned
> use case for those callbacks?
>

Honestly they were added without having nothing special in mind (it just looked
more flexible), and in fact we were going to remove them, until they showed
useful if a policy table is owned by a module. In such case
you need some place to do module_get/put, and start/stop looked just the
right place. So, since we found at least one possible use for them, we decided
to leave them in place.
Actually, your question made me review the code and I've just found that
try_module_get() is called from pm_loss_setup_default_policy(), not from
the start() callback of the default policy. This looks strange, I think I
should review this part.

> > +2.4.1 The nop predefined policy
> > +
> > +The nop policy is the first active policy when the pm loss framework is
> > +initialized. It just does nothing in case of power loss / power resume
> > +events.
> > +
> > +2.4.2 The default predefined policy
> > +
> > +When a power failure warning is received, the default policy walks through a
> > +list of critical busses and invokes their drivers' power_changed() callback.
>
> Why not to walk all bus types here?
>
because of priorities. More prioritary busses are notified first.

> > +The default policy can be customized and registered by calling:
> > +
> > + pm_loss_setup_default_policy(struct pm_loss_default_policy_table *);
> > +
> > +This function receives a pointer to a pm_loss_default_policy_table structure,
> > +which defines a priority ordered list of critical buffers:
> > +
> > + struct pm_loss_default_policy_table {
> > + struct module *owner ;
> > + const char *name ;
> > + struct pm_loss_default_policy_item *items;
> > + int nitems;
> > + };
> > +
> > +Here's a short description of such structure:
> > +
> > + + owner : shall point to the module registering the table (or NULL
> > + if the table is not instantiated by a module).
> > + + name : this is the name given to this particular customization of
> > + the default policy.
> > + + items : pointer to an array of table items.
> > + + nitems : number of the items in the array.
> > +
> > +Each item is a struct pm_loss_default_policy_item, defined as follows:
> > +
> > + struct pm_loss_default_policy_item {
> > + const char *bus_name;
> > + int bus_priority ;
> > + };
> > +
> > +where bus_name is the name of a bus and bus_priority is a numerical priority
> > +of the bus itself. Numerically higher priorities correspond to more prioritary
> > +busses.
>
> So I can understand the need for priorities, but why do you use the name
> instead of a pointer to struct bus_type?
>
The idea here was to make it simpler writing a bus priority table.
For instance this is how a sample table looks now:

struct pm_loss_default_policy_item my_board_pm_loss_policy_items[] = {
{
.bus_name = "mmc",
.bus_priority = MY_BOARD_PWR_FAIL_PRIO_1,
},
{
.bus_name = "platform",
.bus_priority = MY_BOARD_PWR_FAIL_PRIO_2,
},
};

A struct bus_type * would be less immediate.
Using a name is slower, but the operation of finding a bus given its name
is only done when you register the policy (or when a new bus is added after
policy registration).

> > +
> > +2.4.3 Activating a specific policy.
> > +
> > +A policy can be activated:
> > +
> > + + From within the kernel by calling pm_loss_set_policy(char *name).
> > + The argument passed to this function shall be the name of the policy
> > + to be activated.
> > +
> > + + From user space by writing the name of the policy to be activated
> > + to /sys/power/loss/active_policy.
> > +
> > +2.4.4 Unregistering a policy
> > +
> > +For a generic user defined policy, just call :
> > +
> > + pm_loss_unregister_policy(struct pm_loss_policy *);
> > +
> > +For a default policy customization:
> > +
> > + pm_loss_shutdown_default_policy(struct pm_loss_policy *);
> > +
> > +3. Kernel configuration
> > +
> > +The following configuration options have been defined:
> > +
> > + CONFIG_PM_LOSS : this enables the generic pm_loss framework.
> > + CONFIG_PM_LOSS_SIM : this adds the pwrfail_sim file to the sysfs interface
> > + and allows power loss events simulation.
> > + CONFIG_PM_LOSS_DEBUG : this option enables some debug printk's .
> > + CONFIG_PM_LOSS_TEST: this enables the compilation of a sample test module
> > + containing a customized default policy definition
>
> OK, so I think that this feature may be kind of useful, but it will require
> some work before it's ready to go into the kernel.
>
> Please separate the policy part for starters and let's see what's left.
>
ok.

Thanks and regards
Davide

2011-05-13 16:54:46

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: pm loss development

On Friday, May 13, 2011, Raffaele Recalcati wrote:
> Hi Rafael,
>
> 2011/5/12 Rafael J. Wysocki <[email protected]>:
> > On Thursday, May 12, 2011, Raffaele Recalcati wrote:
> >> What happen normally in runtime pm implementation is that every devices
> >> are switched off and are enabled only when needed.
> >> In our case instead we have a completely functional embedded system and,
> >> when an asyncrhonous event appear, we have only some tens milliseconds
> >> before the actual power failure takes place.
> >> This patchset add a support in order to switch off not vital part of the system,
> >> in order to allow the board to survive longer.
> >> This allow the possibility to save important data.
> >
> > OK, so first, who decides what parts of the system are vital and what aren't?
>
> Take a quick look at Documentation/power/loss.txt paragrpah "2.4
> Power loss policies".
> You can decide what can be powered off.

I read the patches. My question was about the general idea of who should
be responsible of making these decisions.

Thanks,
Rafael

2011-05-13 17:12:05

by Davide Ciminaghi

[permalink] [raw]
Subject: Re: [PATCH 1/4] export bus_kset

On Thu, May 12, 2011 at 09:28:53PM +0200, Rafael J. Wysocki wrote:

hi,

sorry, I missed this message this morning.

> On Thursday, May 12, 2011, Raffaele Recalcati wrote:
> > From: Davide Ciminaghi <[email protected]>
>
> Please explain why you need to export it, what the alternatives are and
> why you think this approach is better than the alternatives.
>

what I needed to do was walking through the list of registered busses,
and invoking the bus_added()/bus_removed() callback of a newly registered
policy. I couldn't find any other simple way to do it.

Regards
Davide

2011-05-14 16:24:33

by mark gross

[permalink] [raw]
Subject: Re: [linux-pm] pm loss development

On Thu, May 12, 2011 at 07:11:01PM +0200, Raffaele Recalcati wrote:
> What happen normally in runtime pm implementation is that every devices
> are switched off and are enabled only when needed.
> In our case instead we have a completely functional embedded system and,
> when an asyncrhonous event appear, we have only some tens milliseconds
> before the actual power failure takes place.

Very interesting! I've been worried about a similar failure on battery
driven devices that can experience significant voltage droops when
battery gets old, or low, and we turn on the flashlight led, vibrator
and make the screen bright while a high volume ring tone gets played.

I think 10ms is a bit unrealistic. I think its more like 300uSec before
you hit brown out.

> This patchset add a support in order to switch off not vital part of the system,
> in order to allow the board to survive longer.
> This allow the possibility to save important data.
>
> The implementation has been written by Davide Ciminaghi.
> My work instead was about analyzing different previuos implementation,
> a first completely custom one, a second using runtime pm, arriving
> finally to that one.
>
> I have tested PM loss in our DaVinci dm365 basi board and I write here below a
> piece of code showing a possible usage.
>
> -------------------
>
> static int powerfail_status;
>
> static irqreturn_t basi_powerfail_stop(int irq, void *dev_id);
>
> static irqreturn_t basi_powerfail_quick_check_start(int irq, void *dev_id)
> {
> basi_mask_irq_gpio0(IRQ_DM365_GPIO0_2);
> basi_unmask_irq_gpio0(IRQ_DM365_GPIO0_0);
>
> /* PowerFail situation - START: power is going away */
> return IRQ_WAKE_THREAD;
> }
>
> static irqreturn_t basi_powerfail_start(int irq, void *dev_id)
> {
> if (powerfail_status)
> return IRQ_HANDLED;
> powerfail_status = 1;
> pm_loss_power_changed(SYS_PWR_FAILING);
> return IRQ_HANDLED;
> }
>
>
> static irqreturn_t basi_powerfail_quick_check_stop(int irq, void *dev_id)
> {
> basi_mask_irq_gpio0(IRQ_DM365_GPIO0_0);
> basi_unmask_irq_gpio0(IRQ_DM365_GPIO0_2);
>
> /* PowerFail situation - STOP: power is coming back */
> return IRQ_WAKE_THREAD;
> }
>
> static irqreturn_t basi_powerfail_stop(int irq, void *dev_id)
> {
> if (!powerfail_status)
> return IRQ_HANDLED;
> powerfail_status = 0;
> pm_loss_power_changed(SYS_PWR_GOOD);
> return IRQ_HANDLED;
> }
>
> enum basi_pwrfail_prio {
> BASI_PWR_FAIL_PRIO_0,
> BASI_PWR_FAIL_MIN_PRIO = BASI_PWR_FAIL_PRIO_0,
> BASI_PWR_FAIL_PRIO_1,
> BASI_PWR_FAIL_PRIO_2,
> BASI_PWR_FAIL_PRIO_3,
> BASI_PWR_FAIL_MAX_PRIO = BASI_PWR_FAIL_PRIO_3,
> };
>
> struct pm_loss_default_policy_item basi_pm_loss_policy_items[] = {
> {
> .bus_name = "mmc",
> .bus_priority = BASI_PWR_FAIL_PRIO_1,
> },
> {
> .bus_name = "platform",
> .bus_priority = BASI_PWR_FAIL_PRIO_2,
> },
> };
>
> #define BASI_POLICY_NAME "basi-default"
>
> struct pm_loss_default_policy_table basi_pm_loss_policy_table = {
> .name = BASI_POLICY_NAME,
> .items = basi_pm_loss_policy_items,
> .nitems = ARRAY_SIZE(basi_pm_loss_policy_items),
> };
>
> static void basi_powerfail_configure(void)
> {
> int stat;
> struct pm_loss_policy *p;
> stat = request_threaded_irq(IRQ_DM365_GPIO0_2,
Is this some comparator device that tugs on this gpio when the voltage
drops or goes to 0? Is threaded irq fast enough?

Could we consider something that includes a hot path ISR based
notification call back to do stuff like blink off devices that don't
need to save state; backlights, vibrators, flashlight LEDs, audio
output drivers <-- I'm not sure about audio HW, and then a slower path
for other things that can be put into lower power states?

the all-clear notification that power is good again should be on a
slower path I would assume.

--mark

> basi_powerfail_quick_check_start,
> basi_powerfail_start,
> 0,
> "pwrfail-on", NULL);
> if (stat < 0)
> printk(KERN_ERR "request_threaded_irq for IRQ%d (pwrfail-on) "
> "failed\n", IRQ_DM365_GPIO0_2);
> stat = request_threaded_irq(IRQ_DM365_GPIO0_0,
> basi_powerfail_quick_check_stop,
> basi_powerfail_stop, 0,
> "pwrfail-off", NULL);
> if (stat < 0)
> printk(KERN_ERR "request_threaded_irq for IRQ%d (pwrfail-off) "
> "failed\n", IRQ_DM365_GPIO0_0);
> basi_mask_irq_gpio0(IRQ_DM365_GPIO0_0);
> p = pm_loss_setup_default_policy(&basi_pm_loss_policy_table);
>
> if (!p)
> printk(KERN_ERR "Could not register pwr change policy\n");
>
> if (pm_loss_set_policy(BASI_POLICY_NAME) < 0)
> printk(KERN_ERR "Could not set %s power loss policy\n",
> BASI_POLICY_NAME);
> }
>
> int platform_pm_loss_power_changed(struct device *dev,
> enum sys_power_state s)
> {
> int ret = 0;
>
> /* Calling platform bus pm_loss functions */
> pr_debug_pm_loss("platform_pm_loss_power_changed(%d)\n", s);
>
> if (dev->driver && dev->driver->pm &&
> dev->driver->pm->power_changed)
> ret = dev->driver->pm->power_changed(dev, s);
> return ret;
> }
>
>
>
>
> _______________________________________________
> linux-pm mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm

2011-05-14 16:35:21

by mark gross

[permalink] [raw]
Subject: Re: [linux-pm] pm loss development

On Fri, May 13, 2011 at 06:54:57PM +0200, Rafael J. Wysocki wrote:
> On Friday, May 13, 2011, Raffaele Recalcati wrote:
> > Hi Rafael,
> >
> > 2011/5/12 Rafael J. Wysocki <[email protected]>:
> > > On Thursday, May 12, 2011, Raffaele Recalcati wrote:
> > >> What happen normally in runtime pm implementation is that every devices
> > >> are switched off and are enabled only when needed.
> > >> In our case instead we have a completely functional embedded system and,
> > >> when an asyncrhonous event appear, we have only some tens milliseconds
> > >> before the actual power failure takes place.
> > >> This patchset add a support in order to switch off not vital part of the system,
> > >> in order to allow the board to survive longer.
> > >> This allow the possibility to save important data.
> > >
> > > OK, so first, who decides what parts of the system are vital and what aren't?
> >
> > Take a quick look at Documentation/power/loss.txt paragrpah "2.4
> > Power loss policies".
> > You can decide what can be powered off.
>
> I read the patches. My question was about the general idea of who should
> be responsible of making these decisions.

I would expect the system integrator would based on the application the
device is getting deployed into.

A generic opportunistic policy for peripherals that are stateless and can
be trivially power gated off/on from an ISR could be a default but, for
peripherals that need to do some processing (like waiting on an eMMC DMA
to complete) can take time to power down into a safe state.

--mark

>
> Thanks,
> Rafael
> _______________________________________________
> linux-pm mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm

2011-05-14 18:51:47

by Oliver Neukum

[permalink] [raw]
Subject: Re: [linux-pm] pm loss development

Am Donnerstag, 12. Mai 2011, 21:27:44 schrieb Rafael J. Wysocki:
> On Thursday, May 12, 2011, Raffaele Recalcati wrote:
> > What happen normally in runtime pm implementation is that every devices
> > are switched off and are enabled only when needed.
> > In our case instead we have a completely functional embedded system and,
> > when an asyncrhonous event appear, we have only some tens milliseconds
> > before the actual power failure takes place.
> > This patchset add a support in order to switch off not vital part of the system,
> > in order to allow the board to survive longer.
> > This allow the possibility to save important data.
>
> OK, so first, who decides what parts of the system are vital and what aren't?

If you know that power is failing in a few miliseconds, only stuff that can lead to data
corruption is vital. In that timeframe you can't even flush buffers.

Regards
Oliver

2011-05-14 20:22:01

by Raffaele Recalcati

[permalink] [raw]
Subject: Re: pm loss development

> I read the patches.  My question was about the general idea of who should
> be responsible of making these decisions.

The best should be, I think, to have some guidelines and than the
possibility to choose the best policy for each situation.
In my board I needed to shutdown video in capture and demodulator
circuit, so I have implemented vpfe capture switch off, that does
stream_off to all its v4l2 subdevices (a pal decoder and a video
demodulator).
So I can save 30mA, and it allows to my board to survive longer.
I need to do some tests and have some data with and without PM loss.

Bye,
raffaele
--
http://www.opensurf.it

2011-05-14 20:30:38

by Raffaele Recalcati

[permalink] [raw]
Subject: Re: [linux-pm] pm loss development

On Sat, May 14, 2011 at 6:24 PM, mark gross <[email protected]> wrote:
> On Thu, May 12, 2011 at 07:11:01PM +0200, Raffaele Recalcati wrote:
>> What happen normally in runtime pm implementation is that every devices
>> are switched off and are enabled only when needed.
>> In our case instead we have a completely functional embedded system and,
>> when an asyncrhonous event appear, we have only some tens milliseconds
>> before the actual power failure takes place.
>
> Very interesting!  I've been worried about a similar failure on battery
> driven devices that can experience significant voltage droops when
> battery gets old, or low, and we turn on the flashlight led, vibrator
> and make the screen bright while a high volume ring tone gets played.
>
> I think 10ms is a bit unrealistic.  I think its more like 300uSec before
> you hit brown out.

In our circuit it is the real timing, but I can understand your situation.
Are you sure about 300usec?

>> This patchset add a support in order to switch off not vital part of the system,
>> in order to allow the board to survive longer.
>> This allow the possibility to save important data.
>>
>> The implementation has been written by Davide Ciminaghi.
>> My work instead was about analyzing different previuos implementation,
>> a first completely custom one, a second using runtime pm, arriving
>> finally to that one.
>>
>> I have tested PM loss in our DaVinci dm365 basi board and I write here below a
>> piece of code showing a possible usage.
>>
>> -------------------
>>
>> static int powerfail_status;
>>
>> static irqreturn_t basi_powerfail_stop(int irq, void *dev_id);
>>
>> static irqreturn_t basi_powerfail_quick_check_start(int irq, void *dev_id)
>> {
>>         basi_mask_irq_gpio0(IRQ_DM365_GPIO0_2);
>>         basi_unmask_irq_gpio0(IRQ_DM365_GPIO0_0);
>>
>>         /* PowerFail situation - START: power is going away */
>>         return IRQ_WAKE_THREAD;
>> }
>>
>> static irqreturn_t basi_powerfail_start(int irq, void *dev_id)
>> {
>>         if (powerfail_status)
>>                 return IRQ_HANDLED;
>>         powerfail_status = 1;
>>         pm_loss_power_changed(SYS_PWR_FAILING);
>>         return IRQ_HANDLED;
>> }
>>
>>
>> static irqreturn_t basi_powerfail_quick_check_stop(int irq, void *dev_id)
>> {
>>         basi_mask_irq_gpio0(IRQ_DM365_GPIO0_0);
>>         basi_unmask_irq_gpio0(IRQ_DM365_GPIO0_2);
>>
>>         /* PowerFail situation - STOP: power is coming back */
>>         return IRQ_WAKE_THREAD;
>> }
>>
>> static irqreturn_t basi_powerfail_stop(int irq, void *dev_id)
>> {
>>         if (!powerfail_status)
>>                 return IRQ_HANDLED;
>>         powerfail_status = 0;
>>         pm_loss_power_changed(SYS_PWR_GOOD);
>>         return IRQ_HANDLED;
>> }
>>
>> enum basi_pwrfail_prio {
>>         BASI_PWR_FAIL_PRIO_0,
>>         BASI_PWR_FAIL_MIN_PRIO = BASI_PWR_FAIL_PRIO_0,
>>         BASI_PWR_FAIL_PRIO_1,
>>         BASI_PWR_FAIL_PRIO_2,
>>         BASI_PWR_FAIL_PRIO_3,
>>         BASI_PWR_FAIL_MAX_PRIO = BASI_PWR_FAIL_PRIO_3,
>> };
>>
>> struct pm_loss_default_policy_item basi_pm_loss_policy_items[] = {
>>         {
>>                 .bus_name = "mmc",
>>                 .bus_priority = BASI_PWR_FAIL_PRIO_1,
>>         },
>>         {
>>                 .bus_name = "platform",
>>                 .bus_priority = BASI_PWR_FAIL_PRIO_2,
>>         },
>> };
>>
>> #define BASI_POLICY_NAME "basi-default"
>>
>> struct pm_loss_default_policy_table basi_pm_loss_policy_table = {
>>         .name = BASI_POLICY_NAME,
>>         .items = basi_pm_loss_policy_items,
>>         .nitems = ARRAY_SIZE(basi_pm_loss_policy_items),
>> };
>>
>> static void basi_powerfail_configure(void)
>> {
>>         int stat;
>>         struct pm_loss_policy *p;
>>         stat = request_threaded_irq(IRQ_DM365_GPIO0_2,
> Is this some comparator device that tugs on this gpio when the voltage
> drops or goes to 0?  Is threaded irq fast enough?

yes, there is a comparator.
I need more data about timing, I'll try to add this info in some days.

> Could we consider something that includes a hot path ISR based
> notification call back to do stuff like blink off devices that don't
> need to save state;  backlights, vibrators, flashlight LEDs, audio
> output drivers <-- I'm not sure about audio HW, and then a slower path
> for other things that can be put into lower power states?
>
> the all-clear notification that power is good again should be on a
> slower path I would assume.

First I get data, afterwards we can see if your need can be seen as an
extension or something else.

>
> --mark
>
>>                                     basi_powerfail_quick_check_start,
>>                                     basi_powerfail_start,
>>                                     0,
>>                                     "pwrfail-on", NULL);
>>         if (stat < 0)
>>                 printk(KERN_ERR "request_threaded_irq for IRQ%d (pwrfail-on) "
>>                        "failed\n", IRQ_DM365_GPIO0_2);
>>         stat = request_threaded_irq(IRQ_DM365_GPIO0_0,
>>                                     basi_powerfail_quick_check_stop,
>>                                     basi_powerfail_stop, 0,
>>                                     "pwrfail-off", NULL);
>>         if (stat < 0)
>>                 printk(KERN_ERR "request_threaded_irq for IRQ%d (pwrfail-off) "
>>                        "failed\n", IRQ_DM365_GPIO0_0);
>>         basi_mask_irq_gpio0(IRQ_DM365_GPIO0_0);
>>         p = pm_loss_setup_default_policy(&basi_pm_loss_policy_table);
>>
>>         if (!p)
>>                 printk(KERN_ERR "Could not register pwr change policy\n");
>>
>>         if (pm_loss_set_policy(BASI_POLICY_NAME) < 0)
>>                 printk(KERN_ERR "Could not set %s power loss policy\n",
>>                        BASI_POLICY_NAME);
>> }
>>
>> int platform_pm_loss_power_changed(struct device *dev,
>>                                    enum sys_power_state s)
>> {
>>         int ret = 0;
>>
>>         /* Calling platform bus pm_loss functions */
>>         pr_debug_pm_loss("platform_pm_loss_power_changed(%d)\n", s);
>>
>>         if (dev->driver && dev->driver->pm &&
>>                 dev->driver->pm->power_changed)
>>                 ret = dev->driver->pm->power_changed(dev, s);
>>         return ret;
>> }
>>
>>
>>
>>
>> _______________________________________________
>> linux-pm mailing list
>> [email protected]
>> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
>

Bye,
Raffaele

2011-05-14 20:34:47

by Raffaele Recalcati

[permalink] [raw]
Subject: Re: [linux-pm] pm loss development

On Sat, May 14, 2011 at 8:53 PM, Oliver Neukum <[email protected]> wrote:
> Am Donnerstag, 12. Mai 2011, 21:27:44 schrieb Rafael J. Wysocki:
>> On Thursday, May 12, 2011, Raffaele Recalcati wrote:
>> > What happen normally in runtime pm implementation is that every devices
>> > are switched off and are enabled only when needed.
>> > In our case instead we have a completely functional embedded system and,
>> > when an asyncrhonous event appear, we have only some tens milliseconds
>> > before the actual power failure takes place.
>> > This patchset add a support in order to switch off not vital part of the system,
>> > in order to allow the board to survive longer.
>> > This allow the possibility to save important data.
>>
>> OK, so first, who decides what parts of the system are vital and what aren't?
>
> If you know that power is failing in a few miliseconds, only stuff that can lead to data
> corruption is vital. In that timeframe you can't even flush buffers.

Remember that if you switch off some peripherals this timeframe
becomes longer, so maybe you have enough time to sync some storage
devices.

Bye,
Raffaele

2011-05-14 23:34:07

by mark gross

[permalink] [raw]
Subject: Re: [linux-pm] pm loss development

On Sat, May 14, 2011 at 10:30:31PM +0200, Raffaele Recalcati wrote:
> On Sat, May 14, 2011 at 6:24 PM, mark gross <[email protected]> wrote:
> > On Thu, May 12, 2011 at 07:11:01PM +0200, Raffaele Recalcati wrote:
> >> What happen normally in runtime pm implementation is that every devices
> >> are switched off and are enabled only when needed.
> >> In our case instead we have a completely functional embedded system and,
> >> when an asyncrhonous event appear, we have only some tens milliseconds
> >> before the actual power failure takes place.
> >
> > Very interesting! ?I've been worried about a similar failure on battery
> > driven devices that can experience significant voltage droops when
> > battery gets old, or low, and we turn on the flashlight led, vibrator
> > and make the screen bright while a high volume ring tone gets played.
> >
> > I think 10ms is a bit unrealistic. ?I think its more like 300uSec before
> > you hit brown out.
>
> In our circuit it is the real timing, but I can understand your situation.
> Are you sure about 300usec?

not 100%, I'll ask the HW guys what the RC bulk circuit half life is
for the device. This along with the estimate of the burst power load
will define the time constraint for responding to the event.

Note: an LCD will easily pull 600mW to a full watt, and a loud audio
system can pull a watt, and a 3G modem can have some scary big transient
spikes in current. The LED flash light can pull a watt, and I think the
vibrator is up around 500mW. And um, well, my CPU has a power dynamic
range of ~200mW to ~2watts in C0. So, on a bad day a load spike on the
battery could be over 4 watts with all these going off at the same time.

Some of these events are pretty bursty (vibrator, ring tone at high
volume, 3G modem spike, P-states, camera flash) If you are unlucky you
get a brown out as the vbat drops < 3.3 (or whatever)

>
> >> This patchset add a support in order to switch off not vital part of the system,
> >> in order to allow the board to survive longer.
> >> This allow the possibility to save important data.
> >>
> >> The implementation has been written by Davide Ciminaghi.
> >> My work instead was about analyzing different previuos implementation,
> >> a first completely custom one, a second using runtime pm, arriving
> >> finally to that one.
> >>
> >> I have tested PM loss in our DaVinci dm365 basi board and I write here below a
> >> piece of code showing a possible usage.
> >>
> >> -------------------
> >>
> >> static int powerfail_status;
> >>
> >> static irqreturn_t basi_powerfail_stop(int irq, void *dev_id);
> >>
> >> static irqreturn_t basi_powerfail_quick_check_start(int irq, void *dev_id)
> >> {
> >> ? ? ? ? basi_mask_irq_gpio0(IRQ_DM365_GPIO0_2);
> >> ? ? ? ? basi_unmask_irq_gpio0(IRQ_DM365_GPIO0_0);
> >>
> >> ? ? ? ? /* PowerFail situation - START: power is going away */
> >> ? ? ? ? return IRQ_WAKE_THREAD;
> >> }
> >>
> >> static irqreturn_t basi_powerfail_start(int irq, void *dev_id)
> >> {
> >> ? ? ? ? if (powerfail_status)
> >> ? ? ? ? ? ? ? ? return IRQ_HANDLED;
> >> ? ? ? ? powerfail_status = 1;
> >> ? ? ? ? pm_loss_power_changed(SYS_PWR_FAILING);
> >> ? ? ? ? return IRQ_HANDLED;
> >> }
> >>
> >>
> >> static irqreturn_t basi_powerfail_quick_check_stop(int irq, void *dev_id)
> >> {
> >> ? ? ? ? basi_mask_irq_gpio0(IRQ_DM365_GPIO0_0);
> >> ? ? ? ? basi_unmask_irq_gpio0(IRQ_DM365_GPIO0_2);
> >>
> >> ? ? ? ? /* PowerFail situation - STOP: power is coming back */
> >> ? ? ? ? return IRQ_WAKE_THREAD;
> >> }
> >>
> >> static irqreturn_t basi_powerfail_stop(int irq, void *dev_id)
> >> {
> >> ? ? ? ? if (!powerfail_status)
> >> ? ? ? ? ? ? ? ? return IRQ_HANDLED;
> >> ? ? ? ? powerfail_status = 0;
> >> ? ? ? ? pm_loss_power_changed(SYS_PWR_GOOD);
> >> ? ? ? ? return IRQ_HANDLED;
> >> }
> >>
> >> enum basi_pwrfail_prio {
> >> ? ? ? ? BASI_PWR_FAIL_PRIO_0,
> >> ? ? ? ? BASI_PWR_FAIL_MIN_PRIO = BASI_PWR_FAIL_PRIO_0,
> >> ? ? ? ? BASI_PWR_FAIL_PRIO_1,
> >> ? ? ? ? BASI_PWR_FAIL_PRIO_2,
> >> ? ? ? ? BASI_PWR_FAIL_PRIO_3,
> >> ? ? ? ? BASI_PWR_FAIL_MAX_PRIO = BASI_PWR_FAIL_PRIO_3,
> >> };
> >>
> >> struct pm_loss_default_policy_item basi_pm_loss_policy_items[] = {
> >> ? ? ? ? {
> >> ? ? ? ? ? ? ? ? .bus_name = "mmc",
> >> ? ? ? ? ? ? ? ? .bus_priority = BASI_PWR_FAIL_PRIO_1,
> >> ? ? ? ? },
> >> ? ? ? ? {
> >> ? ? ? ? ? ? ? ? .bus_name = "platform",
> >> ? ? ? ? ? ? ? ? .bus_priority = BASI_PWR_FAIL_PRIO_2,
> >> ? ? ? ? },
> >> };
> >>
> >> #define BASI_POLICY_NAME "basi-default"
> >>
> >> struct pm_loss_default_policy_table basi_pm_loss_policy_table = {
> >> ? ? ? ? .name = BASI_POLICY_NAME,
> >> ? ? ? ? .items = basi_pm_loss_policy_items,
> >> ? ? ? ? .nitems = ARRAY_SIZE(basi_pm_loss_policy_items),
> >> };
> >>
> >> static void basi_powerfail_configure(void)
> >> {
> >> ? ? ? ? int stat;
> >> ? ? ? ? struct pm_loss_policy *p;
> >> ? ? ? ? stat = request_threaded_irq(IRQ_DM365_GPIO0_2,
> > Is this some comparator device that tugs on this gpio when the voltage
> > drops or goes to 0? ?Is threaded irq fast enough?
>
> yes, there is a comparator.
> I need more data about timing, I'll try to add this info in some days.

yeah, even if you don't have as fast of a response time need as I think
I have, you do need to consider what if the system is busy at the time
this interrupt comes in. Could you be blocked and fail to save your
emmc from corruption?

>
> > Could we consider something that includes a hot path ISR based
> > notification call back to do stuff like blink off devices that don't
> > need to save state; ?backlights, vibrators, flashlight LEDs, audio
> > output drivers <-- I'm not sure about audio HW, and then a slower path
> > for other things that can be put into lower power states?
> >
> > the all-clear notification that power is good again should be on a
> > slower path I would assume.
>
> First I get data, afterwards we can see if your need can be seen as an
> extension or something else.
>

This is an interesting problem though. ;) It brings a hard real time
requirement to the normal linux kernel from the physics of battery
performance.

--mark

2011-05-15 14:14:15

by Davide Ciminaghi

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH 1/4] export bus_kset

On Fri, May 13, 2011 at 07:11:48PM +0200, Davide Ciminaghi wrote:
> On Thu, May 12, 2011 at 09:28:53PM +0200, Rafael J. Wysocki wrote:
>
> hi,
>
> sorry, I missed this message this morning.
>
> > On Thursday, May 12, 2011, Raffaele Recalcati wrote:
> > > From: Davide Ciminaghi <[email protected]>
> >
> > Please explain why you need to export it, what the alternatives are and
> > why you think this approach is better than the alternatives.
> >
>
> what I needed to do was walking through the list of registered busses,
> and invoking the bus_added()/bus_removed() callback of a newly registered
> policy. I couldn't find any other simple way to do it.
>
well, I there is another way to do that: adding a function like this
(include/linux/device.h) :

/**
* run a callback for each registered bus type
*
* @data : arg passed to callback
* @fn : pointer to callback
*/
int for_each_bus(void *data, int (*fn)(struct bus_type *bus, void *data));

which would be similar to the already existing bus_for_each_dev() and
would allow to avoid exporting a global variable.


Regards
Davide

2011-05-17 23:04:28

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH 1/4] export bus_kset

On Sunday, May 15, 2011, Davide Ciminaghi wrote:
> On Fri, May 13, 2011 at 07:11:48PM +0200, Davide Ciminaghi wrote:
> > On Thu, May 12, 2011 at 09:28:53PM +0200, Rafael J. Wysocki wrote:
> >
> > hi,
> >
> > sorry, I missed this message this morning.
> >
> > > On Thursday, May 12, 2011, Raffaele Recalcati wrote:
> > > > From: Davide Ciminaghi <[email protected]>
> > >
> > > Please explain why you need to export it, what the alternatives are and
> > > why you think this approach is better than the alternatives.
> > >
> >
> > what I needed to do was walking through the list of registered busses,
> > and invoking the bus_added()/bus_removed() callback of a newly registered
> > policy. I couldn't find any other simple way to do it.
> >
> well, I there is another way to do that: adding a function like this
> (include/linux/device.h) :
>
> /**
> * run a callback for each registered bus type
> *
> * @data : arg passed to callback
> * @fn : pointer to callback
> */
> int for_each_bus(void *data, int (*fn)(struct bus_type *bus, void *data));
>
> which would be similar to the already existing bus_for_each_dev() and
> would allow to avoid exporting a global variable.

I really think you'd simply need to browse all devices, like the core PM
code in drivers/base/power/main.c. You can use dpm_list for that just fine.

Thanks,
Rafael

2011-05-17 23:06:17

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] pm loss development

On Saturday, May 14, 2011, Raffaele Recalcati wrote:
> On Sat, May 14, 2011 at 8:53 PM, Oliver Neukum <[email protected]> wrote:
> > Am Donnerstag, 12. Mai 2011, 21:27:44 schrieb Rafael J. Wysocki:
> >> On Thursday, May 12, 2011, Raffaele Recalcati wrote:
> >> > What happen normally in runtime pm implementation is that every devices
> >> > are switched off and are enabled only when needed.
> >> > In our case instead we have a completely functional embedded system and,
> >> > when an asyncrhonous event appear, we have only some tens milliseconds
> >> > before the actual power failure takes place.
> >> > This patchset add a support in order to switch off not vital part of the system,
> >> > in order to allow the board to survive longer.
> >> > This allow the possibility to save important data.
> >>
> >> OK, so first, who decides what parts of the system are vital and what aren't?
> >
> > If you know that power is failing in a few miliseconds, only stuff that can lead to data
> > corruption is vital. In that timeframe you can't even flush buffers.
>
> Remember that if you switch off some peripherals this timeframe
> becomes longer, so maybe you have enough time to sync some storage
> devices.

However, switching off peripherals _also_ takes time. So, it may be more
useful to take care of the stuff leading to data corruption along with the
switching off peripherals.

Thanks,
Rafael

2011-05-17 23:07:28

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] pm loss development

On Saturday, May 14, 2011, mark gross wrote:
> On Fri, May 13, 2011 at 06:54:57PM +0200, Rafael J. Wysocki wrote:
> > On Friday, May 13, 2011, Raffaele Recalcati wrote:
> > > Hi Rafael,
> > >
> > > 2011/5/12 Rafael J. Wysocki <[email protected]>:
> > > > On Thursday, May 12, 2011, Raffaele Recalcati wrote:
> > > >> What happen normally in runtime pm implementation is that every devices
> > > >> are switched off and are enabled only when needed.
> > > >> In our case instead we have a completely functional embedded system and,
> > > >> when an asyncrhonous event appear, we have only some tens milliseconds
> > > >> before the actual power failure takes place.
> > > >> This patchset add a support in order to switch off not vital part of the system,
> > > >> in order to allow the board to survive longer.
> > > >> This allow the possibility to save important data.
> > > >
> > > > OK, so first, who decides what parts of the system are vital and what aren't?
> > >
> > > Take a quick look at Documentation/power/loss.txt paragrpah "2.4
> > > Power loss policies".
> > > You can decide what can be powered off.
> >
> > I read the patches. My question was about the general idea of who should
> > be responsible of making these decisions.
>
> I would expect the system integrator would based on the application the
> device is getting deployed into.
>
> A generic opportunistic policy for peripherals that are stateless and can
> be trivially power gated off/on from an ISR could be a default but, for
> peripherals that need to do some processing (like waiting on an eMMC DMA
> to complete) can take time to power down into a safe state.

What do you mean by safe state?

Rafael

2011-05-17 23:10:33

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: pm loss development

On Saturday, May 14, 2011, Raffaele Recalcati wrote:
> > I read the patches. My question was about the general idea of who should
> > be responsible of making these decisions.
>
> The best should be, I think, to have some guidelines and than the
> possibility to choose the best policy for each situation.

Again, I'd like to know who's supposed to make the choice.

> In my board I needed to shutdown video in capture and demodulator
> circuit, so I have implemented vpfe capture switch off, that does
> stream_off to all its v4l2 subdevices (a pal decoder and a video
> demodulator).
> So I can save 30mA, and it allows to my board to survive longer.
> I need to do some tests and have some data with and without PM loss.

That's fine, but in general we need to take care of a few more things,
like the interactions between the devices we're switching off and user
space (that can be doing just about anything at the moment).

We can't simply switch off devices at will, because that may lead to
breakage too in general.

Thanks,
Rafael

2011-05-17 23:32:04

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 2/4] PM / Loss: power loss management

Hi,

Sorry for the delay.

On Friday, May 13, 2011, Davide Ciminaghi wrote:
> On Thu, May 12, 2011 at 09:57:46PM +0200, Rafael J. Wysocki wrote:
> > Hi,
> >
> > On Thursday, May 12, 2011, Raffaele Recalcati wrote:
> > > From: Davide Ciminaghi <[email protected]>
> > >
>
> hello, I (Davide) will reply as the patch author.
>
> ....................
> >
> > I'm not really sure about that. Do you want to hardcode policies in some
> > platform/board-specific files inside of the kernel?
> >
>
> I think a little history of the patch can help: it looks like recent mmc
> devices can do any sort of weird things in case of sudden power losses during
> write operations. This is still to be confirmed by actual tests on our
> particular hardware, but we suspect that even a journalling fs could be
> corrupted in such a case.

That very well may be the case.

> Stopping the mmc request queue when a power loss is detected, does not
> completely solve the problem, but at least mitigates the risk of disasters.
> Plus, we needed to get our board to immediately turn off any unneeded power
> sink in case of temporary power losses (just try to survive as long as you
> can ...).

I understand the motivation I think. :-)

> Our hardware triggers an interrupt about 100 millisenconds before the cpu
> dies, so we needed to notify the drivers about the event as fast as we could.
> That was one of the two main reasons why we chose to put this stuff inside
> the kernel. The other reason was that the mmc request queue can be easily
> stopped from kernel space.

OK, but it's not really sufficient to notify drivers in that case, because
they may crash user space if they simply switch off devices. Now, if the
power loss actually happens, that probably doesn't matter, but if power is
restored afterwards, that won't be nice.

The mmc request queue is a special case that shouldn't affect user space
directly, but other devices may be accessed in many different ways and
not all of them are safe to intercept forcibly.

However, even for the mmc case you'd need a mechanism that will force
the filesystem on it to flush everything to the storage immediately.

> Since forcing a given policy seemed the wrong thing to do, we chose to adopt
> a modular policy approach.
>
> > > +The header file include/linux/pm_loss.h contains all the pm_loss function
> > > +prototypes, together with definitions of data structures.
> > > +For what concerns power loss policies, the framework already contains a couple
> > > +of predefined policies: "nop" and "default" (see later paragraphs for more
> > > +details).
> >
> > I think it would be cleaner to split the patch so that the actual functionality
> > is added first and the policy part goes in a separate patch on top of that.
> >
>
> agreed.
>
> > > +
> > > +1.1 Sysfs interface.
> > > +
> > > +It can be useful (for instance during tests), to be able to quickly switch
> > > +from one power loss policy to another, or to simulate power fail and resume
> > > +events. To this end, a sysfs interface of the following form has been devised:
> > > +
> > > +/sys/power/loss +
> > > + |
> > > + +-- active_policy
> > > + |
> > > + +-- policies -+
> > > + | |
> > > + | +-- nop
> > > + | |
> > > + | +-- default +
> > > + | | |
> > > + | | +-- bus_table
> > > + | |
> > > + | +-- ....
> > > + |
> > > + +-- pwrfail_sim
> > > +
> > > +2. Details
> > > +
> > > +2.1 Sending events to the power loss management core.
> > > +
> > > +The board specific code shall trigger a power failure event notification by
> > > +calling pm_loss_power_changed(SYS_PWR_FAILING).
> > > +In case of a temporary power loss, the same function shall be called with
> > > +SYS_PWR_GOOD argument on power resume. pm_loss_power_changed() can sleep, so
> > > +it shall not be called from interrupt context.
> > > +
> > > +2.3 Effects on bus and device drivers.
> > > +
> > > +One more entry has been added to the device PM callbacks:
> > > +
> > > + int (*power_changed)(struct device *, enum sys_power_state);
> >
> > Please don't use the second argument. Make it two (or as many as you need)
> > callbacks each with one struct device * argument instead (following the
> > convention we have in struct dev_pm_ops already).
> >
> > That said, I'm not quite sure we _need_ those additional callbacks at all.
> > Your example mmc implementation appears to use a simple "runtime suspend on
> > power fail, runtime resume on power good" approach, for which it's not
> > necessary to add any new callbacks.
> >
>
> maybe this is design choice comes from our little knowledge of pm_runtime,
> but the reason why we didn't use the existing callbacks was that we wanted
> to avoid any possible conflict.
> In our understanding, pm_runtime works more or less like this: "when
> you're done with a peripheral, turn it off" (well, maybe also wait a little
> while to avoid continuously toggling power on and off, thus wasting more
> power than just doing nothing). This is because its goal is to save power
> during normal operation.
> On the other hand, we needed something like this: "when some external event
> takes place, just turn off anything you can as fast as you can".

This is more similar to system suspend than to runtime PM.

> Actually, one of the attempts we made before adding pm_loss was to implement
> it via pm_runtime, by just immediately stopping devices on idle and refusing
> to turn them on again in case a power loss event took place in the meanwhile.
> This worked, and was ok for drivers with no implementation of the pm_runtime
> callbacks, but introduced a potential conflict as far as pm_runtime enabled
> drivers were concerned. That's why we (maybe wrongfully) thought a new
> callback was needed.

Having considered that for a while I think that additional callbacks may be
necessary, because they generally may need to do more than just suspending
devices.

> > > +This function can be optionally invoked by power loss policies in case of
> > > +system power changes (loss and/or resume). Of course every bus or device driver
> > > +can react to such events in some specific way. For instance the mmc block
> > > +driver will probably block its request queue during a temporary power loss.
> >
> > I think you'd have to deal with user space somehow before suspending devices.
> > Runtime PM suspends devices when they aren't in use (as indicated by the
> > usage counters), but in this power loss case we may really end up suspending
> > devices that _are_ in use, right?
> >
> yes. This is not a problem for us right now, because when a power loss happens,
> the "power bad" condition just lasts for about 100ms. At present, pm_loss
> callbacks are implemented for the mmc block device and a video capture device
> only.

While that may not be a problem to you, it is a problem in general and it
should be taken into account if we are to provide a generic framework for
such things.

> In practice what happens on our platform is that processes accessing the mmc
> can be put to sleep until either the board dies or a power resume event takes
> place. If video capture is running, some "black" frames are generated, but
> that is not considered a real fault because power losses should not occur
> during normal operation (unless you really turn the board off, of course).
>
> That said, I think you're right anyway, userspace should be notified somehow.
> SIGPWR to init ?. As far as I know, there's no single place in the kernel
> where SIGPWR is sent to init:
>
> find . -name \*.c | xargs grep SIGPWR
> ...
> ./drivers/char/snsc_event.c: kill_cad_pid(SIGPWR, 0);
> ...
> ./arch/s390/kernel/nmi.c: kill_cad_pid(SIGPWR, 1);
>
> Maybe pm_loss could become such a place ?

I think so. SIGPWR seems to be a good candidate, but that would change the
current (documented) behavior where SIGPWR is a synonym for SIGINFO.

Thanks,
Rafael

2011-05-18 03:12:10

by mark gross

[permalink] [raw]
Subject: Re: [linux-pm] pm loss development

On Wed, May 18, 2011 at 01:07:57AM +0200, Rafael J. Wysocki wrote:
> On Saturday, May 14, 2011, mark gross wrote:
> > On Fri, May 13, 2011 at 06:54:57PM +0200, Rafael J. Wysocki wrote:
> > > On Friday, May 13, 2011, Raffaele Recalcati wrote:
> > > > Hi Rafael,
> > > >
> > > > 2011/5/12 Rafael J. Wysocki <[email protected]>:
> > > > > On Thursday, May 12, 2011, Raffaele Recalcati wrote:
> > > > >> What happen normally in runtime pm implementation is that every devices
> > > > >> are switched off and are enabled only when needed.
> > > > >> In our case instead we have a completely functional embedded system and,
> > > > >> when an asyncrhonous event appear, we have only some tens milliseconds
> > > > >> before the actual power failure takes place.
> > > > >> This patchset add a support in order to switch off not vital part of the system,
> > > > >> in order to allow the board to survive longer.
> > > > >> This allow the possibility to save important data.
> > > > >
> > > > > OK, so first, who decides what parts of the system are vital and what aren't?
> > > >
> > > > Take a quick look at Documentation/power/loss.txt paragrpah "2.4
> > > > Power loss policies".
> > > > You can decide what can be powered off.
> > >
> > > I read the patches. My question was about the general idea of who should
> > > be responsible of making these decisions.
> >
> > I would expect the system integrator would based on the application the
> > device is getting deployed into.
> >
> > A generic opportunistic policy for peripherals that are stateless and can
> > be trivially power gated off/on from an ISR could be a default but, for
> > peripherals that need to do some processing (like waiting on an eMMC DMA
> > to complete) can take time to power down into a safe state.
>
> What do you mean by safe state?
>
I need to get more details on this but I assume its a state where the
meta data of the file system is committed to the emmc before lights go
off such that when power is reapplied that the damage isn't too big.

I've also heard some talk of sim card corruption risks but, I don't have
first hand info on that.

--mark

2011-05-18 19:42:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] pm loss development

On Wednesday, May 18, 2011, mark gross wrote:
> On Wed, May 18, 2011 at 01:07:57AM +0200, Rafael J. Wysocki wrote:
> > On Saturday, May 14, 2011, mark gross wrote:
> > > On Fri, May 13, 2011 at 06:54:57PM +0200, Rafael J. Wysocki wrote:
> > > > On Friday, May 13, 2011, Raffaele Recalcati wrote:
> > > > > Hi Rafael,
> > > > >
> > > > > 2011/5/12 Rafael J. Wysocki <[email protected]>:
> > > > > > On Thursday, May 12, 2011, Raffaele Recalcati wrote:
> > > > > >> What happen normally in runtime pm implementation is that every devices
> > > > > >> are switched off and are enabled only when needed.
> > > > > >> In our case instead we have a completely functional embedded system and,
> > > > > >> when an asyncrhonous event appear, we have only some tens milliseconds
> > > > > >> before the actual power failure takes place.
> > > > > >> This patchset add a support in order to switch off not vital part of the system,
> > > > > >> in order to allow the board to survive longer.
> > > > > >> This allow the possibility to save important data.
> > > > > >
> > > > > > OK, so first, who decides what parts of the system are vital and what aren't?
> > > > >
> > > > > Take a quick look at Documentation/power/loss.txt paragrpah "2.4
> > > > > Power loss policies".
> > > > > You can decide what can be powered off.
> > > >
> > > > I read the patches. My question was about the general idea of who should
> > > > be responsible of making these decisions.
> > >
> > > I would expect the system integrator would based on the application the
> > > device is getting deployed into.
> > >
> > > A generic opportunistic policy for peripherals that are stateless and can
> > > be trivially power gated off/on from an ISR could be a default but, for
> > > peripherals that need to do some processing (like waiting on an eMMC DMA
> > > to complete) can take time to power down into a safe state.
> >
> > What do you mean by safe state?
> >
> I need to get more details on this but I assume its a state where the
> meta data of the file system is committed to the emmc before lights go
> off such that when power is reapplied that the damage isn't too big.

I don't think you can guarantee that the metadata won't be damaged
without notifying the filesystem of the event (and making it react
appropriately).

> I've also heard some talk of sim card corruption risks but, I don't have
> first hand info on that.

Well, I guess that might be prevented by the driver alone.

Thanks,
Rafael

2011-05-18 22:17:34

by Mark Brown

[permalink] [raw]
Subject: Re: [linux-pm] pm loss development

On Wed, May 18, 2011 at 09:43:13PM +0200, Rafael J. Wysocki wrote:
> On Wednesday, May 18, 2011, mark gross wrote:

> > I need to get more details on this but I assume its a state where the
> > meta data of the file system is committed to the emmc before lights go
> > off such that when power is reapplied that the damage isn't too big.

> I don't think you can guarantee that the metadata won't be damaged
> without notifying the filesystem of the event (and making it react
> appropriately).

I guess with a journalling filesystem you'd hope that you can at any
point write out what's in flight and have it be safe; if it's a non
journalling filesystem of course all bets are off.

2011-05-19 09:27:11

by Davide Ciminaghi

[permalink] [raw]
Subject: Re: [PATCH 2/4] PM / Loss: power loss management

On Wed, May 18, 2011 at 01:32:30AM +0200, Rafael J. Wysocki wrote:
> Hi,
>
> Sorry for the delay.
>
hi,

no problem. I'm having some extremely busy days at the moment, so I'm always
late too (as you see).

...

> > That was one of the two main reasons why we chose to put this stuff inside
> > the kernel. The other reason was that the mmc request queue can be easily
> > stopped from kernel space.
>
> OK, but it's not really sufficient to notify drivers in that case, because
> they may crash user space if they simply switch off devices. Now, if the
> power loss actually happens, that probably doesn't matter, but if power is
> restored afterwards, that won't be nice.
>
> The mmc request queue is a special case that shouldn't affect user space
> directly, but other devices may be accessed in many different ways and
> not all of them are safe to intercept forcibly.
>

well, I don't know if this is completely correct, but my initial idea was to
find some way to just block user space processes when we're not sure about
power. Most devices can be accessed via some form of read()/write() or
enqueue/dequeue mechanism. What I would like to do is to temporarily blocking
such operations (or returning -EAGAIN in case of non-blocking open, for
instance). This should slightly reduce power consumption
anyway, because user space processes cannot run. Then, each single driver
should decide whether it is safe or not to actually turn its device off,
and then restore its state in case of power resume.

> However, even for the mmc case you'd need a mechanism that will force
> the filesystem on it to flush everything to the storage immediately.
>

I'm not completely sure about this. What we wanted to do was to avoid powering
down the mmc while it is physically writing data into its internal memory.
If we force a sync when the power loss warning event warning happens,
it is very difficult to be able to guarantee that all buffered data will be
written before power actually dies. So we preferred to follow another strategy:
let the mmc finish any running write operation, and then stop its request
queue. If power really goes down, then we hope that the file system journal
will fix things on next boot (yes, some data could get lost, but the fs should
still be mountable). On the other hand, if power resumes, nothing bad should
happen for user space processes.

> > Since forcing a given policy seemed the wrong thing to do, we chose to adopt
....
> >
> > maybe this is design choice comes from our little knowledge of pm_runtime,
> > but the reason why we didn't use the existing callbacks was that we wanted
> > to avoid any possible conflict.
> > In our understanding, pm_runtime works more or less like this: "when
> > you're done with a peripheral, turn it off" (well, maybe also wait a little
> > while to avoid continuously toggling power on and off, thus wasting more
> > power than just doing nothing). This is because its goal is to save power
> > during normal operation.
> > On the other hand, we needed something like this: "when some external event
> > takes place, just turn off anything you can as fast as you can".
>
> This is more similar to system suspend than to runtime PM.
>
> > Actually, one of the attempts we made before adding pm_loss was to implement
> > it via pm_runtime, by just immediately stopping devices on idle and refusing
> > to turn them on again in case a power loss event took place in the meanwhile.
> > This worked, and was ok for drivers with no implementation of the pm_runtime
> > callbacks, but introduced a potential conflict as far as pm_runtime enabled
> > drivers were concerned. That's why we (maybe wrongfully) thought a new
> > callback was needed.
>
> Having considered that for a while I think that additional callbacks may be
> necessary, because they generally may need to do more than just suspending
> devices.
>

ok.

> > > > +This function can be optionally invoked by power loss policies in case of
> > > > +system power changes (loss and/or resume). Of course every bus or device driver
> > > > +can react to such events in some specific way. For instance the mmc block
> > > > +driver will probably block its request queue during a temporary power loss.
> > >
> > > I think you'd have to deal with user space somehow before suspending devices.
> > > Runtime PM suspends devices when they aren't in use (as indicated by the
> > > usage counters), but in this power loss case we may really end up suspending
> > > devices that _are_ in use, right?
> > >
> > yes. This is not a problem for us right now, because when a power loss happens,
> > the "power bad" condition just lasts for about 100ms. At present, pm_loss
> > callbacks are implemented for the mmc block device and a video capture device
> > only.
>
> While that may not be a problem to you, it is a problem in general and it
> should be taken into account if we are to provide a generic framework for
> such things.
>
> > In practice what happens on our platform is that processes accessing the mmc
> > can be put to sleep until either the board dies or a power resume event takes
> > place. If video capture is running, some "black" frames are generated, but
> > that is not considered a real fault because power losses should not occur
> > during normal operation (unless you really turn the board off, of course).
> >
> > That said, I think you're right anyway, userspace should be notified somehow.
> > SIGPWR to init ?. As far as I know, there's no single place in the kernel
> > where SIGPWR is sent to init:
> >
> > find . -name \*.c | xargs grep SIGPWR
> > ...
> > ./drivers/char/snsc_event.c: kill_cad_pid(SIGPWR, 0);
> > ...
> > ./arch/s390/kernel/nmi.c: kill_cad_pid(SIGPWR, 1);
> >
> > Maybe pm_loss could become such a place ?
>
> I think so. SIGPWR seems to be a good candidate, but that would change the
> current (documented) behavior where SIGPWR is a synonym for SIGINFO.
>
ok, I'll think about this.

A new version of the patchset is underway, but as I told you I'm having quite
a busy time, so I need some more days to finish and do some test.

Thanks and regards
Davide

2011-05-19 12:45:46

by mark gross

[permalink] [raw]
Subject: Re: [linux-pm] pm loss development

On Wed, May 18, 2011 at 09:43:13PM +0200, Rafael J. Wysocki wrote:
> On Wednesday, May 18, 2011, mark gross wrote:
> > On Wed, May 18, 2011 at 01:07:57AM +0200, Rafael J. Wysocki wrote:
> > > On Saturday, May 14, 2011, mark gross wrote:
> > > > On Fri, May 13, 2011 at 06:54:57PM +0200, Rafael J. Wysocki wrote:
> > > > > On Friday, May 13, 2011, Raffaele Recalcati wrote:
> > > > > > Hi Rafael,
> > > > > >
> > > > > > 2011/5/12 Rafael J. Wysocki <[email protected]>:
> > > > > > > On Thursday, May 12, 2011, Raffaele Recalcati wrote:
> > > > > > >> What happen normally in runtime pm implementation is that every devices
> > > > > > >> are switched off and are enabled only when needed.
> > > > > > >> In our case instead we have a completely functional embedded system and,
> > > > > > >> when an asyncrhonous event appear, we have only some tens milliseconds
> > > > > > >> before the actual power failure takes place.
> > > > > > >> This patchset add a support in order to switch off not vital part of the system,
> > > > > > >> in order to allow the board to survive longer.
> > > > > > >> This allow the possibility to save important data.
> > > > > > >
> > > > > > > OK, so first, who decides what parts of the system are vital and what aren't?
> > > > > >
> > > > > > Take a quick look at Documentation/power/loss.txt paragrpah "2.4
> > > > > > Power loss policies".
> > > > > > You can decide what can be powered off.
> > > > >
> > > > > I read the patches. My question was about the general idea of who should
> > > > > be responsible of making these decisions.
> > > >
> > > > I would expect the system integrator would based on the application the
> > > > device is getting deployed into.
> > > >
> > > > A generic opportunistic policy for peripherals that are stateless and can
> > > > be trivially power gated off/on from an ISR could be a default but, for
> > > > peripherals that need to do some processing (like waiting on an eMMC DMA
> > > > to complete) can take time to power down into a safe state.
> > >
> > > What do you mean by safe state?
> > >
> > I need to get more details on this but I assume its a state where the
> > meta data of the file system is committed to the emmc before lights go
> > off such that when power is reapplied that the damage isn't too big.
>
> I don't think you can guarantee that the metadata won't be damaged
> without notifying the filesystem of the event (and making it react
> appropriately).
>
true. I'm just brain storming what infrastructure would be needed to
support such a feature.

> > I've also heard some talk of sim card corruption risks but, I don't have
> > first hand info on that.
>
> Well, I guess that might be prevented by the driver alone.
Only the radio has a direct interface to the sim. (pretty sure anyway)
so this would transfer into some AT commands pushed to the modem.

--mark

2011-05-19 14:25:14

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH 2/4] PM / Loss: power loss management

On Thu, 19 May 2011, Davide Ciminaghi wrote:

> I'm not completely sure about this. What we wanted to do was to avoid powering
> down the mmc while it is physically writing data into its internal memory.
> If we force a sync when the power loss warning event warning happens,
> it is very difficult to be able to guarantee that all buffered data will be
> written before power actually dies. So we preferred to follow another strategy:
> let the mmc finish any running write operation, and then stop its request
> queue. If power really goes down, then we hope that the file system journal
> will fix things on next boot (yes, some data could get lost, but the fs should
> still be mountable). On the other hand, if power resumes, nothing bad should
> happen for user space processes.

You could consider a totally different approach.

Each platform will have a different set of high-power devices it wants
to turn off when a power-loss warning occurs. So instead of changing
the core PM interface, you could add a new "power_loss" notifier list.
Only the most critical drivers would need to listen for notifications,
and this could be different drivers on different platforms.

Alan Stern

2011-05-19 20:52:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH 2/4] PM / Loss: power loss management

On Thursday, May 19, 2011, Alan Stern wrote:
> On Thu, 19 May 2011, Davide Ciminaghi wrote:
>
> > I'm not completely sure about this. What we wanted to do was to avoid powering
> > down the mmc while it is physically writing data into its internal memory.
> > If we force a sync when the power loss warning event warning happens,
> > it is very difficult to be able to guarantee that all buffered data will be
> > written before power actually dies. So we preferred to follow another strategy:
> > let the mmc finish any running write operation, and then stop its request
> > queue. If power really goes down, then we hope that the file system journal
> > will fix things on next boot (yes, some data could get lost, but the fs should
> > still be mountable). On the other hand, if power resumes, nothing bad should
> > happen for user space processes.
>
> You could consider a totally different approach.
>
> Each platform will have a different set of high-power devices it wants
> to turn off when a power-loss warning occurs. So instead of changing
> the core PM interface, you could add a new "power_loss" notifier list.
> Only the most critical drivers would need to listen for notifications,
> and this could be different drivers on different platforms.

Moreover, it would allow not only drivers, but also filesystems (for
one example) to get notifications.

Thanks,
Rafael

2011-06-14 06:48:10

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] pm loss development

Hi!

> > > What do you mean by safe state?
> > >
> > I need to get more details on this but I assume its a state where the
> > meta data of the file system is committed to the emmc before lights go
> > off such that when power is reapplied that the damage isn't too big.
>
> I don't think you can guarantee that the metadata won't be damaged
> without notifying the filesystem of the event (and making it react
> appropriately).

Journalling filesystem should survive power loss at random time, as
long as block device behaves like a disk (i.e. writes whole sectors).

And that can be guaranteed at driver level...

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2011-06-14 14:06:48

by mark gross

[permalink] [raw]
Subject: Re: [linux-pm] pm loss development

On Fri, Jun 03, 2011 at 12:21:11AM +0200, Pavel Machek wrote:
> Hi!
>
> > > > What do you mean by safe state?
> > > >
> > > I need to get more details on this but I assume its a state where the
> > > meta data of the file system is committed to the emmc before lights go
> > > off such that when power is reapplied that the damage isn't too big.
> >
> > I don't think you can guarantee that the metadata won't be damaged
> > without notifying the filesystem of the event (and making it react
> > appropriately).
>
> Journalling filesystem should survive power loss at random time, as
> long as block device behaves like a disk (i.e. writes whole sectors).
>
> And that can be guaranteed at driver level...
>

What if the battery falls out of the device at a bad time? The HW will
have micro to mili seconds of time to close out whatever transaction is
in flight.

I know nothing about journalling file systems or how well they limit the
critical sections of time where the file system is exposed to corruption
from sudden power failure. Its an interesting question though.

--mark

2011-06-14 14:36:09

by Alan

[permalink] [raw]
Subject: Re: [linux-pm] pm loss development

> I know nothing about journalling file systems or how well they limit the
> critical sections of time where the file system is exposed to corruption
> from sudden power failure. Its an interesting question though.

A properly written journalling file system has no critical sections. The
only things it relies upon are

- store ordering in the drive working properly
- a single disk block write being atomic

the former is well specified even for ATA devices, the latter is a pretty
safe property of rotating media, although in theory you have a finite
chance of getting a bad sector.

For flash it's a lot lot more complicated but for a flash device claiming
to be ATA compliant you ought to get ATA behaviour.

All that said there is still (as ever) a tiny chance your system may
malfunction. It's all down to probabilities and if your laptop explodes
you need a backup (trust me, I've tested this case).

Alan