2012-02-07 01:08:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks"

Hi all,

This series tests the theory that the easiest way to sell a once rejected
feature is to advertise it under a different name.

Well, there actually are two different features, although they are closely
related to each other. First, patch [6/8] introduces a feature that allows
the kernel to trigger system suspend (or more generally a transition into
a sleep state) whenever there are no active wakeup sources (no, they aren't
called wakelocks). It is called "autosleep" here, but it was called a few
different names in the past ("opportunistic suspend" was probably the most
popular one). Second, patch [8/8] introduces "wake locks" that are,
essentially, wakeup sources which may be created and manipulated by user
space. Using them user space may control the autosleep feature introduced
earlier.

This also is a kind of a proof of concept for the people who wanted me to
show a kernel-based implementation of automatic suspend, so there you go.
Please note, however, that it is done so that the user space "wake locks"
interface is compatible with Android in support of its user space. I don't
really like this interface, but since the Android's user space seems to rely
on it, I'm fine with using it as is. YMMV.

Let me say a few words about every patch in the series individually.

[1/8] - This really is a bug fix, so it's v3.4 material. Nobody has stepped
on this bug so far, but it should be fixed anyway.

[2/8] - This is a freezer cleanup, worth doing anyway IMO, so v3.4 material too.

[3/8] - This is something we can do no problem, although completely optional
without the autosleep feature. Rather necessary with it, though.

[4/8] - This kind of reintroduces my original idea of using a wait queue for
waiting until there are no wakeup events in progress. Alan convinced me that
it would be better to poll the counter to prevent wakeup_source_deactivate()
from having to call wake_up_all() occasionally (that may be costly in fast
paths), but then quite some people told me that the wait queue migh be
better. I think that the polling will make much less sense with autosleep
and user space "wake locks". Anyway, [4/8] is something we can do without
those things too.

The patches above were given Sign-off-by tags, because I think they make some
sense regardless of the features introcuded by the remaining patches that in
turn are total RFC.

[5/8] - This changes wakeup source statistics so that they are more similar to
the statistics collected for wakelocks on Android. The file those statistics
may be read from is still located in debugfs, though (I don't think it
belongs to proc and its name is different from the analogous Android's file
name anyway). It could be done without autosleep, but then it would be a bit
pointless. BTW, this changes interfaces that _in_ _theory_ may be used by
someone, but I'm not aware of anyone using them. If you are one, I'll be
pleased to learn about that, so please tell me who you are. :-)

[6/8] - Autosleep implementation. I think the changelog explains the idea
quite well and the code is really nothing special. It doesn't really add
anything new to the kernel in terms of infrastructure etc., it just uses
the existing stuff to implement an alternative method of triggering system
sleep transitions. Note, though, that the interface here is different
from the Android's one, because Android actually modifies /sys/power/state
to trigger something called "early suspend" (that is never going to be
implemented in the "stock" kernel as long as I have any influence on it) and
we simply can't do that in the mainline.

[7/8] - This adds a wakeup source statistics that only makes sense with
autosleep and (I believe) is analogous to the Android's prevent_suspend_time
statistics. Nothing really special, but I didn't want
wakeup_source_activate/deactivate() to take a common lock to avoid
congestion.

[8/8] - This adds a user space interface to create, activate and deactivate
wakeup sources. Since the files it consists of are called wake_lock and
wake_unlock, to follow Android, the objects the wakeup sources are wrapped
into are called "wakelocks" (for added confusion). Since the interface
doesn't provide any means to destroy those "wakelocks", I added a garbage
collection mechanism to get rid of the unused ones, if any. I also tought
it might be a good idea to put a limit on the number of those things that
user space can operate simultaneously, so I did that too.

All in all, it's not as much code as I thought it would be and it seems to be
relatively simple (which rises the question why the Android people didn't
even _try_ to do something like this instead of slapping the "real" wakelocks
onto the kernel FWIW). IMHO it doesn't add anything really new to the kernel,
except for the user space interfaces that should be maintainable. At least I
think I should be able to maintain them. :-)

All of the above has been tested very briefly on my test-bed Mackerel board
and it quite obviously requires more thorough testing, but first I need to know
if it makes sense to spend any more time on it.

IOW, I need to know your opinions!

Thanks,
Rafael


2012-02-07 01:06:04

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 4/8] PM / Sleep: Use wait queue to signal "no wakeup events in progress"

From: Rafael J. Wysocki <[email protected]>

The current wakeup source deactivation code doesn't do anything when
the counter of wakeup events in progress goes down to zero, which
requires pm_get_wakeup_count() to poll that counter periodically.
Although this reduces the average time it takes to deactivate a
wakeup source, it also may lead to a substantial amount of unnecessary
polling if there are extended periods of wakeup activity. Thus it
seems reasonable to use a wait queue for signaling the "no wakeup
events in progress" condition and remove the polling.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/power/wakeup.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)

Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -17,8 +17,6 @@

#include "power.h"

-#define TIMEOUT 100
-
/*
* If set, the suspend/hibernate code will abort transitions to a sleep state
* if wakeup events are registered during or immediately before the transition.
@@ -52,6 +50,8 @@ static void pm_wakeup_timer_fn(unsigned

static LIST_HEAD(wakeup_sources);

+static DECLARE_WAIT_QUEUE_HEAD(wakeup_count_wait_queue);
+
/**
* wakeup_source_create - Create a struct wakeup_source object.
* @name: Name of the new wakeup source.
@@ -84,7 +84,7 @@ void wakeup_source_destroy(struct wakeup
while (ws->active) {
spin_unlock_irq(&ws->lock);

- schedule_timeout_interruptible(msecs_to_jiffies(TIMEOUT));
+ schedule_timeout_interruptible(msecs_to_jiffies(100));

spin_lock_irq(&ws->lock);
}
@@ -411,6 +411,7 @@ EXPORT_SYMBOL_GPL(pm_stay_awake);
*/
static void wakeup_source_deactivate(struct wakeup_source *ws)
{
+ unsigned int cnt, inpr;
ktime_t duration;
ktime_t now;

@@ -444,6 +445,10 @@ static void wakeup_source_deactivate(str
* couter of wakeup events in progress simultaneously.
*/
atomic_add(MAX_IN_PROGRESS, &combined_event_count);
+
+ split_counters(&cnt, &inpr);
+ if (!inpr)
+ wake_up_all(&wakeup_count_wait_queue);
}

/**
@@ -624,14 +629,19 @@ bool pm_wakeup_pending(void)
bool pm_get_wakeup_count(unsigned int *count)
{
unsigned int cnt, inpr;
+ DEFINE_WAIT(wait);

for (;;) {
+ prepare_to_wait(&wakeup_count_wait_queue, &wait,
+ TASK_INTERRUPTIBLE);
split_counters(&cnt, &inpr);
if (inpr == 0 || signal_pending(current))
break;
pm_wakeup_update_hit_counts();
- schedule_timeout_interruptible(msecs_to_jiffies(TIMEOUT));
+
+ schedule();
}
+ finish_wait(&wakeup_count_wait_queue, &wait);

split_counters(&cnt, &inpr);
*count = cnt;

2012-02-07 01:06:03

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 3/8] PM / Sleep: Look for wakeup events in later stages of device suspend

From: Rafael J. Wysocki <[email protected]>

Currently, the device suspend code only checks if there have been
any wakeup events, and therefore the ongoing system transition to a
sleep state should be aborted, during the first (i.e. "suspend")
device suspend phase. However, wakeup events may be reported later
as well, so it's reasonable to look for them in the in the subsequent
(i.e. "late suspend" and "suspend noirq") phases.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/power/main.c | 10 ++++++++++
1 file changed, 10 insertions(+)

Index: linux/drivers/base/power/main.c
===================================================================
--- linux.orig/drivers/base/power/main.c
+++ linux/drivers/base/power/main.c
@@ -889,6 +889,11 @@ static int dpm_suspend_noirq(pm_message_
if (!list_empty(&dev->power.entry))
list_move(&dev->power.entry, &dpm_noirq_list);
put_device(dev);
+
+ if (pm_wakeup_pending()) {
+ error = -EBUSY;
+ break;
+ }
}
mutex_unlock(&dpm_list_mtx);
if (error)
@@ -962,6 +967,11 @@ static int dpm_suspend_late(pm_message_t
if (!list_empty(&dev->power.entry))
list_move(&dev->power.entry, &dpm_late_early_list);
put_device(dev);
+
+ if (pm_wakeup_pending()) {
+ error = -EBUSY;
+ break;
+ }
}
mutex_unlock(&dpm_list_mtx);
if (error)

2012-02-07 01:06:34

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 5/8] PM / Sleep: Change wakeup statistics

From: Rafael J. Wysocki <[email protected]>

Wakeup statistics used by Android are slightly different from what we
have at the moment, so modify them to follow Android more closely.

This removes the struct wakeup_source's hit_cout field, which is very
rough and therefore not very useful, and adds two new fields,
wakeup_count and expire_count. The first one tracks how many times
the wakeup source is activated with events_check_enabled set (which
roughly corresponds to the situations when a system power transition
to a sleep state is in progress and should be aborted by this wakeup
source if it is the only active one at that time) and the second one
is the number of times the wakeup source has been activated with a
timeout that expired.

Additionally, the last_time field is now updated when the wakeup
source is deactivated too (previously it was only updated during
the wakeup source's activation), which seems to be what Android does
with the analogous counter for wakelocks.

---
drivers/base/power/sysfs.c | 30 +++++++++++++++++++++++-----
drivers/base/power/wakeup.c | 47 +++++++++++++++++---------------------------
include/linux/pm_wakeup.h | 12 +++++++----
3 files changed, 52 insertions(+), 37 deletions(-)

Index: linux/include/linux/pm_wakeup.h
===================================================================
--- linux.orig/include/linux/pm_wakeup.h
+++ linux/include/linux/pm_wakeup.h
@@ -33,12 +33,14 @@
*
* @total_time: Total time this wakeup source has been active.
* @max_time: Maximum time this wakeup source has been continuously active.
- * @last_time: Monotonic clock when the wakeup source's was activated last time.
+ * @last_time: Monotonic clock when the wakeup source's was touched last time.
* @event_count: Number of signaled wakeup events.
* @active_count: Number of times the wakeup sorce was activated.
* @relax_count: Number of times the wakeup sorce was deactivated.
- * @hit_count: Number of times the wakeup sorce might abort system suspend.
+ * @expire_count: Number of times the wakeup source's timeout has expired.
+ * @wakeup_count: Number of times the wakeup source might abort suspend.
* @active: Status of the wakeup source.
+ * @has_timeout: The wakeup source has been activated with a timeout.
*/
struct wakeup_source {
char *name;
@@ -52,8 +54,10 @@ struct wakeup_source {
unsigned long event_count;
unsigned long active_count;
unsigned long relax_count;
- unsigned long hit_count;
- unsigned int active:1;
+ unsigned long expire_count;
+ unsigned long wakeup_count;
+ bool active:1;
+ bool has_timeout:1;
};

#ifdef CONFIG_PM_SLEEP
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -21,7 +21,7 @@
* If set, the suspend/hibernate code will abort transitions to a sleep state
* if wakeup events are registered during or immediately before the transition.
*/
-bool events_check_enabled;
+bool events_check_enabled __read_mostly;

/*
* Combined counters of registered wakeup events and wakeup events in progress.
@@ -370,9 +370,15 @@ void __pm_stay_awake(struct wakeup_sourc
return;

spin_lock_irqsave(&ws->lock, flags);
+
ws->event_count++;
if (!ws->active)
wakeup_source_activate(ws);
+
+ /* This is racy, but the counter is approximate anyway. */
+ if (events_check_enabled)
+ ws->wakeup_count++;
+
spin_unlock_irqrestore(&ws->lock, flags);
}
EXPORT_SYMBOL_GPL(__pm_stay_awake);
@@ -438,6 +444,11 @@ static void wakeup_source_deactivate(str
if (ktime_to_ns(duration) > ktime_to_ns(ws->max_time))
ws->max_time = duration;

+ ws->last_time = now;
+ if (ws->has_timeout && time_after(jiffies, ws->timer_expires))
+ ws->expire_count++;
+
+ ws->has_timeout = false;
del_timer(&ws->timer);

/*
@@ -542,6 +553,7 @@ void __pm_wakeup_event(struct wakeup_sou
if (time_after(expires, ws->timer_expires)) {
mod_timer(&ws->timer, expires);
ws->timer_expires = expires;
+ ws->has_timeout = true;
}

unlock:
@@ -571,24 +583,6 @@ void pm_wakeup_event(struct device *dev,
EXPORT_SYMBOL_GPL(pm_wakeup_event);

/**
- * pm_wakeup_update_hit_counts - Update hit counts of all active wakeup sources.
- */
-static void pm_wakeup_update_hit_counts(void)
-{
- unsigned long flags;
- struct wakeup_source *ws;
-
- rcu_read_lock();
- list_for_each_entry_rcu(ws, &wakeup_sources, entry) {
- spin_lock_irqsave(&ws->lock, flags);
- if (ws->active)
- ws->hit_count++;
- spin_unlock_irqrestore(&ws->lock, flags);
- }
- rcu_read_unlock();
-}
-
-/**
* pm_wakeup_pending - Check if power transition in progress should be aborted.
*
* Compare the current number of registered wakeup events with its preserved
@@ -610,8 +604,6 @@ bool pm_wakeup_pending(void)
events_check_enabled = !ret;
}
spin_unlock_irqrestore(&events_lock, flags);
- if (ret)
- pm_wakeup_update_hit_counts();
return ret;
}

@@ -637,7 +629,6 @@ bool pm_get_wakeup_count(unsigned int *c
split_counters(&cnt, &inpr);
if (inpr == 0 || signal_pending(current))
break;
- pm_wakeup_update_hit_counts();

schedule();
}
@@ -670,8 +661,6 @@ bool pm_save_wakeup_count(unsigned int c
events_check_enabled = true;
}
spin_unlock_irq(&events_lock);
- if (!events_check_enabled)
- pm_wakeup_update_hit_counts();
return events_check_enabled;
}

@@ -706,9 +695,10 @@ static int print_wakeup_source_stats(str
active_time = ktime_set(0, 0);
}

- ret = seq_printf(m, "%-12s\t%lu\t\t%lu\t\t%lu\t\t"
+ ret = seq_printf(m, "%-12s\t%lu\t\t%lu\t\t%lu\t\t%lu\t\t"
"%lld\t\t%lld\t\t%lld\t\t%lld\n",
- ws->name, active_count, ws->event_count, ws->hit_count,
+ ws->name, active_count, ws->event_count,
+ ws->wakeup_count, ws->expire_count,
ktime_to_ms(active_time), ktime_to_ms(total_time),
ktime_to_ms(max_time), ktime_to_ms(ws->last_time));

@@ -725,8 +715,9 @@ static int wakeup_sources_stats_show(str
{
struct wakeup_source *ws;

- seq_puts(m, "name\t\tactive_count\tevent_count\thit_count\t"
- "active_since\ttotal_time\tmax_time\tlast_change\n");
+ seq_puts(m, "name\t\tactive_count\tevent_count\twakeup_count\t"
+ "expire_count\tactive_since\ttotal_time\tmax_time\t"
+ "last_change\n");

rcu_read_lock();
list_for_each_entry_rcu(ws, &wakeup_sources, entry)
Index: linux/drivers/base/power/sysfs.c
===================================================================
--- linux.orig/drivers/base/power/sysfs.c
+++ linux/drivers/base/power/sysfs.c
@@ -288,22 +288,41 @@ static ssize_t wakeup_active_count_show(

static DEVICE_ATTR(wakeup_active_count, 0444, wakeup_active_count_show, NULL);

-static ssize_t wakeup_hit_count_show(struct device *dev,
- struct device_attribute *attr, char *buf)
+static ssize_t wakeup_wakeup_count_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ unsigned long count = 0;
+ bool enabled = false;
+
+ spin_lock_irq(&dev->power.lock);
+ if (dev->power.wakeup) {
+ count = dev->power.wakeup->wakeup_count;
+ enabled = true;
+ }
+ spin_unlock_irq(&dev->power.lock);
+ return enabled ? sprintf(buf, "%lu\n", count) : sprintf(buf, "\n");
+}
+
+static DEVICE_ATTR(wakeup_wakeup_count, 0444, wakeup_wakeup_count_show, NULL);
+
+static ssize_t wakeup_expire_count_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
{
unsigned long count = 0;
bool enabled = false;

spin_lock_irq(&dev->power.lock);
if (dev->power.wakeup) {
- count = dev->power.wakeup->hit_count;
+ count = dev->power.wakeup->expire_count;
enabled = true;
}
spin_unlock_irq(&dev->power.lock);
return enabled ? sprintf(buf, "%lu\n", count) : sprintf(buf, "\n");
}

-static DEVICE_ATTR(wakeup_hit_count, 0444, wakeup_hit_count_show, NULL);
+static DEVICE_ATTR(wakeup_expire_count, 0444, wakeup_expire_count_show, NULL);

static ssize_t wakeup_active_show(struct device *dev,
struct device_attribute *attr, char *buf)
@@ -460,7 +479,8 @@ static struct attribute *wakeup_attrs[]
&dev_attr_wakeup.attr,
&dev_attr_wakeup_count.attr,
&dev_attr_wakeup_active_count.attr,
- &dev_attr_wakeup_hit_count.attr,
+ &dev_attr_wakeup_wakeup_count.attr,
+ &dev_attr_wakeup_expire_count.attr,
&dev_attr_wakeup_active.attr,
&dev_attr_wakeup_total_time_ms.attr,
&dev_attr_wakeup_max_time_ms.attr,

2012-02-07 01:06:40

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 8/8] PM / Sleep: Add user space interface for manipulating wakeup sources

From: Rafael J. Wysocki <[email protected]>

Android allows user space to manipulate wakelocks using two
sysfs file located in /sys/power/, wake_lock and wake_unlock.
Writing a wakelock name and optionally a timeout to the wake_lock
file causes the wakelock whose name was written to be acquired (it
is created before is necessary), optionally with the given timeout.
Writing the name of a wakelock to wake_unlock causes that wakelock
to be released.

Implement an analogous interface for user space using wakeup sources.
Add the /sys/power/wake_lock and /sys/power/wake_unlock files
allowing user space to create, activate and deactivate wakeup
sources, such that writing a name and optionally a timeout to
wake_lock causes the wakeup source of that name to be activated,
optionally with the given timeout. If that wakeup source doesn't
exist, it will be created and then activated. Writing a name to
wake_unlock causes the wakeup source of that name, if there is one,
to be deactivated. Wakeup sources created with the help of
wake_lock that haven't been used for more than 5 minutes are garbage
collected and destroyed. Moreover, there can be only WL_NUMBER_LIMIT
wakeup sources created with the help of wake_lock present at a time.

The data type used to track wakeup sources created by user space is
called "struct wakelock" to indicate the origins of this feature.

---
drivers/base/power/wakeup.c | 1
kernel/power/Kconfig | 8 +
kernel/power/Makefile | 1
kernel/power/main.c | 41 ++++++++
kernel/power/power.h | 9 +
kernel/power/wakelock.c | 218 ++++++++++++++++++++++++++++++++++++++++++++
6 files changed, 278 insertions(+)

Index: linux/kernel/power/main.c
===================================================================
--- linux.orig/kernel/power/main.c
+++ linux/kernel/power/main.c
@@ -415,6 +415,43 @@ static ssize_t autosleep_store(struct ko

power_attr(autosleep);
#endif /* CONFIG_PM_AUTOSLEEP */
+
+#ifdef CONFIG_PM_WAKELOCKS
+static ssize_t wake_lock_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ return pm_show_wakelocks(buf, true);
+}
+
+static ssize_t wake_lock_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ int error = pm_wake_lock(buf);
+ return error ? error : n;
+}
+
+power_attr(wake_lock);
+
+static ssize_t wake_unlock_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ return pm_show_wakelocks(buf, false);
+}
+
+static ssize_t wake_unlock_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ int error = pm_wake_unlock(buf);
+ return error ? error : n;
+}
+
+power_attr(wake_unlock);
+
+#endif /* CONFIG_PM_WAKELOCKS */
#endif /* CONFIG_PM_SLEEP */

#ifdef CONFIG_PM_TRACE
@@ -471,6 +508,10 @@ static struct attribute * g[] = {
#ifdef CONFIG_PM_AUTOSLEEP
&autosleep_attr.attr,
#endif
+#ifdef CONFIG_PM_WAKELOCKS
+ &wake_lock_attr.attr,
+ &wake_unlock_attr.attr,
+#endif
#ifdef CONFIG_PM_DEBUG
&pm_test_attr.attr,
#endif
Index: linux/kernel/power/power.h
===================================================================
--- linux.orig/kernel/power/power.h
+++ linux/kernel/power/power.h
@@ -287,3 +287,12 @@ static inline void pm_autosleep_unlock(v
static inline suspend_state_t pm_autosleep_state(void) { return PM_SUSPEND_ON; }

#endif /* !CONFIG_PM_AUTOSLEEP */
+
+#ifdef CONFIG_PM_WAKELOCKS
+
+/* kernel/power/wakelock.c */
+extern ssize_t pm_show_wakelocks(char *buf, bool show_active);
+extern int pm_wake_lock(const char *buf);
+extern int pm_wake_unlock(const char *buf);
+
+#endif /* !CONFIG_PM_WAKELOCKS */
Index: linux/kernel/power/Kconfig
===================================================================
--- linux.orig/kernel/power/Kconfig
+++ linux/kernel/power/Kconfig
@@ -111,6 +111,14 @@ config PM_AUTOSLEEP
Allow the kernel to trigger a system transition into a global sleep
state automatically whenever there are no active wakeup sources.

+config PM_WAKELOCKS
+ bool "User space wakeup sources interface"
+ depends on PM_SLEEP
+ default n
+ ---help---
+ Allow user space to create, activate and deactivate wakeup source
+ objects with the help of a sysfs-based interface.
+
config PM_RUNTIME
bool "Run-time PM core functionality"
depends on !IA64_HP_SIM
Index: linux/kernel/power/wakelock.c
===================================================================
--- /dev/null
+++ linux/kernel/power/wakelock.c
@@ -0,0 +1,218 @@
+/*
+ * kernel/power/wakelock.c
+ *
+ * User space wakeup sources support.
+ *
+ * Copyright (C) 2012 Rafael J. Wysocki <[email protected]>
+ *
+ * This code is based on the analogous interface allowing user space to
+ * manipulate wakelocks on Android.
+ */
+
+#include <linux/ctype.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/hrtimer.h>
+#include <linux/list.h>
+#include <linux/rbtree.h>
+#include <linux/slab.h>
+
+#define WL_NUMBER_LIMIT 100
+#define WL_GC_COUNT_MAX 100
+#define WL_GC_TIME_SEC 300
+
+static DEFINE_MUTEX(wakelocks_lock);
+
+struct wakelock {
+ char *name;
+ struct rb_node node;
+ struct wakeup_source ws;
+ struct list_head lru;
+};
+
+static struct rb_root wakelocks_tree = RB_ROOT;
+static LIST_HEAD(wakelocks_lru_list);
+static unsigned int number_of_wakelocks;
+static unsigned int wakelocks_gc_count;
+
+ssize_t pm_show_wakelocks(char *buf, bool show_active)
+{
+ struct rb_node *node;
+ struct wakelock *wl;
+ char *str = buf;
+ char *end = buf + PAGE_SIZE;
+
+ mutex_lock(&wakelocks_lock);
+
+ for (node = rb_first(&wakelocks_tree); node; node = rb_next(node)) {
+ bool active;
+
+ wl = rb_entry(node, struct wakelock, node);
+ spin_lock_irq(&wl->ws.lock);
+ active = wl->ws.active;
+ spin_unlock_irq(&wl->ws.lock);
+ if (active == show_active)
+ str += scnprintf(str, end - str, "%s ", wl->name);
+ }
+ str += scnprintf(str, end - str, "\n");
+
+ mutex_unlock(&wakelocks_lock);
+ return (str - buf);
+}
+
+static struct wakelock *wakelock_lookup_add(const char *name, size_t len,
+ bool add_if_not_found)
+{
+ struct rb_node **node = &wakelocks_tree.rb_node;
+ struct rb_node *parent = *node;
+ struct wakelock *wl;
+
+ while (*node) {
+ int diff;
+
+ wl = rb_entry(*node, struct wakelock, node);
+ diff = strncmp(name, wl->name, len);
+ if (diff == 0) {
+ if (wl->name[len])
+ diff = -1;
+ else
+ return wl;
+ }
+ if (diff < 0)
+ node = &(*node)->rb_left;
+ else
+ node = &(*node)->rb_right;
+
+ parent = *node;
+ }
+ if (!add_if_not_found)
+ return ERR_PTR(-EINVAL);
+
+ if (number_of_wakelocks > WL_NUMBER_LIMIT)
+ return ERR_PTR(-ENOSPC);
+
+ /* Not found, we have to add a new one. */
+ wl = kzalloc(sizeof(*wl), GFP_KERNEL);
+ if (!wl)
+ return ERR_PTR(-ENOMEM);
+
+ wl->name = kstrndup(name, len, GFP_KERNEL);
+ if (!wl->name) {
+ kfree(wl);
+ return ERR_PTR(-ENOMEM);
+ }
+ wl->ws.name = wl->name;
+ wakeup_source_add(&wl->ws);
+ rb_link_node(&wl->node, parent, node);
+ rb_insert_color(&wl->node, &wakelocks_tree);
+ list_add(&wl->lru, &wakelocks_lru_list);
+ number_of_wakelocks++;
+ return wl;
+}
+
+int pm_wake_lock(const char *buf)
+{
+ const char *str = buf;
+ struct wakelock *wl;
+ u64 timeout_ns = 0;
+ size_t len;
+ int ret = 0;
+
+ while (*str && !isspace(*str))
+ str++;
+
+ len = str - buf;
+ if (!len)
+ return -EINVAL;
+
+ if (*str && *str != '\n') {
+ /* Find out if there's a valid timeout string appended. */
+ ret = kstrtou64(skip_spaces(str), 10, &timeout_ns);
+ if (ret)
+ return -EINVAL;
+ }
+
+ mutex_lock(&wakelocks_lock);
+
+ wl = wakelock_lookup_add(buf, len, true);
+ if (IS_ERR(wl)) {
+ ret = PTR_ERR(wl);
+ goto out;
+ }
+ if (timeout_ns) {
+ u64 timeout_ms = timeout_ns + NSEC_PER_MSEC - 1;
+
+ do_div(timeout_ms, NSEC_PER_MSEC);
+ __pm_wakeup_event(&wl->ws, timeout_ms);
+ } else {
+ __pm_stay_awake(&wl->ws);
+ }
+
+ list_move(&wl->lru, &wakelocks_lru_list);
+
+ out:
+ mutex_unlock(&wakelocks_lock);
+ return ret;
+}
+
+static void wakelocks_gc(void)
+{
+ struct wakelock *wl, *aux;
+ ktime_t now = ktime_get();
+
+ list_for_each_entry_safe_reverse(wl, aux, &wakelocks_lru_list, lru) {
+ u64 idle_time_ns;
+ bool active;
+
+ spin_lock_irq(&wl->ws.lock);
+ idle_time_ns = ktime_to_ns(ktime_sub(now, wl->ws.last_time));
+ active = wl->ws.active;
+ spin_unlock_irq(&wl->ws.lock);
+
+ if (idle_time_ns < ((u64)WL_GC_TIME_SEC * NSEC_PER_SEC))
+ break;
+
+ if (!active) {
+ wakeup_source_remove(&wl->ws);
+ rb_erase(&wl->node, &wakelocks_tree);
+ list_del(&wl->lru);
+ kfree(wl->name);
+ kfree(wl);
+ number_of_wakelocks--;
+ }
+ }
+ wakelocks_gc_count = 0;
+}
+
+int pm_wake_unlock(const char *buf)
+{
+ struct wakelock *wl;
+ size_t len;
+ int ret = 0;
+
+ len = strlen(buf);
+ if (!len)
+ return -EINVAL;
+
+ if (buf[len-1] == '\n')
+ len--;
+
+ if (!len)
+ return -EINVAL;
+
+ mutex_lock(&wakelocks_lock);
+
+ wl = wakelock_lookup_add(buf, len, false);
+ if (IS_ERR(wl)) {
+ ret = PTR_ERR(wl);
+ goto out;
+ }
+ __pm_relax(&wl->ws);
+ list_move(&wl->lru, &wakelocks_lru_list);
+ if (++wakelocks_gc_count > WL_GC_COUNT_MAX)
+ wakelocks_gc();
+
+ out:
+ mutex_unlock(&wakelocks_lock);
+ return ret;
+}
Index: linux/kernel/power/Makefile
===================================================================
--- linux.orig/kernel/power/Makefile
+++ linux/kernel/power/Makefile
@@ -9,5 +9,6 @@ obj-$(CONFIG_PM_TEST_SUSPEND) += suspend
obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o \
block_io.o
obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o
+obj-$(CONFIG_PM_WAKELOCKS) += wakelock.o

obj-$(CONFIG_MAGIC_SYSRQ) += poweroff.o
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -107,6 +107,7 @@ void wakeup_source_add(struct wakeup_sou
spin_lock_init(&ws->lock);
setup_timer(&ws->timer, pm_wakeup_timer_fn, (unsigned long)ws);
ws->active = false;
+ ws->last_time = ktime_get();

spin_lock_irq(&events_lock);
list_add_rcu(&ws->entry, &wakeup_sources);

2012-02-07 01:06:39

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 6/8] PM / Sleep: Implement opportunistic sleep

From: Rafael J. Wysocki <[email protected]>

Introduce a mechanism by which the kernel can trigger global
transitions to a sleep state chosen by user space if there are no
active wakeup sources.

It consists of a new sysfs attribute, /sys/power/autosleep, that
can be written one of the strings returned by reads from
/sys/power/state, a freezable ordered workqueue and a work item
carrying out the "suspend" operations. If a string representing
the system's sleep state is written to /sys/power/autosleep, the
work item triggering transitions to that state is queued up and
it requeues it self after every execution until user space writes
"off" to /sys/power/autosleep. That work item enables the detection
of wakeup events using the functions already defined in
drivers/base/power/wakeup.c (with one small modification) and calls
either pm_suspend(), or hibernate() to put the system into a sleep
state. If a wakeup event is reported while the transition is in
progress, it will abort the transition and the "system suspend" work
item will be queued up again.

---
drivers/base/power/wakeup.c | 38 ++++++++------
include/linux/suspend.h | 13 ++++
kernel/power/Kconfig | 8 +++
kernel/power/Makefile | 1
kernel/power/autosleep.c | 115 ++++++++++++++++++++++++++++++++++++++++++++
kernel/power/main.c | 93 +++++++++++++++++++++++++++++------
kernel/power/power.h | 18 ++++++
7 files changed, 254 insertions(+), 32 deletions(-)

Index: linux/kernel/power/Makefile
===================================================================
--- linux.orig/kernel/power/Makefile
+++ linux/kernel/power/Makefile
@@ -8,5 +8,6 @@ obj-$(CONFIG_SUSPEND) += suspend.o
obj-$(CONFIG_PM_TEST_SUSPEND) += suspend_test.o
obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o \
block_io.o
+obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o

obj-$(CONFIG_MAGIC_SYSRQ) += poweroff.o
Index: linux/kernel/power/Kconfig
===================================================================
--- linux.orig/kernel/power/Kconfig
+++ linux/kernel/power/Kconfig
@@ -103,6 +103,14 @@ config PM_SLEEP_SMP
select HOTPLUG
select HOTPLUG_CPU

+config PM_AUTOSLEEP
+ bool "Opportunistic sleep"
+ depends on PM_SLEEP
+ default n
+ ---help---
+ Allow the kernel to trigger a system transition into a global sleep
+ state automatically whenever there are no active wakeup sources.
+
config PM_RUNTIME
bool "Run-time PM core functionality"
depends on !IA64_HP_SIM
Index: linux/kernel/power/power.h
===================================================================
--- linux.orig/kernel/power/power.h
+++ linux/kernel/power/power.h
@@ -269,3 +269,21 @@ static inline void suspend_thaw_processe
{
}
#endif
+
+#ifdef CONFIG_PM_AUTOSLEEP
+
+/* kernel/power/autosleep.c */
+extern int pm_autosleep_init(void);
+extern void pm_autosleep_lock(void);
+extern void pm_autosleep_unlock(void);
+extern suspend_state_t pm_autosleep_state(void);
+extern int pm_autosleep_set_state(suspend_state_t state);
+
+#else /* !CONFIG_PM_AUTOSLEEP */
+
+static inline int pm_autosleep_init(void) { return 0; }
+static inline void pm_autosleep_lock(void) {}
+static inline void pm_autosleep_unlock(void) {}
+static inline suspend_state_t pm_autosleep_state(void) { return PM_SUSPEND_ON; }
+
+#endif /* !CONFIG_PM_AUTOSLEEP */
Index: linux/include/linux/suspend.h
===================================================================
--- linux.orig/include/linux/suspend.h
+++ linux/include/linux/suspend.h
@@ -372,7 +372,7 @@ extern int unregister_pm_notifier(struct
extern bool events_check_enabled;

extern bool pm_wakeup_pending(void);
-extern bool pm_get_wakeup_count(unsigned int *count);
+extern bool pm_get_wakeup_count(unsigned int *count, bool block);
extern bool pm_save_wakeup_count(unsigned int count);

static inline void lock_system_sleep(void)
@@ -423,6 +423,17 @@ static inline void unlock_system_sleep(v

#endif /* !CONFIG_PM_SLEEP */

+#ifdef CONFIG_PM_AUTOSLEEP
+
+/* kernel/power/autosleep.c */
+void queue_up_suspend_work(void);
+
+#else /* !CONFIG_PM_AUTOSLEEP */
+
+static inline void queue_up_suspend_work(void) {}
+
+#endif /* !CONFIG_PM_AUTOSLEEP */
+
#ifdef CONFIG_ARCH_SAVE_PAGE_KEYS
/*
* The ARCH_SAVE_PAGE_KEYS functions can be used by an architecture
Index: linux/kernel/power/autosleep.c
===================================================================
--- /dev/null
+++ linux/kernel/power/autosleep.c
@@ -0,0 +1,115 @@
+/*
+ * kernel/power/autosleep.c
+ *
+ * Opportunistic sleep support.
+ *
+ * Copyright (C) 2012 Rafael J. Wysocki <[email protected]>
+ */
+
+#include <linux/device.h>
+#include <linux/mutex.h>
+#include <linux/pm_wakeup.h>
+
+#include "power.h"
+
+static struct workqueue_struct *autosleep_wq;
+static struct wakeup_source *autosleep_ws;
+
+static DEFINE_MUTEX(autosleep_lock);
+static DECLARE_COMPLETION(suspend_completion);
+
+static suspend_state_t autosleep_state;
+
+static void try_to_suspend(struct work_struct *work)
+{
+ unsigned int initial_count, final_count;
+
+ if (!pm_get_wakeup_count(&initial_count, true))
+ goto out;
+
+ if (!pm_save_wakeup_count(initial_count))
+ goto out;
+
+ mutex_lock(&autosleep_lock);
+ if (autosleep_state == PM_SUSPEND_ON) {
+ mutex_unlock(&autosleep_lock);
+ return;
+ }
+ INIT_COMPLETION(suspend_completion);
+ if (autosleep_state >= PM_SUSPEND_MAX)
+ hibernate();
+ else
+ pm_suspend(autosleep_state);
+
+ complete_all(&suspend_completion);
+ mutex_unlock(&autosleep_lock);
+
+ if (!pm_get_wakeup_count(&final_count, false))
+ goto out;
+
+ if (final_count == initial_count)
+ schedule_timeout(HZ / 2);
+
+ out:
+ queue_up_suspend_work();
+}
+
+static DECLARE_WORK(suspend_work, try_to_suspend);
+
+void queue_up_suspend_work(void)
+{
+ if (!work_pending(&suspend_work) && autosleep_state > PM_SUSPEND_ON)
+ queue_work(autosleep_wq, &suspend_work);
+}
+
+suspend_state_t pm_autosleep_state(void)
+{
+ return autosleep_state;
+}
+
+int pm_autosleep_set_state(suspend_state_t state)
+{
+#ifndef CONFIG_HIBERNATION
+ if (state >= PM_SUSPEND_MAX)
+ return -EINVAL;
+#endif
+ mutex_lock(&autosleep_lock);
+ __pm_stay_awake(autosleep_ws);
+ if (state == PM_SUSPEND_ON && autosleep_state != PM_SUSPEND_ON) {
+ autosleep_state = PM_SUSPEND_ON;
+ __pm_relax(autosleep_ws);
+ mutex_unlock(&autosleep_lock);
+ wait_for_completion(&suspend_completion);
+ } else if (state > PM_SUSPEND_ON) {
+ autosleep_state = state;
+ __pm_relax(autosleep_ws);
+ queue_up_suspend_work();
+ mutex_unlock(&autosleep_lock);
+ }
+ return 0;
+}
+
+void pm_autosleep_lock(void)
+{
+ mutex_lock(&autosleep_lock);
+}
+
+void pm_autosleep_unlock(void)
+{
+ mutex_unlock(&autosleep_lock);
+}
+
+int __init pm_autosleep_init(void)
+{
+ complete_all(&suspend_completion);
+ autosleep_ws = wakeup_source_register("main");
+ if (!autosleep_ws)
+ return -ENOMEM;
+
+ autosleep_wq = alloc_ordered_workqueue("autosleep", 0);
+ if (autosleep_wq)
+ return 0;
+
+ wakeup_source_unregister(autosleep_ws);
+ return -ENOMEM;
+}
Index: linux/kernel/power/main.c
===================================================================
--- linux.orig/kernel/power/main.c
+++ linux/kernel/power/main.c
@@ -269,8 +269,7 @@ static ssize_t state_show(struct kobject
return (s - buf);
}

-static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
- const char *buf, size_t n)
+static suspend_state_t decode_state(const char *buf, size_t n)
{
#ifdef CONFIG_SUSPEND
suspend_state_t state = PM_SUSPEND_STANDBY;
@@ -278,29 +277,46 @@ static ssize_t state_store(struct kobjec
#endif
char *p;
int len;
- int error = -EINVAL;

p = memchr(buf, '\n', n);
len = p ? p - buf : n;

- /* First, check if we are requested to hibernate */
- if (len == 4 && !strncmp(buf, "disk", len)) {
- error = hibernate();
- goto Exit;
- }
+ /* Check hibernation first. */
+ if (len == 4 && !strncmp(buf, "disk", len))
+ return PM_SUSPEND_MAX;

#ifdef CONFIG_SUSPEND
for (s = &pm_states[state]; state < PM_SUSPEND_MAX; s++, state++) {
if (*s && len == strlen(*s) && !strncmp(buf, *s, len))
break;
}
- if (state < PM_SUSPEND_MAX && *s) {
- error = enter_state(state);
- suspend_stats_update(error);
- }
+ if (state < PM_SUSPEND_MAX && *s)
+ return state;
#endif

- Exit:
+ return PM_SUSPEND_ON;
+}
+
+static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ suspend_state_t state;
+ int error = -EINVAL;
+
+ pm_autosleep_lock();
+ if (pm_autosleep_state() > PM_SUSPEND_ON) {
+ error = -EBUSY;
+ goto out;
+ }
+
+ state = decode_state(buf, n);
+ if (state < PM_SUSPEND_MAX)
+ error = pm_suspend(state);
+ else if (state > PM_SUSPEND_ON)
+ error = hibernate();
+
+ out:
+ pm_autosleep_unlock();
return error ? error : n;
}

@@ -341,7 +357,8 @@ static ssize_t wakeup_count_show(struct
{
unsigned int val;

- return pm_get_wakeup_count(&val) ? sprintf(buf, "%u\n", val) : -EINTR;
+ return pm_get_wakeup_count(&val, true) ?
+ sprintf(buf, "%u\n", val) : -EINTR;
}

static ssize_t wakeup_count_store(struct kobject *kobj,
@@ -358,6 +375,46 @@ static ssize_t wakeup_count_store(struct
}

power_attr(wakeup_count);
+
+#ifdef CONFIG_PM_AUTOSLEEP
+static ssize_t autosleep_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ suspend_state_t state = pm_autosleep_state();
+
+ if (state == PM_SUSPEND_ON)
+ return sprintf(buf, "off\n");
+
+#ifdef CONFIG_SUSPEND
+ if (state < PM_SUSPEND_MAX)
+ return sprintf(buf, "%s\n", valid_state(state) ?
+ pm_states[state] : "error");
+#endif
+#ifdef CONFIG_HIBERNATION
+ return sprintf(buf, "disk\n");
+#else
+ return sprintf(buf, "error");
+#endif
+}
+
+static ssize_t autosleep_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ suspend_state_t state = decode_state(buf, n);
+ int error;
+
+ if (state == PM_SUSPEND_ON && strncmp(buf, "off", 3)
+ && strncmp(buf, "off\n", 4))
+ return -EINVAL;
+
+ error = pm_autosleep_set_state(state);
+ return error ? error : n;
+}
+
+power_attr(autosleep);
+#endif /* CONFIG_PM_AUTOSLEEP */
#endif /* CONFIG_PM_SLEEP */

#ifdef CONFIG_PM_TRACE
@@ -411,6 +468,9 @@ static struct attribute * g[] = {
#ifdef CONFIG_PM_SLEEP
&pm_async_attr.attr,
&wakeup_count_attr.attr,
+#ifdef CONFIG_PM_AUTOSLEEP
+ &autosleep_attr.attr,
+#endif
#ifdef CONFIG_PM_DEBUG
&pm_test_attr.attr,
#endif
@@ -446,7 +506,10 @@ static int __init pm_init(void)
power_kobj = kobject_create_and_add("power", NULL);
if (!power_kobj)
return -ENOMEM;
- return sysfs_create_group(power_kobj, &attr_group);
+ error = sysfs_create_group(power_kobj, &attr_group);
+ if (error)
+ return error;
+ return pm_autosleep_init();
}

core_initcall(pm_init);
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -458,8 +458,10 @@ static void wakeup_source_deactivate(str
atomic_add(MAX_IN_PROGRESS, &combined_event_count);

split_counters(&cnt, &inpr);
- if (!inpr)
+ if (!inpr) {
wake_up_all(&wakeup_count_wait_queue);
+ queue_up_suspend_work();
+ }
}

/**
@@ -610,29 +612,33 @@ bool pm_wakeup_pending(void)
/**
* pm_get_wakeup_count - Read the number of registered wakeup events.
* @count: Address to store the value at.
+ * @block: Whether or not to block.
*
- * Store the number of registered wakeup events at the address in @count. Block
- * if the current number of wakeup events being processed is nonzero.
+ * Store the number of registered wakeup events at the address in @count. If
+ * @block is set, block until the current number of wakeup events being
+ * processed is zero.
*
- * Return 'false' if the wait for the number of wakeup events being processed to
- * drop down to zero has been interrupted by a signal (and the current number
- * of wakeup events being processed is still nonzero). Otherwise return 'true'.
+ * Return 'false' if the current number of wakeup events being processed is
+ * nonzero. Otherwise return 'true'.
*/
-bool pm_get_wakeup_count(unsigned int *count)
+bool pm_get_wakeup_count(unsigned int *count, bool block)
{
unsigned int cnt, inpr;
- DEFINE_WAIT(wait);

- for (;;) {
- prepare_to_wait(&wakeup_count_wait_queue, &wait,
- TASK_INTERRUPTIBLE);
- split_counters(&cnt, &inpr);
- if (inpr == 0 || signal_pending(current))
- break;
+ if (block) {
+ DEFINE_WAIT(wait);

- schedule();
+ for (;;) {
+ prepare_to_wait(&wakeup_count_wait_queue, &wait,
+ TASK_INTERRUPTIBLE);
+ split_counters(&cnt, &inpr);
+ if (inpr == 0 || signal_pending(current))
+ break;
+
+ schedule();
+ }
+ finish_wait(&wakeup_count_wait_queue, &wait);
}
- finish_wait(&wakeup_count_wait_queue, &wait);

split_counters(&cnt, &inpr);
*count = cnt;

2012-02-07 01:07:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 7/8] PM / Sleep: Add "prevent autosleep time" statistics to wakeup sources

From: Rafael J. Wysocki <[email protected]>

Android uses one wakelock statistics that is only necessary for
opportunistic sleep. Namely, the prevent_suspend_time field
accumulates the total time the given wakelock has been locked
while "automatic suspend" was enabled. Add an analogous field,
prevent_sleep_time, to wakeup sources and make it behave in a similar
way.

---
drivers/base/power/wakeup.c | 61 +++++++++++++++++++++++++++++++++++++++++---
include/linux/pm_wakeup.h | 4 ++
include/linux/suspend.h | 1
kernel/power/autosleep.c | 2 +
4 files changed, 64 insertions(+), 4 deletions(-)

Index: linux/include/linux/pm_wakeup.h
===================================================================
--- linux.orig/include/linux/pm_wakeup.h
+++ linux/include/linux/pm_wakeup.h
@@ -34,6 +34,7 @@
* @total_time: Total time this wakeup source has been active.
* @max_time: Maximum time this wakeup source has been continuously active.
* @last_time: Monotonic clock when the wakeup source's was touched last time.
+ * @prevent_sleep_time: Total time this source has been preventing autosleep.
* @event_count: Number of signaled wakeup events.
* @active_count: Number of times the wakeup sorce was activated.
* @relax_count: Number of times the wakeup sorce was deactivated.
@@ -51,6 +52,8 @@ struct wakeup_source {
ktime_t total_time;
ktime_t max_time;
ktime_t last_time;
+ ktime_t start_prevent_time;
+ ktime_t prevent_sleep_time;
unsigned long event_count;
unsigned long active_count;
unsigned long relax_count;
@@ -58,6 +61,7 @@ struct wakeup_source {
unsigned long wakeup_count;
bool active:1;
bool has_timeout:1;
+ bool autosleep_enabled:1;
};

#ifdef CONFIG_PM_SLEEP
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -351,6 +351,8 @@ static void wakeup_source_activate(struc
ws->active_count++;
ws->timer_expires = jiffies;
ws->last_time = ktime_get();
+ if (ws->autosleep_enabled)
+ ws->start_prevent_time = ws->last_time;

/* Increment the counter of events in progress. */
atomic_inc(&combined_event_count);
@@ -407,6 +409,17 @@ void pm_stay_awake(struct device *dev)
}
EXPORT_SYMBOL_GPL(pm_stay_awake);

+#ifdef CONFIG_PM_AUTOSLEEP
+static void update_prevent_sleep_time(struct wakeup_source *ws, ktime_t now)
+{
+ ktime_t delta = ktime_sub(now, ws->start_prevent_time);
+ ws->prevent_sleep_time = ktime_add(ws->prevent_sleep_time, delta);
+}
+#else
+static inline void update_prevent_sleep_time(struct wakeup_source *ws,
+ ktime_t now) {}
+#endif
+
/**
* wakup_source_deactivate - Mark given wakeup source as inactive.
* @ws: Wakeup source to handle.
@@ -451,6 +464,9 @@ static void wakeup_source_deactivate(str
ws->has_timeout = false;
del_timer(&ws->timer);

+ if (ws->autosleep_enabled)
+ update_prevent_sleep_time(ws, now);
+
/*
* Increment the counter of registered wakeup events and decrement the
* couter of wakeup events in progress simultaneously.
@@ -670,6 +686,34 @@ bool pm_save_wakeup_count(unsigned int c
return events_check_enabled;
}

+#ifdef CONFIG_PM_AUTOSLEEP
+/**
+ * pm_wakep_autosleep_enabled - Modify autosleep_enabled for all wakeup sources.
+ * @enabled: Whether to set or to clear the autosleep_enabled flags.
+ */
+void pm_wakep_autosleep_enabled(bool set)
+{
+ struct wakeup_source *ws;
+ ktime_t now = ktime_get();
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(ws, &wakeup_sources, entry) {
+ spin_lock_irq(&ws->lock);
+ if (ws->autosleep_enabled != set) {
+ ws->autosleep_enabled = set;
+ if (ws->active) {
+ if (set)
+ ws->start_prevent_time = now;
+ else
+ update_prevent_sleep_time(ws, now);
+ }
+ }
+ spin_unlock_irq(&ws->lock);
+ }
+ rcu_read_unlock();
+}
+#endif /* CONFIG_PM_AUTOSLEEP */
+
static struct dentry *wakeup_sources_stats_dentry;

/**
@@ -685,28 +729,37 @@ static int print_wakeup_source_stats(str
ktime_t max_time;
unsigned long active_count;
ktime_t active_time;
+ ktime_t prevent_sleep_time;
int ret;

spin_lock_irqsave(&ws->lock, flags);

total_time = ws->total_time;
max_time = ws->max_time;
+ prevent_sleep_time = ws->prevent_sleep_time;
active_count = ws->active_count;
if (ws->active) {
- active_time = ktime_sub(ktime_get(), ws->last_time);
+ ktime_t now = ktime_get();
+
+ active_time = ktime_sub(now, ws->last_time);
total_time = ktime_add(total_time, active_time);
if (active_time.tv64 > max_time.tv64)
max_time = active_time;
+
+ if (ws->autosleep_enabled)
+ prevent_sleep_time = ktime_add(prevent_sleep_time,
+ ktime_sub(now, ws->start_prevent_time));
} else {
active_time = ktime_set(0, 0);
}

ret = seq_printf(m, "%-12s\t%lu\t\t%lu\t\t%lu\t\t%lu\t\t"
- "%lld\t\t%lld\t\t%lld\t\t%lld\n",
+ "%lld\t\t%lld\t\t%lld\t\t%lld\t\t%lld\n",
ws->name, active_count, ws->event_count,
ws->wakeup_count, ws->expire_count,
ktime_to_ms(active_time), ktime_to_ms(total_time),
- ktime_to_ms(max_time), ktime_to_ms(ws->last_time));
+ ktime_to_ms(max_time), ktime_to_ms(ws->last_time),
+ ktime_to_ms(prevent_sleep_time));

spin_unlock_irqrestore(&ws->lock, flags);

@@ -723,7 +776,7 @@ static int wakeup_sources_stats_show(str

seq_puts(m, "name\t\tactive_count\tevent_count\twakeup_count\t"
"expire_count\tactive_since\ttotal_time\tmax_time\t"
- "last_change\n");
+ "last_change\tprevent_suspend_time\n");

rcu_read_lock();
list_for_each_entry_rcu(ws, &wakeup_sources, entry)
Index: linux/include/linux/suspend.h
===================================================================
--- linux.orig/include/linux/suspend.h
+++ linux/include/linux/suspend.h
@@ -374,6 +374,7 @@ extern bool events_check_enabled;
extern bool pm_wakeup_pending(void);
extern bool pm_get_wakeup_count(unsigned int *count, bool block);
extern bool pm_save_wakeup_count(unsigned int count);
+extern void pm_wakep_autosleep_enabled(bool set);

static inline void lock_system_sleep(void)
{
Index: linux/kernel/power/autosleep.c
===================================================================
--- linux.orig/kernel/power/autosleep.c
+++ linux/kernel/power/autosleep.c
@@ -78,11 +78,13 @@ int pm_autosleep_set_state(suspend_state
if (state == PM_SUSPEND_ON && autosleep_state != PM_SUSPEND_ON) {
autosleep_state = PM_SUSPEND_ON;
__pm_relax(autosleep_ws);
+ pm_wakep_autosleep_enabled(false);
mutex_unlock(&autosleep_lock);
wait_for_completion(&suspend_completion);
} else if (state > PM_SUSPEND_ON) {
autosleep_state = state;
__pm_relax(autosleep_ws);
+ pm_wakep_autosleep_enabled(true);
queue_up_suspend_work();
mutex_unlock(&autosleep_lock);
}

2012-02-07 01:05:59

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 1/8] PM / Sleep: Initialize wakeup source locks in wakeup_source_add()

From: Rafael J. Wysocki <[email protected]>

Initialize wakeup source locks in wakeup_source_add() instead of
wakeup_source_create(), because otherwise the locks of the wakeup
sources that haven't been allocated with wakeup_source_create()
aren't initialized and handled properly.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/power/wakeup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -64,7 +64,6 @@ struct wakeup_source *wakeup_source_crea
if (!ws)
return NULL;

- spin_lock_init(&ws->lock);
if (name)
ws->name = kstrdup(name, GFP_KERNEL);

@@ -105,6 +104,7 @@ void wakeup_source_add(struct wakeup_sou
if (WARN_ON(!ws))
return;

+ spin_lock_init(&ws->lock);
setup_timer(&ws->timer, pm_wakeup_timer_fn, (unsigned long)ws);
ws->active = false;

2012-02-07 01:07:58

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 2/8] PM / Sleep: Do not check wakeup too often in try_to_freeze_tasks()

From: Rafael J. Wysocki <[email protected]>

Use the observation that it is more efficient to check the wakeup
variable once before the loop reporting tasks that were not
frozen in try_to_freeze_tasks() than to do that in every step of that
loop.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
kernel/power/process.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)

Index: linux/kernel/power/process.c
===================================================================
--- linux.orig/kernel/power/process.c
+++ linux/kernel/power/process.c
@@ -98,13 +98,15 @@ static int try_to_freeze_tasks(bool user
elapsed_csecs / 100, elapsed_csecs % 100,
todo - wq_busy, wq_busy);

- read_lock(&tasklist_lock);
- do_each_thread(g, p) {
- if (!wakeup && !freezer_should_skip(p) &&
- p != current && freezing(p) && !frozen(p))
- sched_show_task(p);
- } while_each_thread(g, p);
- read_unlock(&tasklist_lock);
+ if (!wakeup) {
+ read_lock(&tasklist_lock);
+ do_each_thread(g, p) {
+ if (p != current && !freezer_should_skip(p)
+ && freezing(p) && !frozen(p))
+ sched_show_task(p);
+ } while_each_thread(g, p);
+ read_unlock(&tasklist_lock);
+ }
} else {
printk("(elapsed %d.%02d seconds) ", elapsed_csecs / 100,
elapsed_csecs % 100);

2012-02-07 01:09:57

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks"

On Tuesday, February 07, 2012, Rafael J. Wysocki wrote:
> Hi all,
>
> This series tests the theory that the easiest way to sell a once rejected
> feature is to advertise it under a different name.
>
> Well, there actually are two different features, although they are closely
> related to each other. First, patch [6/8] introduces a feature that allows
> the kernel to trigger system suspend (or more generally a transition into
> a sleep state) whenever there are no active wakeup sources (no, they aren't
> called wakelocks). It is called "autosleep" here, but it was called a few
> different names in the past ("opportunistic suspend" was probably the most
> popular one). Second, patch [8/8] introduces "wake locks" that are,
> essentially, wakeup sources which may be created and manipulated by user
> space. Using them user space may control the autosleep feature introduced
> earlier.
>
> This also is a kind of a proof of concept for the people who wanted me to
> show a kernel-based implementation of automatic suspend, so there you go.
> Please note, however, that it is done so that the user space "wake locks"
> interface is compatible with Android in support of its user space. I don't
> really like this interface, but since the Android's user space seems to rely
> on it, I'm fine with using it as is. YMMV.
>
> Let me say a few words about every patch in the series individually.
>
> [1/8] - This really is a bug fix, so it's v3.4 material. Nobody has stepped
> on this bug so far, but it should be fixed anyway.
>
> [2/8] - This is a freezer cleanup, worth doing anyway IMO, so v3.4 material too.
>
> [3/8] - This is something we can do no problem, although completely optional
> without the autosleep feature. Rather necessary with it, though.
>
> [4/8] - This kind of reintroduces my original idea of using a wait queue for
> waiting until there are no wakeup events in progress. Alan convinced me that
> it would be better to poll the counter to prevent wakeup_source_deactivate()
> from having to call wake_up_all() occasionally (that may be costly in fast
> paths), but then quite some people told me that the wait queue migh be
> better. I think that the polling will make much less sense with autosleep
> and user space "wake locks". Anyway, [4/8] is something we can do without
> those things too.
>
> The patches above were given Sign-off-by tags, because I think they make some
> sense regardless of the features introcuded by the remaining patches that in
> turn are total RFC.
>
> [5/8] - This changes wakeup source statistics so that they are more similar to
> the statistics collected for wakelocks on Android. The file those statistics
> may be read from is still located in debugfs, though (I don't think it
> belongs to proc and its name is different from the analogous Android's file
> name anyway). It could be done without autosleep, but then it would be a bit
> pointless. BTW, this changes interfaces that _in_ _theory_ may be used by
> someone, but I'm not aware of anyone using them. If you are one, I'll be
> pleased to learn about that, so please tell me who you are. :-)
>
> [6/8] - Autosleep implementation. I think the changelog explains the idea
> quite well and the code is really nothing special. It doesn't really add
> anything new to the kernel in terms of infrastructure etc., it just uses
> the existing stuff to implement an alternative method of triggering system
> sleep transitions. Note, though, that the interface here is different
> from the Android's one, because Android actually modifies /sys/power/state
> to trigger something called "early suspend" (that is never going to be
> implemented in the "stock" kernel as long as I have any influence on it) and
> we simply can't do that in the mainline.
>
> [7/8] - This adds a wakeup source statistics that only makes sense with
> autosleep and (I believe) is analogous to the Android's prevent_suspend_time
> statistics. Nothing really special, but I didn't want
> wakeup_source_activate/deactivate() to take a common lock to avoid
> congestion.
>
> [8/8] - This adds a user space interface to create, activate and deactivate
> wakeup sources. Since the files it consists of are called wake_lock and
> wake_unlock, to follow Android, the objects the wakeup sources are wrapped
> into are called "wakelocks" (for added confusion). Since the interface
> doesn't provide any means to destroy those "wakelocks", I added a garbage
> collection mechanism to get rid of the unused ones, if any. I also tought
> it might be a good idea to put a limit on the number of those things that
> user space can operate simultaneously, so I did that too.
>
> All in all, it's not as much code as I thought it would be and it seems to be
> relatively simple (which rises the question why the Android people didn't
> even _try_ to do something like this instead of slapping the "real" wakelocks
> onto the kernel FWIW). IMHO it doesn't add anything really new to the kernel,
> except for the user space interfaces that should be maintainable. At least I
> think I should be able to maintain them. :-)
>
> All of the above has been tested very briefly on my test-bed Mackerel board
> and it quite obviously requires more thorough testing, but first I need to know
> if it makes sense to spend any more time on it.
>
> IOW, I need to know your opinions!

Ouch. Sorry for breaking the Greg's address. Please replace it with the
correct one when you reply.

Thanks,
Rafael

2012-02-07 22:30:37

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH 1/8] PM / Sleep: Initialize wakeup source locks in wakeup_source_add()

On Tue, 2012-02-07 at 02:01 +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Initialize wakeup source locks in wakeup_source_add() instead of
> wakeup_source_create(), because otherwise the locks of the wakeup
> sources that haven't been allocated with wakeup_source_create()
> aren't initialized and handled properly.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>

Ah, I've shot myself in the foot before, forgetting to init the wakeup
source, so this should be good. Although, would a WARN_ON be better then
just initializing the lock in add? That way bad behavior is more likely
to be corrected, rather then just ignored.

thanks
-john

2012-02-07 22:37:53

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 1/8] PM / Sleep: Initialize wakeup source locks in wakeup_source_add()

On Tuesday, February 07, 2012, John Stultz wrote:
> On Tue, 2012-02-07 at 02:01 +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > Initialize wakeup source locks in wakeup_source_add() instead of
> > wakeup_source_create(), because otherwise the locks of the wakeup
> > sources that haven't been allocated with wakeup_source_create()
> > aren't initialized and handled properly.
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
>
> Ah, I've shot myself in the foot before, forgetting to init the wakeup
> source, so this should be good. Although, would a WARN_ON be better then
> just initializing the lock in add? That way bad behavior is more likely
> to be corrected, rather then just ignored.

Well, that's not bad behavior, since users are not supposed to open code
wakeup source initialization. _add() is supposed to do the job (that's
why I regard this one as a fix).

Thanks,
Rafael

2012-02-07 22:45:54

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Update][RFC][PATCH 6/8] PM / Sleep: Implement opportunistic sleep

On Tuesday, February 07, 2012, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Introduce a mechanism by which the kernel can trigger global
> transitions to a sleep state chosen by user space if there are no
> active wakeup sources.
>
> It consists of a new sysfs attribute, /sys/power/autosleep, that
> can be written one of the strings returned by reads from
> /sys/power/state, a freezable ordered workqueue and a work item
> carrying out the "suspend" operations. If a string representing
> the system's sleep state is written to /sys/power/autosleep, the
> work item triggering transitions to that state is queued up and
> it requeues it self after every execution until user space writes
> "off" to /sys/power/autosleep. That work item enables the detection
> of wakeup events using the functions already defined in
> drivers/base/power/wakeup.c (with one small modification) and calls
> either pm_suspend(), or hibernate() to put the system into a sleep
> state. If a wakeup event is reported while the transition is in
> progress, it will abort the transition and the "system suspend" work
> item will be queued up again.

OK, so before somebody points that out to me, the completion was redundant
(it was a leftover from one of the previous versions of the patch, sorry
about that).

Moreover, try_to_suspend() is racy with respect to wakeup_count_store()
(in theory, an automatic suspend without checking wakeup sources may happen
if the latter is used carelessly when autosleep is enabled).

Thus below is an updated patch (it requires [8/8] to be updated too because
of the changes in pm_autosleep_set_state(), but that's rather trivial).

Thanks,
Rafael

---
From: Rafael J. Wysocki <[email protected]>
Subject: PM / Sleep: Implement opportunistic sleep

Introduce a mechanism by which the kernel can trigger global
transitions to a sleep state chosen by user space if there are no
active wakeup sources.

It consists of a new sysfs attribute, /sys/power/autosleep, that
can be written one of the strings returned by reads from
/sys/power/state, an ordered workqueue and a work item carrying out
the "suspend" operations. If a string representing the system's
sleep state is written to /sys/power/autosleep, the work item
triggering transitions to that state is queued up and it requeues
itself after every execution until user space writes "off" to
/sys/power/autosleep.

That work item enables the detection of wakeup events using the
functions already defined in drivers/base/power/wakeup.c (with one
small modification) and calls either pm_suspend(), or hibernate() to
put the system into a sleep state. If a wakeup event is reported
while the transition is in progress, it will abort the transition and
the "system suspend" work item will be queued up again.

---
drivers/base/power/wakeup.c | 38 ++++++++------
include/linux/suspend.h | 13 ++++-
kernel/power/Kconfig | 8 +++
kernel/power/Makefile | 1
kernel/power/autosleep.c | 112 ++++++++++++++++++++++++++++++++++++++++++++
kernel/power/main.c | 105 ++++++++++++++++++++++++++++++++++-------
kernel/power/power.h | 18 +++++++
7 files changed, 262 insertions(+), 33 deletions(-)

Index: linux/kernel/power/Makefile
===================================================================
--- linux.orig/kernel/power/Makefile
+++ linux/kernel/power/Makefile
@@ -8,5 +8,6 @@ obj-$(CONFIG_SUSPEND) += suspend.o
obj-$(CONFIG_PM_TEST_SUSPEND) += suspend_test.o
obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o \
block_io.o
+obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o

obj-$(CONFIG_MAGIC_SYSRQ) += poweroff.o
Index: linux/kernel/power/Kconfig
===================================================================
--- linux.orig/kernel/power/Kconfig
+++ linux/kernel/power/Kconfig
@@ -103,6 +103,14 @@ config PM_SLEEP_SMP
select HOTPLUG
select HOTPLUG_CPU

+config PM_AUTOSLEEP
+ bool "Opportunistic sleep"
+ depends on PM_SLEEP
+ default n
+ ---help---
+ Allow the kernel to trigger a system transition into a global sleep
+ state automatically whenever there are no active wakeup sources.
+
config PM_RUNTIME
bool "Run-time PM core functionality"
depends on !IA64_HP_SIM
Index: linux/kernel/power/power.h
===================================================================
--- linux.orig/kernel/power/power.h
+++ linux/kernel/power/power.h
@@ -269,3 +269,21 @@ static inline void suspend_thaw_processe
{
}
#endif
+
+#ifdef CONFIG_PM_AUTOSLEEP
+
+/* kernel/power/autosleep.c */
+extern int pm_autosleep_init(void);
+extern void pm_autosleep_lock(void);
+extern void pm_autosleep_unlock(void);
+extern suspend_state_t pm_autosleep_state(void);
+extern int pm_autosleep_set_state(suspend_state_t state);
+
+#else /* !CONFIG_PM_AUTOSLEEP */
+
+static inline int pm_autosleep_init(void) { return 0; }
+static inline void pm_autosleep_lock(void) {}
+static inline void pm_autosleep_unlock(void) {}
+static inline suspend_state_t pm_autosleep_state(void) { return PM_SUSPEND_ON; }
+
+#endif /* !CONFIG_PM_AUTOSLEEP */
Index: linux/include/linux/suspend.h
===================================================================
--- linux.orig/include/linux/suspend.h
+++ linux/include/linux/suspend.h
@@ -372,7 +372,7 @@ extern int unregister_pm_notifier(struct
extern bool events_check_enabled;

extern bool pm_wakeup_pending(void);
-extern bool pm_get_wakeup_count(unsigned int *count);
+extern bool pm_get_wakeup_count(unsigned int *count, bool block);
extern bool pm_save_wakeup_count(unsigned int count);

static inline void lock_system_sleep(void)
@@ -423,6 +423,17 @@ static inline void unlock_system_sleep(v

#endif /* !CONFIG_PM_SLEEP */

+#ifdef CONFIG_PM_AUTOSLEEP
+
+/* kernel/power/autosleep.c */
+void queue_up_suspend_work(void);
+
+#else /* !CONFIG_PM_AUTOSLEEP */
+
+static inline void queue_up_suspend_work(void) {}
+
+#endif /* !CONFIG_PM_AUTOSLEEP */
+
#ifdef CONFIG_ARCH_SAVE_PAGE_KEYS
/*
* The ARCH_SAVE_PAGE_KEYS functions can be used by an architecture
Index: linux/kernel/power/autosleep.c
===================================================================
--- /dev/null
+++ linux/kernel/power/autosleep.c
@@ -0,0 +1,112 @@
+/*
+ * kernel/power/autosleep.c
+ *
+ * Opportunistic sleep support.
+ *
+ * Copyright (C) 2012 Rafael J. Wysocki <[email protected]>
+ */
+
+#include <linux/device.h>
+#include <linux/mutex.h>
+#include <linux/pm_wakeup.h>
+
+#include "power.h"
+
+static struct workqueue_struct *autosleep_wq;
+static struct wakeup_source *autosleep_ws;
+
+static DEFINE_MUTEX(autosleep_lock);
+
+static suspend_state_t autosleep_state;
+
+static void try_to_suspend(struct work_struct *work)
+{
+ unsigned int initial_count, final_count;
+
+ if (!pm_get_wakeup_count(&initial_count, true))
+ goto out;
+
+ mutex_lock(&autosleep_lock);
+
+ if (!pm_save_wakeup_count(initial_count)) {
+ mutex_unlock(&autosleep_lock);
+ goto out;
+ }
+
+ if (autosleep_state == PM_SUSPEND_ON) {
+ mutex_unlock(&autosleep_lock);
+ return;
+ }
+ if (autosleep_state >= PM_SUSPEND_MAX)
+ hibernate();
+ else
+ pm_suspend(autosleep_state);
+
+ mutex_unlock(&autosleep_lock);
+
+ if (!pm_get_wakeup_count(&final_count, false))
+ goto out;
+
+ if (final_count == initial_count)
+ schedule_timeout(HZ / 2);
+
+ out:
+ queue_up_suspend_work();
+}
+
+static DECLARE_WORK(suspend_work, try_to_suspend);
+
+void queue_up_suspend_work(void)
+{
+ if (!work_pending(&suspend_work) && autosleep_state > PM_SUSPEND_ON)
+ queue_work(autosleep_wq, &suspend_work);
+}
+
+suspend_state_t pm_autosleep_state(void)
+{
+ return autosleep_state;
+}
+
+int pm_autosleep_set_state(suspend_state_t state)
+{
+#ifndef CONFIG_HIBERNATION
+ if (state >= PM_SUSPEND_MAX)
+ return -EINVAL;
+#endif
+ mutex_lock(&autosleep_lock);
+ __pm_stay_awake(autosleep_ws);
+ if (state == PM_SUSPEND_ON && autosleep_state != PM_SUSPEND_ON) {
+ autosleep_state = PM_SUSPEND_ON;
+ __pm_relax(autosleep_ws);
+ } else if (state > PM_SUSPEND_ON) {
+ autosleep_state = state;
+ __pm_relax(autosleep_ws);
+ queue_up_suspend_work();
+ }
+ mutex_unlock(&autosleep_lock);
+ return 0;
+}
+
+void pm_autosleep_lock(void)
+{
+ mutex_lock(&autosleep_lock);
+}
+
+void pm_autosleep_unlock(void)
+{
+ mutex_unlock(&autosleep_lock);
+}
+
+int __init pm_autosleep_init(void)
+{
+ autosleep_ws = wakeup_source_register("main");
+ if (!autosleep_ws)
+ return -ENOMEM;
+
+ autosleep_wq = alloc_ordered_workqueue("autosleep", 0);
+ if (autosleep_wq)
+ return 0;
+
+ wakeup_source_unregister(autosleep_ws);
+ return -ENOMEM;
+}
Index: linux/kernel/power/main.c
===================================================================
--- linux.orig/kernel/power/main.c
+++ linux/kernel/power/main.c
@@ -269,8 +269,7 @@ static ssize_t state_show(struct kobject
return (s - buf);
}

-static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
- const char *buf, size_t n)
+static suspend_state_t decode_state(const char *buf, size_t n)
{
#ifdef CONFIG_SUSPEND
suspend_state_t state = PM_SUSPEND_STANDBY;
@@ -278,29 +277,46 @@ static ssize_t state_store(struct kobjec
#endif
char *p;
int len;
- int error = -EINVAL;

p = memchr(buf, '\n', n);
len = p ? p - buf : n;

- /* First, check if we are requested to hibernate */
- if (len == 4 && !strncmp(buf, "disk", len)) {
- error = hibernate();
- goto Exit;
- }
+ /* Check hibernation first. */
+ if (len == 4 && !strncmp(buf, "disk", len))
+ return PM_SUSPEND_MAX;

#ifdef CONFIG_SUSPEND
for (s = &pm_states[state]; state < PM_SUSPEND_MAX; s++, state++) {
if (*s && len == strlen(*s) && !strncmp(buf, *s, len))
break;
}
- if (state < PM_SUSPEND_MAX && *s) {
- error = enter_state(state);
- suspend_stats_update(error);
- }
+ if (state < PM_SUSPEND_MAX && *s)
+ return state;
#endif

- Exit:
+ return PM_SUSPEND_ON;
+}
+
+static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ suspend_state_t state;
+ int error = -EINVAL;
+
+ pm_autosleep_lock();
+ if (pm_autosleep_state() > PM_SUSPEND_ON) {
+ error = -EBUSY;
+ goto out;
+ }
+
+ state = decode_state(buf, n);
+ if (state < PM_SUSPEND_MAX)
+ error = pm_suspend(state);
+ else if (state > PM_SUSPEND_ON)
+ error = hibernate();
+
+ out:
+ pm_autosleep_unlock();
return error ? error : n;
}

@@ -341,7 +357,8 @@ static ssize_t wakeup_count_show(struct
{
unsigned int val;

- return pm_get_wakeup_count(&val) ? sprintf(buf, "%u\n", val) : -EINTR;
+ return pm_get_wakeup_count(&val, true) ?
+ sprintf(buf, "%u\n", val) : -EINTR;
}

static ssize_t wakeup_count_store(struct kobject *kobj,
@@ -349,15 +366,65 @@ static ssize_t wakeup_count_store(struct
const char *buf, size_t n)
{
unsigned int val;
+ int error = -EINVAL;
+
+ pm_autosleep_lock();
+ if (pm_autosleep_state() > PM_SUSPEND_ON) {
+ error = -EBUSY;
+ goto out;
+ }

if (sscanf(buf, "%u", &val) == 1) {
if (pm_save_wakeup_count(val))
return n;
}
- return -EINVAL;
+
+ out:
+ pm_autosleep_unlock();
+ return error;
}

power_attr(wakeup_count);
+
+#ifdef CONFIG_PM_AUTOSLEEP
+static ssize_t autosleep_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ suspend_state_t state = pm_autosleep_state();
+
+ if (state == PM_SUSPEND_ON)
+ return sprintf(buf, "off\n");
+
+#ifdef CONFIG_SUSPEND
+ if (state < PM_SUSPEND_MAX)
+ return sprintf(buf, "%s\n", valid_state(state) ?
+ pm_states[state] : "error");
+#endif
+#ifdef CONFIG_HIBERNATION
+ return sprintf(buf, "disk\n");
+#else
+ return sprintf(buf, "error");
+#endif
+}
+
+static ssize_t autosleep_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ suspend_state_t state = decode_state(buf, n);
+ int error;
+
+ if (state == PM_SUSPEND_ON && strncmp(buf, "off", 3)
+ && strncmp(buf, "off\n", 4))
+ return -EINVAL;
+
+ error = pm_autosleep_set_state(state);
+ return error ? error : n;
+}
+
+power_attr(autosleep);
+#endif /* CONFIG_PM_AUTOSLEEP */
#endif /* CONFIG_PM_SLEEP */

#ifdef CONFIG_PM_TRACE
@@ -411,6 +478,9 @@ static struct attribute * g[] = {
#ifdef CONFIG_PM_SLEEP
&pm_async_attr.attr,
&wakeup_count_attr.attr,
+#ifdef CONFIG_PM_AUTOSLEEP
+ &autosleep_attr.attr,
+#endif
#ifdef CONFIG_PM_DEBUG
&pm_test_attr.attr,
#endif
@@ -446,7 +516,10 @@ static int __init pm_init(void)
power_kobj = kobject_create_and_add("power", NULL);
if (!power_kobj)
return -ENOMEM;
- return sysfs_create_group(power_kobj, &attr_group);
+ error = sysfs_create_group(power_kobj, &attr_group);
+ if (error)
+ return error;
+ return pm_autosleep_init();
}

core_initcall(pm_init);
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -458,8 +458,10 @@ static void wakeup_source_deactivate(str
atomic_add(MAX_IN_PROGRESS, &combined_event_count);

split_counters(&cnt, &inpr);
- if (!inpr)
+ if (!inpr) {
wake_up_all(&wakeup_count_wait_queue);
+ queue_up_suspend_work();
+ }
}

/**
@@ -610,29 +612,33 @@ bool pm_wakeup_pending(void)
/**
* pm_get_wakeup_count - Read the number of registered wakeup events.
* @count: Address to store the value at.
+ * @block: Whether or not to block.
*
- * Store the number of registered wakeup events at the address in @count. Block
- * if the current number of wakeup events being processed is nonzero.
+ * Store the number of registered wakeup events at the address in @count. If
+ * @block is set, block until the current number of wakeup events being
+ * processed is zero.
*
- * Return 'false' if the wait for the number of wakeup events being processed to
- * drop down to zero has been interrupted by a signal (and the current number
- * of wakeup events being processed is still nonzero). Otherwise return 'true'.
+ * Return 'false' if the current number of wakeup events being processed is
+ * nonzero. Otherwise return 'true'.
*/
-bool pm_get_wakeup_count(unsigned int *count)
+bool pm_get_wakeup_count(unsigned int *count, bool block)
{
unsigned int cnt, inpr;
- DEFINE_WAIT(wait);

- for (;;) {
- prepare_to_wait(&wakeup_count_wait_queue, &wait,
- TASK_INTERRUPTIBLE);
- split_counters(&cnt, &inpr);
- if (inpr == 0 || signal_pending(current))
- break;
+ if (block) {
+ DEFINE_WAIT(wait);

- schedule();
+ for (;;) {
+ prepare_to_wait(&wakeup_count_wait_queue, &wait,
+ TASK_INTERRUPTIBLE);
+ split_counters(&cnt, &inpr);
+ if (inpr == 0 || signal_pending(current))
+ break;
+
+ schedule();
+ }
+ finish_wait(&wakeup_count_wait_queue, &wait);
}
- finish_wait(&wakeup_count_wait_queue, &wait);

split_counters(&cnt, &inpr);
*count = cnt;

2012-02-08 23:11:15

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 4/8] PM / Sleep: Use wait queue to signal "no wakeup events in progress"

On Tue, 7 Feb 2012 02:04:19 +0100 "Rafael J. Wysocki" <[email protected]> wrote:

> From: Rafael J. Wysocki <[email protected]>
>
> The current wakeup source deactivation code doesn't do anything when
> the counter of wakeup events in progress goes down to zero, which
> requires pm_get_wakeup_count() to poll that counter periodically.
> Although this reduces the average time it takes to deactivate a
> wakeup source, it also may lead to a substantial amount of unnecessary
> polling if there are extended periods of wakeup activity. Thus it
> seems reasonable to use a wait queue for signaling the "no wakeup
> events in progress" condition and remove the polling.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> drivers/base/power/wakeup.c | 18 ++++++++++++++----
> 1 file changed, 14 insertions(+), 4 deletions(-)
>
> Index: linux/drivers/base/power/wakeup.c
> ===================================================================
> --- linux.orig/drivers/base/power/wakeup.c
> +++ linux/drivers/base/power/wakeup.c
> @@ -17,8 +17,6 @@
>
> #include "power.h"
>
> -#define TIMEOUT 100
> -
> /*
> * If set, the suspend/hibernate code will abort transitions to a sleep state
> * if wakeup events are registered during or immediately before the transition.
> @@ -52,6 +50,8 @@ static void pm_wakeup_timer_fn(unsigned
>
> static LIST_HEAD(wakeup_sources);
>
> +static DECLARE_WAIT_QUEUE_HEAD(wakeup_count_wait_queue);
> +
> /**
> * wakeup_source_create - Create a struct wakeup_source object.
> * @name: Name of the new wakeup source.
> @@ -84,7 +84,7 @@ void wakeup_source_destroy(struct wakeup
> while (ws->active) {
> spin_unlock_irq(&ws->lock);
>
> - schedule_timeout_interruptible(msecs_to_jiffies(TIMEOUT));
> + schedule_timeout_interruptible(msecs_to_jiffies(100));
>
> spin_lock_irq(&ws->lock);
> }
> @@ -411,6 +411,7 @@ EXPORT_SYMBOL_GPL(pm_stay_awake);
> */
> static void wakeup_source_deactivate(struct wakeup_source *ws)
> {
> + unsigned int cnt, inpr;
> ktime_t duration;
> ktime_t now;
>
> @@ -444,6 +445,10 @@ static void wakeup_source_deactivate(str
> * couter of wakeup events in progress simultaneously.
> */
> atomic_add(MAX_IN_PROGRESS, &combined_event_count);
> +
> + split_counters(&cnt, &inpr);
> + if (!inpr)
> + wake_up_all(&wakeup_count_wait_queue);
> }

Would it be worth making this:

if (!inpr && waitqueue_active(&wakeup_count_wait_queue))
wake_up_all(&wakeup_count_wait_queue);

??
It would often save a spinlock.

Also was there a reason you used wake_up_all(). That is only really needed
were EXCLUSIVE waits are happening, and there aren't any of those.

Thanks,
NeilBrown


>
> /**
> @@ -624,14 +629,19 @@ bool pm_wakeup_pending(void)
> bool pm_get_wakeup_count(unsigned int *count)
> {
> unsigned int cnt, inpr;
> + DEFINE_WAIT(wait);
>
> for (;;) {
> + prepare_to_wait(&wakeup_count_wait_queue, &wait,
> + TASK_INTERRUPTIBLE);
> split_counters(&cnt, &inpr);
> if (inpr == 0 || signal_pending(current))
> break;
> pm_wakeup_update_hit_counts();
> - schedule_timeout_interruptible(msecs_to_jiffies(TIMEOUT));
> +
> + schedule();
> }
> + finish_wait(&wakeup_count_wait_queue, &wait);
>
> split_counters(&cnt, &inpr);
> *count = cnt;


Attachments:
signature.asc (828.00 B)

2012-02-08 23:57:53

by NeilBrown

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks"

On Tue, 7 Feb 2012 02:00:55 +0100 "Rafael J. Wysocki" <[email protected]> wrote:


> All in all, it's not as much code as I thought it would be and it seems to be
> relatively simple (which rises the question why the Android people didn't
> even _try_ to do something like this instead of slapping the "real" wakelocks
> onto the kernel FWIW). IMHO it doesn't add anything really new to the kernel,
> except for the user space interfaces that should be maintainable. At least I
> think I should be able to maintain them. :-)
>
> All of the above has been tested very briefly on my test-bed Mackerel board
> and it quite obviously requires more thorough testing, but first I need to know
> if it makes sense to spend any more time on it.
>
> IOW, I need to know your opinions!

I've got opinions!!!

I'll try to avoid the obvious bike-shedding about interface design...

The key point I want to make is that doing this in the kernel has one very
import difference to doing it in userspace (which, as you know, I prefer)
which may not be obvious to everyone at first sight. So I will try to make it
apparent.

In the user-space solution that we have previously discussed, it is only
necessary for the kernel to hold a wakeup_source active until the event is
*visible* to user-space. So a low level driver can queue e.g. an input event
and then deactivate their wakeup_source. The event can remain in the input
queue without any wakeup_source being active and there is no risk of going to
sleep inappropriately.
This is because - in the user-space approach - user-space must effectively
poll every source of interesting wakeup events between the last wakeup_source
being deactivate and the next attempt to suspend. This poll will notice the
event sitting in a queue so that a well-written user-space will not go to
sleep but will read the event.
(Note that this 'poll-of-every-device' need not be expensive. It can be a
single 'poll' or 'select' or even 'read' on a pollfd).

In the kernel based approach that you have presented this is not the case.
As the kernel will initiate suspend the moment the last wakeup_source is
released (with no polling of other queues), there must be an unbroken chain of
wakeup_sources from the initial interrupt all the way up to the user.
In particular, any subsystem (such as 'input') must hold a wakeup_source
active as long as any designated 'wakeup event' is in any of its queues.
This means that the subsystem must be able to differentiate wakeup events
from non-wakeup events.
This might be easy (maybe "all events are wakeup events" or "all events on
this queue are wakeup events") but it is not obvious to me that that is the
case.

To summarise: for this solution to be effective it also requires that
1/ every subsystem that carries wakeup events must know about wakeup_sources
and must activate/deactivate them as events are queued/dequeued.
2/ these subsystems must be able to differentiate between wakeup events and
non-wakeup events, and this must be a configurable decision.

Currently, understanding wakeup events is restricted to:
- drivers that are capable of configuring wakeup
- user-space which cares about wakeup
The proposed solution adds:
- intermediate subsystems which might queue wakeup events

I think that is a significant addition to make and not one to be made
lightly. It might end up adding more code than you thought it would be :-)

Thanks for the opportunity to comment,
NeilBrown


Attachments:
signature.asc (828.00 B)

2012-02-09 00:02:05

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 4/8] PM / Sleep: Use wait queue to signal "no wakeup events in progress"

On Thursday, February 09, 2012, NeilBrown wrote:
> On Tue, 7 Feb 2012 02:04:19 +0100 "Rafael J. Wysocki" <[email protected]> wrote:
>
> > From: Rafael J. Wysocki <[email protected]>
> >
> > The current wakeup source deactivation code doesn't do anything when
> > the counter of wakeup events in progress goes down to zero, which
> > requires pm_get_wakeup_count() to poll that counter periodically.
> > Although this reduces the average time it takes to deactivate a
> > wakeup source, it also may lead to a substantial amount of unnecessary
> > polling if there are extended periods of wakeup activity. Thus it
> > seems reasonable to use a wait queue for signaling the "no wakeup
> > events in progress" condition and remove the polling.
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---
> > drivers/base/power/wakeup.c | 18 ++++++++++++++----
> > 1 file changed, 14 insertions(+), 4 deletions(-)
> >
> > Index: linux/drivers/base/power/wakeup.c
> > ===================================================================
> > --- linux.orig/drivers/base/power/wakeup.c
> > +++ linux/drivers/base/power/wakeup.c
> > @@ -17,8 +17,6 @@
> >
> > #include "power.h"
> >
> > -#define TIMEOUT 100
> > -
> > /*
> > * If set, the suspend/hibernate code will abort transitions to a sleep state
> > * if wakeup events are registered during or immediately before the transition.
> > @@ -52,6 +50,8 @@ static void pm_wakeup_timer_fn(unsigned
> >
> > static LIST_HEAD(wakeup_sources);
> >
> > +static DECLARE_WAIT_QUEUE_HEAD(wakeup_count_wait_queue);
> > +
> > /**
> > * wakeup_source_create - Create a struct wakeup_source object.
> > * @name: Name of the new wakeup source.
> > @@ -84,7 +84,7 @@ void wakeup_source_destroy(struct wakeup
> > while (ws->active) {
> > spin_unlock_irq(&ws->lock);
> >
> > - schedule_timeout_interruptible(msecs_to_jiffies(TIMEOUT));
> > + schedule_timeout_interruptible(msecs_to_jiffies(100));
> >
> > spin_lock_irq(&ws->lock);
> > }
> > @@ -411,6 +411,7 @@ EXPORT_SYMBOL_GPL(pm_stay_awake);
> > */
> > static void wakeup_source_deactivate(struct wakeup_source *ws)
> > {
> > + unsigned int cnt, inpr;
> > ktime_t duration;
> > ktime_t now;
> >
> > @@ -444,6 +445,10 @@ static void wakeup_source_deactivate(str
> > * couter of wakeup events in progress simultaneously.
> > */
> > atomic_add(MAX_IN_PROGRESS, &combined_event_count);
> > +
> > + split_counters(&cnt, &inpr);
> > + if (!inpr)
> > + wake_up_all(&wakeup_count_wait_queue);
> > }
>
> Would it be worth making this:
>
> if (!inpr && waitqueue_active(&wakeup_count_wait_queue))
> wake_up_all(&wakeup_count_wait_queue);
>
> ??
> It would often save a spinlock.

Yes, good point. :-)

> Also was there a reason you used wake_up_all(). That is only really needed
> were EXCLUSIVE waits are happening, and there aren't any of those.

Right, I think wake_up() should be fine too.

Thanks,
Rafael

2012-02-10 00:40:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks"

Hi,

On Thursday, February 09, 2012, NeilBrown wrote:
> On Tue, 7 Feb 2012 02:00:55 +0100 "Rafael J. Wysocki" <[email protected]> wrote:
>
>
> > All in all, it's not as much code as I thought it would be and it seems to be
> > relatively simple (which rises the question why the Android people didn't
> > even _try_ to do something like this instead of slapping the "real" wakelocks
> > onto the kernel FWIW). IMHO it doesn't add anything really new to the kernel,
> > except for the user space interfaces that should be maintainable. At least I
> > think I should be able to maintain them. :-)
> >
> > All of the above has been tested very briefly on my test-bed Mackerel board
> > and it quite obviously requires more thorough testing, but first I need to know
> > if it makes sense to spend any more time on it.
> >
> > IOW, I need to know your opinions!
>
> I've got opinions!!!

Good! :-)

It seems that no one else has.

> I'll try to avoid the obvious bike-shedding about interface design...
>
> The key point I want to make is that doing this in the kernel has one very
> import difference to doing it in userspace (which, as you know, I prefer)
> which may not be obvious to everyone at first sight. So I will try to make it
> apparent.
>
> In the user-space solution that we have previously discussed, it is only
> necessary for the kernel to hold a wakeup_source active until the event is
> *visible* to user-space. So a low level driver can queue e.g. an input event
> and then deactivate their wakeup_source. The event can remain in the input
> queue without any wakeup_source being active and there is no risk of going to
> sleep inappropriately.
> This is because - in the user-space approach - user-space must effectively
> poll every source of interesting wakeup events between the last wakeup_source
> being deactivate and the next attempt to suspend. This poll will notice the
> event sitting in a queue so that a well-written user-space will not go to
> sleep but will read the event.
> (Note that this 'poll-of-every-device' need not be expensive. It can be a
> single 'poll' or 'select' or even 'read' on a pollfd).

So I see one little problem with that, which is that you'd need to teach user
space developers what to do an how to do that correctly.

Also, when you say "user space", it isn't exactly clear whether you mean a
power manager (that would carry out the attmepts to suspend) or applications
(that would need to communicate with the power manager to let it know what
they are doing). This is important, because in general, before deactivating
a wakeup source the kernel subsystem should know that the associated event
has become visible not only to the "polling" application, but also (perhaps
indirectly) to the power manager, so that it doesn't trigger suspend too
early.

> In the kernel based approach that you have presented this is not the case.
> As the kernel will initiate suspend the moment the last wakeup_source is
> released (with no polling of other queues), there must be an unbroken chain of
> wakeup_sources from the initial interrupt all the way up to the user.
> In particular, any subsystem (such as 'input') must hold a wakeup_source
> active as long as any designated 'wakeup event' is in any of its queues.
> This means that the subsystem must be able to differentiate wakeup events
> from non-wakeup events.
> This might be easy (maybe "all events are wakeup events" or "all events on
> this queue are wakeup events") but it is not obvious to me that that is the
> case.
>
> To summarise: for this solution to be effective it also requires that
> 1/ every subsystem that carries wakeup events must know about wakeup_sources
> and must activate/deactivate them as events are queued/dequeued.
> 2/ these subsystems must be able to differentiate between wakeup events and
> non-wakeup events, and this must be a configurable decision.
>
> Currently, understanding wakeup events is restricted to:
> - drivers that are capable of configuring wakeup
> - user-space which cares about wakeup
> The proposed solution adds:
> - intermediate subsystems which might queue wakeup events
>
> I think that is a significant addition to make and not one to be made
> lightly. It might end up adding more code than you thought it would be :-)

I'm aware of that and I expect people to come up with patches adding the
handling of wakeup events to a number of subsystems (this is kind of needed
regardless of autosleep if we want to be sure that user space has actually
consumed events we want it to take from us before suspending). However,
I'm not expecting that to be a lot of code (I think we both can only speculate
about that at this point) and those subsystems have maintainers and the
decision whether or not to take that code is theirs.

That may be a long process, but at least we can see from Android what's
needed and where.

Still, the point here is to give people something to start with so that they
can take the Android user space, test it against the mainline and see what
doesn't work and why and come up with fixes. Perhaps they will have better
ideas than we think right now, but surely nothing more is going to happen
without this starting point.

I'd like us and Android to use the same low-level data structures for power
management and the same API eventually, at least for drivers. This is not
the case at the moment and it's actively hurting us as a project quite a bit.
If Android needs to add patches on top of whatever we have to get the desired
functionality, I'm fine with that, as long as they don't require drivers to use
APIs that are incompatible with the mainline. Insisting that Android should
use a user-space-based autosleep implementation wouldn't help at all, because
realistically this isn't going to happen.

> Thanks for the opportunity to comment,

No need to thank for that, it's Open Source after all ...

Thanks,
Rafael

2012-02-12 01:20:05

by mark gross

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks"

On Tue, Feb 07, 2012 at 02:00:55AM +0100, Rafael J. Wysocki wrote:
> Hi all,
>
> This series tests the theory that the easiest way to sell a once rejected
> feature is to advertise it under a different name.
>
> Well, there actually are two different features, although they are closely
> related to each other. First, patch [6/8] introduces a feature that allows
> the kernel to trigger system suspend (or more generally a transition into
> a sleep state) whenever there are no active wakeup sources (no, they aren't
> called wakelocks). It is called "autosleep" here, but it was called a few
> different names in the past ("opportunistic suspend" was probably the most
> popular one). Second, patch [8/8] introduces "wake locks" that are,
> essentially, wakeup sources which may be created and manipulated by user
> space. Using them user space may control the autosleep feature introduced
> earlier.
>
> This also is a kind of a proof of concept for the people who wanted me to
> show a kernel-based implementation of automatic suspend, so there you go.
> Please note, however, that it is done so that the user space "wake locks"
> interface is compatible with Android in support of its user space. I don't
> really like this interface, but since the Android's user space seems to rely
> on it, I'm fine with using it as is. YMMV.
>
> Let me say a few words about every patch in the series individually.
>
> [1/8] - This really is a bug fix, so it's v3.4 material. Nobody has stepped
> on this bug so far, but it should be fixed anyway.
>
> [2/8] - This is a freezer cleanup, worth doing anyway IMO, so v3.4 material too.
>
> [3/8] - This is something we can do no problem, although completely optional
> without the autosleep feature. Rather necessary with it, though.
>
> [4/8] - This kind of reintroduces my original idea of using a wait queue for
> waiting until there are no wakeup events in progress. Alan convinced me that
> it would be better to poll the counter to prevent wakeup_source_deactivate()
> from having to call wake_up_all() occasionally (that may be costly in fast
> paths), but then quite some people told me that the wait queue migh be
> better. I think that the polling will make much less sense with autosleep
> and user space "wake locks". Anyway, [4/8] is something we can do without
> those things too.
>
> The patches above were given Sign-off-by tags, because I think they make some
> sense regardless of the features introcuded by the remaining patches that in
> turn are total RFC.
>
> [5/8] - This changes wakeup source statistics so that they are more similar to
> the statistics collected for wakelocks on Android. The file those statistics
> may be read from is still located in debugfs, though (I don't think it
> belongs to proc and its name is different from the analogous Android's file
> name anyway). It could be done without autosleep, but then it would be a bit
> pointless. BTW, this changes interfaces that _in_ _theory_ may be used by
> someone, but I'm not aware of anyone using them. If you are one, I'll be
> pleased to learn about that, so please tell me who you are. :-)
>
> [6/8] - Autosleep implementation. I think the changelog explains the idea
> quite well and the code is really nothing special. It doesn't really add
> anything new to the kernel in terms of infrastructure etc., it just uses
> the existing stuff to implement an alternative method of triggering system
> sleep transitions. Note, though, that the interface here is different
> from the Android's one, because Android actually modifies /sys/power/state
> to trigger something called "early suspend" (that is never going to be
> implemented in the "stock" kernel as long as I have any influence on it) and
> we simply can't do that in the mainline.
dude early suspend is the hallmark of enlightend coding for implementing
a kernel / user mode handshake to user mode when the display is turned
off. How can you not like that shit?

>
> [7/8] - This adds a wakeup source statistics that only makes sense with
> autosleep and (I believe) is analogous to the Android's prevent_suspend_time
> statistics. Nothing really special, but I didn't want
> wakeup_source_activate/deactivate() to take a common lock to avoid
> congestion.
>
> [8/8] - This adds a user space interface to create, activate and deactivate
> wakeup sources. Since the files it consists of are called wake_lock and
> wake_unlock, to follow Android, the objects the wakeup sources are wrapped
> into are called "wakelocks" (for added confusion). Since the interface
> doesn't provide any means to destroy those "wakelocks", I added a garbage
> collection mechanism to get rid of the unused ones, if any. I also tought
> it might be a good idea to put a limit on the number of those things that
> user space can operate simultaneously, so I did that too.
>
> All in all, it's not as much code as I thought it would be and it seems to be
> relatively simple (which rises the question why the Android people didn't
> even _try_ to do something like this instead of slapping the "real" wakelocks
> onto the kernel FWIW). IMHO it doesn't add anything really new to the kernel,
> except for the user space interfaces that should be maintainable. At least I
> think I should be able to maintain them. :-)
>
> All of the above has been tested very briefly on my test-bed Mackerel board
> and it quite obviously requires more thorough testing, but first I need to know
> if it makes sense to spend any more time on it.
>
> IOW, I need to know your opinions!
my opinion is "sigh".

FWIW we need to bring Android wakelocks into the main line so we can fix
them WRT wake event notification handling. But, I'll have to take a
look at the patches to see if I still have heart burn over the race
between wake sources and wake lock dropping in kernel mode.

/me goes and looks now....

--mark

>
> Thanks,
> Rafael
>

2012-02-12 01:27:34

by mark gross

[permalink] [raw]
Subject: Re: [PATCH 4/8] PM / Sleep: Use wait queue to signal "no wakeup events in progress"

On Tue, Feb 07, 2012 at 02:04:19AM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> The current wakeup source deactivation code doesn't do anything when
> the counter of wakeup events in progress goes down to zero, which
> requires pm_get_wakeup_count() to poll that counter periodically.
> Although this reduces the average time it takes to deactivate a
> wakeup source, it also may lead to a substantial amount of unnecessary
> polling if there are extended periods of wakeup activity. Thus it
> seems reasonable to use a wait queue for signaling the "no wakeup
> events in progress" condition and remove the polling.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> drivers/base/power/wakeup.c | 18 ++++++++++++++----
> 1 file changed, 14 insertions(+), 4 deletions(-)
>
> Index: linux/drivers/base/power/wakeup.c
> ===================================================================
> --- linux.orig/drivers/base/power/wakeup.c
> +++ linux/drivers/base/power/wakeup.c
> @@ -17,8 +17,6 @@
>
> #include "power.h"
>
> -#define TIMEOUT 100
> -
> /*
> * If set, the suspend/hibernate code will abort transitions to a sleep state
> * if wakeup events are registered during or immediately before the transition.
> @@ -52,6 +50,8 @@ static void pm_wakeup_timer_fn(unsigned
>
> static LIST_HEAD(wakeup_sources);
>
> +static DECLARE_WAIT_QUEUE_HEAD(wakeup_count_wait_queue);
> +
> /**
> * wakeup_source_create - Create a struct wakeup_source object.
> * @name: Name of the new wakeup source.
> @@ -84,7 +84,7 @@ void wakeup_source_destroy(struct wakeup
> while (ws->active) {
> spin_unlock_irq(&ws->lock);
>
> - schedule_timeout_interruptible(msecs_to_jiffies(TIMEOUT));
> + schedule_timeout_interruptible(msecs_to_jiffies(100));
Nit/ style comment: how is replacing a TIMEOUT macro with a magic number
an improvement. (maybe timeout is a un-helpful name but 100 isn't any
better. )
>
> spin_lock_irq(&ws->lock);
> }
> @@ -411,6 +411,7 @@ EXPORT_SYMBOL_GPL(pm_stay_awake);
> */
> static void wakeup_source_deactivate(struct wakeup_source *ws)
> {
> + unsigned int cnt, inpr;
> ktime_t duration;
> ktime_t now;
>
> @@ -444,6 +445,10 @@ static void wakeup_source_deactivate(str
> * couter of wakeup events in progress simultaneously.
> */
> atomic_add(MAX_IN_PROGRESS, &combined_event_count);
> +
> + split_counters(&cnt, &inpr);
> + if (!inpr)
> + wake_up_all(&wakeup_count_wait_queue);
> }
>
> /**
> @@ -624,14 +629,19 @@ bool pm_wakeup_pending(void)
> bool pm_get_wakeup_count(unsigned int *count)
> {
> unsigned int cnt, inpr;
> + DEFINE_WAIT(wait);
>
> for (;;) {
> + prepare_to_wait(&wakeup_count_wait_queue, &wait,
> + TASK_INTERRUPTIBLE);
> split_counters(&cnt, &inpr);
> if (inpr == 0 || signal_pending(current))
> break;
> pm_wakeup_update_hit_counts();
> - schedule_timeout_interruptible(msecs_to_jiffies(TIMEOUT));
> +
> + schedule();
> }
> + finish_wait(&wakeup_count_wait_queue, &wait);
>
> split_counters(&cnt, &inpr);
> *count = cnt;
>

2012-02-12 01:54:36

by mark gross

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks"

On Thu, Feb 09, 2012 at 10:57:36AM +1100, NeilBrown wrote:
> On Tue, 7 Feb 2012 02:00:55 +0100 "Rafael J. Wysocki" <[email protected]> wrote:
>
>
> > All in all, it's not as much code as I thought it would be and it seems to be
> > relatively simple (which rises the question why the Android people didn't
> > even _try_ to do something like this instead of slapping the "real" wakelocks
> > onto the kernel FWIW). IMHO it doesn't add anything really new to the kernel,
> > except for the user space interfaces that should be maintainable. At least I
> > think I should be able to maintain them. :-)
> >
> > All of the above has been tested very briefly on my test-bed Mackerel board
> > and it quite obviously requires more thorough testing, but first I need to know
> > if it makes sense to spend any more time on it.
> >
> > IOW, I need to know your opinions!
>
> I've got opinions!!!
>
> I'll try to avoid the obvious bike-shedding about interface design...
>
> The key point I want to make is that doing this in the kernel has one very
> import difference to doing it in userspace (which, as you know, I prefer)
> which may not be obvious to everyone at first sight. So I will try to make it
> apparent.
>
> In the user-space solution that we have previously discussed, it is only
> necessary for the kernel to hold a wakeup_source active until the event is
> *visible* to user-space. So a low level driver can queue e.g. an input event
> and then deactivate their wakeup_source. The event can remain in the input
> queue without any wakeup_source being active and there is no risk of going to
> sleep inappropriately.
> This is because - in the user-space approach - user-space must effectively
> poll every source of interesting wakeup events between the last wakeup_source
> being deactivate and the next attempt to suspend. This poll will notice the
> event sitting in a queue so that a well-written user-space will not go to
> sleep but will read the event.
<sarcasm>
its on running on 100's of millions of devices today... It must be well
written. Right?
</sarcasm>

> single 'poll' or 'select' or even 'read' on a pollfd).
>
> In the kernel based approach that you have presented this is not the case.
> As the kernel will initiate suspend the moment the last wakeup_source is
> released (with no polling of other queues), there must be an unbroken chain of
> wakeup_sources from the initial interrupt all the way up to the user.
> In particular, any subsystem (such as 'input') must hold a wakeup_source
> active as long as any designated 'wakeup event' is in any of its queues.
> This means that the subsystem must be able to differentiate wakeup events
> from non-wakeup events.
> This might be easy (maybe "all events are wakeup events" or "all events on
> this queue are wakeup events") but it is not obvious to me that that is the
> case.
>
And this brings us to a wake acknowledgement of wake events from user
mode before re-suspending type of design.


> To summarise: for this solution to be effective it also requires that
> 1/ every subsystem that carries wakeup events must know about wakeup_sources
> and must activate/deactivate them as events are queued/dequeued.
> 2/ these subsystems must be able to differentiate between wakeup events and
> non-wakeup events, and this must be a configurable decision.
>
> Currently, understanding wakeup events is restricted to:
> - drivers that are capable of configuring wakeup
> - user-space which cares about wakeup
> The proposed solution adds:
> - intermediate subsystems which might queue wakeup events
>
> I think that is a significant addition to make and not one to be made
> lightly. It might end up adding more code than you thought it would be :-)
you mean wake lock-itis sprinkling time out wake locks all over the
place?

--mark

> Thanks for the opportunity to comment,
> NeilBrown

2012-02-12 02:05:13

by mark gross

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks"

On Fri, Feb 10, 2012 at 01:44:10AM +0100, Rafael J. Wysocki wrote:
> Hi,
>
> On Thursday, February 09, 2012, NeilBrown wrote:
> > On Tue, 7 Feb 2012 02:00:55 +0100 "Rafael J. Wysocki" <[email protected]> wrote:
> >
> >
> > > All in all, it's not as much code as I thought it would be and it seems to be
> > > relatively simple (which rises the question why the Android people didn't
> > > even _try_ to do something like this instead of slapping the "real" wakelocks
> > > onto the kernel FWIW). IMHO it doesn't add anything really new to the kernel,
> > > except for the user space interfaces that should be maintainable. At least I
> > > think I should be able to maintain them. :-)
> > >
> > > All of the above has been tested very briefly on my test-bed Mackerel board
> > > and it quite obviously requires more thorough testing, but first I need to know
> > > if it makes sense to spend any more time on it.
> > >
> > > IOW, I need to know your opinions!
> >
> > I've got opinions!!!
>
> Good! :-)
>
> It seems that no one else has.
I'm sorry I've been really bad this last year about my email latency.

> > I'll try to avoid the obvious bike-shedding about interface design...
> >
> > The key point I want to make is that doing this in the kernel has one very
> > import difference to doing it in userspace (which, as you know, I prefer)
> > which may not be obvious to everyone at first sight. So I will try to make it
> > apparent.
> >
> > In the user-space solution that we have previously discussed, it is only
> > necessary for the kernel to hold a wakeup_source active until the event is
> > *visible* to user-space. So a low level driver can queue e.g. an input event
> > and then deactivate their wakeup_source. The event can remain in the input
> > queue without any wakeup_source being active and there is no risk of going to
> > sleep inappropriately.
> > This is because - in the user-space approach - user-space must effectively
> > poll every source of interesting wakeup events between the last wakeup_source
> > being deactivate and the next attempt to suspend. This poll will notice the
> > event sitting in a queue so that a well-written user-space will not go to
> > sleep but will read the event.
> > (Note that this 'poll-of-every-device' need not be expensive. It can be a
> > single 'poll' or 'select' or even 'read' on a pollfd).
>
> So I see one little problem with that, which is that you'd need to teach user
> space developers what to do an how to do that correctly.
>
> Also, when you say "user space", it isn't exactly clear whether you mean a
> power manager (that would carry out the attmepts to suspend) or applications
> (that would need to communicate with the power manager to let it know what
> they are doing). This is important, because in general, before deactivating
> a wakeup source the kernel subsystem should know that the associated event
> has become visible not only to the "polling" application, but also (perhaps
> indirectly) to the power manager, so that it doesn't trigger suspend too
> early.

yup, an explicit user mode acknowledgment of the wake event would be
appropriate.

> > In the kernel based approach that you have presented this is not the case.
> > As the kernel will initiate suspend the moment the last wakeup_source is
> > released (with no polling of other queues), there must be an unbroken chain of
> > wakeup_sources from the initial interrupt all the way up to the user.
> > In particular, any subsystem (such as 'input') must hold a wakeup_source
> > active as long as any designated 'wakeup event' is in any of its queues.
> > This means that the subsystem must be able to differentiate wakeup events
> > from non-wakeup events.
> > This might be easy (maybe "all events are wakeup events" or "all events on
> > this queue are wakeup events") but it is not obvious to me that that is the
> > case.
> >
> > To summarise: for this solution to be effective it also requires that
> > 1/ every subsystem that carries wakeup events must know about wakeup_sources
> > and must activate/deactivate them as events are queued/dequeued.
> > 2/ these subsystems must be able to differentiate between wakeup events and
> > non-wakeup events, and this must be a configurable decision.
> >
> > Currently, understanding wakeup events is restricted to:
> > - drivers that are capable of configuring wakeup
> > - user-space which cares about wakeup
> > The proposed solution adds:
> > - intermediate subsystems which might queue wakeup events
> >
> > I think that is a significant addition to make and not one to be made
> > lightly. It might end up adding more code than you thought it would be :-)
>
> I'm aware of that and I expect people to come up with patches adding the
> handling of wakeup events to a number of subsystems (this is kind of needed
> regardless of autosleep if we want to be sure that user space has actually
> consumed events we want it to take from us before suspending). However,
> I'm not expecting that to be a lot of code (I think we both can only speculate
> about that at this point) and those subsystems have maintainers and the
> decision whether or not to take that code is theirs.
>
> That may be a long process, but at least we can see from Android what's
> needed and where.
>
> Still, the point here is to give people something to start with so that they
> can take the Android user space, test it against the mainline and see what
> doesn't work and why and come up with fixes. Perhaps they will have better
> ideas than we think right now, but surely nothing more is going to happen
> without this starting point.
>
> I'd like us and Android to use the same low-level data structures for power
> management and the same API eventually, at least for drivers. This is not
> the case at the moment and it's actively hurting us as a project quite a bit.
> If Android needs to add patches on top of whatever we have to get the desired
> functionality, I'm fine with that, as long as they don't require drivers to use
> APIs that are incompatible with the mainline. Insisting that Android should
> use a user-space-based autosleep implementation wouldn't help at all, because
> realistically this isn't going to happen.

why not? I don't think having the PMS explicitly acknowledge a wake
event is a big ask at all.

--mark

> > Thanks for the opportunity to comment,
>
> No need to thank for that, it's Open Source after all ...
>
> Thanks,
> Rafael

2012-02-12 21:28:27

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks"

On Sunday, February 12, 2012, mark gross wrote:
> On Fri, Feb 10, 2012 at 01:44:10AM +0100, Rafael J. Wysocki wrote:
[...]
> > I'd like us and Android to use the same low-level data structures for power
> > management and the same API eventually, at least for drivers. This is not
> > the case at the moment and it's actively hurting us as a project quite a bit.
> > If Android needs to add patches on top of whatever we have to get the desired
> > functionality, I'm fine with that, as long as they don't require drivers to use
> > APIs that are incompatible with the mainline. Insisting that Android should
> > use a user-space-based autosleep implementation wouldn't help at all, because
> > realistically this isn't going to happen.
>
> why not? I don't think having the PMS explicitly acknowledge a wake
> event is a big ask at all.

I'd like to hear what the Android people think about that, but somehow it seems
to me they won't like it. :-)

Thanks,
Rafael

2012-02-14 00:11:30

by Arve Hjønnevåg

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks"

On Sun, Feb 12, 2012 at 1:32 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Sunday, February 12, 2012, mark gross wrote:
>> On Fri, Feb 10, 2012 at 01:44:10AM +0100, Rafael J. Wysocki wrote:
> [...]
>> > I'd like us and Android to use the same low-level data structures for power
>> > management and the same API eventually, at least for drivers. ?This is not
>> > the case at the moment and it's actively hurting us as a project quite a bit.
>> > If Android needs to add patches on top of whatever we have to get the desired
>> > functionality, I'm fine with that, as long as they don't require drivers to use
>> > APIs that are incompatible with the mainline. ?Insisting that Android should
>> > use a user-space-based autosleep implementation wouldn't help at all, because
>> > realistically this isn't going to happen.
>>
>> why not? ?I don't think having the PMS explicitly acknowledge a wake
>> event is a big ask at all.
>
> I'd like to hear what the Android people think about that, but somehow it seems
> to me they won't like it. :-)
>

Correct.

The android power manager service does not handle wake events and
therefore does not know when it is safe to acknowledge a wake event
(assuming this acknowledgement re-triggers suspend). Other components
handle the event and only notify the power manager if the event should
change a state (e.g. turn the screen on). Some wake events, like the
alarm used for battery monitoring, don't signal user space at all if
the user visible state did not change. Other wake events are processed
by lower level user-space services than the system-server where the
power manager runs.

--
Arve Hj?nnev?g

2012-02-14 02:07:30

by Arve Hjønnevåg

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks"

On Mon, Feb 6, 2012 at 5:00 PM, Rafael J. Wysocki <[email protected]> wrote:
...
> All in all, it's not as much code as I thought it would be and it seems to be
> relatively simple (which rises the question why the Android people didn't
> even _try_ to do something like this instead of slapping the "real" wakelocks
> onto the kernel FWIW). ?IMHO it doesn't add anything really new to the kernel,
> except for the user space interfaces that should be maintainable. ?At least I
> think I should be able to maintain them. :-)
>

Replacing a working solution with an untested one takes time. That
said, I have recently tried replacing all our kernel wake-locks with a
thin wrapper around wake-sources. This appears to mostly work, but the
wake-source timeout feature has some bugs or incompatible apis. An
init api would also be useful for embedding wake-sources in other data
structures without adding another memory allocation. Your patch to
move the spinlock init to wakeup_source_add still require the struct
to be zero initialized and the name set manually.

I needed to use two wake-sources per wake-lock since calling
__pm_stay_awake after __pm_wakeup_event on a wake-source does not
cancel the timeout. Unless there is a reason to keep this behavior I
would like __pm_stay_awake to cancel any active timeout.

Destroying a wake-source also has some problems. If you call
wakeup_source_destroy it will spin forever if the wake-source is
active without a timeout. And, if you call __pm_relax then
wakeup_source_destroy it could free the wake-source memory while the
timer function is still running. It also looks as if the wake_source
can be immediately deactivated if you call __pm_wakeup_event at the
same time as the previous timeout expired.

--
Arve Hj?nnev?g

2012-02-14 23:18:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks"

On Tuesday, February 14, 2012, Arve Hj?nnev?g wrote:
> On Mon, Feb 6, 2012 at 5:00 PM, Rafael J. Wysocki <[email protected]> wrote:
> ...
> > All in all, it's not as much code as I thought it would be and it seems to be
> > relatively simple (which rises the question why the Android people didn't
> > even _try_ to do something like this instead of slapping the "real" wakelocks
> > onto the kernel FWIW). IMHO it doesn't add anything really new to the kernel,
> > except for the user space interfaces that should be maintainable. At least I
> > think I should be able to maintain them. :-)
> >
>
> Replacing a working solution with an untested one takes time.

Sure, that's pretty obvious. :-)

> That said, I have recently tried replacing all our kernel wake-locks with a
> thin wrapper around wake-sources. This appears to mostly work,

Good!

> but the wake-source timeout feature has some bugs or incompatible apis. An
> init api would also be useful for embedding wake-sources in other data
> structures without adding another memory allocation. Your patch to
> move the spinlock init to wakeup_source_add still require the struct
> to be zero initialized and the name set manually.

That should be easy to fix. What about the appended patch?

> I needed to use two wake-sources per wake-lock since calling
> __pm_stay_awake after __pm_wakeup_event on a wake-source does not
> cancel the timeout. Unless there is a reason to keep this behavior I
> would like __pm_stay_awake to cancel any active timeout.

That actually is a bug. At least it's not consistent with
__pm_wakeup_event() that will replace the existing timeout with a new
one.

I'll post a patch to fix that in the next couple of days, stay tuned. :-)

> Destroying a wake-source also has some problems. If you call
> wakeup_source_destroy it will spin forever if the wake-source is
> active without a timeout. And, if you call __pm_relax then
> wakeup_source_destroy it could free the wake-source memory while the
> timer function is still running.

This also is a bug that needs fixing anyway.

> It also looks as if the wake_source can be immediately deactivated if
> you call __pm_wakeup_event at the same time as the previous timeout expired.

Yes, there is a race window if the timer function has already started.
It looks like I wanted to make it too simple. :-) Will fix.

Thanks,
Rafael


Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/power/wakeup.c | 44 +++++++++++++++++++++++++++++++++++++-------
include/linux/pm_wakeup.h | 9 +++++++++
2 files changed, 46 insertions(+), 7 deletions(-)

Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -53,6 +53,28 @@ static void pm_wakeup_timer_fn(unsigned
static LIST_HEAD(wakeup_sources);

/**
+ * wakeup_source_init - Initialize a struct wakeup_source object.
+ * @ws: Wakeup source to initialize.
+ * @name: Name of the new wakeup source.
+ */
+int wakeup_source_init(struct wakeup_source *ws, const char *name)
+{
+ int ret = 0;
+
+ if (!ws)
+ return -EINVAL;
+
+ memset(ws, 0, sizeof(*ws));
+ if (name) {
+ ws->name = kstrdup(name, GFP_KERNEL);
+ if (!ws->name)
+ ret = -ENOMEM;
+ }
+ return ret;
+}
+EXPORT_SYMBOL_GPL(wakeup_source_init);
+
+/**
* wakeup_source_create - Create a struct wakeup_source object.
* @name: Name of the new wakeup source.
*/
@@ -60,22 +82,20 @@ struct wakeup_source *wakeup_source_crea
{
struct wakeup_source *ws;

- ws = kzalloc(sizeof(*ws), GFP_KERNEL);
+ ws = kmalloc(sizeof(*ws), GFP_KERNEL);
if (!ws)
return NULL;

- if (name)
- ws->name = kstrdup(name, GFP_KERNEL);
-
+ wakeup_source_init(ws, name);
return ws;
}
EXPORT_SYMBOL_GPL(wakeup_source_create);

/**
- * wakeup_source_destroy - Destroy a struct wakeup_source object.
- * @ws: Wakeup source to destroy.
+ * wakeup_source_drop - Prepare a struct wakeup_source object for destruction.
+ * @ws: Wakeup source to prepare for destruction.
*/
-void wakeup_source_destroy(struct wakeup_source *ws)
+void wakeup_source_drop(struct wakeup_source *ws)
{
if (!ws)
return;
@@ -91,6 +111,16 @@ void wakeup_source_destroy(struct wakeup
spin_unlock_irq(&ws->lock);

kfree(ws->name);
+}
+EXPORT_SYMBOL_GPL(wakeup_source_drop);
+
+/**
+ * wakeup_source_destroy - Destroy a struct wakeup_source object.
+ * @ws: Wakeup source to destroy.
+ */
+void wakeup_source_destroy(struct wakeup_source *ws)
+{
+ wakeup_source_drop(ws);
kfree(ws);
}
EXPORT_SYMBOL_GPL(wakeup_source_destroy);
Index: linux/include/linux/pm_wakeup.h
===================================================================
--- linux.orig/include/linux/pm_wakeup.h
+++ linux/include/linux/pm_wakeup.h
@@ -73,7 +73,9 @@ static inline bool device_may_wakeup(str
}

/* drivers/base/power/wakeup.c */
+extern int wakeup_source_init(struct wakeup_source *ws, const char *name);
extern struct wakeup_source *wakeup_source_create(const char *name);
+extern void wakeup_source_drop(struct wakeup_source *ws);
extern void wakeup_source_destroy(struct wakeup_source *ws);
extern void wakeup_source_add(struct wakeup_source *ws);
extern void wakeup_source_remove(struct wakeup_source *ws);
@@ -103,11 +105,18 @@ static inline bool device_can_wakeup(str
return dev->power.can_wakeup;
}

+static inline int wakeup_source_init(struct wakeup_source *ws, const char *name)
+{
+ return -ENOSYS;
+}
+
static inline struct wakeup_source *wakeup_source_create(const char *name)
{
return NULL;
}

+static inline void wakeup_source_drop(struct wakeup_source *ws) {}
+
static inline void wakeup_source_destroy(struct wakeup_source *ws) {}

static inline void wakeup_source_add(struct wakeup_source *ws) {}

2012-02-15 05:57:47

by Arve Hjønnevåg

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks"

2012/2/14 Rafael J. Wysocki <[email protected]>:
> On Tuesday, February 14, 2012, Arve Hj?nnev?g wrote:
>> On Mon, Feb 6, 2012 at 5:00 PM, Rafael J. Wysocki <[email protected]> wrote:
>> ...
>> but the wake-source timeout feature has some bugs or incompatible apis. An
>> init api would also be useful for embedding wake-sources in other data
>> structures without adding another memory allocation. Your patch to
>> move the spinlock init to wakeup_source_add still require the struct
>> to be zero initialized and the name set manually.
>
> That should be easy to fix. ?What about the appended patch?
>

That works, but I still have to call more than one function before I
can use the wakeup-source (wakeup_source_init and wakeup_source_add)
and more than one function before I can free it (__pm_relax,
wakeup_source_remove and wakeup_source_drop). Is there any reason to
keep these separate?

Also, not copying the name when the caller provides the memory for the
wakeup-source would be a closer match to the wakelock api. Most of our
wakelocks pass a string constant as the name, and making a copy of
that string is not useful. wake_lock_init is also safe to call from
atomic context, but I don't know if anyone relies on this.

--
Arve Hj?nnev?g

2012-02-15 06:15:22

by Arve Hjønnevåg

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/8] PM / Sleep: Change wakeup statistics

On Mon, Feb 6, 2012 at 5:05 PM, Rafael J. Wysocki <[email protected]> wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Wakeup statistics used by Android are slightly different from what we
> have at the moment, so modify them to follow Android more closely.
...
> @@ -438,6 +444,11 @@ static void wakeup_source_deactivate(str
> ? ? ? ?if (ktime_to_ns(duration) > ktime_to_ns(ws->max_time))
> ? ? ? ? ? ? ? ?ws->max_time = duration;
>
> + ? ? ? ws->last_time = now;
> + ? ? ? if (ws->has_timeout && time_after(jiffies, ws->timer_expires))

time_after_eq may work better (or increment the count from the timer).
I applied this patch and the expire counts I see for wakeup-sources
that always time-out do not match the active count.

--
Arve Hj?nnev?g

2012-02-15 15:28:59

by mark gross

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks"

On Mon, Feb 13, 2012 at 04:11:24PM -0800, Arve Hj?nnev?g wrote:
> On Sun, Feb 12, 2012 at 1:32 PM, Rafael J. Wysocki <[email protected]> wrote:
> > On Sunday, February 12, 2012, mark gross wrote:
> >> On Fri, Feb 10, 2012 at 01:44:10AM +0100, Rafael J. Wysocki wrote:
> > [...]
> >> > I'd like us and Android to use the same low-level data structures for power
> >> > management and the same API eventually, at least for drivers. ?This is not
> >> > the case at the moment and it's actively hurting us as a project quite a bit.
> >> > If Android needs to add patches on top of whatever we have to get the desired
> >> > functionality, I'm fine with that, as long as they don't require drivers to use
> >> > APIs that are incompatible with the mainline. ?Insisting that Android should
> >> > use a user-space-based autosleep implementation wouldn't help at all, because
> >> > realistically this isn't going to happen.
> >>
> >> why not? ?I don't think having the PMS explicitly acknowledge a wake
> >> event is a big ask at all.
> >
> > I'd like to hear what the Android people think about that, but somehow it seems
> > to me they won't like it. :-)
> >
>
> Correct.
>
> The android power manager service does not handle wake events and
> therefore does not know when it is safe to acknowledge a wake event
> (assuming this acknowledgement re-triggers suspend). Other components
> handle the event and only notify the power manager if the event should
> change a state (e.g. turn the screen on). Some wake events, like the
> alarm used for battery monitoring, don't signal user space at all if
> the user visible state did not change. Other wake events are processed
> by lower level user-space services than the system-server where the
> power manager runs.

So you are all good with the wake event suspend race condition never ever
getting corrected or the fact that we have to sprinkle overlapping
kernel wake locks up and down the stack if we want to attempt to
implement correct code or that there is *no* way to deal with the hand
off of a wake lock critical section between kernel and user mode on wake
events without having a somewhat arbitrary time out wake lock dropping in
kernel mode?

Fine, if you don't like having the PMS ack wake events how about having
the services that handle them do it?

The basic problem with wake locks is that there is no explicit wake
event acknowledgment required before re-suspending. How about helping
us come up with a solution to that.

--mark

> --
> Arve Hj?nnev?g

2012-02-15 22:33:30

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/8] PM / Sleep: Change wakeup statistics

On Wednesday, February 15, 2012, Arve Hj?nnev?g wrote:
> On Mon, Feb 6, 2012 at 5:05 PM, Rafael J. Wysocki <[email protected]> wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > Wakeup statistics used by Android are slightly different from what we
> > have at the moment, so modify them to follow Android more closely.
> ...
> > @@ -438,6 +444,11 @@ static void wakeup_source_deactivate(str
> > if (ktime_to_ns(duration) > ktime_to_ns(ws->max_time))
> > ws->max_time = duration;
> >
> > + ws->last_time = now;
> > + if (ws->has_timeout && time_after(jiffies, ws->timer_expires))
>
> time_after_eq may work better (or increment the count from the timer).

I think incrementing the count from the timer is a better approach.

> I applied this patch and the expire counts I see for wakeup-sources
> that always time-out do not match the active count.

I see. The reason may also be that __pm_wakeup_event() increments
ws->event_count even if the wakeup source is already active.

Thanks,
Rafael

2012-02-15 23:03:41

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks"

On Wednesday, February 15, 2012, Arve Hj?nnev?g wrote:
> 2012/2/14 Rafael J. Wysocki <[email protected]>:
> > On Tuesday, February 14, 2012, Arve Hj?nnev?g wrote:
> >> On Mon, Feb 6, 2012 at 5:00 PM, Rafael J. Wysocki <[email protected]> wrote:
> >> ...
> >> but the wake-source timeout feature has some bugs or incompatible apis. An
> >> init api would also be useful for embedding wake-sources in other data
> >> structures without adding another memory allocation. Your patch to
> >> move the spinlock init to wakeup_source_add still require the struct
> >> to be zero initialized and the name set manually.
> >
> > That should be easy to fix. What about the appended patch?
> >
>
> That works, but I still have to call more than one function before I
> can use the wakeup-source (wakeup_source_init and wakeup_source_add)
> and more than one function before I can free it (__pm_relax,
> wakeup_source_remove and wakeup_source_drop). Is there any reason to
> keep these separate?

Yes, there is. I think that wakeup_source_create/_destroy() should
use the same initialization functions internally that will be used for
externally allocated wakeup sources (to make sure that all wakeup source
objects are initialized in exactly the same way).

> Also, not copying the name when the caller provides the memory for the
> wakeup-source would be a closer match to the wakelock api. Most of our
> wakelocks pass a string constant as the name, and making a copy of
> that string is not useful. wake_lock_init is also safe to call from
> atomic context, but I don't know if anyone relies on this.

OK, below is another go. It doesn't copy the name if wakeup_source_init() is
used (which also does the _add this time). I think, though, that copying
the name is generally safer, because someone might use wakeup_source_init()
with the name string allocated on the stack or otherwise temporary, which would
be a bug with the new version.

Thanks,
Rafael


Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/power/wakeup.c | 41 ++++++++++++++++++++++++++++++++++-------
include/linux/pm_wakeup.h | 20 ++++++++++++++++++++
2 files changed, 54 insertions(+), 7 deletions(-)

Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -53,6 +53,23 @@ static void pm_wakeup_timer_fn(unsigned
static LIST_HEAD(wakeup_sources);

/**
+ * wakeup_source_prepare - Prepare a new wakeup source for initialization.
+ * @ws: Wakeup source to prepare.
+ * @name: Pointer to the name of the new wakeup source.
+ *
+ * Callers must ensure that the @name string won't be freed when @ws is still in
+ * use.
+ */
+void wakeup_source_prepare(struct wakeup_source *ws, const char *name)
+{
+ if (ws) {
+ memset(ws, 0, sizeof(*ws));
+ ws->name = name;
+ }
+}
+EXPORT_SYMBOL_GPL(wakeup_source_prepare);
+
+/**
* wakeup_source_create - Create a struct wakeup_source object.
* @name: Name of the new wakeup source.
*/
@@ -60,31 +77,41 @@ struct wakeup_source *wakeup_source_crea
{
struct wakeup_source *ws;

- ws = kzalloc(sizeof(*ws), GFP_KERNEL);
+ ws = kmalloc(sizeof(*ws), GFP_KERNEL);
if (!ws)
return NULL;

- if (name)
- ws->name = kstrdup(name, GFP_KERNEL);
-
+ wakeup_source_prepare(ws, name ? kstrdup(name, GFP_KERNEL) : NULL);
return ws;
}
EXPORT_SYMBOL_GPL(wakeup_source_create);

/**
- * wakeup_source_destroy - Destroy a struct wakeup_source object.
- * @ws: Wakeup source to destroy.
+ * wakeup_source_drop - Prepare a struct wakeup_source object for destruction.
+ * @ws: Wakeup source to prepare for destruction.
*
* Callers must ensure that __pm_stay_awake() or __pm_wakeup_event() will never
* be run in parallel with this function for the same wakeup source object.
*/
-void wakeup_source_destroy(struct wakeup_source *ws)
+void wakeup_source_drop(struct wakeup_source *ws)
{
if (!ws)
return;

del_timer_sync(&ws->timer);
__pm_relax(ws);
+}
+EXPORT_SYMBOL_GPL(wakeup_source_drop);
+
+/**
+ * wakeup_source_destroy - Destroy a struct wakeup_source object.
+ * @ws: Wakeup source to destroy.
+ *
+ * Use only for wakeup source objects created with wakeup_source_create().
+ */
+void wakeup_source_destroy(struct wakeup_source *ws)
+{
+ wakeup_source_drop(ws);
kfree(ws->name);
kfree(ws);
}
Index: linux/include/linux/pm_wakeup.h
===================================================================
--- linux.orig/include/linux/pm_wakeup.h
+++ linux/include/linux/pm_wakeup.h
@@ -73,7 +73,9 @@ static inline bool device_may_wakeup(str
}

/* drivers/base/power/wakeup.c */
+extern void wakeup_source_prepare(struct wakeup_source *ws, const char *name);
extern struct wakeup_source *wakeup_source_create(const char *name);
+extern void wakeup_source_drop(struct wakeup_source *ws);
extern void wakeup_source_destroy(struct wakeup_source *ws);
extern void wakeup_source_add(struct wakeup_source *ws);
extern void wakeup_source_remove(struct wakeup_source *ws);
@@ -103,11 +105,16 @@ static inline bool device_can_wakeup(str
return dev->power.can_wakeup;
}

+static inline void wakeup_source_prepare(struct wakeup_source *ws,
+ const char *name) {}
+
static inline struct wakeup_source *wakeup_source_create(const char *name)
{
return NULL;
}

+static inline void wakeup_source_drop(struct wakeup_source *ws) {}
+
static inline void wakeup_source_destroy(struct wakeup_source *ws) {}

static inline void wakeup_source_add(struct wakeup_source *ws) {}
@@ -165,4 +172,17 @@ static inline void pm_wakeup_event(struc

#endif /* !CONFIG_PM_SLEEP */

+static inline void wakeup_source_init(struct wakeup_source *ws,
+ const char *name)
+{
+ wakeup_source_prepare(ws, name);
+ wakeup_source_add(ws);
+}
+
+static inline void wakeup_source_trash(struct wakeup_source *ws)
+{
+ wakeup_source_remove(ws);
+ wakeup_source_drop(ws);
+}
+
#endif /* _LINUX_PM_WAKEUP_H */

2012-02-16 22:18:30

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks"

On Thursday, February 16, 2012, Rafael J. Wysocki wrote:
> On Wednesday, February 15, 2012, Arve Hj?nnev?g wrote:
> > 2012/2/14 Rafael J. Wysocki <[email protected]>:
> > > On Tuesday, February 14, 2012, Arve Hj?nnev?g wrote:
> > >> On Mon, Feb 6, 2012 at 5:00 PM, Rafael J. Wysocki <[email protected]> wrote:
> > >> ...
> > >> but the wake-source timeout feature has some bugs or incompatible apis. An
> > >> init api would also be useful for embedding wake-sources in other data
> > >> structures without adding another memory allocation. Your patch to
> > >> move the spinlock init to wakeup_source_add still require the struct
> > >> to be zero initialized and the name set manually.
> > >
> > > That should be easy to fix. What about the appended patch?
> > >
> >
> > That works, but I still have to call more than one function before I
> > can use the wakeup-source (wakeup_source_init and wakeup_source_add)
> > and more than one function before I can free it (__pm_relax,
> > wakeup_source_remove and wakeup_source_drop). Is there any reason to
> > keep these separate?
>
> Yes, there is. I think that wakeup_source_create/_destroy() should
> use the same initialization functions internally that will be used for
> externally allocated wakeup sources (to make sure that all wakeup source
> objects are initialized in exactly the same way).
>
> > Also, not copying the name when the caller provides the memory for the
> > wakeup-source would be a closer match to the wakelock api. Most of our
> > wakelocks pass a string constant as the name, and making a copy of
> > that string is not useful. wake_lock_init is also safe to call from
> > atomic context, but I don't know if anyone relies on this.
>
> OK, below is another go. It doesn't copy the name if wakeup_source_init() is
> used (which also does the _add this time). I think, though, that copying
> the name is generally safer, because someone might use wakeup_source_init()
> with the name string allocated on the stack or otherwise temporary, which would
> be a bug with the new version.

So, is the new version more suitable than the previous one?

Rafael


> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> drivers/base/power/wakeup.c | 41 ++++++++++++++++++++++++++++++++++-------
> include/linux/pm_wakeup.h | 20 ++++++++++++++++++++
> 2 files changed, 54 insertions(+), 7 deletions(-)
>
> Index: linux/drivers/base/power/wakeup.c
> ===================================================================
> --- linux.orig/drivers/base/power/wakeup.c
> +++ linux/drivers/base/power/wakeup.c
> @@ -53,6 +53,23 @@ static void pm_wakeup_timer_fn(unsigned
> static LIST_HEAD(wakeup_sources);
>
> /**
> + * wakeup_source_prepare - Prepare a new wakeup source for initialization.
> + * @ws: Wakeup source to prepare.
> + * @name: Pointer to the name of the new wakeup source.
> + *
> + * Callers must ensure that the @name string won't be freed when @ws is still in
> + * use.
> + */
> +void wakeup_source_prepare(struct wakeup_source *ws, const char *name)
> +{
> + if (ws) {
> + memset(ws, 0, sizeof(*ws));
> + ws->name = name;
> + }
> +}
> +EXPORT_SYMBOL_GPL(wakeup_source_prepare);
> +
> +/**
> * wakeup_source_create - Create a struct wakeup_source object.
> * @name: Name of the new wakeup source.
> */
> @@ -60,31 +77,41 @@ struct wakeup_source *wakeup_source_crea
> {
> struct wakeup_source *ws;
>
> - ws = kzalloc(sizeof(*ws), GFP_KERNEL);
> + ws = kmalloc(sizeof(*ws), GFP_KERNEL);
> if (!ws)
> return NULL;
>
> - if (name)
> - ws->name = kstrdup(name, GFP_KERNEL);
> -
> + wakeup_source_prepare(ws, name ? kstrdup(name, GFP_KERNEL) : NULL);
> return ws;
> }
> EXPORT_SYMBOL_GPL(wakeup_source_create);
>
> /**
> - * wakeup_source_destroy - Destroy a struct wakeup_source object.
> - * @ws: Wakeup source to destroy.
> + * wakeup_source_drop - Prepare a struct wakeup_source object for destruction.
> + * @ws: Wakeup source to prepare for destruction.
> *
> * Callers must ensure that __pm_stay_awake() or __pm_wakeup_event() will never
> * be run in parallel with this function for the same wakeup source object.
> */
> -void wakeup_source_destroy(struct wakeup_source *ws)
> +void wakeup_source_drop(struct wakeup_source *ws)
> {
> if (!ws)
> return;
>
> del_timer_sync(&ws->timer);
> __pm_relax(ws);
> +}
> +EXPORT_SYMBOL_GPL(wakeup_source_drop);
> +
> +/**
> + * wakeup_source_destroy - Destroy a struct wakeup_source object.
> + * @ws: Wakeup source to destroy.
> + *
> + * Use only for wakeup source objects created with wakeup_source_create().
> + */
> +void wakeup_source_destroy(struct wakeup_source *ws)
> +{
> + wakeup_source_drop(ws);
> kfree(ws->name);
> kfree(ws);
> }
> Index: linux/include/linux/pm_wakeup.h
> ===================================================================
> --- linux.orig/include/linux/pm_wakeup.h
> +++ linux/include/linux/pm_wakeup.h
> @@ -73,7 +73,9 @@ static inline bool device_may_wakeup(str
> }
>
> /* drivers/base/power/wakeup.c */
> +extern void wakeup_source_prepare(struct wakeup_source *ws, const char *name);
> extern struct wakeup_source *wakeup_source_create(const char *name);
> +extern void wakeup_source_drop(struct wakeup_source *ws);
> extern void wakeup_source_destroy(struct wakeup_source *ws);
> extern void wakeup_source_add(struct wakeup_source *ws);
> extern void wakeup_source_remove(struct wakeup_source *ws);
> @@ -103,11 +105,16 @@ static inline bool device_can_wakeup(str
> return dev->power.can_wakeup;
> }
>
> +static inline void wakeup_source_prepare(struct wakeup_source *ws,
> + const char *name) {}
> +
> static inline struct wakeup_source *wakeup_source_create(const char *name)
> {
> return NULL;
> }
>
> +static inline void wakeup_source_drop(struct wakeup_source *ws) {}
> +
> static inline void wakeup_source_destroy(struct wakeup_source *ws) {}
>
> static inline void wakeup_source_add(struct wakeup_source *ws) {}
> @@ -165,4 +172,17 @@ static inline void pm_wakeup_event(struc
>
> #endif /* !CONFIG_PM_SLEEP */
>
> +static inline void wakeup_source_init(struct wakeup_source *ws,
> + const char *name)
> +{
> + wakeup_source_prepare(ws, name);
> + wakeup_source_add(ws);
> +}
> +
> +static inline void wakeup_source_trash(struct wakeup_source *ws)
> +{
> + wakeup_source_remove(ws);
> + wakeup_source_drop(ws);
> +}
> +
> #endif /* _LINUX_PM_WAKEUP_H */
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

2012-02-17 02:12:00

by Arve Hjønnevåg

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/8] PM / Sleep: Change wakeup statistics

2012/2/15 Rafael J. Wysocki <[email protected]>:
> On Wednesday, February 15, 2012, Arve Hj?nnev?g wrote:
>> On Mon, Feb 6, 2012 at 5:05 PM, Rafael J. Wysocki <[email protected]> wrote:
>> > From: Rafael J. Wysocki <[email protected]>
>> >
>> > Wakeup statistics used by Android are slightly different from what we
>> > have at the moment, so modify them to follow Android more closely.
>> ...
>> > @@ -438,6 +444,11 @@ static void wakeup_source_deactivate(str
>> > ? ? ? ?if (ktime_to_ns(duration) > ktime_to_ns(ws->max_time))
>> > ? ? ? ? ? ? ? ?ws->max_time = duration;
>> >
>> > + ? ? ? ws->last_time = now;
>> > + ? ? ? if (ws->has_timeout && time_after(jiffies, ws->timer_expires))
>>
>> time_after_eq may work better (or increment the count from the timer).
>
> I think incrementing the count from the timer is a better approach.
>

OK.

>> I applied this patch and the expire counts I see for wakeup-sources
>> that always time-out do not match the active count.
>
> I see. ?The reason may also be that __pm_wakeup_event() increments
> ws->event_count even if the wakeup source is already active.
>

The active count, which is what I was looking at, only changes if it
was not already active though.

--
Arve Hj?nnev?g

2012-02-17 03:55:46

by Arve Hjønnevåg

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks"

2012/2/15 Rafael J. Wysocki <[email protected]>:
> On Wednesday, February 15, 2012, Arve Hj?nnev?g wrote:
>> 2012/2/14 Rafael J. Wysocki <[email protected]>:
>> > On Tuesday, February 14, 2012, Arve Hj?nnev?g wrote:
>> >> On Mon, Feb 6, 2012 at 5:00 PM, Rafael J. Wysocki <[email protected]> wrote:
>> >> ...
>> >> but the wake-source timeout feature has some bugs or incompatible apis. An
>> >> init api would also be useful for embedding wake-sources in other data
>> >> structures without adding another memory allocation. Your patch to
>> >> move the spinlock init to wakeup_source_add still require the struct
>> >> to be zero initialized and the name set manually.
>> >
>> > That should be easy to fix. ?What about the appended patch?
>> >
>>
>> That works, but I still have to call more than one function before I
>> can use the wakeup-source (wakeup_source_init and wakeup_source_add)
>> and more than one function before I can free it (__pm_relax,
>> wakeup_source_remove and wakeup_source_drop). Is there any reason to
>> keep these separate?
>
> Yes, there is. ?I think that wakeup_source_create/_destroy() should
> use the same initialization functions internally that will be used for
> externally allocated wakeup sources (to make sure that all wakeup source
> objects are initialized in exactly the same way).
>

I agree with that, but is it useful to export these helper functions?

>> Also, not copying the name when the caller provides the memory for the
>> wakeup-source would be a closer match to the wakelock api. Most of our
>> wakelocks pass a string constant as the name, and making a copy of
>> that string is not useful. wake_lock_init is also safe to call from
>> atomic context, but I don't know if anyone relies on this.
>
> OK, below is another go. ?It doesn't copy the name if wakeup_source_init() is
> used (which also does the _add this time). ?I think, though, that copying
> the name is generally safer, because someone might use wakeup_source_init()
> with the name string allocated on the stack or otherwise temporary, which would
> be a bug with the new version.
>

I prefer this version. I have not seen a bug where someone passed a
temporary as the wakelock name, I assume since this will show up
immediately in the stats file.

--
Arve Hj?nnev?g

2012-02-17 03:56:28

by Arve Hjønnevåg

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks"

2012/2/16 Rafael J. Wysocki <[email protected]>:
...
>
> So, is the new version more suitable than the previous one?
>

Yes, I think it is.

--
Arve Hj?nnev?g

2012-02-17 20:53:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks"

On Friday, February 17, 2012, Arve Hj?nnev?g wrote:
> 2012/2/15 Rafael J. Wysocki <[email protected]>:
> > On Wednesday, February 15, 2012, Arve Hj?nnev?g wrote:
> >> 2012/2/14 Rafael J. Wysocki <[email protected]>:
> >> > On Tuesday, February 14, 2012, Arve Hj?nnev?g wrote:
> >> >> On Mon, Feb 6, 2012 at 5:00 PM, Rafael J. Wysocki <[email protected]> wrote:
> >> >> ...
> >> >> but the wake-source timeout feature has some bugs or incompatible apis. An
> >> >> init api would also be useful for embedding wake-sources in other data
> >> >> structures without adding another memory allocation. Your patch to
> >> >> move the spinlock init to wakeup_source_add still require the struct
> >> >> to be zero initialized and the name set manually.
> >> >
> >> > That should be easy to fix. What about the appended patch?
> >> >
> >>
> >> That works, but I still have to call more than one function before I
> >> can use the wakeup-source (wakeup_source_init and wakeup_source_add)
> >> and more than one function before I can free it (__pm_relax,
> >> wakeup_source_remove and wakeup_source_drop). Is there any reason to
> >> keep these separate?
> >
> > Yes, there is. I think that wakeup_source_create/_destroy() should
> > use the same initialization functions internally that will be used for
> > externally allocated wakeup sources (to make sure that all wakeup source
> > objects are initialized in exactly the same way).
> >
>
> I agree with that, but is it useful to export these helper functions?

Well, we need to export either them or the ones that will call them internally
and in principle someone may want to do something between _prepare() and _add()
sometimes ...

> >> Also, not copying the name when the caller provides the memory for the
> >> wakeup-source would be a closer match to the wakelock api. Most of our
> >> wakelocks pass a string constant as the name, and making a copy of
> >> that string is not useful. wake_lock_init is also safe to call from
> >> atomic context, but I don't know if anyone relies on this.
> >
> > OK, below is another go. It doesn't copy the name if wakeup_source_init() is
> > used (which also does the _add this time). I think, though, that copying
> > the name is generally safer, because someone might use wakeup_source_init()
> > with the name string allocated on the stack or otherwise temporary, which would
> > be a bug with the new version.
> >
>
> I prefer this version. I have not seen a bug where someone passed a
> temporary as the wakelock name, I assume since this will show up
> immediately in the stats file.

OK

Thanks,
Rafael

2012-02-17 22:58:17

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH] PM / Sleep: Add more wakeup source initialization routines

From: Rafael J. Wysocki <[email protected]>

The existing wakeup source initialization routines are not
particularly useful for wakeup sources that aren't created by
wakeup_source_create(), because their users have to open code
filling the objects with zeros and setting their names. For this
reason, introduce routines that can be used for initializing, for
example, static wakeup source objects.

Requested-by: Arve Hj?nnev?g <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---

This patch is on top of the linux-next branch of the linux-pm tree.

Thanks,
Rafael

---
drivers/base/power/wakeup.c | 41 ++++++++++++++++++++++++++++++++++-------
include/linux/pm_wakeup.h | 20 ++++++++++++++++++++
2 files changed, 54 insertions(+), 7 deletions(-)

Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -53,6 +53,23 @@ static void pm_wakeup_timer_fn(unsigned
static LIST_HEAD(wakeup_sources);

/**
+ * wakeup_source_prepare - Prepare a new wakeup source for initialization.
+ * @ws: Wakeup source to prepare.
+ * @name: Pointer to the name of the new wakeup source.
+ *
+ * Callers must ensure that the @name string won't be freed when @ws is still in
+ * use.
+ */
+void wakeup_source_prepare(struct wakeup_source *ws, const char *name)
+{
+ if (ws) {
+ memset(ws, 0, sizeof(*ws));
+ ws->name = name;
+ }
+}
+EXPORT_SYMBOL_GPL(wakeup_source_prepare);
+
+/**
* wakeup_source_create - Create a struct wakeup_source object.
* @name: Name of the new wakeup source.
*/
@@ -60,31 +77,41 @@ struct wakeup_source *wakeup_source_crea
{
struct wakeup_source *ws;

- ws = kzalloc(sizeof(*ws), GFP_KERNEL);
+ ws = kmalloc(sizeof(*ws), GFP_KERNEL);
if (!ws)
return NULL;

- if (name)
- ws->name = kstrdup(name, GFP_KERNEL);
-
+ wakeup_source_prepare(ws, name ? kstrdup(name, GFP_KERNEL) : NULL);
return ws;
}
EXPORT_SYMBOL_GPL(wakeup_source_create);

/**
- * wakeup_source_destroy - Destroy a struct wakeup_source object.
- * @ws: Wakeup source to destroy.
+ * wakeup_source_drop - Prepare a struct wakeup_source object for destruction.
+ * @ws: Wakeup source to prepare for destruction.
*
* Callers must ensure that __pm_stay_awake() or __pm_wakeup_event() will never
* be run in parallel with this function for the same wakeup source object.
*/
-void wakeup_source_destroy(struct wakeup_source *ws)
+void wakeup_source_drop(struct wakeup_source *ws)
{
if (!ws)
return;

del_timer_sync(&ws->timer);
__pm_relax(ws);
+}
+EXPORT_SYMBOL_GPL(wakeup_source_drop);
+
+/**
+ * wakeup_source_destroy - Destroy a struct wakeup_source object.
+ * @ws: Wakeup source to destroy.
+ *
+ * Use only for wakeup source objects created with wakeup_source_create().
+ */
+void wakeup_source_destroy(struct wakeup_source *ws)
+{
+ wakeup_source_drop(ws);
kfree(ws->name);
kfree(ws);
}
Index: linux/include/linux/pm_wakeup.h
===================================================================
--- linux.orig/include/linux/pm_wakeup.h
+++ linux/include/linux/pm_wakeup.h
@@ -73,7 +73,9 @@ static inline bool device_may_wakeup(str
}

/* drivers/base/power/wakeup.c */
+extern void wakeup_source_prepare(struct wakeup_source *ws, const char *name);
extern struct wakeup_source *wakeup_source_create(const char *name);
+extern void wakeup_source_drop(struct wakeup_source *ws);
extern void wakeup_source_destroy(struct wakeup_source *ws);
extern void wakeup_source_add(struct wakeup_source *ws);
extern void wakeup_source_remove(struct wakeup_source *ws);
@@ -103,11 +105,16 @@ static inline bool device_can_wakeup(str
return dev->power.can_wakeup;
}

+static inline void wakeup_source_prepare(struct wakeup_source *ws,
+ const char *name) {}
+
static inline struct wakeup_source *wakeup_source_create(const char *name)
{
return NULL;
}

+static inline void wakeup_source_drop(struct wakeup_source *ws) {}
+
static inline void wakeup_source_destroy(struct wakeup_source *ws) {}

static inline void wakeup_source_add(struct wakeup_source *ws) {}
@@ -165,4 +172,17 @@ static inline void pm_wakeup_event(struc

#endif /* !CONFIG_PM_SLEEP */

+static inline void wakeup_source_init(struct wakeup_source *ws,
+ const char *name)
+{
+ wakeup_source_prepare(ws, name);
+ wakeup_source_add(ws);
+}
+
+static inline void wakeup_source_trash(struct wakeup_source *ws)
+{
+ wakeup_source_remove(ws);
+ wakeup_source_drop(ws);
+}
+
#endif /* _LINUX_PM_WAKEUP_H */

2012-02-18 23:46:48

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Update][PATCH] PM / Sleep: Add more wakeup source initialization routines

From: Rafael J. Wysocki <[email protected]>
Subject: PM / Sleep: Add more wakeup source initialization routines

The existing wakeup source initialization routines are not
particularly useful for wakeup sources that aren't created by
wakeup_source_create(), because their users have to open code
filling the objects with zeros and setting their names. For this
reason, introduce routines that can be used for initializing, for
example, static wakeup source objects.

Requested-by: Arve Hj?nnev?g <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---

The name member of struct wakeup_source has to be of type (const char *)
due to the new dependencies between the arguments of the new initializers.
That also reflects the fact that that string is not supposed to be modified.

Thanks,
Rafael

---
drivers/base/power/wakeup.c | 41 ++++++++++++++++++++++++++++++++++-------
include/linux/pm_wakeup.h | 22 +++++++++++++++++++++-
2 files changed, 55 insertions(+), 8 deletions(-)

Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -53,6 +53,23 @@ static void pm_wakeup_timer_fn(unsigned
static LIST_HEAD(wakeup_sources);

/**
+ * wakeup_source_prepare - Prepare a new wakeup source for initialization.
+ * @ws: Wakeup source to prepare.
+ * @name: Pointer to the name of the new wakeup source.
+ *
+ * Callers must ensure that the @name string won't be freed when @ws is still in
+ * use.
+ */
+void wakeup_source_prepare(struct wakeup_source *ws, const char *name)
+{
+ if (ws) {
+ memset(ws, 0, sizeof(*ws));
+ ws->name = name;
+ }
+}
+EXPORT_SYMBOL_GPL(wakeup_source_prepare);
+
+/**
* wakeup_source_create - Create a struct wakeup_source object.
* @name: Name of the new wakeup source.
*/
@@ -60,31 +77,41 @@ struct wakeup_source *wakeup_source_crea
{
struct wakeup_source *ws;

- ws = kzalloc(sizeof(*ws), GFP_KERNEL);
+ ws = kmalloc(sizeof(*ws), GFP_KERNEL);
if (!ws)
return NULL;

- if (name)
- ws->name = kstrdup(name, GFP_KERNEL);
-
+ wakeup_source_prepare(ws, name ? kstrdup(name, GFP_KERNEL) : NULL);
return ws;
}
EXPORT_SYMBOL_GPL(wakeup_source_create);

/**
- * wakeup_source_destroy - Destroy a struct wakeup_source object.
- * @ws: Wakeup source to destroy.
+ * wakeup_source_drop - Prepare a struct wakeup_source object for destruction.
+ * @ws: Wakeup source to prepare for destruction.
*
* Callers must ensure that __pm_stay_awake() or __pm_wakeup_event() will never
* be run in parallel with this function for the same wakeup source object.
*/
-void wakeup_source_destroy(struct wakeup_source *ws)
+void wakeup_source_drop(struct wakeup_source *ws)
{
if (!ws)
return;

del_timer_sync(&ws->timer);
__pm_relax(ws);
+}
+EXPORT_SYMBOL_GPL(wakeup_source_drop);
+
+/**
+ * wakeup_source_destroy - Destroy a struct wakeup_source object.
+ * @ws: Wakeup source to destroy.
+ *
+ * Use only for wakeup source objects created with wakeup_source_create().
+ */
+void wakeup_source_destroy(struct wakeup_source *ws)
+{
+ wakeup_source_drop(ws);
kfree(ws->name);
kfree(ws);
}
Index: linux/include/linux/pm_wakeup.h
===================================================================
--- linux.orig/include/linux/pm_wakeup.h
+++ linux/include/linux/pm_wakeup.h
@@ -41,7 +41,7 @@
* @active: Status of the wakeup source.
*/
struct wakeup_source {
- char *name;
+ const char *name;
struct list_head entry;
spinlock_t lock;
struct timer_list timer;
@@ -73,7 +73,9 @@ static inline bool device_may_wakeup(str
}

/* drivers/base/power/wakeup.c */
+extern void wakeup_source_prepare(struct wakeup_source *ws, const char *name);
extern struct wakeup_source *wakeup_source_create(const char *name);
+extern void wakeup_source_drop(struct wakeup_source *ws);
extern void wakeup_source_destroy(struct wakeup_source *ws);
extern void wakeup_source_add(struct wakeup_source *ws);
extern void wakeup_source_remove(struct wakeup_source *ws);
@@ -103,11 +105,16 @@ static inline bool device_can_wakeup(str
return dev->power.can_wakeup;
}

+static inline void wakeup_source_prepare(struct wakeup_source *ws,
+ const char *name) {}
+
static inline struct wakeup_source *wakeup_source_create(const char *name)
{
return NULL;
}

+static inline void wakeup_source_drop(struct wakeup_source *ws) {}
+
static inline void wakeup_source_destroy(struct wakeup_source *ws) {}

static inline void wakeup_source_add(struct wakeup_source *ws) {}
@@ -165,4 +172,17 @@ static inline void pm_wakeup_event(struc

#endif /* !CONFIG_PM_SLEEP */

+static inline void wakeup_source_init(struct wakeup_source *ws,
+ const char *name)
+{
+ wakeup_source_prepare(ws, name);
+ wakeup_source_add(ws);
+}
+
+static inline void wakeup_source_trash(struct wakeup_source *ws)
+{
+ wakeup_source_remove(ws);
+ wakeup_source_drop(ws);
+}
+
#endif /* _LINUX_PM_WAKEUP_H */

2012-02-20 23:00:15

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Update 2x][PATCH] PM / Sleep: Add more wakeup source initialization routines

From: Rafael J. Wysocki <[email protected]>
Subject: PM / Sleep: Add more wakeup source initialization routines

The existing wakeup source initialization routines are not
particularly useful for wakeup sources that aren't created by
wakeup_source_create(), because their users have to open code
filling the objects with zeros and setting their names. For this
reason, introduce routines that can be used for initializing, for
example, static wakeup source objects.

Requested-by: Arve Hj?nnev?g <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---

Make sure that wakeup_source_unregister() won't crash or trigger the
WARN_ON() in wakeup_source_remove() if a NULL pointer is passed to it.

Thanks,
Rafael

---
drivers/base/power/wakeup.c | 50 ++++++++++++++++++++++++++++++++++++--------
include/linux/pm_wakeup.h | 22 ++++++++++++++++++-
2 files changed, 62 insertions(+), 10 deletions(-)

Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -53,6 +53,23 @@ static void pm_wakeup_timer_fn(unsigned
static LIST_HEAD(wakeup_sources);

/**
+ * wakeup_source_prepare - Prepare a new wakeup source for initialization.
+ * @ws: Wakeup source to prepare.
+ * @name: Pointer to the name of the new wakeup source.
+ *
+ * Callers must ensure that the @name string won't be freed when @ws is still in
+ * use.
+ */
+void wakeup_source_prepare(struct wakeup_source *ws, const char *name)
+{
+ if (ws) {
+ memset(ws, 0, sizeof(*ws));
+ ws->name = name;
+ }
+}
+EXPORT_SYMBOL_GPL(wakeup_source_prepare);
+
+/**
* wakeup_source_create - Create a struct wakeup_source object.
* @name: Name of the new wakeup source.
*/
@@ -60,31 +77,44 @@ struct wakeup_source *wakeup_source_crea
{
struct wakeup_source *ws;

- ws = kzalloc(sizeof(*ws), GFP_KERNEL);
+ ws = kmalloc(sizeof(*ws), GFP_KERNEL);
if (!ws)
return NULL;

- if (name)
- ws->name = kstrdup(name, GFP_KERNEL);
-
+ wakeup_source_prepare(ws, name ? kstrdup(name, GFP_KERNEL) : NULL);
return ws;
}
EXPORT_SYMBOL_GPL(wakeup_source_create);

/**
- * wakeup_source_destroy - Destroy a struct wakeup_source object.
- * @ws: Wakeup source to destroy.
+ * wakeup_source_drop - Prepare a struct wakeup_source object for destruction.
+ * @ws: Wakeup source to prepare for destruction.
*
* Callers must ensure that __pm_stay_awake() or __pm_wakeup_event() will never
* be run in parallel with this function for the same wakeup source object.
*/
-void wakeup_source_destroy(struct wakeup_source *ws)
+void wakeup_source_drop(struct wakeup_source *ws)
{
if (!ws)
return;

del_timer_sync(&ws->timer);
__pm_relax(ws);
+}
+EXPORT_SYMBOL_GPL(wakeup_source_drop);
+
+/**
+ * wakeup_source_destroy - Destroy a struct wakeup_source object.
+ * @ws: Wakeup source to destroy.
+ *
+ * Use only for wakeup source objects created with wakeup_source_create().
+ */
+void wakeup_source_destroy(struct wakeup_source *ws)
+{
+ if (!ws)
+ return;
+
+ wakeup_source_drop(ws);
kfree(ws->name);
kfree(ws);
}
@@ -147,8 +177,10 @@ EXPORT_SYMBOL_GPL(wakeup_source_register
*/
void wakeup_source_unregister(struct wakeup_source *ws)
{
- wakeup_source_remove(ws);
- wakeup_source_destroy(ws);
+ if (ws) {
+ wakeup_source_remove(ws);
+ wakeup_source_destroy(ws);
+ }
}
EXPORT_SYMBOL_GPL(wakeup_source_unregister);

Index: linux/include/linux/pm_wakeup.h
===================================================================
--- linux.orig/include/linux/pm_wakeup.h
+++ linux/include/linux/pm_wakeup.h
@@ -41,7 +41,7 @@
* @active: Status of the wakeup source.
*/
struct wakeup_source {
- char *name;
+ const char *name;
struct list_head entry;
spinlock_t lock;
struct timer_list timer;
@@ -73,7 +73,9 @@ static inline bool device_may_wakeup(str
}

/* drivers/base/power/wakeup.c */
+extern void wakeup_source_prepare(struct wakeup_source *ws, const char *name);
extern struct wakeup_source *wakeup_source_create(const char *name);
+extern void wakeup_source_drop(struct wakeup_source *ws);
extern void wakeup_source_destroy(struct wakeup_source *ws);
extern void wakeup_source_add(struct wakeup_source *ws);
extern void wakeup_source_remove(struct wakeup_source *ws);
@@ -103,11 +105,16 @@ static inline bool device_can_wakeup(str
return dev->power.can_wakeup;
}

+static inline void wakeup_source_prepare(struct wakeup_source *ws,
+ const char *name) {}
+
static inline struct wakeup_source *wakeup_source_create(const char *name)
{
return NULL;
}

+static inline void wakeup_source_drop(struct wakeup_source *ws) {}
+
static inline void wakeup_source_destroy(struct wakeup_source *ws) {}

static inline void wakeup_source_add(struct wakeup_source *ws) {}
@@ -165,4 +172,17 @@ static inline void pm_wakeup_event(struc

#endif /* !CONFIG_PM_SLEEP */

+static inline void wakeup_source_init(struct wakeup_source *ws,
+ const char *name)
+{
+ wakeup_source_prepare(ws, name);
+ wakeup_source_add(ws);
+}
+
+static inline void wakeup_source_trash(struct wakeup_source *ws)
+{
+ wakeup_source_remove(ws);
+ wakeup_source_drop(ws);
+}
+
#endif /* _LINUX_PM_WAKEUP_H */

2012-02-21 23:35:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 1/7] PM / Sleep: Look for wakeup events in later stages of device suspend

From: Rafael J. Wysocki <[email protected]>

Currently, the device suspend code in drivers/base/power/main.c
only checks if there have been any wakeup events, and therefore the
ongoing system transition to a sleep state should be aborted, during
the first (i.e. "suspend") device suspend phase. However, wakeup
events may be reported later as well, so it's reasonable to look for
them in the in the subsequent (i.e. "late suspend" and "suspend
noirq") phases.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/power/main.c | 10 ++++++++++
1 file changed, 10 insertions(+)

Index: linux/drivers/base/power/main.c
===================================================================
--- linux.orig/drivers/base/power/main.c
+++ linux/drivers/base/power/main.c
@@ -889,6 +889,11 @@ static int dpm_suspend_noirq(pm_message_
if (!list_empty(&dev->power.entry))
list_move(&dev->power.entry, &dpm_noirq_list);
put_device(dev);
+
+ if (pm_wakeup_pending()) {
+ error = -EBUSY;
+ break;
+ }
}
mutex_unlock(&dpm_list_mtx);
if (error)
@@ -962,6 +967,11 @@ static int dpm_suspend_late(pm_message_t
if (!list_empty(&dev->power.entry))
list_move(&dev->power.entry, &dpm_late_early_list);
put_device(dev);
+
+ if (pm_wakeup_pending()) {
+ error = -EBUSY;
+ break;
+ }
}
mutex_unlock(&dpm_list_mtx);
if (error)

2012-02-21 23:35:26

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 3/7] PM / Sleep: Change wakeup source statistics to follow Android

From: Rafael J. Wysocki <[email protected]>

Wakeup statistics used by Android are slightly different from what we
have in wakeup sources at the moment and there aren't any known
users of those statistics other than Android, so modify them to make
it easier for Android to switch to wakeup sources.

This removes the struct wakeup_source's hit_cout field, which is very
rough and therefore not very useful, and adds two new fields,
wakeup_count and expire_count. The first one tracks how many times
the wakeup source is activated with events_check_enabled set (which
roughly corresponds to the situations when a system power transition
to a sleep state is in progress and would be aborted by this wakeup
source if it were the only active one at that time) and the second
one is the number of times the wakeup source has been activated with
a timeout that expired.

Additionally, the last_time field is now updated when the wakeup
source is deactivated too (previously it was only updated during
the wakeup source's activation), which seems to be what Android does
with the analogous counter for wakelocks.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
Documentation/ABI/testing/sysfs-devices-power | 24 ++++++---
drivers/base/power/sysfs.c | 30 ++++++++++--
drivers/base/power/wakeup.c | 64 +++++++++++---------------
include/linux/pm_wakeup.h | 11 ++--
4 files changed, 77 insertions(+), 52 deletions(-)

Index: linux/include/linux/pm_wakeup.h
===================================================================
--- linux.orig/include/linux/pm_wakeup.h
+++ linux/include/linux/pm_wakeup.h
@@ -33,12 +33,14 @@
*
* @total_time: Total time this wakeup source has been active.
* @max_time: Maximum time this wakeup source has been continuously active.
- * @last_time: Monotonic clock when the wakeup source's was activated last time.
+ * @last_time: Monotonic clock when the wakeup source's was touched last time.
* @event_count: Number of signaled wakeup events.
* @active_count: Number of times the wakeup sorce was activated.
* @relax_count: Number of times the wakeup sorce was deactivated.
- * @hit_count: Number of times the wakeup sorce might abort system suspend.
+ * @expire_count: Number of times the wakeup source's timeout has expired.
+ * @wakeup_count: Number of times the wakeup source might abort suspend.
* @active: Status of the wakeup source.
+ * @has_timeout: The wakeup source has been activated with a timeout.
*/
struct wakeup_source {
const char *name;
@@ -52,8 +54,9 @@ struct wakeup_source {
unsigned long event_count;
unsigned long active_count;
unsigned long relax_count;
- unsigned long hit_count;
- unsigned int active:1;
+ unsigned long expire_count;
+ unsigned long wakeup_count;
+ bool active:1;
};

#ifdef CONFIG_PM_SLEEP
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -21,7 +21,7 @@
* If set, the suspend/hibernate code will abort transitions to a sleep state
* if wakeup events are registered during or immediately before the transition.
*/
-bool events_check_enabled;
+bool events_check_enabled __read_mostly;

/*
* Combined counters of registered wakeup events and wakeup events in progress.
@@ -383,6 +383,21 @@ static void wakeup_source_activate(struc
}

/**
+ * wakeup_source_report_event - Report wakeup event using the given source.
+ * @ws: Wakeup source to report the event for.
+ */
+static void wakeup_source_report_event(struct wakeup_source *ws)
+{
+ ws->event_count++;
+ /* This is racy, but the counter is approximate anyway. */
+ if (events_check_enabled)
+ ws->wakeup_count++;
+
+ if (!ws->active)
+ wakeup_source_activate(ws);
+}
+
+/**
* __pm_stay_awake - Notify the PM core of a wakeup event.
* @ws: Wakeup source object associated with the source of the event.
*
@@ -397,10 +412,7 @@ void __pm_stay_awake(struct wakeup_sourc

spin_lock_irqsave(&ws->lock, flags);

- ws->event_count++;
- if (!ws->active)
- wakeup_source_activate(ws);
-
+ wakeup_source_report_event(ws);
del_timer(&ws->timer);
ws->timer_expires = 0;

@@ -469,6 +481,7 @@ static void wakeup_source_deactivate(str
if (ktime_to_ns(duration) > ktime_to_ns(ws->max_time))
ws->max_time = duration;

+ ws->last_time = now;
del_timer(&ws->timer);
ws->timer_expires = 0;

@@ -541,8 +554,10 @@ static void pm_wakeup_timer_fn(unsigned
spin_lock_irqsave(&ws->lock, flags);

if (ws->active && ws->timer_expires
- && time_after_eq(jiffies, ws->timer_expires))
+ && time_after_eq(jiffies, ws->timer_expires)) {
wakeup_source_deactivate(ws);
+ ws->expire_count++;
+ }

spin_unlock_irqrestore(&ws->lock, flags);
}
@@ -569,9 +584,7 @@ void __pm_wakeup_event(struct wakeup_sou

spin_lock_irqsave(&ws->lock, flags);

- ws->event_count++;
- if (!ws->active)
- wakeup_source_activate(ws);
+ wakeup_source_report_event(ws);

if (!msec) {
wakeup_source_deactivate(ws);
@@ -614,24 +627,6 @@ void pm_wakeup_event(struct device *dev,
EXPORT_SYMBOL_GPL(pm_wakeup_event);

/**
- * pm_wakeup_update_hit_counts - Update hit counts of all active wakeup sources.
- */
-static void pm_wakeup_update_hit_counts(void)
-{
- unsigned long flags;
- struct wakeup_source *ws;
-
- rcu_read_lock();
- list_for_each_entry_rcu(ws, &wakeup_sources, entry) {
- spin_lock_irqsave(&ws->lock, flags);
- if (ws->active)
- ws->hit_count++;
- spin_unlock_irqrestore(&ws->lock, flags);
- }
- rcu_read_unlock();
-}
-
-/**
* pm_wakeup_pending - Check if power transition in progress should be aborted.
*
* Compare the current number of registered wakeup events with its preserved
@@ -653,8 +648,6 @@ bool pm_wakeup_pending(void)
events_check_enabled = !ret;
}
spin_unlock_irqrestore(&events_lock, flags);
- if (ret)
- pm_wakeup_update_hit_counts();
return ret;
}

@@ -680,7 +673,6 @@ bool pm_get_wakeup_count(unsigned int *c
split_counters(&cnt, &inpr);
if (inpr == 0 || signal_pending(current))
break;
- pm_wakeup_update_hit_counts();

schedule();
}
@@ -713,8 +705,6 @@ bool pm_save_wakeup_count(unsigned int c
events_check_enabled = true;
}
spin_unlock_irq(&events_lock);
- if (!events_check_enabled)
- pm_wakeup_update_hit_counts();
return events_check_enabled;
}

@@ -749,9 +739,10 @@ static int print_wakeup_source_stats(str
active_time = ktime_set(0, 0);
}

- ret = seq_printf(m, "%-12s\t%lu\t\t%lu\t\t%lu\t\t"
+ ret = seq_printf(m, "%-12s\t%lu\t\t%lu\t\t%lu\t\t%lu\t\t"
"%lld\t\t%lld\t\t%lld\t\t%lld\n",
- ws->name, active_count, ws->event_count, ws->hit_count,
+ ws->name, active_count, ws->event_count,
+ ws->wakeup_count, ws->expire_count,
ktime_to_ms(active_time), ktime_to_ms(total_time),
ktime_to_ms(max_time), ktime_to_ms(ws->last_time));

@@ -768,8 +759,9 @@ static int wakeup_sources_stats_show(str
{
struct wakeup_source *ws;

- seq_puts(m, "name\t\tactive_count\tevent_count\thit_count\t"
- "active_since\ttotal_time\tmax_time\tlast_change\n");
+ seq_puts(m, "name\t\tactive_count\tevent_count\twakeup_count\t"
+ "expire_count\tactive_since\ttotal_time\tmax_time\t"
+ "last_change\n");

rcu_read_lock();
list_for_each_entry_rcu(ws, &wakeup_sources, entry)
Index: linux/drivers/base/power/sysfs.c
===================================================================
--- linux.orig/drivers/base/power/sysfs.c
+++ linux/drivers/base/power/sysfs.c
@@ -288,22 +288,41 @@ static ssize_t wakeup_active_count_show(

static DEVICE_ATTR(wakeup_active_count, 0444, wakeup_active_count_show, NULL);

-static ssize_t wakeup_hit_count_show(struct device *dev,
- struct device_attribute *attr, char *buf)
+static ssize_t wakeup_abort_count_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ unsigned long count = 0;
+ bool enabled = false;
+
+ spin_lock_irq(&dev->power.lock);
+ if (dev->power.wakeup) {
+ count = dev->power.wakeup->wakeup_count;
+ enabled = true;
+ }
+ spin_unlock_irq(&dev->power.lock);
+ return enabled ? sprintf(buf, "%lu\n", count) : sprintf(buf, "\n");
+}
+
+static DEVICE_ATTR(wakeup_abort_count, 0444, wakeup_abort_count_show, NULL);
+
+static ssize_t wakeup_expire_count_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
{
unsigned long count = 0;
bool enabled = false;

spin_lock_irq(&dev->power.lock);
if (dev->power.wakeup) {
- count = dev->power.wakeup->hit_count;
+ count = dev->power.wakeup->expire_count;
enabled = true;
}
spin_unlock_irq(&dev->power.lock);
return enabled ? sprintf(buf, "%lu\n", count) : sprintf(buf, "\n");
}

-static DEVICE_ATTR(wakeup_hit_count, 0444, wakeup_hit_count_show, NULL);
+static DEVICE_ATTR(wakeup_expire_count, 0444, wakeup_expire_count_show, NULL);

static ssize_t wakeup_active_show(struct device *dev,
struct device_attribute *attr, char *buf)
@@ -460,7 +479,8 @@ static struct attribute *wakeup_attrs[]
&dev_attr_wakeup.attr,
&dev_attr_wakeup_count.attr,
&dev_attr_wakeup_active_count.attr,
- &dev_attr_wakeup_hit_count.attr,
+ &dev_attr_wakeup_abort_count.attr,
+ &dev_attr_wakeup_expire_count.attr,
&dev_attr_wakeup_active.attr,
&dev_attr_wakeup_total_time_ms.attr,
&dev_attr_wakeup_max_time_ms.attr,
Index: linux/Documentation/ABI/testing/sysfs-devices-power
===================================================================
--- linux.orig/Documentation/ABI/testing/sysfs-devices-power
+++ linux/Documentation/ABI/testing/sysfs-devices-power
@@ -96,16 +96,26 @@ Description:
is read-only. If the device is not enabled to wake up the
system from sleep states, this attribute is not present.

-What: /sys/devices/.../power/wakeup_hit_count
-Date: September 2010
+What: /sys/devices/.../power/wakeup_abort_count
+Date: February 2012
Contact: Rafael J. Wysocki <[email protected]>
Description:
- The /sys/devices/.../wakeup_hit_count attribute contains the
+ The /sys/devices/.../wakeup_abort_count attribute contains the
number of times the processing of a wakeup event associated with
- the device might prevent the system from entering a sleep state.
- This attribute is read-only. If the device is not enabled to
- wake up the system from sleep states, this attribute is not
- present.
+ the device might have aborted system transition into a sleep
+ state in progress. This attribute is read-only. If the device
+ is not enabled to wake up the system from sleep states, this
+ attribute is not present.
+
+What: /sys/devices/.../power/wakeup_expire_count
+Date: February 2012
+Contact: Rafael J. Wysocki <[email protected]>
+Description:
+ The /sys/devices/.../wakeup_expire_count attribute contains the
+ number of times a wakeup event associated with the device has
+ been reported with a timeout that expired. This attribute is
+ read-only. If the device is not enabled to wake up the system
+ from sleep states, this attribute is not present.

What: /sys/devices/.../power/wakeup_active
Date: September 2010

2012-02-21 23:35:40

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 6/7] PM / Sleep: Add "prevent autosleep time" statistics to wakeup sources

From: Rafael J. Wysocki <[email protected]>

Android uses one wakelock statistics that is only necessary for
opportunistic sleep. Namely, the prevent_suspend_time field
accumulates the total time the given wakelock has been locked
while "automatic suspend" was enabled. Add an analogous field,
prevent_sleep_time, to wakeup sources and make it behave in a similar
way.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
Documentation/ABI/testing/sysfs-devices-power | 11 ++++
drivers/base/power/sysfs.c | 24 ++++++++++
drivers/base/power/wakeup.c | 61 ++++++++++++++++++++++++--
include/linux/pm_wakeup.h | 4 +
include/linux/suspend.h | 1
kernel/power/autosleep.c | 2
6 files changed, 99 insertions(+), 4 deletions(-)

Index: linux/include/linux/pm_wakeup.h
===================================================================
--- linux.orig/include/linux/pm_wakeup.h
+++ linux/include/linux/pm_wakeup.h
@@ -34,6 +34,7 @@
* @total_time: Total time this wakeup source has been active.
* @max_time: Maximum time this wakeup source has been continuously active.
* @last_time: Monotonic clock when the wakeup source's was touched last time.
+ * @prevent_sleep_time: Total time this source has been preventing autosleep.
* @event_count: Number of signaled wakeup events.
* @active_count: Number of times the wakeup sorce was activated.
* @relax_count: Number of times the wakeup sorce was deactivated.
@@ -51,12 +52,15 @@ struct wakeup_source {
ktime_t total_time;
ktime_t max_time;
ktime_t last_time;
+ ktime_t start_prevent_time;
+ ktime_t prevent_sleep_time;
unsigned long event_count;
unsigned long active_count;
unsigned long relax_count;
unsigned long expire_count;
unsigned long wakeup_count;
bool active:1;
+ bool autosleep_enabled:1;
};

#ifdef CONFIG_PM_SLEEP
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -377,6 +377,8 @@ static void wakeup_source_activate(struc
ws->active = true;
ws->active_count++;
ws->last_time = ktime_get();
+ if (ws->autosleep_enabled)
+ ws->start_prevent_time = ws->last_time;

/* Increment the counter of events in progress. */
atomic_inc(&combined_event_count);
@@ -444,6 +446,17 @@ void pm_stay_awake(struct device *dev)
}
EXPORT_SYMBOL_GPL(pm_stay_awake);

+#ifdef CONFIG_PM_AUTOSLEEP
+static void update_prevent_sleep_time(struct wakeup_source *ws, ktime_t now)
+{
+ ktime_t delta = ktime_sub(now, ws->start_prevent_time);
+ ws->prevent_sleep_time = ktime_add(ws->prevent_sleep_time, delta);
+}
+#else
+static inline void update_prevent_sleep_time(struct wakeup_source *ws,
+ ktime_t now) {}
+#endif
+
/**
* wakup_source_deactivate - Mark given wakeup source as inactive.
* @ws: Wakeup source to handle.
@@ -485,6 +498,9 @@ static void wakeup_source_deactivate(str
del_timer(&ws->timer);
ws->timer_expires = 0;

+ if (ws->autosleep_enabled)
+ update_prevent_sleep_time(ws, now);
+
/*
* Increment the counter of registered wakeup events and decrement the
* couter of wakeup events in progress simultaneously.
@@ -714,6 +730,34 @@ bool pm_save_wakeup_count(unsigned int c
return events_check_enabled;
}

+#ifdef CONFIG_PM_AUTOSLEEP
+/**
+ * pm_wakep_autosleep_enabled - Modify autosleep_enabled for all wakeup sources.
+ * @enabled: Whether to set or to clear the autosleep_enabled flags.
+ */
+void pm_wakep_autosleep_enabled(bool set)
+{
+ struct wakeup_source *ws;
+ ktime_t now = ktime_get();
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(ws, &wakeup_sources, entry) {
+ spin_lock_irq(&ws->lock);
+ if (ws->autosleep_enabled != set) {
+ ws->autosleep_enabled = set;
+ if (ws->active) {
+ if (set)
+ ws->start_prevent_time = now;
+ else
+ update_prevent_sleep_time(ws, now);
+ }
+ }
+ spin_unlock_irq(&ws->lock);
+ }
+ rcu_read_unlock();
+}
+#endif /* CONFIG_PM_AUTOSLEEP */
+
static struct dentry *wakeup_sources_stats_dentry;

/**
@@ -729,28 +773,37 @@ static int print_wakeup_source_stats(str
ktime_t max_time;
unsigned long active_count;
ktime_t active_time;
+ ktime_t prevent_sleep_time;
int ret;

spin_lock_irqsave(&ws->lock, flags);

total_time = ws->total_time;
max_time = ws->max_time;
+ prevent_sleep_time = ws->prevent_sleep_time;
active_count = ws->active_count;
if (ws->active) {
- active_time = ktime_sub(ktime_get(), ws->last_time);
+ ktime_t now = ktime_get();
+
+ active_time = ktime_sub(now, ws->last_time);
total_time = ktime_add(total_time, active_time);
if (active_time.tv64 > max_time.tv64)
max_time = active_time;
+
+ if (ws->autosleep_enabled)
+ prevent_sleep_time = ktime_add(prevent_sleep_time,
+ ktime_sub(now, ws->start_prevent_time));
} else {
active_time = ktime_set(0, 0);
}

ret = seq_printf(m, "%-12s\t%lu\t\t%lu\t\t%lu\t\t%lu\t\t"
- "%lld\t\t%lld\t\t%lld\t\t%lld\n",
+ "%lld\t\t%lld\t\t%lld\t\t%lld\t\t%lld\n",
ws->name, active_count, ws->event_count,
ws->wakeup_count, ws->expire_count,
ktime_to_ms(active_time), ktime_to_ms(total_time),
- ktime_to_ms(max_time), ktime_to_ms(ws->last_time));
+ ktime_to_ms(max_time), ktime_to_ms(ws->last_time),
+ ktime_to_ms(prevent_sleep_time));

spin_unlock_irqrestore(&ws->lock, flags);

@@ -767,7 +820,7 @@ static int wakeup_sources_stats_show(str

seq_puts(m, "name\t\tactive_count\tevent_count\twakeup_count\t"
"expire_count\tactive_since\ttotal_time\tmax_time\t"
- "last_change\n");
+ "last_change\tprevent_suspend_time\n");

rcu_read_lock();
list_for_each_entry_rcu(ws, &wakeup_sources, entry)
Index: linux/include/linux/suspend.h
===================================================================
--- linux.orig/include/linux/suspend.h
+++ linux/include/linux/suspend.h
@@ -358,6 +358,7 @@ extern bool events_check_enabled;
extern bool pm_wakeup_pending(void);
extern bool pm_get_wakeup_count(unsigned int *count, bool block);
extern bool pm_save_wakeup_count(unsigned int count);
+extern void pm_wakep_autosleep_enabled(bool set);

static inline void lock_system_sleep(void)
{
Index: linux/kernel/power/autosleep.c
===================================================================
--- linux.orig/kernel/power/autosleep.c
+++ linux/kernel/power/autosleep.c
@@ -73,8 +73,10 @@ int pm_autosleep_set_state(suspend_state
mutex_lock(&autosleep_lock);
if (state == PM_SUSPEND_ON && autosleep_state != PM_SUSPEND_ON) {
autosleep_state = PM_SUSPEND_ON;
+ pm_wakep_autosleep_enabled(false);
} else if (state > PM_SUSPEND_ON) {
autosleep_state = state;
+ pm_wakep_autosleep_enabled(true);
queue_up_suspend_work();
}
mutex_unlock(&autosleep_lock);
Index: linux/drivers/base/power/sysfs.c
===================================================================
--- linux.orig/drivers/base/power/sysfs.c
+++ linux/drivers/base/power/sysfs.c
@@ -391,6 +391,27 @@ static ssize_t wakeup_last_time_show(str
}

static DEVICE_ATTR(wakeup_last_time_ms, 0444, wakeup_last_time_show, NULL);
+
+#ifdef CONFIG_PM_AUTOSLEEP
+static ssize_t wakeup_prevent_sleep_time_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ s64 msec = 0;
+ bool enabled = false;
+
+ spin_lock_irq(&dev->power.lock);
+ if (dev->power.wakeup) {
+ msec = ktime_to_ms(dev->power.wakeup->prevent_sleep_time);
+ enabled = true;
+ }
+ spin_unlock_irq(&dev->power.lock);
+ return enabled ? sprintf(buf, "%lld\n", msec) : sprintf(buf, "\n");
+}
+
+static DEVICE_ATTR(wakeup_prevent_sleep_time_ms, 0444,
+ wakeup_prevent_sleep_time_show, NULL);
+#endif /* CONFIG_PM_AUTOSLEEP */
#endif /* CONFIG_PM_SLEEP */

#ifdef CONFIG_PM_ADVANCED_DEBUG
@@ -485,6 +506,9 @@ static struct attribute *wakeup_attrs[]
&dev_attr_wakeup_total_time_ms.attr,
&dev_attr_wakeup_max_time_ms.attr,
&dev_attr_wakeup_last_time_ms.attr,
+#ifdef CONFIG_PM_AUTOSLEEP
+ &dev_attr_wakeup_prevent_sleep_time_ms.attr,
+#endif
#endif
NULL,
};
Index: linux/Documentation/ABI/testing/sysfs-devices-power
===================================================================
--- linux.orig/Documentation/ABI/testing/sysfs-devices-power
+++ linux/Documentation/ABI/testing/sysfs-devices-power
@@ -158,6 +158,17 @@ Description:
not enabled to wake up the system from sleep states, this
attribute is not present.

+What: /sys/devices/.../power/wakeup_prevent_sleep_time_ms
+Date: February 2012
+Contact: Rafael J. Wysocki <[email protected]>
+Description:
+ The /sys/devices/.../wakeup_prevent_sleep_time_ms attribute
+ contains the total time the device has been preventing
+ opportunistic transitions to sleep states from occuring.
+ This attribute is read-only. If the device is not enabled to
+ wake up the system from sleep states, this attribute is not
+ present.
+
What: /sys/devices/.../power/autosuspend_delay_ms
Date: September 2010
Contact: Alan Stern <[email protected]>

2012-02-21 23:35:51

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 5/7] PM / Sleep: Implement opportunistic sleep

From: Rafael J. Wysocki <[email protected]>

Introduce a mechanism by which the kernel can trigger global
transitions to a sleep state chosen by user space if there are no
active wakeup sources.

It consists of a new sysfs attribute, /sys/power/autosleep, that
can be written one of the strings returned by reads from
/sys/power/state, an ordered workqueue and a work item carrying out
the "suspend" operations. If a string representing the system's
sleep state is written to /sys/power/autosleep, the work item
triggering transitions to that state is queued up and it requeues
itself after every execution until user space writes "off" to
/sys/power/autosleep.

That work item enables the detection of wakeup events using the
functions already defined in drivers/base/power/wakeup.c (with one
small modification) and calls either pm_suspend(), or hibernate() to
put the system into a sleep state. If a wakeup event is reported
while the transition is in progress, it will abort the transition and
the "system suspend" work item will be queued up again.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
Documentation/ABI/testing/sysfs-power | 17 +++++
drivers/base/power/wakeup.c | 38 ++++++-----
include/linux/suspend.h | 13 +++-
kernel/power/Kconfig | 8 ++
kernel/power/Makefile | 1
kernel/power/autosleep.c | 98 ++++++++++++++++++++++++++++++
kernel/power/main.c | 108 ++++++++++++++++++++++++++++------
kernel/power/power.h | 18 +++++
8 files changed, 266 insertions(+), 35 deletions(-)

Index: linux/kernel/power/Makefile
===================================================================
--- linux.orig/kernel/power/Makefile
+++ linux/kernel/power/Makefile
@@ -9,5 +9,6 @@ obj-$(CONFIG_SUSPEND) += suspend.o
obj-$(CONFIG_PM_TEST_SUSPEND) += suspend_test.o
obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o \
block_io.o
+obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o

obj-$(CONFIG_MAGIC_SYSRQ) += poweroff.o
Index: linux/kernel/power/Kconfig
===================================================================
--- linux.orig/kernel/power/Kconfig
+++ linux/kernel/power/Kconfig
@@ -103,6 +103,14 @@ config PM_SLEEP_SMP
select HOTPLUG
select HOTPLUG_CPU

+config PM_AUTOSLEEP
+ bool "Opportunistic sleep"
+ depends on PM_SLEEP
+ default n
+ ---help---
+ Allow the kernel to trigger a system transition into a global sleep
+ state automatically whenever there are no active wakeup sources.
+
config PM_RUNTIME
bool "Run-time PM core functionality"
depends on !IA64_HP_SIM
Index: linux/kernel/power/power.h
===================================================================
--- linux.orig/kernel/power/power.h
+++ linux/kernel/power/power.h
@@ -264,3 +264,21 @@ static inline void suspend_thaw_processe
{
}
#endif
+
+#ifdef CONFIG_PM_AUTOSLEEP
+
+/* kernel/power/autosleep.c */
+extern int pm_autosleep_init(void);
+extern void pm_autosleep_lock(void);
+extern void pm_autosleep_unlock(void);
+extern suspend_state_t pm_autosleep_state(void);
+extern int pm_autosleep_set_state(suspend_state_t state);
+
+#else /* !CONFIG_PM_AUTOSLEEP */
+
+static inline int pm_autosleep_init(void) { return 0; }
+static inline void pm_autosleep_lock(void) {}
+static inline void pm_autosleep_unlock(void) {}
+static inline suspend_state_t pm_autosleep_state(void) { return PM_SUSPEND_ON; }
+
+#endif /* !CONFIG_PM_AUTOSLEEP */
Index: linux/include/linux/suspend.h
===================================================================
--- linux.orig/include/linux/suspend.h
+++ linux/include/linux/suspend.h
@@ -356,7 +356,7 @@ extern int unregister_pm_notifier(struct
extern bool events_check_enabled;

extern bool pm_wakeup_pending(void);
-extern bool pm_get_wakeup_count(unsigned int *count);
+extern bool pm_get_wakeup_count(unsigned int *count, bool block);
extern bool pm_save_wakeup_count(unsigned int count);

static inline void lock_system_sleep(void)
@@ -407,6 +407,17 @@ static inline void unlock_system_sleep(v

#endif /* !CONFIG_PM_SLEEP */

+#ifdef CONFIG_PM_AUTOSLEEP
+
+/* kernel/power/autosleep.c */
+void queue_up_suspend_work(void);
+
+#else /* !CONFIG_PM_AUTOSLEEP */
+
+static inline void queue_up_suspend_work(void) {}
+
+#endif /* !CONFIG_PM_AUTOSLEEP */
+
#ifdef CONFIG_ARCH_SAVE_PAGE_KEYS
/*
* The ARCH_SAVE_PAGE_KEYS functions can be used by an architecture
Index: linux/kernel/power/autosleep.c
===================================================================
--- /dev/null
+++ linux/kernel/power/autosleep.c
@@ -0,0 +1,98 @@
+/*
+ * kernel/power/autosleep.c
+ *
+ * Opportunistic sleep support.
+ *
+ * Copyright (C) 2012 Rafael J. Wysocki <[email protected]>
+ */
+
+#include <linux/device.h>
+#include <linux/mutex.h>
+#include <linux/pm_wakeup.h>
+
+#include "power.h"
+
+static suspend_state_t autosleep_state;
+static struct workqueue_struct *autosleep_wq;
+static DEFINE_MUTEX(autosleep_lock);
+
+static void try_to_suspend(struct work_struct *work)
+{
+ unsigned int initial_count, final_count;
+
+ if (!pm_get_wakeup_count(&initial_count, true))
+ goto out;
+
+ mutex_lock(&autosleep_lock);
+
+ if (!pm_save_wakeup_count(initial_count)) {
+ mutex_unlock(&autosleep_lock);
+ goto out;
+ }
+
+ if (autosleep_state == PM_SUSPEND_ON) {
+ mutex_unlock(&autosleep_lock);
+ return;
+ }
+ if (autosleep_state >= PM_SUSPEND_MAX)
+ hibernate();
+ else
+ pm_suspend(autosleep_state);
+
+ mutex_unlock(&autosleep_lock);
+
+ if (!pm_get_wakeup_count(&final_count, false))
+ goto out;
+
+ if (final_count == initial_count)
+ schedule_timeout(HZ / 2);
+
+ out:
+ queue_up_suspend_work();
+}
+
+static DECLARE_WORK(suspend_work, try_to_suspend);
+
+void queue_up_suspend_work(void)
+{
+ if (!work_pending(&suspend_work) && autosleep_state > PM_SUSPEND_ON)
+ queue_work(autosleep_wq, &suspend_work);
+}
+
+suspend_state_t pm_autosleep_state(void)
+{
+ return autosleep_state;
+}
+
+int pm_autosleep_set_state(suspend_state_t state)
+{
+#ifndef CONFIG_HIBERNATION
+ if (state >= PM_SUSPEND_MAX)
+ return -EINVAL;
+#endif
+ mutex_lock(&autosleep_lock);
+ if (state == PM_SUSPEND_ON && autosleep_state != PM_SUSPEND_ON) {
+ autosleep_state = PM_SUSPEND_ON;
+ } else if (state > PM_SUSPEND_ON) {
+ autosleep_state = state;
+ queue_up_suspend_work();
+ }
+ mutex_unlock(&autosleep_lock);
+ return 0;
+}
+
+void pm_autosleep_lock(void)
+{
+ mutex_lock(&autosleep_lock);
+}
+
+void pm_autosleep_unlock(void)
+{
+ mutex_unlock(&autosleep_lock);
+}
+
+int __init pm_autosleep_init(void)
+{
+ autosleep_wq = alloc_ordered_workqueue("autosleep", 0);
+ return autosleep_wq ? 0 : -ENOMEM;
+}
Index: linux/kernel/power/main.c
===================================================================
--- linux.orig/kernel/power/main.c
+++ linux/kernel/power/main.c
@@ -269,8 +269,7 @@ static ssize_t state_show(struct kobject
return (s - buf);
}

-static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
- const char *buf, size_t n)
+static suspend_state_t decode_state(const char *buf, size_t n)
{
#ifdef CONFIG_SUSPEND
suspend_state_t state = PM_SUSPEND_STANDBY;
@@ -278,27 +277,43 @@ static ssize_t state_store(struct kobjec
#endif
char *p;
int len;
- int error = -EINVAL;

p = memchr(buf, '\n', n);
len = p ? p - buf : n;

- /* First, check if we are requested to hibernate */
- if (len == 4 && !strncmp(buf, "disk", len)) {
- error = hibernate();
- goto Exit;
- }
+ /* Check hibernation first. */
+ if (len == 4 && !strncmp(buf, "disk", len))
+ return PM_SUSPEND_MAX;

#ifdef CONFIG_SUSPEND
- for (s = &pm_states[state]; state < PM_SUSPEND_MAX; s++, state++) {
- if (*s && len == strlen(*s) && !strncmp(buf, *s, len)) {
- error = pm_suspend(state);
- break;
- }
- }
+ for (s = &pm_states[state]; state < PM_SUSPEND_MAX; s++, state++)
+ if (*s && len == strlen(*s) && !strncmp(buf, *s, len))
+ return state;
#endif

- Exit:
+ return PM_SUSPEND_ON;
+}
+
+static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ suspend_state_t state;
+ int error = -EINVAL;
+
+ pm_autosleep_lock();
+ if (pm_autosleep_state() > PM_SUSPEND_ON) {
+ error = -EBUSY;
+ goto out;
+ }
+
+ state = decode_state(buf, n);
+ if (state < PM_SUSPEND_MAX)
+ error = pm_suspend(state);
+ else if (state > PM_SUSPEND_ON)
+ error = hibernate();
+
+ out:
+ pm_autosleep_unlock();
return error ? error : n;
}

@@ -339,7 +354,8 @@ static ssize_t wakeup_count_show(struct
{
unsigned int val;

- return pm_get_wakeup_count(&val) ? sprintf(buf, "%u\n", val) : -EINTR;
+ return pm_get_wakeup_count(&val, true) ?
+ sprintf(buf, "%u\n", val) : -EINTR;
}

static ssize_t wakeup_count_store(struct kobject *kobj,
@@ -347,15 +363,65 @@ static ssize_t wakeup_count_store(struct
const char *buf, size_t n)
{
unsigned int val;
+ int error = -EINVAL;
+
+ pm_autosleep_lock();
+ if (pm_autosleep_state() > PM_SUSPEND_ON) {
+ error = -EBUSY;
+ goto out;
+ }

if (sscanf(buf, "%u", &val) == 1) {
if (pm_save_wakeup_count(val))
return n;
}
- return -EINVAL;
+
+ out:
+ pm_autosleep_unlock();
+ return error;
}

power_attr(wakeup_count);
+
+#ifdef CONFIG_PM_AUTOSLEEP
+static ssize_t autosleep_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ suspend_state_t state = pm_autosleep_state();
+
+ if (state == PM_SUSPEND_ON)
+ return sprintf(buf, "off\n");
+
+#ifdef CONFIG_SUSPEND
+ if (state < PM_SUSPEND_MAX)
+ return sprintf(buf, "%s\n", valid_state(state) ?
+ pm_states[state] : "error");
+#endif
+#ifdef CONFIG_HIBERNATION
+ return sprintf(buf, "disk\n");
+#else
+ return sprintf(buf, "error");
+#endif
+}
+
+static ssize_t autosleep_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ suspend_state_t state = decode_state(buf, n);
+ int error;
+
+ if (state == PM_SUSPEND_ON && strncmp(buf, "off", 3)
+ && strncmp(buf, "off\n", 4))
+ return -EINVAL;
+
+ error = pm_autosleep_set_state(state);
+ return error ? error : n;
+}
+
+power_attr(autosleep);
+#endif /* CONFIG_PM_AUTOSLEEP */
#endif /* CONFIG_PM_SLEEP */

#ifdef CONFIG_PM_TRACE
@@ -409,6 +475,9 @@ static struct attribute * g[] = {
#ifdef CONFIG_PM_SLEEP
&pm_async_attr.attr,
&wakeup_count_attr.attr,
+#ifdef CONFIG_PM_AUTOSLEEP
+ &autosleep_attr.attr,
+#endif
#ifdef CONFIG_PM_DEBUG
&pm_test_attr.attr,
#endif
@@ -444,7 +513,10 @@ static int __init pm_init(void)
power_kobj = kobject_create_and_add("power", NULL);
if (!power_kobj)
return -ENOMEM;
- return sysfs_create_group(power_kobj, &attr_group);
+ error = sysfs_create_group(power_kobj, &attr_group);
+ if (error)
+ return error;
+ return pm_autosleep_init();
}

core_initcall(pm_init);
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -492,8 +492,10 @@ static void wakeup_source_deactivate(str
atomic_add(MAX_IN_PROGRESS, &combined_event_count);

split_counters(&cnt, &inpr);
- if (!inpr && waitqueue_active(&wakeup_count_wait_queue))
+ if (!inpr && waitqueue_active(&wakeup_count_wait_queue)) {
wake_up(&wakeup_count_wait_queue);
+ queue_up_suspend_work();
+ }
}

/**
@@ -654,29 +656,33 @@ bool pm_wakeup_pending(void)
/**
* pm_get_wakeup_count - Read the number of registered wakeup events.
* @count: Address to store the value at.
+ * @block: Whether or not to block.
*
- * Store the number of registered wakeup events at the address in @count. Block
- * if the current number of wakeup events being processed is nonzero.
+ * Store the number of registered wakeup events at the address in @count. If
+ * @block is set, block until the current number of wakeup events being
+ * processed is zero.
*
- * Return 'false' if the wait for the number of wakeup events being processed to
- * drop down to zero has been interrupted by a signal (and the current number
- * of wakeup events being processed is still nonzero). Otherwise return 'true'.
+ * Return 'false' if the current number of wakeup events being processed is
+ * nonzero. Otherwise return 'true'.
*/
-bool pm_get_wakeup_count(unsigned int *count)
+bool pm_get_wakeup_count(unsigned int *count, bool block)
{
unsigned int cnt, inpr;
- DEFINE_WAIT(wait);

- for (;;) {
- prepare_to_wait(&wakeup_count_wait_queue, &wait,
- TASK_INTERRUPTIBLE);
- split_counters(&cnt, &inpr);
- if (inpr == 0 || signal_pending(current))
- break;
+ if (block) {
+ DEFINE_WAIT(wait);

- schedule();
+ for (;;) {
+ prepare_to_wait(&wakeup_count_wait_queue, &wait,
+ TASK_INTERRUPTIBLE);
+ split_counters(&cnt, &inpr);
+ if (inpr == 0 || signal_pending(current))
+ break;
+
+ schedule();
+ }
+ finish_wait(&wakeup_count_wait_queue, &wait);
}
- finish_wait(&wakeup_count_wait_queue, &wait);

split_counters(&cnt, &inpr);
*count = cnt;
Index: linux/Documentation/ABI/testing/sysfs-power
===================================================================
--- linux.orig/Documentation/ABI/testing/sysfs-power
+++ linux/Documentation/ABI/testing/sysfs-power
@@ -172,3 +172,20 @@ Description:

Reading from this file will display the current value, which is
set to 1 MB by default.
+
+What: /sys/power/autosleep
+Date: February 2012
+Contact: Rafael J. Wysocki <[email protected]>
+Description:
+ The /sys/power/autosleep file can be written one of the strings
+ returned by reads from /sys/power/state. If that happens, a
+ work item attempting to trigger a transition of the system to
+ the sleep state represented by that string is queued up. This
+ attempt will only succeed if there are no active wakeup sources
+ in the system at that time. After evey execution, regardless
+ of whether or not the attempt to put the system to sleep has
+ succeeded, the work item requeues itself until user space
+ writes "off" to /sys/power/autosleep.
+
+ Reading from this file causes the last string successfully
+ written to it to be displayed.

2012-02-21 23:36:06

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 7/7] PM / Sleep: Add user space interface for manipulating wakeup sources

From: Rafael J. Wysocki <[email protected]>

Android allows user space to manipulate wakelocks using two
sysfs file located in /sys/power/, wake_lock and wake_unlock.
Writing a wakelock name and optionally a timeout to the wake_lock
file causes the wakelock whose name was written to be acquired (it
is created before is necessary), optionally with the given timeout.
Writing the name of a wakelock to wake_unlock causes that wakelock
to be released.

Implement an analogous interface for user space using wakeup sources.
Add the /sys/power/wake_lock and /sys/power/wake_unlock files
allowing user space to create, activate and deactivate wakeup
sources, such that writing a name and optionally a timeout to
wake_lock causes the wakeup source of that name to be activated,
optionally with the given timeout. If that wakeup source doesn't
exist, it will be created and then activated. Writing a name to
wake_unlock causes the wakeup source of that name, if there is one,
to be deactivated. Wakeup sources created with the help of
wake_lock that haven't been used for more than 5 minutes are garbage
collected and destroyed. Moreover, there can be only WL_NUMBER_LIMIT
wakeup sources created with the help of wake_lock present at a time.

The data type used to track wakeup sources created by user space is
called "struct wakelock" to indicate the origins of this feature.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
Documentation/ABI/testing/sysfs-power | 42 ++++++
drivers/base/power/wakeup.c | 1
kernel/power/Kconfig | 8 +
kernel/power/Makefile | 1
kernel/power/main.c | 41 ++++++
kernel/power/power.h | 9 +
kernel/power/wakelock.c | 218 ++++++++++++++++++++++++++++++++++
7 files changed, 320 insertions(+)

Index: linux/kernel/power/main.c
===================================================================
--- linux.orig/kernel/power/main.c
+++ linux/kernel/power/main.c
@@ -422,6 +422,43 @@ static ssize_t autosleep_store(struct ko

power_attr(autosleep);
#endif /* CONFIG_PM_AUTOSLEEP */
+
+#ifdef CONFIG_PM_WAKELOCKS
+static ssize_t wake_lock_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ return pm_show_wakelocks(buf, true);
+}
+
+static ssize_t wake_lock_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ int error = pm_wake_lock(buf);
+ return error ? error : n;
+}
+
+power_attr(wake_lock);
+
+static ssize_t wake_unlock_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ return pm_show_wakelocks(buf, false);
+}
+
+static ssize_t wake_unlock_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ int error = pm_wake_unlock(buf);
+ return error ? error : n;
+}
+
+power_attr(wake_unlock);
+
+#endif /* CONFIG_PM_WAKELOCKS */
#endif /* CONFIG_PM_SLEEP */

#ifdef CONFIG_PM_TRACE
@@ -478,6 +515,10 @@ static struct attribute * g[] = {
#ifdef CONFIG_PM_AUTOSLEEP
&autosleep_attr.attr,
#endif
+#ifdef CONFIG_PM_WAKELOCKS
+ &wake_lock_attr.attr,
+ &wake_unlock_attr.attr,
+#endif
#ifdef CONFIG_PM_DEBUG
&pm_test_attr.attr,
#endif
Index: linux/kernel/power/power.h
===================================================================
--- linux.orig/kernel/power/power.h
+++ linux/kernel/power/power.h
@@ -282,3 +282,12 @@ static inline void pm_autosleep_unlock(v
static inline suspend_state_t pm_autosleep_state(void) { return PM_SUSPEND_ON; }

#endif /* !CONFIG_PM_AUTOSLEEP */
+
+#ifdef CONFIG_PM_WAKELOCKS
+
+/* kernel/power/wakelock.c */
+extern ssize_t pm_show_wakelocks(char *buf, bool show_active);
+extern int pm_wake_lock(const char *buf);
+extern int pm_wake_unlock(const char *buf);
+
+#endif /* !CONFIG_PM_WAKELOCKS */
Index: linux/kernel/power/Kconfig
===================================================================
--- linux.orig/kernel/power/Kconfig
+++ linux/kernel/power/Kconfig
@@ -111,6 +111,14 @@ config PM_AUTOSLEEP
Allow the kernel to trigger a system transition into a global sleep
state automatically whenever there are no active wakeup sources.

+config PM_WAKELOCKS
+ bool "User space wakeup sources interface"
+ depends on PM_SLEEP
+ default n
+ ---help---
+ Allow user space to create, activate and deactivate wakeup source
+ objects with the help of a sysfs-based interface.
+
config PM_RUNTIME
bool "Run-time PM core functionality"
depends on !IA64_HP_SIM
Index: linux/kernel/power/wakelock.c
===================================================================
--- /dev/null
+++ linux/kernel/power/wakelock.c
@@ -0,0 +1,218 @@
+/*
+ * kernel/power/wakelock.c
+ *
+ * User space wakeup sources support.
+ *
+ * Copyright (C) 2012 Rafael J. Wysocki <[email protected]>
+ *
+ * This code is based on the analogous interface allowing user space to
+ * manipulate wakelocks on Android.
+ */
+
+#include <linux/ctype.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/hrtimer.h>
+#include <linux/list.h>
+#include <linux/rbtree.h>
+#include <linux/slab.h>
+
+#define WL_NUMBER_LIMIT 100
+#define WL_GC_COUNT_MAX 100
+#define WL_GC_TIME_SEC 300
+
+static DEFINE_MUTEX(wakelocks_lock);
+
+struct wakelock {
+ char *name;
+ struct rb_node node;
+ struct wakeup_source ws;
+ struct list_head lru;
+};
+
+static struct rb_root wakelocks_tree = RB_ROOT;
+static LIST_HEAD(wakelocks_lru_list);
+static unsigned int number_of_wakelocks;
+static unsigned int wakelocks_gc_count;
+
+ssize_t pm_show_wakelocks(char *buf, bool show_active)
+{
+ struct rb_node *node;
+ struct wakelock *wl;
+ char *str = buf;
+ char *end = buf + PAGE_SIZE;
+
+ mutex_lock(&wakelocks_lock);
+
+ for (node = rb_first(&wakelocks_tree); node; node = rb_next(node)) {
+ bool active;
+
+ wl = rb_entry(node, struct wakelock, node);
+ spin_lock_irq(&wl->ws.lock);
+ active = wl->ws.active;
+ spin_unlock_irq(&wl->ws.lock);
+ if (active == show_active)
+ str += scnprintf(str, end - str, "%s ", wl->name);
+ }
+ str += scnprintf(str, end - str, "\n");
+
+ mutex_unlock(&wakelocks_lock);
+ return (str - buf);
+}
+
+static struct wakelock *wakelock_lookup_add(const char *name, size_t len,
+ bool add_if_not_found)
+{
+ struct rb_node **node = &wakelocks_tree.rb_node;
+ struct rb_node *parent = *node;
+ struct wakelock *wl;
+
+ while (*node) {
+ int diff;
+
+ wl = rb_entry(*node, struct wakelock, node);
+ diff = strncmp(name, wl->name, len);
+ if (diff == 0) {
+ if (wl->name[len])
+ diff = -1;
+ else
+ return wl;
+ }
+ if (diff < 0)
+ node = &(*node)->rb_left;
+ else
+ node = &(*node)->rb_right;
+
+ parent = *node;
+ }
+ if (!add_if_not_found)
+ return ERR_PTR(-EINVAL);
+
+ if (number_of_wakelocks > WL_NUMBER_LIMIT)
+ return ERR_PTR(-ENOSPC);
+
+ /* Not found, we have to add a new one. */
+ wl = kzalloc(sizeof(*wl), GFP_KERNEL);
+ if (!wl)
+ return ERR_PTR(-ENOMEM);
+
+ wl->name = kstrndup(name, len, GFP_KERNEL);
+ if (!wl->name) {
+ kfree(wl);
+ return ERR_PTR(-ENOMEM);
+ }
+ wl->ws.name = wl->name;
+ wakeup_source_add(&wl->ws);
+ rb_link_node(&wl->node, parent, node);
+ rb_insert_color(&wl->node, &wakelocks_tree);
+ list_add(&wl->lru, &wakelocks_lru_list);
+ number_of_wakelocks++;
+ return wl;
+}
+
+int pm_wake_lock(const char *buf)
+{
+ const char *str = buf;
+ struct wakelock *wl;
+ u64 timeout_ns = 0;
+ size_t len;
+ int ret = 0;
+
+ while (*str && !isspace(*str))
+ str++;
+
+ len = str - buf;
+ if (!len)
+ return -EINVAL;
+
+ if (*str && *str != '\n') {
+ /* Find out if there's a valid timeout string appended. */
+ ret = kstrtou64(skip_spaces(str), 10, &timeout_ns);
+ if (ret)
+ return -EINVAL;
+ }
+
+ mutex_lock(&wakelocks_lock);
+
+ wl = wakelock_lookup_add(buf, len, true);
+ if (IS_ERR(wl)) {
+ ret = PTR_ERR(wl);
+ goto out;
+ }
+ if (timeout_ns) {
+ u64 timeout_ms = timeout_ns + NSEC_PER_MSEC - 1;
+
+ do_div(timeout_ms, NSEC_PER_MSEC);
+ __pm_wakeup_event(&wl->ws, timeout_ms);
+ } else {
+ __pm_stay_awake(&wl->ws);
+ }
+
+ list_move(&wl->lru, &wakelocks_lru_list);
+
+ out:
+ mutex_unlock(&wakelocks_lock);
+ return ret;
+}
+
+static void wakelocks_gc(void)
+{
+ struct wakelock *wl, *aux;
+ ktime_t now = ktime_get();
+
+ list_for_each_entry_safe_reverse(wl, aux, &wakelocks_lru_list, lru) {
+ u64 idle_time_ns;
+ bool active;
+
+ spin_lock_irq(&wl->ws.lock);
+ idle_time_ns = ktime_to_ns(ktime_sub(now, wl->ws.last_time));
+ active = wl->ws.active;
+ spin_unlock_irq(&wl->ws.lock);
+
+ if (idle_time_ns < ((u64)WL_GC_TIME_SEC * NSEC_PER_SEC))
+ break;
+
+ if (!active) {
+ wakeup_source_remove(&wl->ws);
+ rb_erase(&wl->node, &wakelocks_tree);
+ list_del(&wl->lru);
+ kfree(wl->name);
+ kfree(wl);
+ number_of_wakelocks--;
+ }
+ }
+ wakelocks_gc_count = 0;
+}
+
+int pm_wake_unlock(const char *buf)
+{
+ struct wakelock *wl;
+ size_t len;
+ int ret = 0;
+
+ len = strlen(buf);
+ if (!len)
+ return -EINVAL;
+
+ if (buf[len-1] == '\n')
+ len--;
+
+ if (!len)
+ return -EINVAL;
+
+ mutex_lock(&wakelocks_lock);
+
+ wl = wakelock_lookup_add(buf, len, false);
+ if (IS_ERR(wl)) {
+ ret = PTR_ERR(wl);
+ goto out;
+ }
+ __pm_relax(&wl->ws);
+ list_move(&wl->lru, &wakelocks_lru_list);
+ if (++wakelocks_gc_count > WL_GC_COUNT_MAX)
+ wakelocks_gc();
+
+ out:
+ mutex_unlock(&wakelocks_lock);
+ return ret;
+}
Index: linux/kernel/power/Makefile
===================================================================
--- linux.orig/kernel/power/Makefile
+++ linux/kernel/power/Makefile
@@ -10,5 +10,6 @@ obj-$(CONFIG_PM_TEST_SUSPEND) += suspend
obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o \
block_io.o
obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o
+obj-$(CONFIG_PM_WAKELOCKS) += wakelock.o

obj-$(CONFIG_MAGIC_SYSRQ) += poweroff.o
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -132,6 +132,7 @@ void wakeup_source_add(struct wakeup_sou
spin_lock_init(&ws->lock);
setup_timer(&ws->timer, pm_wakeup_timer_fn, (unsigned long)ws);
ws->active = false;
+ ws->last_time = ktime_get();

spin_lock_irq(&events_lock);
list_add_rcu(&ws->entry, &wakeup_sources);
Index: linux/Documentation/ABI/testing/sysfs-power
===================================================================
--- linux.orig/Documentation/ABI/testing/sysfs-power
+++ linux/Documentation/ABI/testing/sysfs-power
@@ -189,3 +189,45 @@ Description:

Reading from this file causes the last string successfully
written to it to be displayed.
+
+What: /sys/power/wake_lock
+Date: February 2012
+Contact: Rafael J. Wysocki <[email protected]>
+Description:
+ The /sys/power/wake_lock file allows user space to create
+ wakeup source objects and activate them on demand (if one of
+ those wakeup sources is active, reads from the
+ /sys/power/wakeup_count file block or return false). When a
+ string without white space is written to /sys/power/wake_lock,
+ it will be assumed to represent a wakeup source name. If there
+ is a wakeup source object with that name, it will be activated
+ (unless active already). Otherwise, a new wakeup source object
+ will be registered, assigned the given name and activated.
+ If a string written to /sys/power/wake_lock contains white
+ space, the part of the string preceding the white space will be
+ regarded as a wakeup source name and handled as descrived above.
+ The other part of the string will be regarded as a timeout (in
+ nanoseconds) such that the wakeup source will be automatically
+ deactivated after it has expired. The timeout, if present, is
+ set regardless of the current state of the wakeup source object
+ in question.
+
+ Reads from this file return a string consisting of the names of
+ wakeup sources created with the help of it that are active at
+ the moment, separated with spaces.
+
+
+What: /sys/power/wake_unlock
+Date: February 2012
+Contact: Rafael J. Wysocki <[email protected]>
+Description:
+ The /sys/power/wake_unlock file allows user space to deactivate
+ wakeup sources created with the help of /sys/power/wake_lock.
+ When a string is written to /sys/power/wake_unlock, it will be
+ assumed to represent the name of a wakeup source to deactivate.
+ If a wakeup source object of that name exists and is active at
+ the moment, it will be deactivated.
+
+ Reads from this file return a string consisting of the names of
+ wakeup sources created with the help of /sys/power/wake_lock
+ that are inactive at the moment, separated with spaces.

2012-02-21 23:36:30

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 0/7] PM: Implement autosleep and "wake locks", take 2

Hi all,

After the feedback so far I've decided to follow up with a refreshed patchset.
The first two patches from the previous one went to linux-pm/linux-next
and I included the recent evdev patch from Arve (with some modifications)
to this patchset for completness.

On Tuesday, February 07, 2012, Rafael J. Wysocki wrote:
> Hi all,
>
> This series tests the theory that the easiest way to sell a once rejected
> feature is to advertise it under a different name.
>
> Well, there actually are two different features, although they are closely
> related to each other. First, patch [6/8] introduces a feature that allows
> the kernel to trigger system suspend (or more generally a transition into
> a sleep state) whenever there are no active wakeup sources (no, they aren't
> called wakelocks). It is called "autosleep" here, but it was called a few
> different names in the past ("opportunistic suspend" was probably the most
> popular one). Second, patch [8/8] introduces "wake locks" that are,
> essentially, wakeup sources which may be created and manipulated by user
> space. Using them user space may control the autosleep feature introduced
> earlier.
>
> This also is a kind of a proof of concept for the people who wanted me to
> show a kernel-based implementation of automatic suspend, so there you go.
> Please note, however, that it is done so that the user space "wake locks"
> interface is compatible with Android in support of its user space. I don't
> really like this interface, but since the Android's user space seems to rely
> on it, I'm fine with using it as is. YMMV.
>
> Let me say a few words about every patch in the series individually.
>
> [1/8] - This really is a bug fix, so it's v3.4 material. Nobody has stepped
> on this bug so far, but it should be fixed anyway.
>
> [2/8] - This is a freezer cleanup, worth doing anyway IMO, so v3.4 material too.

The above two are in linux-pm/linux-next now. There are a few more fixes
related to wakeup sources in there and the patches below are based on that
branch.

> [3/8] - This is something we can do no problem, although completely optional
> without the autosleep feature. Rather necessary with it, though.

Now [1/7] - Look for wakeup events in later stages of device suspend.

> [4/8] - This kind of reintroduces my original idea of using a wait queue for
> waiting until there are no wakeup events in progress. Alan convinced me that
> it would be better to poll the counter to prevent wakeup_source_deactivate()
> from having to call wake_up_all() occasionally (that may be costly in fast
> paths), but then quite some people told me that the wait queue migh be
> better. I think that the polling will make much less sense with autosleep
> and user space "wake locks". Anyway, [4/8] is something we can do without
> those things too.

Now [2/7] - Use wait queue to signal "no wakeup events in progress"

With a couple of improvements suggested by Neil.

> The patches above were given Sign-off-by tags, because I think they make some
> sense regardless of the features introcuded by the remaining patches that in
> turn are total RFC.

This time all of the patches are signed-off and include the requisite
documentation changes (hopefully, I haven't forgotten about anything).

> [5/8] - This changes wakeup source statistics so that they are more similar to
> the statistics collected for wakelocks on Android. The file those statistics
> may be read from is still located in debugfs, though (I don't think it
> belongs to proc and its name is different from the analogous Android's file
> name anyway). It could be done without autosleep, but then it would be a bit
> pointless. BTW, this changes interfaces that _in_ _theory_ may be used by
> someone, but I'm not aware of anyone using them. If you are one, I'll be
> pleased to learn about that, so please tell me who you are. :-)

Now [3/7] - Change wakeup source statistics to follow Android.

Rebased and reworked in accordance with the Arve's feedback.

[4/7] - Add ioctl to block suspend while event queue is not empty.

Originally posted by Arve as http://marc.info/?l=linux-pm&m=132711288825973&w=4
Reworked and with modified changelog (I wonder what Dmity thinks about this).

It has some minor problems (for example, in some situations the queue wakeup
source may be activated for events that are not coming from a wakeup device),
but I think it's simple enough, at least for illustration. The ioctls
introduced here will be used by Android user space anyway, although perhaps
under different names, AFAICS.

> [6/8] - Autosleep implementation. I think the changelog explains the idea
> quite well and the code is really nothing special. It doesn't really add
> anything new to the kernel in terms of infrastructure etc., it just uses
> the existing stuff to implement an alternative method of triggering system
> sleep transitions. Note, though, that the interface here is different
> from the Android's one, because Android actually modifies /sys/power/state
> to trigger something called "early suspend" (that is never going to be
> implemented in the "stock" kernel as long as I have any influence on it) and
> we simply can't do that in the mainline.

Now [5/7] - Implement opportunistic sleep

Rebased and simplified (most notably, I've dropped the "main" wakeup source,
since it wasn't really necessary).

> [7/8] - This adds a wakeup source statistics that only makes sense with
> autosleep and (I believe) is analogous to the Android's prevent_suspend_time
> statistics. Nothing really special, but I didn't want
> wakeup_source_activate/deactivate() to take a common lock to avoid
> congestion.

Now [6/7] - Add "prevent autosleep time" statistics to wakeup sources.

Rebased.

> [8/8] - This adds a user space interface to create, activate and deactivate
> wakeup sources. Since the files it consists of are called wake_lock and
> wake_unlock, to follow Android, the objects the wakeup sources are wrapped
> into are called "wakelocks" (for added confusion). Since the interface
> doesn't provide any means to destroy those "wakelocks", I added a garbage
> collection mechanism to get rid of the unused ones, if any. I also tought
> it might be a good idea to put a limit on the number of those things that
> user space can operate simultaneously, so I did that too.

Now [7/7] - Add user space interface for manipulating wakeup sources.

> All of the above has been tested very briefly on my test-bed Mackerel board
> and it quite obviously requires more thorough testing, but first I need to know
> if it makes sense to spend any more time on it.

The above is still accurate, but I also verified that the patches don't break
my PC test boxes (at least as long as the new features aren't used ;-)).

Thanks,
Rafael

2012-02-21 23:36:28

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 4/7] Input / PM: Add ioctl to block suspend while event queue is not empty

From: Arve Hjønnevåg <[email protected]>

Add a new ioctl, EVIOCSWAKEUPSRC, to attach a wakeup source object to
an evdev client event queue, such that it will be active whenever the
queue is not empty. Then, all events in the queue will be regarded
as wakeup events in progress and pm_get_wakeup_count() will block (or
return false if woken up by a signal) until they are removed from the
queue. In consequence, if the checking of wakeup events is enabled
(e.g. throught the /sys/power/wakeup_count interface), the system
won't be able to go into a sleep state until the queue is empty.

This allows user space processes to handle situations in which they
want to do a select() on an evdev descriptor, so they go to sleep
until there are some events to read from the device's queue, and then
they don't want the system to go into a sleep state until all the
events are read (presumably for further processing). Of course, if
they don't want the system to go into a sleep state _after_ all the
events have been read from the queue, they have to use a separate
mechanism that will prevent the system from doing that and it has
to be activated before reading the first event (that also may be the
last one).

[rjw: Removed unnecessary checks, changed the names of the new ioctls
and the names of the functions that add/remove wakeup source objects
to/from evdev clients, modified the changelog.]

Signed-off-by: Arve Hjønnevåg <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/input/evdev.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/input.h | 3 ++
2 files changed, 58 insertions(+)

Index: linux/drivers/input/evdev.c
===================================================================
--- linux.orig/drivers/input/evdev.c
+++ linux/drivers/input/evdev.c
@@ -43,6 +43,7 @@ struct evdev_client {
unsigned int tail;
unsigned int packet_head; /* [future] position of the first element of next packet */
spinlock_t buffer_lock; /* protects access to buffer, head and tail */
+ struct wakeup_source *wakeup_source;
struct fasync_struct *fasync;
struct evdev *evdev;
struct list_head node;
@@ -75,10 +76,12 @@ static void evdev_pass_event(struct evde
client->buffer[client->tail].value = 0;

client->packet_head = client->tail;
+ __pm_relax(client->wakeup_source);
}

if (event->type == EV_SYN && event->code == SYN_REPORT) {
client->packet_head = client->head;
+ __pm_stay_awake(client->wakeup_source);
kill_fasync(&client->fasync, SIGIO, POLL_IN);
}

@@ -255,6 +258,8 @@ static int evdev_release(struct inode *i
mutex_unlock(&evdev->mutex);

evdev_detach_client(evdev, client);
+ wakeup_source_unregister(client->wakeup_source);
+
kfree(client);

evdev_close_device(evdev);
@@ -373,6 +378,8 @@ static int evdev_fetch_next_event(struct
if (have_event) {
*event = client->buffer[client->tail++];
client->tail &= client->bufsize - 1;
+ if (client->packet_head == client->tail)
+ __pm_relax(client->wakeup_source);
}

spin_unlock_irq(&client->buffer_lock);
@@ -623,6 +630,45 @@ static int evdev_handle_set_keycode_v2(s
return input_set_keycode(dev, &ke);
}

+static int evdev_attach_wakeup_source(struct evdev *evdev,
+ struct evdev_client *client)
+{
+ struct wakeup_source *ws;
+ char name[28];
+
+ if (client->wakeup_source)
+ return 0;
+
+ snprintf(name, sizeof(name), "%s-%d",
+ dev_name(&evdev->dev), task_tgid_vnr(current));
+
+ ws = wakeup_source_register(name);
+ if (!ws)
+ return -ENOMEM;
+
+ spin_lock_irq(&client->buffer_lock);
+ client->wakeup_source = ws;
+ if (client->packet_head != client->tail)
+ __pm_stay_awake(client->wakeup_source);
+ spin_unlock_irq(&client->buffer_lock);
+ return 0;
+}
+
+static int evdev_detach_wakeup_source(struct evdev *evdev,
+ struct evdev_client *client)
+{
+ struct wakeup_source *ws;
+
+ spin_lock_irq(&client->buffer_lock);
+ ws = client->wakeup_source;
+ client->wakeup_source = NULL;
+ spin_unlock_irq(&client->buffer_lock);
+
+ wakeup_source_unregister(ws);
+
+ return 0;
+}
+
static long evdev_do_ioctl(struct file *file, unsigned int cmd,
void __user *p, int compat_mode)
{
@@ -696,6 +742,15 @@ static long evdev_do_ioctl(struct file *

case EVIOCSKEYCODE_V2:
return evdev_handle_set_keycode_v2(dev, p);
+
+ case EVIOCGWAKEUPSRC:
+ return put_user(!!client->wakeup_source, ip);
+
+ case EVIOCSWAKEUPSRC:
+ if (p)
+ return evdev_attach_wakeup_source(evdev, client);
+ else
+ return evdev_detach_wakeup_source(evdev, client);
}

size = _IOC_SIZE(cmd);
Index: linux/include/linux/input.h
===================================================================
--- linux.orig/include/linux/input.h
+++ linux/include/linux/input.h
@@ -129,6 +129,9 @@ struct input_keymap_entry {

#define EVIOCGRAB _IOW('E', 0x90, int) /* Grab/Release device */

+#define EVIOCGWAKEUPSRC _IOR('E', 0x91, int) /* Check if wakeup handling is enabled */
+#define EVIOCSWAKEUPSRC _IOW('E', 0x91, int) /* Enable/disable wakeup handling */
+
/*
* Device properties and quirks
*/

2012-02-21 23:37:01

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 2/7] PM / Sleep: Use wait queue to signal "no wakeup events in progress"

From: Rafael J. Wysocki <[email protected]>

The current wakeup source deactivation code doesn't do anything when
the counter of wakeup events in progress goes down to zero, which
requires pm_get_wakeup_count() to poll that counter periodically.
Although this reduces the average time it takes to deactivate a
wakeup source, it also may lead to a substantial amount of unnecessary
polling if there are extended periods of wakeup activity. Thus it
seems reasonable to use a wait queue for signaling the "no wakeup
events in progress" condition and remove the polling.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/power/wakeup.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -17,8 +17,6 @@

#include "power.h"

-#define TIMEOUT 100
-
/*
* If set, the suspend/hibernate code will abort transitions to a sleep state
* if wakeup events are registered during or immediately before the transition.
@@ -52,6 +50,8 @@ static void pm_wakeup_timer_fn(unsigned

static LIST_HEAD(wakeup_sources);

+static DECLARE_WAIT_QUEUE_HEAD(wakeup_count_wait_queue);
+
/**
* wakeup_source_prepare - Prepare a new wakeup source for initialization.
* @ws: Wakeup source to prepare.
@@ -442,6 +442,7 @@ EXPORT_SYMBOL_GPL(pm_stay_awake);
*/
static void wakeup_source_deactivate(struct wakeup_source *ws)
{
+ unsigned int cnt, inpr;
ktime_t duration;
ktime_t now;

@@ -476,6 +477,10 @@ static void wakeup_source_deactivate(str
* couter of wakeup events in progress simultaneously.
*/
atomic_add(MAX_IN_PROGRESS, &combined_event_count);
+
+ split_counters(&cnt, &inpr);
+ if (!inpr && waitqueue_active(&wakeup_count_wait_queue))
+ wake_up(&wakeup_count_wait_queue);
}

/**
@@ -667,14 +672,19 @@ bool pm_wakeup_pending(void)
bool pm_get_wakeup_count(unsigned int *count)
{
unsigned int cnt, inpr;
+ DEFINE_WAIT(wait);

for (;;) {
+ prepare_to_wait(&wakeup_count_wait_queue, &wait,
+ TASK_INTERRUPTIBLE);
split_counters(&cnt, &inpr);
if (inpr == 0 || signal_pending(current))
break;
pm_wakeup_update_hit_counts();
- schedule_timeout_interruptible(msecs_to_jiffies(TIMEOUT));
+
+ schedule();
}
+ finish_wait(&wakeup_count_wait_queue, &wait);

split_counters(&cnt, &inpr);
*count = cnt;

2012-02-22 04:50:30

by John Stultz

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/7] PM: Implement autosleep and "wake locks", take 2

On Wed, 2012-02-22 at 00:31 +0100, Rafael J. Wysocki wrote:
> Hi all,
>
> After the feedback so far I've decided to follow up with a refreshed patchset.
> The first two patches from the previous one went to linux-pm/linux-next
> and I included the recent evdev patch from Arve (with some modifications)
> to this patchset for completness.

Hey Rafael,
Thanks again for posting this! I've started playing around with it in a
kvm environment, and got the following warning after echoing off >
autosleep:
...
PM: resume of devices complete after 185.615 msecs
PM: Finishing wakeup.
Restarting tasks ... done.
PM: Syncing filesystems ... done.
PM: Preparing system for mem sleep
Freezing user space processes ...
Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0):
bash D ffff880015714010
===============================
[ INFO: suspicious RCU usage. ]
3.3.0-rc3john+ #131 Not tainted
-------------------------------
kernel/sched/core.c:4784 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 0
5 locks held by kworker/u:1/10:
#0: (autosleep){.+.+.+}, at: [<ffffffff81066db8>] process_one_work+0x2d8/0x8c0
#1: (suspend_work){+.+.+.}, at: [<ffffffff81066db8>] process_one_work+0x2d8/0x8c0
#2: (autosleep_lock){+.+.+.}, at: [<ffffffff810a2d3d>] try_to_suspend+0x2d/0xe0
#3: (pm_mutex){+.+.+.}, at: [<ffffffff8109b9fc>] pm_suspend+0x8c/0x210
#4: (tasklist_lock){.+.+..}, at: [<ffffffff8109b0f1>] try_to_freeze_tasks+0x2d1/0x400

stack backtrace:
Pid: 10, comm: kworker/u:1 Not tainted 3.3.0-rc3john+ #131
Call Trace:
[<ffffffff81040d82>] ? vprintk+0x242/0x530
[<ffffffff810b0fdb>] lockdep_rcu_suspicious+0xeb/0x100
[<ffffffff81083371>] sched_show_task+0x121/0x180
[<ffffffff8109b1e5>] try_to_freeze_tasks+0x3c5/0x400
[<ffffffff810a2d10>] ? pm_autosleep_set_state+0x80/0x80
[<ffffffff8109b2eb>] freeze_processes+0x3b/0xb0
[<ffffffff8109baad>] pm_suspend+0x13d/0x210
[<ffffffff810a2d5d>] try_to_suspend+0x4d/0xe0
[<ffffffff81066f02>] process_one_work+0x422/0x8c0
[<ffffffff81066db8>] ? process_one_work+0x2d8/0x8c0
[<ffffffff810b063e>] ? put_lock_stats+0xe/0x40
[<ffffffff81067a16>] worker_thread+0x476/0x550
[<ffffffff810675a0>] ? rescuer_thread+0x200/0x200
[<ffffffff810706fe>] kthread+0xae/0xc0
[<ffffffff81af4cb4>] kernel_thread_helper+0x4/0x10
[<ffffffff81af3078>] ? retint_restore_args+0x13/0x13
[<ffffffff81070650>] ? __init_kthread_worker+0x70/0x70
[<ffffffff81af4cb0>] ? gs_change+0x13/0x13
0 1981 1980 0x00020004
ffff880015715d88 0000000000000046 ffff880015715c88 ffffffff8102c22b
ffff880015714010 ffff880015715fd8 ffff880015714010 ffff880015714000
ffff880015715fd8 ffff880015714000 ffff880015c4e3c0 ffff88001342e540
Call Trace:
[<ffffffff8102c22b>] ? kvm_clock_read+0x6b/0x90
[<ffffffff810b1f2d>] ? mark_held_locks+0xad/0x150
[<ffffffff81af10bf>] schedule+0x3f/0x60
[<ffffffff81aef33b>] mutex_lock_nested+0x1cb/0x4c0
[<ffffffff810a2cae>] ? pm_autosleep_set_state+0x1e/0x80
[<ffffffff810a2cae>] ? pm_autosleep_set_state+0x1e/0x80
[<ffffffff810a2cae>] pm_autosleep_set_state+0x1e/0x80
[<ffffffff8109a74b>] autosleep_store+0x3b/0x80
[<ffffffff813856e7>] kobj_attr_store+0x17/0x20
[<ffffffff81200dcc>] sysfs_write_file+0xec/0x170
[<ffffffff8118085f>] vfs_write+0x11f/0x1b0
[<ffffffff811809f4>] sys_write+0x54/0xa0
[<ffffffff81af4e66>] sysenter_dispatch+0x7/0x26
[<ffffffff8139238e>] ? trace_hardirqs_on_thunk+0x3a/0x3f

Restarting tasks ... done.


2012-02-22 08:45:39

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/7] PM: Implement autosleep and "wake locks", take 2

On 02/22/2012 10:19 AM, John Stultz wrote:

> On Wed, 2012-02-22 at 00:31 +0100, Rafael J. Wysocki wrote:
>> Hi all,
>>
>> After the feedback so far I've decided to follow up with a refreshed patchset.
>> The first two patches from the previous one went to linux-pm/linux-next
>> and I included the recent evdev patch from Arve (with some modifications)
>> to this patchset for completness.
>
> Hey Rafael,
> Thanks again for posting this! I've started playing around with it in a
> kvm environment, and got the following warning after echoing off >
> autosleep:
> ...
> PM: resume of devices complete after 185.615 msecs
> PM: Finishing wakeup.
> Restarting tasks ... done.
> PM: Syncing filesystems ... done.
> PM: Preparing system for mem sleep
> Freezing user space processes ...
> Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0):
> bash D ffff880015714010


Ah.. I think I know what is the problem here..

The kernel was freezing userspace processes and meanwhile, you wrote "off"
to autosleep. So, as a result, this userspace process (bash) just now
entered kernel mode. Unfortunately, the autosleep_lock is held for too long,
that is, something like:

acquire autosleep_lock
modify autosleep_state
<============== "A"
pm_suspend or hibernate()

release autosleep_lock

At point marked "A", we should have released the autosleep lock and only then
entered pm_suspend or hibernate(). Since the current code holds the lock and
enters suspend/hibernate, the userspace process that wrote "off" to autosleep
(or even userspace process that writes to /sys/power/state will end up waiting
on autosleep_lock, thus failing the freezing operation.)

So the solution is to always release the autosleep lock before entering
suspend/hibernation.


Regards,
Srivatsa S. Bhat

> ===============================
> [ INFO: suspicious RCU usage. ]
> 3.3.0-rc3john+ #131 Not tainted
> -------------------------------
> kernel/sched/core.c:4784 suspicious rcu_dereference_check() usage!
>
> other info that might help us debug this:
>
>
> rcu_scheduler_active = 1, debug_locks = 0
> 5 locks held by kworker/u:1/10:
> #0: (autosleep){.+.+.+}, at: [<ffffffff81066db8>] process_one_work+0x2d8/0x8c0
> #1: (suspend_work){+.+.+.}, at: [<ffffffff81066db8>] process_one_work+0x2d8/0x8c0
> #2: (autosleep_lock){+.+.+.}, at: [<ffffffff810a2d3d>] try_to_suspend+0x2d/0xe0
> #3: (pm_mutex){+.+.+.}, at: [<ffffffff8109b9fc>] pm_suspend+0x8c/0x210
> #4: (tasklist_lock){.+.+..}, at: [<ffffffff8109b0f1>] try_to_freeze_tasks+0x2d1/0x400
>
> stack backtrace:
> Pid: 10, comm: kworker/u:1 Not tainted 3.3.0-rc3john+ #131
> Call Trace:
> [<ffffffff81040d82>] ? vprintk+0x242/0x530
> [<ffffffff810b0fdb>] lockdep_rcu_suspicious+0xeb/0x100
> [<ffffffff81083371>] sched_show_task+0x121/0x180
> [<ffffffff8109b1e5>] try_to_freeze_tasks+0x3c5/0x400
> [<ffffffff810a2d10>] ? pm_autosleep_set_state+0x80/0x80
> [<ffffffff8109b2eb>] freeze_processes+0x3b/0xb0
> [<ffffffff8109baad>] pm_suspend+0x13d/0x210
> [<ffffffff810a2d5d>] try_to_suspend+0x4d/0xe0
> [<ffffffff81066f02>] process_one_work+0x422/0x8c0
> [<ffffffff81066db8>] ? process_one_work+0x2d8/0x8c0
> [<ffffffff810b063e>] ? put_lock_stats+0xe/0x40
> [<ffffffff81067a16>] worker_thread+0x476/0x550
> [<ffffffff810675a0>] ? rescuer_thread+0x200/0x200
> [<ffffffff810706fe>] kthread+0xae/0xc0
> [<ffffffff81af4cb4>] kernel_thread_helper+0x4/0x10
> [<ffffffff81af3078>] ? retint_restore_args+0x13/0x13
> [<ffffffff81070650>] ? __init_kthread_worker+0x70/0x70
> [<ffffffff81af4cb0>] ? gs_change+0x13/0x13
> 0 1981 1980 0x00020004
> ffff880015715d88 0000000000000046 ffff880015715c88 ffffffff8102c22b
> ffff880015714010 ffff880015715fd8 ffff880015714010 ffff880015714000
> ffff880015715fd8 ffff880015714000 ffff880015c4e3c0 ffff88001342e540
> Call Trace:
> [<ffffffff8102c22b>] ? kvm_clock_read+0x6b/0x90
> [<ffffffff810b1f2d>] ? mark_held_locks+0xad/0x150
> [<ffffffff81af10bf>] schedule+0x3f/0x60
> [<ffffffff81aef33b>] mutex_lock_nested+0x1cb/0x4c0
> [<ffffffff810a2cae>] ? pm_autosleep_set_state+0x1e/0x80
> [<ffffffff810a2cae>] ? pm_autosleep_set_state+0x1e/0x80
> [<ffffffff810a2cae>] pm_autosleep_set_state+0x1e/0x80
> [<ffffffff8109a74b>] autosleep_store+0x3b/0x80
> [<ffffffff813856e7>] kobj_attr_store+0x17/0x20
> [<ffffffff81200dcc>] sysfs_write_file+0xec/0x170
> [<ffffffff8118085f>] vfs_write+0x11f/0x1b0
> [<ffffffff811809f4>] sys_write+0x54/0xa0
> [<ffffffff81af4e66>] sysenter_dispatch+0x7/0x26
> [<ffffffff8139238e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>
> Restarting tasks ... done.
>
>
>

2012-02-22 08:45:52

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/7] PM / Sleep: Implement opportunistic sleep

On 02/22/2012 05:05 AM, Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki <[email protected]>
>
> Introduce a mechanism by which the kernel can trigger global
> transitions to a sleep state chosen by user space if there are no
> active wakeup sources.
>
> It consists of a new sysfs attribute, /sys/power/autosleep, that
> can be written one of the strings returned by reads from
> /sys/power/state, an ordered workqueue and a work item carrying out
> the "suspend" operations. If a string representing the system's
> sleep state is written to /sys/power/autosleep, the work item
> triggering transitions to that state is queued up and it requeues
> itself after every execution until user space writes "off" to
> /sys/power/autosleep.
>
> That work item enables the detection of wakeup events using the
> functions already defined in drivers/base/power/wakeup.c (with one
> small modification) and calls either pm_suspend(), or hibernate() to
> put the system into a sleep state. If a wakeup event is reported
> while the transition is in progress, it will abort the transition and
> the "system suspend" work item will be queued up again.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> Documentation/ABI/testing/sysfs-power | 17 +++++
> drivers/base/power/wakeup.c | 38 ++++++-----
> include/linux/suspend.h | 13 +++-
> kernel/power/Kconfig | 8 ++
> kernel/power/Makefile | 1
> kernel/power/autosleep.c | 98 ++++++++++++++++++++++++++++++
> kernel/power/main.c | 108 ++++++++++++++++++++++++++++------
> kernel/power/power.h | 18 +++++
> 8 files changed, 266 insertions(+), 35 deletions(-)
>
> Index: linux/kernel/power/Makefile
> ===================================================================
> --- linux.orig/kernel/power/Makefile
> +++ linux/kernel/power/Makefile
> @@ -9,5 +9,6 @@ obj-$(CONFIG_SUSPEND) += suspend.o
> obj-$(CONFIG_PM_TEST_SUSPEND) += suspend_test.o
> obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o \
> block_io.o
> +obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o
>
> obj-$(CONFIG_MAGIC_SYSRQ) += poweroff.o
> Index: linux/kernel/power/Kconfig
> ===================================================================
> --- linux.orig/kernel/power/Kconfig
> +++ linux/kernel/power/Kconfig
> @@ -103,6 +103,14 @@ config PM_SLEEP_SMP
> select HOTPLUG
> select HOTPLUG_CPU
>
> +config PM_AUTOSLEEP
> + bool "Opportunistic sleep"
> + depends on PM_SLEEP
> + default n
> + ---help---
> + Allow the kernel to trigger a system transition into a global sleep
> + state automatically whenever there are no active wakeup sources.
> +
> config PM_RUNTIME
> bool "Run-time PM core functionality"
> depends on !IA64_HP_SIM
> Index: linux/kernel/power/power.h
> ===================================================================
> --- linux.orig/kernel/power/power.h
> +++ linux/kernel/power/power.h
> @@ -264,3 +264,21 @@ static inline void suspend_thaw_processe
> {
> }
> #endif
> +
> +#ifdef CONFIG_PM_AUTOSLEEP
> +
> +/* kernel/power/autosleep.c */
> +extern int pm_autosleep_init(void);
> +extern void pm_autosleep_lock(void);
> +extern void pm_autosleep_unlock(void);
> +extern suspend_state_t pm_autosleep_state(void);
> +extern int pm_autosleep_set_state(suspend_state_t state);
> +
> +#else /* !CONFIG_PM_AUTOSLEEP */
> +
> +static inline int pm_autosleep_init(void) { return 0; }
> +static inline void pm_autosleep_lock(void) {}
> +static inline void pm_autosleep_unlock(void) {}
> +static inline suspend_state_t pm_autosleep_state(void) { return PM_SUSPEND_ON; }
> +
> +#endif /* !CONFIG_PM_AUTOSLEEP */
> Index: linux/include/linux/suspend.h
> ===================================================================
> --- linux.orig/include/linux/suspend.h
> +++ linux/include/linux/suspend.h
> @@ -356,7 +356,7 @@ extern int unregister_pm_notifier(struct
> extern bool events_check_enabled;
>
> extern bool pm_wakeup_pending(void);
> -extern bool pm_get_wakeup_count(unsigned int *count);
> +extern bool pm_get_wakeup_count(unsigned int *count, bool block);
> extern bool pm_save_wakeup_count(unsigned int count);
>
> static inline void lock_system_sleep(void)
> @@ -407,6 +407,17 @@ static inline void unlock_system_sleep(v
>
> #endif /* !CONFIG_PM_SLEEP */
>
> +#ifdef CONFIG_PM_AUTOSLEEP
> +
> +/* kernel/power/autosleep.c */
> +void queue_up_suspend_work(void);
> +
> +#else /* !CONFIG_PM_AUTOSLEEP */
> +
> +static inline void queue_up_suspend_work(void) {}
> +
> +#endif /* !CONFIG_PM_AUTOSLEEP */
> +
> #ifdef CONFIG_ARCH_SAVE_PAGE_KEYS
> /*
> * The ARCH_SAVE_PAGE_KEYS functions can be used by an architecture
> Index: linux/kernel/power/autosleep.c
> ===================================================================
> --- /dev/null
> +++ linux/kernel/power/autosleep.c
> @@ -0,0 +1,98 @@
> +/*
> + * kernel/power/autosleep.c
> + *
> + * Opportunistic sleep support.
> + *
> + * Copyright (C) 2012 Rafael J. Wysocki <[email protected]>
> + */
> +
> +#include <linux/device.h>
> +#include <linux/mutex.h>
> +#include <linux/pm_wakeup.h>
> +
> +#include "power.h"
> +
> +static suspend_state_t autosleep_state;
> +static struct workqueue_struct *autosleep_wq;
> +static DEFINE_MUTEX(autosleep_lock);
> +
> +static void try_to_suspend(struct work_struct *work)
> +{
> + unsigned int initial_count, final_count;
> +
> + if (!pm_get_wakeup_count(&initial_count, true))
> + goto out;
> +
> + mutex_lock(&autosleep_lock);
> +
> + if (!pm_save_wakeup_count(initial_count)) {
> + mutex_unlock(&autosleep_lock);
> + goto out;
> + }
> +
> + if (autosleep_state == PM_SUSPEND_ON) {
> + mutex_unlock(&autosleep_lock);
> + return;
> + }
> + if (autosleep_state >= PM_SUSPEND_MAX)
> + hibernate();
> + else
> + pm_suspend(autosleep_state);


We are calling pm_suspend() or hibernate() directly here.
Won't this break build when CONFIG_SUSPEND or CONFIG_HIBERNATION is not set?
CONFIG_PM_AUTOSLEEP depends only on PM_SLEEP which means we could enable
either one of suspend or hibernation and yet come to this point, breaking
the option which was not enabled.

Regards,
Srivatsa S. Bhat

> +
> + mutex_unlock(&autosleep_lock);
> +
> + if (!pm_get_wakeup_count(&final_count, false))
> + goto out;
> +
> + if (final_count == initial_count)
> + schedule_timeout(HZ / 2);
> +
> + out:
> + queue_up_suspend_work();
> +}
> +
> +static DECLARE_WORK(suspend_work, try_to_suspend);
> +
> +void queue_up_suspend_work(void)
> +{
> + if (!work_pending(&suspend_work) && autosleep_state > PM_SUSPEND_ON)
> + queue_work(autosleep_wq, &suspend_work);
> +}
> +
> +suspend_state_t pm_autosleep_state(void)
> +{
> + return autosleep_state;
> +}
> +
> +int pm_autosleep_set_state(suspend_state_t state)
> +{
> +#ifndef CONFIG_HIBERNATION
> + if (state >= PM_SUSPEND_MAX)
> + return -EINVAL;
> +#endif
> + mutex_lock(&autosleep_lock);
> + if (state == PM_SUSPEND_ON && autosleep_state != PM_SUSPEND_ON) {
> + autosleep_state = PM_SUSPEND_ON;
> + } else if (state > PM_SUSPEND_ON) {
> + autosleep_state = state;
> + queue_up_suspend_work();
> + }
> + mutex_unlock(&autosleep_lock);
> + return 0;
> +}
> +
> +void pm_autosleep_lock(void)
> +{
> + mutex_lock(&autosleep_lock);
> +}
> +
> +void pm_autosleep_unlock(void)
> +{
> + mutex_unlock(&autosleep_lock);
> +}
> +
> +int __init pm_autosleep_init(void)
> +{
> + autosleep_wq = alloc_ordered_workqueue("autosleep", 0);
> + return autosleep_wq ? 0 : -ENOMEM;
> +}
> Index: linux/kernel/power/main.c
> ===================================================================
> --- linux.orig/kernel/power/main.c
> +++ linux/kernel/power/main.c
> @@ -269,8 +269,7 @@ static ssize_t state_show(struct kobject
> return (s - buf);
> }
>
> -static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
> - const char *buf, size_t n)
> +static suspend_state_t decode_state(const char *buf, size_t n)
> {
> #ifdef CONFIG_SUSPEND
> suspend_state_t state = PM_SUSPEND_STANDBY;
> @@ -278,27 +277,43 @@ static ssize_t state_store(struct kobjec
> #endif
> char *p;
> int len;
> - int error = -EINVAL;
>
> p = memchr(buf, '\n', n);
> len = p ? p - buf : n;
>
> - /* First, check if we are requested to hibernate */
> - if (len == 4 && !strncmp(buf, "disk", len)) {
> - error = hibernate();
> - goto Exit;
> - }
> + /* Check hibernation first. */
> + if (len == 4 && !strncmp(buf, "disk", len))
> + return PM_SUSPEND_MAX;
>
> #ifdef CONFIG_SUSPEND
> - for (s = &pm_states[state]; state < PM_SUSPEND_MAX; s++, state++) {
> - if (*s && len == strlen(*s) && !strncmp(buf, *s, len)) {
> - error = pm_suspend(state);
> - break;
> - }
> - }
> + for (s = &pm_states[state]; state < PM_SUSPEND_MAX; s++, state++)
> + if (*s && len == strlen(*s) && !strncmp(buf, *s, len))
> + return state;
> #endif
>
> - Exit:
> + return PM_SUSPEND_ON;
> +}
> +
> +static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
> + const char *buf, size_t n)
> +{
> + suspend_state_t state;
> + int error = -EINVAL;
> +
> + pm_autosleep_lock();
> + if (pm_autosleep_state() > PM_SUSPEND_ON) {
> + error = -EBUSY;
> + goto out;
> + }
> +
> + state = decode_state(buf, n);
> + if (state < PM_SUSPEND_MAX)
> + error = pm_suspend(state);
> + else if (state > PM_SUSPEND_ON)
> + error = hibernate();
> +
> + out:
> + pm_autosleep_unlock();
> return error ? error : n;
> }
>
> @@ -339,7 +354,8 @@ static ssize_t wakeup_count_show(struct
> {
> unsigned int val;
>
> - return pm_get_wakeup_count(&val) ? sprintf(buf, "%u\n", val) : -EINTR;
> + return pm_get_wakeup_count(&val, true) ?
> + sprintf(buf, "%u\n", val) : -EINTR;
> }
>
> static ssize_t wakeup_count_store(struct kobject *kobj,
> @@ -347,15 +363,65 @@ static ssize_t wakeup_count_store(struct
> const char *buf, size_t n)
> {
> unsigned int val;
> + int error = -EINVAL;
> +
> + pm_autosleep_lock();
> + if (pm_autosleep_state() > PM_SUSPEND_ON) {
> + error = -EBUSY;
> + goto out;
> + }
>
> if (sscanf(buf, "%u", &val) == 1) {
> if (pm_save_wakeup_count(val))
> return n;
> }
> - return -EINVAL;
> +
> + out:
> + pm_autosleep_unlock();
> + return error;
> }
>
> power_attr(wakeup_count);
> +
> +#ifdef CONFIG_PM_AUTOSLEEP
> +static ssize_t autosleep_show(struct kobject *kobj,
> + struct kobj_attribute *attr,
> + char *buf)
> +{
> + suspend_state_t state = pm_autosleep_state();
> +
> + if (state == PM_SUSPEND_ON)
> + return sprintf(buf, "off\n");
> +
> +#ifdef CONFIG_SUSPEND
> + if (state < PM_SUSPEND_MAX)
> + return sprintf(buf, "%s\n", valid_state(state) ?
> + pm_states[state] : "error");
> +#endif
> +#ifdef CONFIG_HIBERNATION
> + return sprintf(buf, "disk\n");
> +#else
> + return sprintf(buf, "error");
> +#endif
> +}
> +
> +static ssize_t autosleep_store(struct kobject *kobj,
> + struct kobj_attribute *attr,
> + const char *buf, size_t n)
> +{
> + suspend_state_t state = decode_state(buf, n);
> + int error;
> +
> + if (state == PM_SUSPEND_ON && strncmp(buf, "off", 3)
> + && strncmp(buf, "off\n", 4))
> + return -EINVAL;
> +
> + error = pm_autosleep_set_state(state);
> + return error ? error : n;
> +}
> +
> +power_attr(autosleep);
> +#endif /* CONFIG_PM_AUTOSLEEP */
> #endif /* CONFIG_PM_SLEEP */
>
> #ifdef CONFIG_PM_TRACE
> @@ -409,6 +475,9 @@ static struct attribute * g[] = {
> #ifdef CONFIG_PM_SLEEP
> &pm_async_attr.attr,
> &wakeup_count_attr.attr,
> +#ifdef CONFIG_PM_AUTOSLEEP
> + &autosleep_attr.attr,
> +#endif
> #ifdef CONFIG_PM_DEBUG
> &pm_test_attr.attr,
> #endif
> @@ -444,7 +513,10 @@ static int __init pm_init(void)
> power_kobj = kobject_create_and_add("power", NULL);
> if (!power_kobj)
> return -ENOMEM;
> - return sysfs_create_group(power_kobj, &attr_group);
> + error = sysfs_create_group(power_kobj, &attr_group);
> + if (error)
> + return error;
> + return pm_autosleep_init();
> }
>
> core_initcall(pm_init);
> Index: linux/drivers/base/power/wakeup.c
> ===================================================================
> --- linux.orig/drivers/base/power/wakeup.c
> +++ linux/drivers/base/power/wakeup.c
> @@ -492,8 +492,10 @@ static void wakeup_source_deactivate(str
> atomic_add(MAX_IN_PROGRESS, &combined_event_count);
>
> split_counters(&cnt, &inpr);
> - if (!inpr && waitqueue_active(&wakeup_count_wait_queue))
> + if (!inpr && waitqueue_active(&wakeup_count_wait_queue)) {
> wake_up(&wakeup_count_wait_queue);
> + queue_up_suspend_work();
> + }
> }
>
> /**
> @@ -654,29 +656,33 @@ bool pm_wakeup_pending(void)
> /**
> * pm_get_wakeup_count - Read the number of registered wakeup events.
> * @count: Address to store the value at.
> + * @block: Whether or not to block.
> *
> - * Store the number of registered wakeup events at the address in @count. Block
> - * if the current number of wakeup events being processed is nonzero.
> + * Store the number of registered wakeup events at the address in @count. If
> + * @block is set, block until the current number of wakeup events being
> + * processed is zero.
> *
> - * Return 'false' if the wait for the number of wakeup events being processed to
> - * drop down to zero has been interrupted by a signal (and the current number
> - * of wakeup events being processed is still nonzero). Otherwise return 'true'.
> + * Return 'false' if the current number of wakeup events being processed is
> + * nonzero. Otherwise return 'true'.
> */
> -bool pm_get_wakeup_count(unsigned int *count)
> +bool pm_get_wakeup_count(unsigned int *count, bool block)
> {
> unsigned int cnt, inpr;
> - DEFINE_WAIT(wait);
>
> - for (;;) {
> - prepare_to_wait(&wakeup_count_wait_queue, &wait,
> - TASK_INTERRUPTIBLE);
> - split_counters(&cnt, &inpr);
> - if (inpr == 0 || signal_pending(current))
> - break;
> + if (block) {
> + DEFINE_WAIT(wait);
>
> - schedule();
> + for (;;) {
> + prepare_to_wait(&wakeup_count_wait_queue, &wait,
> + TASK_INTERRUPTIBLE);
> + split_counters(&cnt, &inpr);
> + if (inpr == 0 || signal_pending(current))
> + break;
> +
> + schedule();
> + }
> + finish_wait(&wakeup_count_wait_queue, &wait);
> }
> - finish_wait(&wakeup_count_wait_queue, &wait);
>
> split_counters(&cnt, &inpr);
> *count = cnt;
> Index: linux/Documentation/ABI/testing/sysfs-power
> ===================================================================
> --- linux.orig/Documentation/ABI/testing/sysfs-power
> +++ linux/Documentation/ABI/testing/sysfs-power
> @@ -172,3 +172,20 @@ Description:
>
> Reading from this file will display the current value, which is
> set to 1 MB by default.
> +
> +What: /sys/power/autosleep
> +Date: February 2012
> +Contact: Rafael J. Wysocki <[email protected]>
> +Description:
> + The /sys/power/autosleep file can be written one of the strings
> + returned by reads from /sys/power/state. If that happens, a
> + work item attempting to trigger a transition of the system to
> + the sleep state represented by that string is queued up. This
> + attempt will only succeed if there are no active wakeup sources
> + in the system at that time. After evey execution, regardless
> + of whether or not the attempt to put the system to sleep has
> + succeeded, the work item requeues itself until user space
> + writes "off" to /sys/power/autosleep.
> +
> + Reading from this file causes the last string successfully
> + written to it to be displayed.
>


2012-02-22 22:06:36

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/7] PM: Implement autosleep and "wake locks", take2

On Wednesday, February 22, 2012, Srivatsa S. Bhat wrote:
> On 02/22/2012 10:19 AM, John Stultz wrote:
>
> > On Wed, 2012-02-22 at 00:31 +0100, Rafael J. Wysocki wrote:
> >> Hi all,
> >>
> >> After the feedback so far I've decided to follow up with a refreshed patchset.
> >> The first two patches from the previous one went to linux-pm/linux-next
> >> and I included the recent evdev patch from Arve (with some modifications)
> >> to this patchset for completness.
> >
> > Hey Rafael,
> > Thanks again for posting this! I've started playing around with it in a
> > kvm environment, and got the following warning after echoing off >
> > autosleep:
> > ...
> > PM: resume of devices complete after 185.615 msecs
> > PM: Finishing wakeup.
> > Restarting tasks ... done.
> > PM: Syncing filesystems ... done.
> > PM: Preparing system for mem sleep
> > Freezing user space processes ...
> > Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0):
> > bash D ffff880015714010
>
>
> Ah.. I think I know what is the problem here..
>
> The kernel was freezing userspace processes and meanwhile, you wrote "off"
> to autosleep. So, as a result, this userspace process (bash) just now
> entered kernel mode. Unfortunately, the autosleep_lock is held for too long,
> that is, something like:
>
> acquire autosleep_lock
> modify autosleep_state
> <============== "A"
> pm_suspend or hibernate()
>
> release autosleep_lock
>
> At point marked "A", we should have released the autosleep lock and only then
> entered pm_suspend or hibernate(). Since the current code holds the lock and
> enters suspend/hibernate, the userspace process that wrote "off" to autosleep
> (or even userspace process that writes to /sys/power/state will end up waiting
> on autosleep_lock, thus failing the freezing operation.)
>
> So the solution is to always release the autosleep lock before entering
> suspend/hibernation.

Well, the autosleep lock is intentionally held around suspend/hibernation in
try_to_suspend(), because otherwise it would be possible to trigger automatic
suspend right after user space has disabled it.

I think the solution is to make pm_autosleep_lock() do a _trylock() and
return error code if already locked.

Thanks,
Rafael

2012-02-22 22:06:55

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/7] PM / Sleep: Implement opportunistic sleep

On Wednesday, February 22, 2012, Srivatsa S. Bhat wrote:
> On 02/22/2012 05:05 AM, Rafael J. Wysocki wrote:
>
> > From: Rafael J. Wysocki <[email protected]>
> >
> > Introduce a mechanism by which the kernel can trigger global
> > transitions to a sleep state chosen by user space if there are no
> > active wakeup sources.
> >
> > It consists of a new sysfs attribute, /sys/power/autosleep, that
> > can be written one of the strings returned by reads from
> > /sys/power/state, an ordered workqueue and a work item carrying out
> > the "suspend" operations. If a string representing the system's
> > sleep state is written to /sys/power/autosleep, the work item
> > triggering transitions to that state is queued up and it requeues
> > itself after every execution until user space writes "off" to
> > /sys/power/autosleep.
> >
> > That work item enables the detection of wakeup events using the
> > functions already defined in drivers/base/power/wakeup.c (with one
> > small modification) and calls either pm_suspend(), or hibernate() to
> > put the system into a sleep state. If a wakeup event is reported
> > while the transition is in progress, it will abort the transition and
> > the "system suspend" work item will be queued up again.
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---
> > Documentation/ABI/testing/sysfs-power | 17 +++++
> > drivers/base/power/wakeup.c | 38 ++++++-----
> > include/linux/suspend.h | 13 +++-
> > kernel/power/Kconfig | 8 ++
> > kernel/power/Makefile | 1
> > kernel/power/autosleep.c | 98 ++++++++++++++++++++++++++++++
> > kernel/power/main.c | 108 ++++++++++++++++++++++++++++------
> > kernel/power/power.h | 18 +++++
> > 8 files changed, 266 insertions(+), 35 deletions(-)
> >
> > Index: linux/kernel/power/Makefile
> > ===================================================================
> > --- linux.orig/kernel/power/Makefile
> > +++ linux/kernel/power/Makefile
> > @@ -9,5 +9,6 @@ obj-$(CONFIG_SUSPEND) += suspend.o
> > obj-$(CONFIG_PM_TEST_SUSPEND) += suspend_test.o
> > obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o \
> > block_io.o
> > +obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o
> >
> > obj-$(CONFIG_MAGIC_SYSRQ) += poweroff.o
> > Index: linux/kernel/power/Kconfig
> > ===================================================================
> > --- linux.orig/kernel/power/Kconfig
> > +++ linux/kernel/power/Kconfig
> > @@ -103,6 +103,14 @@ config PM_SLEEP_SMP
> > select HOTPLUG
> > select HOTPLUG_CPU
> >
> > +config PM_AUTOSLEEP
> > + bool "Opportunistic sleep"
> > + depends on PM_SLEEP
> > + default n
> > + ---help---
> > + Allow the kernel to trigger a system transition into a global sleep
> > + state automatically whenever there are no active wakeup sources.
> > +
> > config PM_RUNTIME
> > bool "Run-time PM core functionality"
> > depends on !IA64_HP_SIM
> > Index: linux/kernel/power/power.h
> > ===================================================================
> > --- linux.orig/kernel/power/power.h
> > +++ linux/kernel/power/power.h
> > @@ -264,3 +264,21 @@ static inline void suspend_thaw_processe
> > {
> > }
> > #endif
> > +
> > +#ifdef CONFIG_PM_AUTOSLEEP
> > +
> > +/* kernel/power/autosleep.c */
> > +extern int pm_autosleep_init(void);
> > +extern void pm_autosleep_lock(void);
> > +extern void pm_autosleep_unlock(void);
> > +extern suspend_state_t pm_autosleep_state(void);
> > +extern int pm_autosleep_set_state(suspend_state_t state);
> > +
> > +#else /* !CONFIG_PM_AUTOSLEEP */
> > +
> > +static inline int pm_autosleep_init(void) { return 0; }
> > +static inline void pm_autosleep_lock(void) {}
> > +static inline void pm_autosleep_unlock(void) {}
> > +static inline suspend_state_t pm_autosleep_state(void) { return PM_SUSPEND_ON; }
> > +
> > +#endif /* !CONFIG_PM_AUTOSLEEP */
> > Index: linux/include/linux/suspend.h
> > ===================================================================
> > --- linux.orig/include/linux/suspend.h
> > +++ linux/include/linux/suspend.h
> > @@ -356,7 +356,7 @@ extern int unregister_pm_notifier(struct
> > extern bool events_check_enabled;
> >
> > extern bool pm_wakeup_pending(void);
> > -extern bool pm_get_wakeup_count(unsigned int *count);
> > +extern bool pm_get_wakeup_count(unsigned int *count, bool block);
> > extern bool pm_save_wakeup_count(unsigned int count);
> >
> > static inline void lock_system_sleep(void)
> > @@ -407,6 +407,17 @@ static inline void unlock_system_sleep(v
> >
> > #endif /* !CONFIG_PM_SLEEP */
> >
> > +#ifdef CONFIG_PM_AUTOSLEEP
> > +
> > +/* kernel/power/autosleep.c */
> > +void queue_up_suspend_work(void);
> > +
> > +#else /* !CONFIG_PM_AUTOSLEEP */
> > +
> > +static inline void queue_up_suspend_work(void) {}
> > +
> > +#endif /* !CONFIG_PM_AUTOSLEEP */
> > +
> > #ifdef CONFIG_ARCH_SAVE_PAGE_KEYS
> > /*
> > * The ARCH_SAVE_PAGE_KEYS functions can be used by an architecture
> > Index: linux/kernel/power/autosleep.c
> > ===================================================================
> > --- /dev/null
> > +++ linux/kernel/power/autosleep.c
> > @@ -0,0 +1,98 @@
> > +/*
> > + * kernel/power/autosleep.c
> > + *
> > + * Opportunistic sleep support.
> > + *
> > + * Copyright (C) 2012 Rafael J. Wysocki <[email protected]>
> > + */
> > +
> > +#include <linux/device.h>
> > +#include <linux/mutex.h>
> > +#include <linux/pm_wakeup.h>
> > +
> > +#include "power.h"
> > +
> > +static suspend_state_t autosleep_state;
> > +static struct workqueue_struct *autosleep_wq;
> > +static DEFINE_MUTEX(autosleep_lock);
> > +
> > +static void try_to_suspend(struct work_struct *work)
> > +{
> > + unsigned int initial_count, final_count;
> > +
> > + if (!pm_get_wakeup_count(&initial_count, true))
> > + goto out;
> > +
> > + mutex_lock(&autosleep_lock);
> > +
> > + if (!pm_save_wakeup_count(initial_count)) {
> > + mutex_unlock(&autosleep_lock);
> > + goto out;
> > + }
> > +
> > + if (autosleep_state == PM_SUSPEND_ON) {
> > + mutex_unlock(&autosleep_lock);
> > + return;
> > + }
> > + if (autosleep_state >= PM_SUSPEND_MAX)
> > + hibernate();
> > + else
> > + pm_suspend(autosleep_state);
>
>
> We are calling pm_suspend() or hibernate() directly here.
> Won't this break build when CONFIG_SUSPEND or CONFIG_HIBERNATION is not set?
> CONFIG_PM_AUTOSLEEP depends only on PM_SLEEP which means we could enable
> either one of suspend or hibernation and yet come to this point, breaking
> the option which was not enabled.

Both pm_suspend() and hibernate() have appropriate static inline definitions
for !CONFIG_SUSPEND and !CONFIG_HIBERNATION (in suspend.h), as far as I can say.

Thanks,
Rafael

2012-02-23 05:36:17

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [RFC][PATCH 5/7] PM / Sleep: Implement opportunistic sleep

On 02/23/2012 03:40 AM, Rafael J. Wysocki wrote:

> On Wednesday, February 22, 2012, Srivatsa S. Bhat wrote:
>> On 02/22/2012 05:05 AM, Rafael J. Wysocki wrote:
>>
>>> From: Rafael J. Wysocki <[email protected]>
>>>
>>> Introduce a mechanism by which the kernel can trigger global
>>> transitions to a sleep state chosen by user space if there are no
>>> active wakeup sources.
>>>
>>> It consists of a new sysfs attribute, /sys/power/autosleep, that
>>> can be written one of the strings returned by reads from
>>> /sys/power/state, an ordered workqueue and a work item carrying out
>>> the "suspend" operations. If a string representing the system's
>>> sleep state is written to /sys/power/autosleep, the work item
>>> triggering transitions to that state is queued up and it requeues
>>> itself after every execution until user space writes "off" to
>>> /sys/power/autosleep.
>>>
>>> That work item enables the detection of wakeup events using the
>>> functions already defined in drivers/base/power/wakeup.c (with one
>>> small modification) and calls either pm_suspend(), or hibernate() to
>>> put the system into a sleep state. If a wakeup event is reported
>>> while the transition is in progress, it will abort the transition and
>>> the "system suspend" work item will be queued up again.
>>>
>>> Signed-off-by: Rafael J. Wysocki <[email protected]>
>>> ---
>>> Documentation/ABI/testing/sysfs-power | 17 +++++
>>> drivers/base/power/wakeup.c | 38 ++++++-----
>>> include/linux/suspend.h | 13 +++-
>>> kernel/power/Kconfig | 8 ++
>>> kernel/power/Makefile | 1
>>> kernel/power/autosleep.c | 98 ++++++++++++++++++++++++++++++
>>> kernel/power/main.c | 108 ++++++++++++++++++++++++++++------
>>> kernel/power/power.h | 18 +++++
>>> 8 files changed, 266 insertions(+), 35 deletions(-)
>>>
>>> Index: linux/kernel/power/Makefile
>>> ===================================================================
>>> --- linux.orig/kernel/power/Makefile
>>> +++ linux/kernel/power/Makefile
>>> @@ -9,5 +9,6 @@ obj-$(CONFIG_SUSPEND) += suspend.o
>>> obj-$(CONFIG_PM_TEST_SUSPEND) += suspend_test.o
>>> obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o \
>>> block_io.o
>>> +obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o
>>>
>>> obj-$(CONFIG_MAGIC_SYSRQ) += poweroff.o
>>> Index: linux/kernel/power/Kconfig
>>> ===================================================================
>>> --- linux.orig/kernel/power/Kconfig
>>> +++ linux/kernel/power/Kconfig
>>> @@ -103,6 +103,14 @@ config PM_SLEEP_SMP
>>> select HOTPLUG
>>> select HOTPLUG_CPU
>>>
>>> +config PM_AUTOSLEEP
>>> + bool "Opportunistic sleep"
>>> + depends on PM_SLEEP
>>> + default n
>>> + ---help---
>>> + Allow the kernel to trigger a system transition into a global sleep
>>> + state automatically whenever there are no active wakeup sources.
>>> +
>>> config PM_RUNTIME
>>> bool "Run-time PM core functionality"
>>> depends on !IA64_HP_SIM
>>> Index: linux/kernel/power/power.h
>>> ===================================================================
>>> --- linux.orig/kernel/power/power.h
>>> +++ linux/kernel/power/power.h
>>> @@ -264,3 +264,21 @@ static inline void suspend_thaw_processe
>>> {
>>> }
>>> #endif
>>> +
>>> +#ifdef CONFIG_PM_AUTOSLEEP
>>> +
>>> +/* kernel/power/autosleep.c */
>>> +extern int pm_autosleep_init(void);
>>> +extern void pm_autosleep_lock(void);
>>> +extern void pm_autosleep_unlock(void);
>>> +extern suspend_state_t pm_autosleep_state(void);
>>> +extern int pm_autosleep_set_state(suspend_state_t state);
>>> +
>>> +#else /* !CONFIG_PM_AUTOSLEEP */
>>> +
>>> +static inline int pm_autosleep_init(void) { return 0; }
>>> +static inline void pm_autosleep_lock(void) {}
>>> +static inline void pm_autosleep_unlock(void) {}
>>> +static inline suspend_state_t pm_autosleep_state(void) { return PM_SUSPEND_ON; }
>>> +
>>> +#endif /* !CONFIG_PM_AUTOSLEEP */
>>> Index: linux/include/linux/suspend.h
>>> ===================================================================
>>> --- linux.orig/include/linux/suspend.h
>>> +++ linux/include/linux/suspend.h
>>> @@ -356,7 +356,7 @@ extern int unregister_pm_notifier(struct
>>> extern bool events_check_enabled;
>>>
>>> extern bool pm_wakeup_pending(void);
>>> -extern bool pm_get_wakeup_count(unsigned int *count);
>>> +extern bool pm_get_wakeup_count(unsigned int *count, bool block);
>>> extern bool pm_save_wakeup_count(unsigned int count);
>>>
>>> static inline void lock_system_sleep(void)
>>> @@ -407,6 +407,17 @@ static inline void unlock_system_sleep(v
>>>
>>> #endif /* !CONFIG_PM_SLEEP */
>>>
>>> +#ifdef CONFIG_PM_AUTOSLEEP
>>> +
>>> +/* kernel/power/autosleep.c */
>>> +void queue_up_suspend_work(void);
>>> +
>>> +#else /* !CONFIG_PM_AUTOSLEEP */
>>> +
>>> +static inline void queue_up_suspend_work(void) {}
>>> +
>>> +#endif /* !CONFIG_PM_AUTOSLEEP */
>>> +
>>> #ifdef CONFIG_ARCH_SAVE_PAGE_KEYS
>>> /*
>>> * The ARCH_SAVE_PAGE_KEYS functions can be used by an architecture
>>> Index: linux/kernel/power/autosleep.c
>>> ===================================================================
>>> --- /dev/null
>>> +++ linux/kernel/power/autosleep.c
>>> @@ -0,0 +1,98 @@
>>> +/*
>>> + * kernel/power/autosleep.c
>>> + *
>>> + * Opportunistic sleep support.
>>> + *
>>> + * Copyright (C) 2012 Rafael J. Wysocki <[email protected]>
>>> + */
>>> +
>>> +#include <linux/device.h>
>>> +#include <linux/mutex.h>
>>> +#include <linux/pm_wakeup.h>
>>> +
>>> +#include "power.h"
>>> +
>>> +static suspend_state_t autosleep_state;
>>> +static struct workqueue_struct *autosleep_wq;
>>> +static DEFINE_MUTEX(autosleep_lock);
>>> +
>>> +static void try_to_suspend(struct work_struct *work)
>>> +{
>>> + unsigned int initial_count, final_count;
>>> +
>>> + if (!pm_get_wakeup_count(&initial_count, true))
>>> + goto out;
>>> +
>>> + mutex_lock(&autosleep_lock);
>>> +
>>> + if (!pm_save_wakeup_count(initial_count)) {
>>> + mutex_unlock(&autosleep_lock);
>>> + goto out;
>>> + }
>>> +
>>> + if (autosleep_state == PM_SUSPEND_ON) {
>>> + mutex_unlock(&autosleep_lock);
>>> + return;
>>> + }
>>> + if (autosleep_state >= PM_SUSPEND_MAX)
>>> + hibernate();
>>> + else
>>> + pm_suspend(autosleep_state);
>>
>>
>> We are calling pm_suspend() or hibernate() directly here.
>> Won't this break build when CONFIG_SUSPEND or CONFIG_HIBERNATION is not set?
>> CONFIG_PM_AUTOSLEEP depends only on PM_SLEEP which means we could enable
>> either one of suspend or hibernation and yet come to this point, breaking
>> the option which was not enabled.
>
> Both pm_suspend() and hibernate() have appropriate static inline definitions
> for !CONFIG_SUSPEND and !CONFIG_HIBERNATION (in suspend.h), as far as I can say.
>


Oh, you are right.. I overlooked that, sorry!

Regards,
Srivatsa S. Bhat

2012-02-23 06:25:48

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/7] PM: Implement autosleep and "wake locks", take2

On 02/23/2012 03:40 AM, Rafael J. Wysocki wrote:

> On Wednesday, February 22, 2012, Srivatsa S. Bhat wrote:
>> On 02/22/2012 10:19 AM, John Stultz wrote:
>>
>>> On Wed, 2012-02-22 at 00:31 +0100, Rafael J. Wysocki wrote:
>>>> Hi all,
>>>>
>>>> After the feedback so far I've decided to follow up with a refreshed patchset.
>>>> The first two patches from the previous one went to linux-pm/linux-next
>>>> and I included the recent evdev patch from Arve (with some modifications)
>>>> to this patchset for completness.
>>>
>>> Hey Rafael,
>>> Thanks again for posting this! I've started playing around with it in a
>>> kvm environment, and got the following warning after echoing off >
>>> autosleep:
>>> ...
>>> PM: resume of devices complete after 185.615 msecs
>>> PM: Finishing wakeup.
>>> Restarting tasks ... done.
>>> PM: Syncing filesystems ... done.
>>> PM: Preparing system for mem sleep
>>> Freezing user space processes ...
>>> Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0):
>>> bash D ffff880015714010
>>
>>
>> Ah.. I think I know what is the problem here..
>>
>> The kernel was freezing userspace processes and meanwhile, you wrote "off"
>> to autosleep. So, as a result, this userspace process (bash) just now
>> entered kernel mode. Unfortunately, the autosleep_lock is held for too long,
>> that is, something like:
>>
>> acquire autosleep_lock
>> modify autosleep_state
>> <============== "A"
>> pm_suspend or hibernate()
>>
>> release autosleep_lock
>>
>> At point marked "A", we should have released the autosleep lock and only then
>> entered pm_suspend or hibernate(). Since the current code holds the lock and
>> enters suspend/hibernate, the userspace process that wrote "off" to autosleep
>> (or even userspace process that writes to /sys/power/state will end up waiting
>> on autosleep_lock, thus failing the freezing operation.)
>>
>> So the solution is to always release the autosleep lock before entering
>> suspend/hibernation.
>
> Well, the autosleep lock is intentionally held around suspend/hibernation in
> try_to_suspend(), because otherwise it would be possible to trigger automatic
> suspend right after user space has disabled it.
>


Hmm.. I was just wondering if we could avoid holding yet another lock in the
suspend/hibernate path, if possible..


> I think the solution is to make pm_autosleep_lock() do a _trylock() and
> return error code if already locked.
>

... and also do a trylock() in pm_autosleep_set_state() right?.... that is
where John hit the problem..

By the way, I am just curious.. how difficult will this make it for userspace
to disable autosleep? I mean, would a trylock mean that the user has to keep
fighting until he finally gets a chance to disable autosleep?

Regards,
Srivatsa S. Bhat

2012-02-23 21:22:47

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/7] PM: Implement autosleep and "wake locks", take2

On Thursday, February 23, 2012, Srivatsa S. Bhat wrote:
> On 02/23/2012 03:40 AM, Rafael J. Wysocki wrote:
>
> > On Wednesday, February 22, 2012, Srivatsa S. Bhat wrote:
> >> On 02/22/2012 10:19 AM, John Stultz wrote:
> >>
> >>> On Wed, 2012-02-22 at 00:31 +0100, Rafael J. Wysocki wrote:
> >>>> Hi all,
> >>>>
> >>>> After the feedback so far I've decided to follow up with a refreshed patchset.
> >>>> The first two patches from the previous one went to linux-pm/linux-next
> >>>> and I included the recent evdev patch from Arve (with some modifications)
> >>>> to this patchset for completness.
> >>>
> >>> Hey Rafael,
> >>> Thanks again for posting this! I've started playing around with it in a
> >>> kvm environment, and got the following warning after echoing off >
> >>> autosleep:
> >>> ...
> >>> PM: resume of devices complete after 185.615 msecs
> >>> PM: Finishing wakeup.
> >>> Restarting tasks ... done.
> >>> PM: Syncing filesystems ... done.
> >>> PM: Preparing system for mem sleep
> >>> Freezing user space processes ...
> >>> Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0):
> >>> bash D ffff880015714010
> >>
> >>
> >> Ah.. I think I know what is the problem here..
> >>
> >> The kernel was freezing userspace processes and meanwhile, you wrote "off"
> >> to autosleep. So, as a result, this userspace process (bash) just now
> >> entered kernel mode. Unfortunately, the autosleep_lock is held for too long,
> >> that is, something like:
> >>
> >> acquire autosleep_lock
> >> modify autosleep_state
> >> <============== "A"
> >> pm_suspend or hibernate()
> >>
> >> release autosleep_lock
> >>
> >> At point marked "A", we should have released the autosleep lock and only then
> >> entered pm_suspend or hibernate(). Since the current code holds the lock and
> >> enters suspend/hibernate, the userspace process that wrote "off" to autosleep
> >> (or even userspace process that writes to /sys/power/state will end up waiting
> >> on autosleep_lock, thus failing the freezing operation.)
> >>
> >> So the solution is to always release the autosleep lock before entering
> >> suspend/hibernation.
> >
> > Well, the autosleep lock is intentionally held around suspend/hibernation in
> > try_to_suspend(), because otherwise it would be possible to trigger automatic
> > suspend right after user space has disabled it.
> >
>
>
> Hmm.. I was just wondering if we could avoid holding yet another lock in the
> suspend/hibernate path, if possible..
>
>
> > I think the solution is to make pm_autosleep_lock() do a _trylock() and
> > return error code if already locked.
> >
>
> ... and also do a trylock() in pm_autosleep_set_state() right?.... that is
> where John hit the problem..
>
> By the way, I am just curious.. how difficult will this make it for userspace
> to disable autosleep? I mean, would a trylock mean that the user has to keep
> fighting until he finally gets a chance to disable autosleep?

That's a good point, so I think it may be a good idea to do
mutex_lock_interruptible() in pm_autosleep_set_state() instead.

Thanks,
Rafael

2012-02-23 21:29:06

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/7] PM: Implement autosleep and "wake locks", take2

On Thursday, February 23, 2012, Rafael J. Wysocki wrote:
> On Thursday, February 23, 2012, Srivatsa S. Bhat wrote:
> > On 02/23/2012 03:40 AM, Rafael J. Wysocki wrote:
> >
> > > On Wednesday, February 22, 2012, Srivatsa S. Bhat wrote:
> > >> On 02/22/2012 10:19 AM, John Stultz wrote:
> > >>
> > >>> On Wed, 2012-02-22 at 00:31 +0100, Rafael J. Wysocki wrote:
> > >>>> Hi all,
> > >>>>
> > >>>> After the feedback so far I've decided to follow up with a refreshed patchset.
> > >>>> The first two patches from the previous one went to linux-pm/linux-next
> > >>>> and I included the recent evdev patch from Arve (with some modifications)
> > >>>> to this patchset for completness.
> > >>>
> > >>> Hey Rafael,
> > >>> Thanks again for posting this! I've started playing around with it in a
> > >>> kvm environment, and got the following warning after echoing off >
> > >>> autosleep:
> > >>> ...
> > >>> PM: resume of devices complete after 185.615 msecs
> > >>> PM: Finishing wakeup.
> > >>> Restarting tasks ... done.
> > >>> PM: Syncing filesystems ... done.
> > >>> PM: Preparing system for mem sleep
> > >>> Freezing user space processes ...
> > >>> Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0):
> > >>> bash D ffff880015714010
> > >>
> > >>
> > >> Ah.. I think I know what is the problem here..
> > >>
> > >> The kernel was freezing userspace processes and meanwhile, you wrote "off"
> > >> to autosleep. So, as a result, this userspace process (bash) just now
> > >> entered kernel mode. Unfortunately, the autosleep_lock is held for too long,
> > >> that is, something like:
> > >>
> > >> acquire autosleep_lock
> > >> modify autosleep_state
> > >> <============== "A"
> > >> pm_suspend or hibernate()
> > >>
> > >> release autosleep_lock
> > >>
> > >> At point marked "A", we should have released the autosleep lock and only then
> > >> entered pm_suspend or hibernate(). Since the current code holds the lock and
> > >> enters suspend/hibernate, the userspace process that wrote "off" to autosleep
> > >> (or even userspace process that writes to /sys/power/state will end up waiting
> > >> on autosleep_lock, thus failing the freezing operation.)
> > >>
> > >> So the solution is to always release the autosleep lock before entering
> > >> suspend/hibernation.
> > >
> > > Well, the autosleep lock is intentionally held around suspend/hibernation in
> > > try_to_suspend(), because otherwise it would be possible to trigger automatic
> > > suspend right after user space has disabled it.
> > >
> >
> >
> > Hmm.. I was just wondering if we could avoid holding yet another lock in the
> > suspend/hibernate path, if possible..
> >
> >
> > > I think the solution is to make pm_autosleep_lock() do a _trylock() and
> > > return error code if already locked.
> > >
> >
> > ... and also do a trylock() in pm_autosleep_set_state() right?.... that is
> > where John hit the problem..
> >
> > By the way, I am just curious.. how difficult will this make it for userspace
> > to disable autosleep? I mean, would a trylock mean that the user has to keep
> > fighting until he finally gets a chance to disable autosleep?
>
> That's a good point, so I think it may be a good idea to do
> mutex_lock_interruptible() in pm_autosleep_set_state() instead.

Now that I think of it, perhaps it's a good idea to just make
pm_autosleep_lock() do mutex_lock_interruptible() _and_ make
pm_autosleep_set_state() use pm_autosleep_lock().

What do you think?

Rafael

2012-02-24 04:44:38

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/7] PM: Implement autosleep and "wake locks", take2

On 02/24/2012 03:02 AM, Rafael J. Wysocki wrote:

> On Thursday, February 23, 2012, Rafael J. Wysocki wrote:
>> On Thursday, February 23, 2012, Srivatsa S. Bhat wrote:
>>> On 02/23/2012 03:40 AM, Rafael J. Wysocki wrote:
>>>
>>>> On Wednesday, February 22, 2012, Srivatsa S. Bhat wrote:
>>>>> On 02/22/2012 10:19 AM, John Stultz wrote:
>>>>>
>>>>>> On Wed, 2012-02-22 at 00:31 +0100, Rafael J. Wysocki wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> After the feedback so far I've decided to follow up with a refreshed patchset.
>>>>>>> The first two patches from the previous one went to linux-pm/linux-next
>>>>>>> and I included the recent evdev patch from Arve (with some modifications)
>>>>>>> to this patchset for completness.
>>>>>>
>>>>>> Hey Rafael,
>>>>>> Thanks again for posting this! I've started playing around with it in a
>>>>>> kvm environment, and got the following warning after echoing off >
>>>>>> autosleep:
>>>>>> ...
>>>>>> PM: resume of devices complete after 185.615 msecs
>>>>>> PM: Finishing wakeup.
>>>>>> Restarting tasks ... done.
>>>>>> PM: Syncing filesystems ... done.
>>>>>> PM: Preparing system for mem sleep
>>>>>> Freezing user space processes ...
>>>>>> Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0):
>>>>>> bash D ffff880015714010
>>>>>
>>>>>
>>>>> Ah.. I think I know what is the problem here..
>>>>>
>>>>> The kernel was freezing userspace processes and meanwhile, you wrote "off"
>>>>> to autosleep. So, as a result, this userspace process (bash) just now
>>>>> entered kernel mode. Unfortunately, the autosleep_lock is held for too long,
>>>>> that is, something like:
>>>>>
>>>>> acquire autosleep_lock
>>>>> modify autosleep_state
>>>>> <============== "A"
>>>>> pm_suspend or hibernate()
>>>>>
>>>>> release autosleep_lock
>>>>>
>>>>> At point marked "A", we should have released the autosleep lock and only then
>>>>> entered pm_suspend or hibernate(). Since the current code holds the lock and
>>>>> enters suspend/hibernate, the userspace process that wrote "off" to autosleep
>>>>> (or even userspace process that writes to /sys/power/state will end up waiting
>>>>> on autosleep_lock, thus failing the freezing operation.)
>>>>>
>>>>> So the solution is to always release the autosleep lock before entering
>>>>> suspend/hibernation.
>>>>
>>>> Well, the autosleep lock is intentionally held around suspend/hibernation in
>>>> try_to_suspend(), because otherwise it would be possible to trigger automatic
>>>> suspend right after user space has disabled it.
>>>>
>>>
>>>
>>> Hmm.. I was just wondering if we could avoid holding yet another lock in the
>>> suspend/hibernate path, if possible..
>>>
>>>
>>>> I think the solution is to make pm_autosleep_lock() do a _trylock() and
>>>> return error code if already locked.
>>>>
>>>
>>> ... and also do a trylock() in pm_autosleep_set_state() right?.... that is
>>> where John hit the problem..
>>>
>>> By the way, I am just curious.. how difficult will this make it for userspace
>>> to disable autosleep? I mean, would a trylock mean that the user has to keep
>>> fighting until he finally gets a chance to disable autosleep?
>>
>> That's a good point, so I think it may be a good idea to do
>> mutex_lock_interruptible() in pm_autosleep_set_state() instead.
>
> Now that I think of it, perhaps it's a good idea to just make
> pm_autosleep_lock() do mutex_lock_interruptible() _and_ make
> pm_autosleep_set_state() use pm_autosleep_lock().
>
> What do you think?
>


Well, I don't think mutex_lock_interruptible() would help us much..
Consider what would happen, if we use it:

* pm-suspend got initiated as part of autosleep. Acquired autosleep lock.
* Userspace is about to get frozen.
* Now, the user tries to write "off" to autosleep. And hence, he is waiting
for autosleep lock, interruptibly.
* The freezer sent a fake signal to all userspace processes and hence
this process also got interrupted.. it is no longer waiting on autosleep
lock - it got the signal and returned, and got frozen.
(And when the userspace gets thawed later, this process won't have the
autosleep lock - which is a different (but yet another) problem).

So ultimately the only thing we achieved is to ensure that freezing of
userspace goes smoothly. But the user process could not succeed in
disabling autosleep. Of course we can work around that by having the
mutex_lock_interruptible() in a loop and so on, but that gets very
ugly pretty soon.

So, I would suggest the following solution:

We want to achieve 2 things here:
a. A user process trying to write to /sys/power/state or
/sys/power/autosleep should not cause freezing failures.
b. When a user process writes "off" to autosleep, the suspend/hibernate
attempt that is on-going, if any, must be immediately aborted, to give
the user the feeling that his preference has been noticed and respected.

And to achieve this, we note that a user process can write "off" to autosleep
only until the userspace gets frozen. No chance after that.

So, let's do this:
1. Drop the autosleep lock before entering pm-suspend/hibernate.
2. This means, a user process can get hold of this lock and successfully
disable autosleep a moment after we initiated suspend, but before userspace
got frozen fully.
3. So, to respect the user's wish, we add a check immediately after the
freezing of userspace is complete - we check if the user disabled autosleep
and bail out, if he did. Otherwise, we continue and suspend the machine.

IOW, this is like hitting 2 birds with one stone ;-)
We don't hold autosleep lock throughout suspend/hibernate, but still react
instantly when the user disables autosleep. And of course, freezing of tasks
won't fail, ever! :-)


Regards,
Srivatsa S. Bhat

2012-02-24 05:17:04

by Matt Helsley

[permalink] [raw]
Subject: Re: [RFC][PATCH 4/7] Input / PM: Add ioctl to block suspend while event queue is not empty

On Wed, Feb 22, 2012 at 12:34:58AM +0100, Rafael J. Wysocki wrote:
> From: Arve Hjønnevåg <[email protected]>
>
> Add a new ioctl, EVIOCSWAKEUPSRC, to attach a wakeup source object to
> an evdev client event queue, such that it will be active whenever the
> queue is not empty. Then, all events in the queue will be regarded
> as wakeup events in progress and pm_get_wakeup_count() will block (or
> return false if woken up by a signal) until they are removed from the
> queue. In consequence, if the checking of wakeup events is enabled
> (e.g. throught the /sys/power/wakeup_count interface), the system
> won't be able to go into a sleep state until the queue is empty.
>
> This allows user space processes to handle situations in which they
> want to do a select() on an evdev descriptor, so they go to sleep
> until there are some events to read from the device's queue, and then
> they don't want the system to go into a sleep state until all the
> events are read (presumably for further processing). Of course, if
> they don't want the system to go into a sleep state _after_ all the
> events have been read from the queue, they have to use a separate
> mechanism that will prevent the system from doing that and it has
> to be activated before reading the first event (that also may be the
> last one).

I haven't seen this idea mentioned before but I must admit I haven't
been following this thread too closely so apologies (and don't bother
rehashing) if it has:

Could you just add this to epoll so that any fd userspace chooses would be
capable of doing this without introducing potentially ecclectic ioctl
interfaces?

struct epoll_event ev;

epfd = epoll_create1(EPOLL_STAY_AWAKE_SET);
ev.data.ptr = foo;
epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev);

Which could be useful because you can put one epollfd in another's epoll
set. Or maybe as an EPOLLKEEPAWAKE flag in the event struct sort of like
EPOLLET:

epfd = epoll_create1(0);
ev.events = EPOLLIN|EPOLLKEEPAWAKE;
epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev);

>
> [rjw: Removed unnecessary checks, changed the names of the new ioctls
> and the names of the functions that add/remove wakeup source objects
> to/from evdev clients, modified the changelog.
> Signed-off-by: Arve Hjønnevåg <[email protected]>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> drivers/input/evdev.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/input.h | 3 ++
> 2 files changed, 58 insertions(+)
>
> Index: linux/drivers/input/evdev.c
> ===================================================================
> --- linux.orig/drivers/input/evdev.c
> +++ linux/drivers/input/evdev.c
> @@ -43,6 +43,7 @@ struct evdev_client {
> unsigned int tail;
> unsigned int packet_head; /* [future] position of the first element of next packet */
> spinlock_t buffer_lock; /* protects access to buffer, head and tail */
> + struct wakeup_source *wakeup_source;
> struct fasync_struct *fasync;
> struct evdev *evdev;
> struct list_head node;
> @@ -75,10 +76,12 @@ static void evdev_pass_event(struct evde
> client->buffer[client->tail].value = 0;
>
> client->packet_head = client->tail;
> + __pm_relax(client->wakeup_source);
> }
>
> if (event->type == EV_SYN && event->code == SYN_REPORT) {
> client->packet_head = client->head;
> + __pm_stay_awake(client->wakeup_source);
> kill_fasync(&client->fasync, SIGIO, POLL_IN);
> }
>
> @@ -255,6 +258,8 @@ static int evdev_release(struct inode *i
> mutex_unlock(&evdev->mutex);
>
> evdev_detach_client(evdev, client);
> + wakeup_source_unregister(client->wakeup_source);
> +
> kfree(client);
>
> evdev_close_device(evdev);
> @@ -373,6 +378,8 @@ static int evdev_fetch_next_event(struct
> if (have_event) {
> *event = client->buffer[client->tail++];
> client->tail &= client->bufsize - 1;
> + if (client->packet_head == client->tail)
> + __pm_relax(client->wakeup_source);
> }
>
> spin_unlock_irq(&client->buffer_lock);
> @@ -623,6 +630,45 @@ static int evdev_handle_set_keycode_v2(s
> return input_set_keycode(dev, &ke);
> }
>
> +static int evdev_attach_wakeup_source(struct evdev *evdev,
> + struct evdev_client *client)
> +{
> + struct wakeup_source *ws;
> + char name[28];
> +
> + if (client->wakeup_source)
> + return 0;
> +
> + snprintf(name, sizeof(name), "%s-%d",
> + dev_name(&evdev->dev), task_tgid_vnr(current));

This does not look like it will work well with tasks in different pid
namespaces. What should happen, I think, is the wakeup_source should hold a
reference to either the struct pid of current or current itself. Then
when someone reads the file you should get the pid vnr in the reader's
pid namespace. That way instead of a bogus pid vnr 0 would show up if
"current" here is not in the reader's pid namepsace.

Cheers,
-Matt Helsley

2012-02-24 23:17:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/7] PM: Implement autosleep and "wake locks", take2

On Friday, February 24, 2012, Srivatsa S. Bhat wrote:
> On 02/24/2012 03:02 AM, Rafael J. Wysocki wrote:
>
> > On Thursday, February 23, 2012, Rafael J. Wysocki wrote:
> >> On Thursday, February 23, 2012, Srivatsa S. Bhat wrote:
> >>> On 02/23/2012 03:40 AM, Rafael J. Wysocki wrote:
[...]
> >>>
> >>> By the way, I am just curious.. how difficult will this make it for userspace
> >>> to disable autosleep? I mean, would a trylock mean that the user has to keep
> >>> fighting until he finally gets a chance to disable autosleep?
> >>
> >> That's a good point, so I think it may be a good idea to do
> >> mutex_lock_interruptible() in pm_autosleep_set_state() instead.
> >
> > Now that I think of it, perhaps it's a good idea to just make
> > pm_autosleep_lock() do mutex_lock_interruptible() _and_ make
> > pm_autosleep_set_state() use pm_autosleep_lock().
> >
> > What do you think?
> >
>
>
> Well, I don't think mutex_lock_interruptible() would help us much..
> Consider what would happen, if we use it:
>
> * pm-suspend got initiated as part of autosleep. Acquired autosleep lock.
> * Userspace is about to get frozen.
> * Now, the user tries to write "off" to autosleep. And hence, he is waiting
> for autosleep lock, interruptibly.
> * The freezer sent a fake signal to all userspace processes and hence
> this process also got interrupted.. it is no longer waiting on autosleep
> lock - it got the signal and returned, and got frozen.
> (And when the userspace gets thawed later, this process won't have the
> autosleep lock - which is a different (but yet another) problem).
>
> So ultimately the only thing we achieved is to ensure that freezing of
> userspace goes smoothly. But the user process could not succeed in
> disabling autosleep. Of course we can work around that by having the
> mutex_lock_interruptible() in a loop and so on, but that gets very
> ugly pretty soon.
>
> So, I would suggest the following solution:
>
> We want to achieve 2 things here:
> a. A user process trying to write to /sys/power/state or
> /sys/power/autosleep should not cause freezing failures.
> b. When a user process writes "off" to autosleep, the suspend/hibernate
> attempt that is on-going, if any, must be immediately aborted, to give
> the user the feeling that his preference has been noticed and respected.
>
> And to achieve this, we note that a user process can write "off" to autosleep
> only until the userspace gets frozen. No chance after that.
>
> So, let's do this:
> 1. Drop the autosleep lock before entering pm-suspend/hibernate.
> 2. This means, a user process can get hold of this lock and successfully
> disable autosleep a moment after we initiated suspend, but before userspace
> got frozen fully.
> 3. So, to respect the user's wish, we add a check immediately after the
> freezing of userspace is complete - we check if the user disabled autosleep
> and bail out, if he did. Otherwise, we continue and suspend the machine.
>
> IOW, this is like hitting 2 birds with one stone ;-)
> We don't hold autosleep lock throughout suspend/hibernate, but still react
> instantly when the user disables autosleep. And of course, freezing of tasks
> won't fail, ever! :-)

Well, you essentially are postulating to restore the "interface" wakeup source
that was present in the previous version of this patch and that I dropped in
order to simplify the code.

I guess I can do that ...

Thanks,
Rafael

2012-02-25 04:25:33

by Arve Hjønnevåg

[permalink] [raw]
Subject: Re: [RFC][PATCH 4/7] Input / PM: Add ioctl to block suspend while event queue is not empty

On Thu, Feb 23, 2012 at 9:16 PM, Matt Helsley <[email protected]> wrote:
> On Wed, Feb 22, 2012 at 12:34:58AM +0100, Rafael J. Wysocki wrote:
>> From: Arve Hj?nnev?g <[email protected]>
>>
>> Add a new ioctl, EVIOCSWAKEUPSRC, to attach a wakeup source object to
>> an evdev client event queue, such that it will be active whenever the
>> queue is not empty. ?Then, all events in the queue will be regarded
>> as wakeup events in progress and pm_get_wakeup_count() will block (or
>> return false if woken up by a signal) until they are removed from the
>> queue. ?In consequence, if the checking of wakeup events is enabled
>> (e.g. throught the /sys/power/wakeup_count interface), the system
>> won't be able to go into a sleep state until the queue is empty.
>>
>> This allows user space processes to handle situations in which they
>> want to do a select() on an evdev descriptor, so they go to sleep
>> until there are some events to read from the device's queue, and then
>> they don't want the system to go into a sleep state until all the
>> events are read (presumably for further processing). ?Of course, if
>> they don't want the system to go into a sleep state _after_ all the
>> events have been read from the queue, they have to use a separate
>> mechanism that will prevent the system from doing that and it has
>> to be activated before reading the first event (that also may be the
>> last one).
>
> I haven't seen this idea mentioned before but I must admit I haven't
> been following this thread too closely so apologies (and don't bother
> rehashing) if it has:
>
> Could you just add this to epoll so that any fd userspace chooses would be
> capable of doing this without introducing potentially ecclectic ioctl
> interfaces?
>

This is an interesting idea, but I'm not sure how well it would work.

I looked at the epoll code and it looks like it is possible to
activate the wakeup-source from the wait queue function it uses. The
epoll callback will happen without holding evdev client buffer_lock,
so the wakeup-source and buffer state will not always be in sync (this
may be OK, but require more thought). This callback is also called if
no data was added to the queue we are polling on because another
client has grabbed the input device (is this a bug or intended?).

There is no call into the epoll code when input queue is emptied, so
we can't deactivate the wakeup-source until epoll_wait is called
again. This also should be workable, but result in different stats.

It does not look like the normal poll and select interfaces can be
extended the same way (since they remove themselves from the
wait-queue before returning to user-space), so user-space has to be
changed to use epoll even if select or poll would be a better fit.

I don't know how many other drivers this would work for. The input
driver will wake up user-space from the same thread or interrupt
handler that queued the event, but other drivers may defer this to
another thread which makes an epoll wakeup-source insufficient.

...
>> + ? ? snprintf(name, sizeof(name), "%s-%d",
>> + ? ? ? ? ? ? ?dev_name(&evdev->dev), task_tgid_vnr(current));
>
> This does not look like it will work well with tasks in different pid
> namespaces. What should happen, I think, is the wakeup_source should hold a
> reference to either the struct pid of current or current itself. Then
> when someone reads the file you should get the pid vnr in the reader's
> pid namespace. That way instead of a bogus pid vnr 0 would show up if
> "current" here is not in the reader's pid namepsace.
>

The pid here is only used for debugging purposes, and used less than
the dev_name. I don't think tracking pid namespaces is worth the
trouble here, so if this is a real problem we can just drop the pid
from the name for now.

--
Arve Hj?nnev?g

2012-02-25 04:43:32

by Arve Hjønnevåg

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/7] PM: Implement autosleep and "wake locks", take2

On Fri, Feb 24, 2012 at 3:21 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Friday, February 24, 2012, Srivatsa S. Bhat wrote:
>> On 02/24/2012 03:02 AM, Rafael J. Wysocki wrote:
>>
>> > On Thursday, February 23, 2012, Rafael J. Wysocki wrote:
>> >> On Thursday, February 23, 2012, Srivatsa S. Bhat wrote:
>> >>> On 02/23/2012 03:40 AM, Rafael J. Wysocki wrote:
> [...]
>> >>>
>> >>> By the way, I am just curious.. how difficult will this make it for userspace
>> >>> to disable autosleep? I mean, would a trylock mean that the user has to keep
>> >>> fighting until he finally gets a chance to disable autosleep?
>> >>
>> >> That's a good point, so I think it may be a good idea to do
>> >> mutex_lock_interruptible() in pm_autosleep_set_state() instead.
>> >
>> > Now that I think of it, perhaps it's a good idea to just make
>> > pm_autosleep_lock() do mutex_lock_interruptible() _and_ make
>> > pm_autosleep_set_state() use pm_autosleep_lock().
>> >
>> > What do you think?
>> >
>>
>>
>> Well, I don't think mutex_lock_interruptible() would help us much..
>> Consider what would happen, if we use it:
>>
>> * pm-suspend got initiated as part of autosleep. Acquired autosleep lock.
>> * Userspace is about to get frozen.
>> * Now, the user tries to write "off" to autosleep. And hence, he is waiting
>> ? for autosleep lock, interruptibly.
>> * The freezer sent a fake signal to all userspace processes and hence
>> ? this process also got interrupted.. it is no longer waiting on autosleep
>> ? lock - it got the signal and returned, and got frozen.
>> ? (And when the userspace gets thawed later, this process won't have the
>> ? ?autosleep lock - which is a different (but yet another) problem).
>>
>> So ultimately the only thing we achieved is to ensure that freezing of
>> userspace goes smoothly. But the user process could not succeed in
>> disabling autosleep. Of course we can work around that by having the
>> mutex_lock_interruptible() in a loop and so on, but that gets very
>> ugly pretty soon.
>>
>> So, I would suggest the following solution:
>>
>> We want to achieve 2 things here:
>> ?a. A user process trying to write to /sys/power/state or
>> ? ? /sys/power/autosleep should not cause freezing failures.
>> ?b. When a user process writes "off" to autosleep, the suspend/hibernate
>> ? ? attempt that is on-going, if any, must be immediately aborted, to give
>> ? ? the user the feeling that his preference has been noticed and respected.
>>
>> And to achieve this, we note that a user process can write "off" to autosleep
>> only until the userspace gets frozen. No chance after that.
>>
>> So, let's do this:
>> 1. Drop the autosleep lock before entering pm-suspend/hibernate.
>> 2. This means, a user process can get hold of this lock and successfully
>> ? ?disable autosleep a moment after we initiated suspend, but before userspace
>> ? ?got frozen fully.
>> 3. So, to respect the user's wish, we add a check immediately after the
>> ? ?freezing of userspace is complete - we check if the user disabled autosleep
>> ? ?and bail out, if he did. Otherwise, we continue and suspend the machine.
>>
>> IOW, this is like hitting 2 birds with one stone ;-)
>> We don't hold autosleep lock throughout suspend/hibernate, but still react
>> instantly when the user disables autosleep. And of course, freezing of tasks
>> won't fail, ever! :-)
>
> Well, you essentially are postulating to restore the "interface" wakeup source
> that was present in the previous version of this patch and that I dropped in
> order to simplify the code.
>
> I guess I can do that ...
>

If this wakeup source is reported as active whenever user-space has
not requested suspend that would be useful in the stats. It does not
look like your original patch did this however, but you could have a
main wakeup-source that you release when any form of suspend is
requested and activate when turning off auto suspend or returning from
a one-shot suspend operation.

--
Arve Hj?nnev?g

2012-02-25 19:20:55

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/7] PM: Implement autosleep and "wake locks", take2

On 02/25/2012 04:51 AM, Rafael J. Wysocki wrote:

> On Friday, February 24, 2012, Srivatsa S. Bhat wrote:
>> On 02/24/2012 03:02 AM, Rafael J. Wysocki wrote:
>>
>>> On Thursday, February 23, 2012, Rafael J. Wysocki wrote:
>>>> On Thursday, February 23, 2012, Srivatsa S. Bhat wrote:
>>>>> On 02/23/2012 03:40 AM, Rafael J. Wysocki wrote:
> [...]
>>>>>
>>>>> By the way, I am just curious.. how difficult will this make it for userspace
>>>>> to disable autosleep? I mean, would a trylock mean that the user has to keep
>>>>> fighting until he finally gets a chance to disable autosleep?
>>>>
>>>> That's a good point, so I think it may be a good idea to do
>>>> mutex_lock_interruptible() in pm_autosleep_set_state() instead.
>>>
>>> Now that I think of it, perhaps it's a good idea to just make
>>> pm_autosleep_lock() do mutex_lock_interruptible() _and_ make
>>> pm_autosleep_set_state() use pm_autosleep_lock().
>>>
>>> What do you think?
>>>
>>
>>
>> Well, I don't think mutex_lock_interruptible() would help us much..
>> Consider what would happen, if we use it:
>>
>> * pm-suspend got initiated as part of autosleep. Acquired autosleep lock.
>> * Userspace is about to get frozen.
>> * Now, the user tries to write "off" to autosleep. And hence, he is waiting
>> for autosleep lock, interruptibly.
>> * The freezer sent a fake signal to all userspace processes and hence
>> this process also got interrupted.. it is no longer waiting on autosleep
>> lock - it got the signal and returned, and got frozen.
>> (And when the userspace gets thawed later, this process won't have the
>> autosleep lock - which is a different (but yet another) problem).
>>
>> So ultimately the only thing we achieved is to ensure that freezing of
>> userspace goes smoothly. But the user process could not succeed in
>> disabling autosleep. Of course we can work around that by having the
>> mutex_lock_interruptible() in a loop and so on, but that gets very
>> ugly pretty soon.
>>
>> So, I would suggest the following solution:
>>
>> We want to achieve 2 things here:
>> a. A user process trying to write to /sys/power/state or
>> /sys/power/autosleep should not cause freezing failures.
>> b. When a user process writes "off" to autosleep, the suspend/hibernate
>> attempt that is on-going, if any, must be immediately aborted, to give
>> the user the feeling that his preference has been noticed and respected.
>>
>> And to achieve this, we note that a user process can write "off" to autosleep
>> only until the userspace gets frozen. No chance after that.
>>
>> So, let's do this:
>> 1. Drop the autosleep lock before entering pm-suspend/hibernate.
>> 2. This means, a user process can get hold of this lock and successfully
>> disable autosleep a moment after we initiated suspend, but before userspace
>> got frozen fully.
>> 3. So, to respect the user's wish, we add a check immediately after the
>> freezing of userspace is complete - we check if the user disabled autosleep
>> and bail out, if he did. Otherwise, we continue and suspend the machine.
>>
>> IOW, this is like hitting 2 birds with one stone ;-)
>> We don't hold autosleep lock throughout suspend/hibernate, but still react
>> instantly when the user disables autosleep. And of course, freezing of tasks
>> won't fail, ever! :-)
>
> Well, you essentially are postulating to restore the "interface" wakeup source
> that was present in the previous version of this patch and that I dropped in
> order to simplify the code.
>


Oh is it? I guess I haven't followed this thread very closely...

> I guess I can do that ...
>


Oh by the way, this scheme doesn't solve all problems. It might be effective
in reacting "instantly" to a request by the user to *switch off* autosleep.
But say, when the user wants to switch to suspend instead of hibernate as the
autosleep preference, for example, I don't think it would be as quick in
responding... (I mean, it might do the old operation one more time before
switching to the new one..)

But I guess at this point it might be wiser to say "sigh.. we can do only so
much..." instead of complicating the code too much in an attempt to meet
everybody's expectations :-)

Regards,
Srivatsa S. Bhat

2012-02-25 20:39:55

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/7] PM: Implement autosleep and "wake locks", take2

On Saturday, February 25, 2012, Arve Hj?nnev?g wrote:
> On Fri, Feb 24, 2012 at 3:21 PM, Rafael J. Wysocki <[email protected]> wrote:
> > On Friday, February 24, 2012, Srivatsa S. Bhat wrote:
> >> On 02/24/2012 03:02 AM, Rafael J. Wysocki wrote:
> >>
> >> > On Thursday, February 23, 2012, Rafael J. Wysocki wrote:
> >> >> On Thursday, February 23, 2012, Srivatsa S. Bhat wrote:
> >> >>> On 02/23/2012 03:40 AM, Rafael J. Wysocki wrote:
> > [...]
> >> >>>
> >> >>> By the way, I am just curious.. how difficult will this make it for userspace
> >> >>> to disable autosleep? I mean, would a trylock mean that the user has to keep
> >> >>> fighting until he finally gets a chance to disable autosleep?
> >> >>
> >> >> That's a good point, so I think it may be a good idea to do
> >> >> mutex_lock_interruptible() in pm_autosleep_set_state() instead.
> >> >
> >> > Now that I think of it, perhaps it's a good idea to just make
> >> > pm_autosleep_lock() do mutex_lock_interruptible() _and_ make
> >> > pm_autosleep_set_state() use pm_autosleep_lock().
> >> >
> >> > What do you think?
> >> >
> >>
> >>
> >> Well, I don't think mutex_lock_interruptible() would help us much..
> >> Consider what would happen, if we use it:
> >>
> >> * pm-suspend got initiated as part of autosleep. Acquired autosleep lock.
> >> * Userspace is about to get frozen.
> >> * Now, the user tries to write "off" to autosleep. And hence, he is waiting
> >> for autosleep lock, interruptibly.
> >> * The freezer sent a fake signal to all userspace processes and hence
> >> this process also got interrupted.. it is no longer waiting on autosleep
> >> lock - it got the signal and returned, and got frozen.
> >> (And when the userspace gets thawed later, this process won't have the
> >> autosleep lock - which is a different (but yet another) problem).
> >>
> >> So ultimately the only thing we achieved is to ensure that freezing of
> >> userspace goes smoothly. But the user process could not succeed in
> >> disabling autosleep. Of course we can work around that by having the
> >> mutex_lock_interruptible() in a loop and so on, but that gets very
> >> ugly pretty soon.
> >>
> >> So, I would suggest the following solution:
> >>
> >> We want to achieve 2 things here:
> >> a. A user process trying to write to /sys/power/state or
> >> /sys/power/autosleep should not cause freezing failures.
> >> b. When a user process writes "off" to autosleep, the suspend/hibernate
> >> attempt that is on-going, if any, must be immediately aborted, to give
> >> the user the feeling that his preference has been noticed and respected.
> >>
> >> And to achieve this, we note that a user process can write "off" to autosleep
> >> only until the userspace gets frozen. No chance after that.
> >>
> >> So, let's do this:
> >> 1. Drop the autosleep lock before entering pm-suspend/hibernate.
> >> 2. This means, a user process can get hold of this lock and successfully
> >> disable autosleep a moment after we initiated suspend, but before userspace
> >> got frozen fully.
> >> 3. So, to respect the user's wish, we add a check immediately after the
> >> freezing of userspace is complete - we check if the user disabled autosleep
> >> and bail out, if he did. Otherwise, we continue and suspend the machine.
> >>
> >> IOW, this is like hitting 2 birds with one stone ;-)
> >> We don't hold autosleep lock throughout suspend/hibernate, but still react
> >> instantly when the user disables autosleep. And of course, freezing of tasks
> >> won't fail, ever! :-)
> >
> > Well, you essentially are postulating to restore the "interface" wakeup source
> > that was present in the previous version of this patch and that I dropped in
> > order to simplify the code.
> >
> > I guess I can do that ...
> >
>
> If this wakeup source is reported as active whenever user-space has
> not requested suspend that would be useful in the stats. It does not
> look like your original patch did this however,

No, it didn't.

> but you could have a
> main wakeup-source that you release when any form of suspend is
> requested and activate when turning off auto suspend or returning from
> a one-shot suspend operation.

I honestly don't think I can do that and handle the /sys/power/wakeup_count
-> /sys/power/state handoff (which is used by OLPC, as we've learnt recently)
sanely at the same time. OTOH, I don't want CONFIG_AUTOSLEEP to disable that
interface entirely, because things like that basically prevent people from
trying alternative features, which is essential to us for "interesting
feedback" reasons.

So, my "main" wakeup source is only going to register the number of times user
space has (successfully) written to /sysp/power/autosleep (please have a look
at the updated patch I'm going to send in a reply to Srivatsa in a little
while).

Thanks,
Rafael

2012-02-25 20:57:16

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/7] PM: Implement autosleep and "wake locks", take2

On Saturday, February 25, 2012, Srivatsa S. Bhat wrote:
> On 02/25/2012 04:51 AM, Rafael J. Wysocki wrote:
>
> > On Friday, February 24, 2012, Srivatsa S. Bhat wrote:
> >> On 02/24/2012 03:02 AM, Rafael J. Wysocki wrote:
> >>
> >>> On Thursday, February 23, 2012, Rafael J. Wysocki wrote:
> >>>> On Thursday, February 23, 2012, Srivatsa S. Bhat wrote:
> >>>>> On 02/23/2012 03:40 AM, Rafael J. Wysocki wrote:
> > [...]
> >>>>>
> >>>>> By the way, I am just curious.. how difficult will this make it for userspace
> >>>>> to disable autosleep? I mean, would a trylock mean that the user has to keep
> >>>>> fighting until he finally gets a chance to disable autosleep?
> >>>>
> >>>> That's a good point, so I think it may be a good idea to do
> >>>> mutex_lock_interruptible() in pm_autosleep_set_state() instead.
> >>>
> >>> Now that I think of it, perhaps it's a good idea to just make
> >>> pm_autosleep_lock() do mutex_lock_interruptible() _and_ make
> >>> pm_autosleep_set_state() use pm_autosleep_lock().
> >>>
> >>> What do you think?
> >>>
> >>
> >>
> >> Well, I don't think mutex_lock_interruptible() would help us much..
> >> Consider what would happen, if we use it:
> >>
> >> * pm-suspend got initiated as part of autosleep. Acquired autosleep lock.
> >> * Userspace is about to get frozen.
> >> * Now, the user tries to write "off" to autosleep. And hence, he is waiting
> >> for autosleep lock, interruptibly.
> >> * The freezer sent a fake signal to all userspace processes and hence
> >> this process also got interrupted.. it is no longer waiting on autosleep
> >> lock - it got the signal and returned, and got frozen.
> >> (And when the userspace gets thawed later, this process won't have the
> >> autosleep lock - which is a different (but yet another) problem).
> >>
> >> So ultimately the only thing we achieved is to ensure that freezing of
> >> userspace goes smoothly. But the user process could not succeed in
> >> disabling autosleep. Of course we can work around that by having the
> >> mutex_lock_interruptible() in a loop and so on, but that gets very
> >> ugly pretty soon.
> >>
> >> So, I would suggest the following solution:
> >>
> >> We want to achieve 2 things here:
> >> a. A user process trying to write to /sys/power/state or
> >> /sys/power/autosleep should not cause freezing failures.
> >> b. When a user process writes "off" to autosleep, the suspend/hibernate
> >> attempt that is on-going, if any, must be immediately aborted, to give
> >> the user the feeling that his preference has been noticed and respected.
> >>
> >> And to achieve this, we note that a user process can write "off" to autosleep
> >> only until the userspace gets frozen. No chance after that.
> >>
> >> So, let's do this:
> >> 1. Drop the autosleep lock before entering pm-suspend/hibernate.
> >> 2. This means, a user process can get hold of this lock and successfully
> >> disable autosleep a moment after we initiated suspend, but before userspace
> >> got frozen fully.
> >> 3. So, to respect the user's wish, we add a check immediately after the
> >> freezing of userspace is complete - we check if the user disabled autosleep
> >> and bail out, if he did. Otherwise, we continue and suspend the machine.
> >>
> >> IOW, this is like hitting 2 birds with one stone ;-)
> >> We don't hold autosleep lock throughout suspend/hibernate, but still react
> >> instantly when the user disables autosleep. And of course, freezing of tasks
> >> won't fail, ever! :-)
> >
> > Well, you essentially are postulating to restore the "interface" wakeup source
> > that was present in the previous version of this patch and that I dropped in
> > order to simplify the code.
> >
>
>
> Oh is it? I guess I haven't followed this thread very closely...
>
> > I guess I can do that ...
> >
>
>
> Oh by the way, this scheme doesn't solve all problems. It might be effective
> in reacting "instantly" to a request by the user to *switch off* autosleep.
> But say, when the user wants to switch to suspend instead of hibernate as the
> autosleep preference, for example, I don't think it would be as quick in
> responding... (I mean, it might do the old operation one more time before
> switching to the new one..)
>
> But I guess at this point it might be wiser to say "sigh.. we can do only so
> much..." instead of complicating the code too much in an attempt to meet
> everybody's expectations :-)

I think we can do something like in the updated patch [5/7] below.

It uses a special wakeup source object called "autosleep" to bump up the
number of wakeup events in progress before acquiring autosleep_lock in
pm_autosleep_set_state(). This way, either pm_autosleep_set_state() will
acquire autosleep_lock before try_to_suspend(), in which case the latter
will see the change of autosleep_state immediately (after autosleep_lock has
been passed to it), or try_to_suspend() will get it first, but then
pm_save_wakeup_count() or pm_suspend()/hibernate() will see the nonzero counter
of wakeup events in progress and return error code (sooner or later).

The drawback is that writes to /sys/power/autosleep may interfere with
the /sys/power/wakeup_count + /sys/power/state interface by interrupting
transitions started by writing to /sys/power/state, for example (although
I think that's highly unlikely).

Additionally, I made pm_autosleep_lock() use mutex_trylock_interruptible()
to prevent operations on /sys/power/wakeup_count and/or /sys/power/state
from failing the freezing of tasks started by try_to_suspend().

Thanks,
Rafael

---
From: Rafael J. Wysocki <[email protected]>
Subject: PM / Sleep: Implement opportunistic sleep

Introduce a mechanism by which the kernel can trigger global
transitions to a sleep state chosen by user space if there are no
active wakeup sources.

It consists of a new sysfs attribute, /sys/power/autosleep, that
can be written one of the strings returned by reads from
/sys/power/state, an ordered workqueue and a work item carrying out
the "suspend" operations. If a string representing the system's
sleep state is written to /sys/power/autosleep, the work item
triggering transitions to that state is queued up and it requeues
itself after every execution until user space writes "off" to
/sys/power/autosleep.

That work item enables the detection of wakeup events using the
functions already defined in drivers/base/power/wakeup.c (with one
small modification) and calls either pm_suspend(), or hibernate() to
put the system into a sleep state. If a wakeup event is reported
while the transition is in progress, it will abort the transition and
the "system suspend" work item will be queued up again.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
Documentation/ABI/testing/sysfs-power | 17 ++++
drivers/base/power/wakeup.c | 38 ++++++-----
include/linux/suspend.h | 13 +++
kernel/power/Kconfig | 8 ++
kernel/power/Makefile | 1
kernel/power/autosleep.c | 113 ++++++++++++++++++++++++++++++++
kernel/power/main.c | 117 ++++++++++++++++++++++++++++------
kernel/power/power.h | 18 +++++
8 files changed, 290 insertions(+), 35 deletions(-)

Index: linux/kernel/power/Makefile
===================================================================
--- linux.orig/kernel/power/Makefile
+++ linux/kernel/power/Makefile
@@ -9,5 +9,6 @@ obj-$(CONFIG_SUSPEND) += suspend.o
obj-$(CONFIG_PM_TEST_SUSPEND) += suspend_test.o
obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o \
block_io.o
+obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o

obj-$(CONFIG_MAGIC_SYSRQ) += poweroff.o
Index: linux/kernel/power/Kconfig
===================================================================
--- linux.orig/kernel/power/Kconfig
+++ linux/kernel/power/Kconfig
@@ -103,6 +103,14 @@ config PM_SLEEP_SMP
select HOTPLUG
select HOTPLUG_CPU

+config PM_AUTOSLEEP
+ bool "Opportunistic sleep"
+ depends on PM_SLEEP
+ default n
+ ---help---
+ Allow the kernel to trigger a system transition into a global sleep
+ state automatically whenever there are no active wakeup sources.
+
config PM_RUNTIME
bool "Run-time PM core functionality"
depends on !IA64_HP_SIM
Index: linux/kernel/power/power.h
===================================================================
--- linux.orig/kernel/power/power.h
+++ linux/kernel/power/power.h
@@ -264,3 +264,21 @@ static inline void suspend_thaw_processe
{
}
#endif
+
+#ifdef CONFIG_PM_AUTOSLEEP
+
+/* kernel/power/autosleep.c */
+extern int pm_autosleep_init(void);
+extern int pm_autosleep_lock(void);
+extern void pm_autosleep_unlock(void);
+extern suspend_state_t pm_autosleep_state(void);
+extern int pm_autosleep_set_state(suspend_state_t state);
+
+#else /* !CONFIG_PM_AUTOSLEEP */
+
+static inline int pm_autosleep_init(void) { return 0; }
+static inline int pm_autosleep_lock(void) { return 0; }
+static inline void pm_autosleep_unlock(void) {}
+static inline suspend_state_t pm_autosleep_state(void) { return PM_SUSPEND_ON; }
+
+#endif /* !CONFIG_PM_AUTOSLEEP */
Index: linux/include/linux/suspend.h
===================================================================
--- linux.orig/include/linux/suspend.h
+++ linux/include/linux/suspend.h
@@ -356,7 +356,7 @@ extern int unregister_pm_notifier(struct
extern bool events_check_enabled;

extern bool pm_wakeup_pending(void);
-extern bool pm_get_wakeup_count(unsigned int *count);
+extern bool pm_get_wakeup_count(unsigned int *count, bool block);
extern bool pm_save_wakeup_count(unsigned int count);

static inline void lock_system_sleep(void)
@@ -407,6 +407,17 @@ static inline void unlock_system_sleep(v

#endif /* !CONFIG_PM_SLEEP */

+#ifdef CONFIG_PM_AUTOSLEEP
+
+/* kernel/power/autosleep.c */
+void queue_up_suspend_work(void);
+
+#else /* !CONFIG_PM_AUTOSLEEP */
+
+static inline void queue_up_suspend_work(void) {}
+
+#endif /* !CONFIG_PM_AUTOSLEEP */
+
#ifdef CONFIG_ARCH_SAVE_PAGE_KEYS
/*
* The ARCH_SAVE_PAGE_KEYS functions can be used by an architecture
Index: linux/kernel/power/autosleep.c
===================================================================
--- /dev/null
+++ linux/kernel/power/autosleep.c
@@ -0,0 +1,113 @@
+/*
+ * kernel/power/autosleep.c
+ *
+ * Opportunistic sleep support.
+ *
+ * Copyright (C) 2012 Rafael J. Wysocki <[email protected]>
+ */
+
+#include <linux/device.h>
+#include <linux/mutex.h>
+#include <linux/pm_wakeup.h>
+
+#include "power.h"
+
+static suspend_state_t autosleep_state;
+static struct workqueue_struct *autosleep_wq;
+static DEFINE_MUTEX(autosleep_lock);
+static struct wakeup_source *autosleep_ws;
+
+static void try_to_suspend(struct work_struct *work)
+{
+ unsigned int initial_count, final_count;
+
+ if (!pm_get_wakeup_count(&initial_count, true))
+ goto out;
+
+ mutex_lock(&autosleep_lock);
+
+ if (!pm_save_wakeup_count(initial_count)) {
+ mutex_unlock(&autosleep_lock);
+ goto out;
+ }
+
+ if (autosleep_state == PM_SUSPEND_ON) {
+ mutex_unlock(&autosleep_lock);
+ return;
+ }
+ if (autosleep_state >= PM_SUSPEND_MAX)
+ hibernate();
+ else
+ pm_suspend(autosleep_state);
+
+ mutex_unlock(&autosleep_lock);
+
+ if (!pm_get_wakeup_count(&final_count, false))
+ goto out;
+
+ if (final_count == initial_count)
+ schedule_timeout(HZ / 2);
+
+ out:
+ queue_up_suspend_work();
+}
+
+static DECLARE_WORK(suspend_work, try_to_suspend);
+
+void queue_up_suspend_work(void)
+{
+ if (!work_pending(&suspend_work) && autosleep_state > PM_SUSPEND_ON)
+ queue_work(autosleep_wq, &suspend_work);
+}
+
+suspend_state_t pm_autosleep_state(void)
+{
+ return autosleep_state;
+}
+
+int pm_autosleep_lock(void)
+{
+ return mutex_lock_interruptible(&autosleep_lock);
+}
+
+void pm_autosleep_unlock(void)
+{
+ mutex_unlock(&autosleep_lock);
+}
+
+int pm_autosleep_set_state(suspend_state_t state)
+{
+
+#ifndef CONFIG_HIBERNATION
+ if (state >= PM_SUSPEND_MAX)
+ return -EINVAL;
+#endif
+
+ __pm_stay_awake(autosleep_ws);
+
+ mutex_lock(&autosleep_lock);
+
+ autosleep_state = state;
+
+ __pm_relax(autosleep_ws);
+
+ if (state > PM_SUSPEND_ON)
+ queue_up_suspend_work();
+
+ mutex_unlock(&autosleep_lock);
+ return 0;
+}
+
+int __init pm_autosleep_init(void)
+{
+ autosleep_ws = wakeup_source_register("autosleep");
+ if (!autosleep_ws)
+ return -ENOMEM;
+
+ autosleep_wq = alloc_ordered_workqueue("autosleep", 0);
+ if (autosleep_wq)
+ return 0;
+
+ wakeup_source_unregister(autosleep_ws);
+ return -ENOMEM;
+}
Index: linux/kernel/power/main.c
===================================================================
--- linux.orig/kernel/power/main.c
+++ linux/kernel/power/main.c
@@ -269,8 +269,7 @@ static ssize_t state_show(struct kobject
return (s - buf);
}

-static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
- const char *buf, size_t n)
+static suspend_state_t decode_state(const char *buf, size_t n)
{
#ifdef CONFIG_SUSPEND
suspend_state_t state = PM_SUSPEND_STANDBY;
@@ -278,27 +277,48 @@ static ssize_t state_store(struct kobjec
#endif
char *p;
int len;
- int error = -EINVAL;

p = memchr(buf, '\n', n);
len = p ? p - buf : n;

- /* First, check if we are requested to hibernate */
- if (len == 4 && !strncmp(buf, "disk", len)) {
- error = hibernate();
- goto Exit;
- }
+ /* Check hibernation first. */
+ if (len == 4 && !strncmp(buf, "disk", len))
+ return PM_SUSPEND_MAX;

#ifdef CONFIG_SUSPEND
- for (s = &pm_states[state]; state < PM_SUSPEND_MAX; s++, state++) {
- if (*s && len == strlen(*s) && !strncmp(buf, *s, len)) {
- error = pm_suspend(state);
- break;
- }
- }
+ for (s = &pm_states[state]; state < PM_SUSPEND_MAX; s++, state++)
+ if (*s && len == strlen(*s) && !strncmp(buf, *s, len))
+ return state;
#endif

- Exit:
+ return PM_SUSPEND_ON;
+}
+
+static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ suspend_state_t state;
+ int error;
+
+ error = pm_autosleep_lock();
+ if (error)
+ return error;
+
+ if (pm_autosleep_state() > PM_SUSPEND_ON) {
+ error = -EBUSY;
+ goto out;
+ }
+
+ state = decode_state(buf, n);
+ if (state < PM_SUSPEND_MAX)
+ error = pm_suspend(state);
+ else if (state > PM_SUSPEND_ON)
+ error = hibernate();
+ else
+ error = -EINVAL;
+
+ out:
+ pm_autosleep_unlock();
return error ? error : n;
}

@@ -339,7 +359,8 @@ static ssize_t wakeup_count_show(struct
{
unsigned int val;

- return pm_get_wakeup_count(&val) ? sprintf(buf, "%u\n", val) : -EINTR;
+ return pm_get_wakeup_count(&val, true) ?
+ sprintf(buf, "%u\n", val) : -EINTR;
}

static ssize_t wakeup_count_store(struct kobject *kobj,
@@ -347,15 +368,69 @@ static ssize_t wakeup_count_store(struct
const char *buf, size_t n)
{
unsigned int val;
+ int error;
+
+ error = pm_autosleep_lock();
+ if (error)
+ return error;
+
+ if (pm_autosleep_state() > PM_SUSPEND_ON) {
+ error = -EBUSY;
+ goto out;
+ }

if (sscanf(buf, "%u", &val) == 1) {
if (pm_save_wakeup_count(val))
return n;
}
- return -EINVAL;
+ error = -EINVAL;
+
+ out:
+ pm_autosleep_unlock();
+ return error;
}

power_attr(wakeup_count);
+
+#ifdef CONFIG_PM_AUTOSLEEP
+static ssize_t autosleep_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ suspend_state_t state = pm_autosleep_state();
+
+ if (state == PM_SUSPEND_ON)
+ return sprintf(buf, "off\n");
+
+#ifdef CONFIG_SUSPEND
+ if (state < PM_SUSPEND_MAX)
+ return sprintf(buf, "%s\n", valid_state(state) ?
+ pm_states[state] : "error");
+#endif
+#ifdef CONFIG_HIBERNATION
+ return sprintf(buf, "disk\n");
+#else
+ return sprintf(buf, "error");
+#endif
+}
+
+static ssize_t autosleep_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ suspend_state_t state = decode_state(buf, n);
+ int error;
+
+ if (state == PM_SUSPEND_ON && strncmp(buf, "off", 3)
+ && strncmp(buf, "off\n", 4))
+ return -EINVAL;
+
+ error = pm_autosleep_set_state(state);
+ return error ? error : n;
+}
+
+power_attr(autosleep);
+#endif /* CONFIG_PM_AUTOSLEEP */
#endif /* CONFIG_PM_SLEEP */

#ifdef CONFIG_PM_TRACE
@@ -409,6 +484,9 @@ static struct attribute * g[] = {
#ifdef CONFIG_PM_SLEEP
&pm_async_attr.attr,
&wakeup_count_attr.attr,
+#ifdef CONFIG_PM_AUTOSLEEP
+ &autosleep_attr.attr,
+#endif
#ifdef CONFIG_PM_DEBUG
&pm_test_attr.attr,
#endif
@@ -444,7 +522,10 @@ static int __init pm_init(void)
power_kobj = kobject_create_and_add("power", NULL);
if (!power_kobj)
return -ENOMEM;
- return sysfs_create_group(power_kobj, &attr_group);
+ error = sysfs_create_group(power_kobj, &attr_group);
+ if (error)
+ return error;
+ return pm_autosleep_init();
}

core_initcall(pm_init);
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -492,8 +492,10 @@ static void wakeup_source_deactivate(str
atomic_add(MAX_IN_PROGRESS, &combined_event_count);

split_counters(&cnt, &inpr);
- if (!inpr && waitqueue_active(&wakeup_count_wait_queue))
+ if (!inpr && waitqueue_active(&wakeup_count_wait_queue)) {
wake_up(&wakeup_count_wait_queue);
+ queue_up_suspend_work();
+ }
}

/**
@@ -654,29 +656,33 @@ bool pm_wakeup_pending(void)
/**
* pm_get_wakeup_count - Read the number of registered wakeup events.
* @count: Address to store the value at.
+ * @block: Whether or not to block.
*
- * Store the number of registered wakeup events at the address in @count. Block
- * if the current number of wakeup events being processed is nonzero.
+ * Store the number of registered wakeup events at the address in @count. If
+ * @block is set, block until the current number of wakeup events being
+ * processed is zero.
*
- * Return 'false' if the wait for the number of wakeup events being processed to
- * drop down to zero has been interrupted by a signal (and the current number
- * of wakeup events being processed is still nonzero). Otherwise return 'true'.
+ * Return 'false' if the current number of wakeup events being processed is
+ * nonzero. Otherwise return 'true'.
*/
-bool pm_get_wakeup_count(unsigned int *count)
+bool pm_get_wakeup_count(unsigned int *count, bool block)
{
unsigned int cnt, inpr;
- DEFINE_WAIT(wait);

- for (;;) {
- prepare_to_wait(&wakeup_count_wait_queue, &wait,
- TASK_INTERRUPTIBLE);
- split_counters(&cnt, &inpr);
- if (inpr == 0 || signal_pending(current))
- break;
+ if (block) {
+ DEFINE_WAIT(wait);

- schedule();
+ for (;;) {
+ prepare_to_wait(&wakeup_count_wait_queue, &wait,
+ TASK_INTERRUPTIBLE);
+ split_counters(&cnt, &inpr);
+ if (inpr == 0 || signal_pending(current))
+ break;
+
+ schedule();
+ }
+ finish_wait(&wakeup_count_wait_queue, &wait);
}
- finish_wait(&wakeup_count_wait_queue, &wait);

split_counters(&cnt, &inpr);
*count = cnt;
Index: linux/Documentation/ABI/testing/sysfs-power
===================================================================
--- linux.orig/Documentation/ABI/testing/sysfs-power
+++ linux/Documentation/ABI/testing/sysfs-power
@@ -172,3 +172,20 @@ Description:

Reading from this file will display the current value, which is
set to 1 MB by default.
+
+What: /sys/power/autosleep
+Date: February 2012
+Contact: Rafael J. Wysocki <[email protected]>
+Description:
+ The /sys/power/autosleep file can be written one of the strings
+ returned by reads from /sys/power/state. If that happens, a
+ work item attempting to trigger a transition of the system to
+ the sleep state represented by that string is queued up. This
+ attempt will only succeed if there are no active wakeup sources
+ in the system at that time. After evey execution, regardless
+ of whether or not the attempt to put the system to sleep has
+ succeeded, the work item requeues itself until user space
+ writes "off" to /sys/power/autosleep.
+
+ Reading from this file causes the last string successfully
+ written to it to be displayed.

2012-02-25 23:29:58

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 4/7] Input / PM: Add ioctl to block suspend while event queue is not empty

On Saturday, February 25, 2012, Arve Hj?nnev?g wrote:
> On Thu, Feb 23, 2012 at 9:16 PM, Matt Helsley <[email protected]> wrote:
> > On Wed, Feb 22, 2012 at 12:34:58AM +0100, Rafael J. Wysocki wrote:
> >> From: Arve Hj?nnev?g <[email protected]>
> >>
> >> Add a new ioctl, EVIOCSWAKEUPSRC, to attach a wakeup source object to
> >> an evdev client event queue, such that it will be active whenever the
> >> queue is not empty. Then, all events in the queue will be regarded
> >> as wakeup events in progress and pm_get_wakeup_count() will block (or
> >> return false if woken up by a signal) until they are removed from the
> >> queue. In consequence, if the checking of wakeup events is enabled
> >> (e.g. throught the /sys/power/wakeup_count interface), the system
> >> won't be able to go into a sleep state until the queue is empty.
> >>
> >> This allows user space processes to handle situations in which they
> >> want to do a select() on an evdev descriptor, so they go to sleep
> >> until there are some events to read from the device's queue, and then
> >> they don't want the system to go into a sleep state until all the
> >> events are read (presumably for further processing). Of course, if
> >> they don't want the system to go into a sleep state _after_ all the
> >> events have been read from the queue, they have to use a separate
> >> mechanism that will prevent the system from doing that and it has
> >> to be activated before reading the first event (that also may be the
> >> last one).
> >
> > I haven't seen this idea mentioned before but I must admit I haven't
> > been following this thread too closely so apologies (and don't bother
> > rehashing) if it has:
> >
> > Could you just add this to epoll so that any fd userspace chooses would be
> > capable of doing this without introducing potentially ecclectic ioctl
> > interfaces?
> >
>
> This is an interesting idea, but I'm not sure how well it would work.
>
> I looked at the epoll code and it looks like it is possible to
> activate the wakeup-source from the wait queue function it uses.

I'm not sure I'm following you here. How exactly would you like to do that?

In particular, what data structure would the wakeup source object be
associated with?

> The epoll callback will happen without holding evdev client buffer_lock,
> so the wakeup-source and buffer state will not always be in sync (this
> may be OK, but require more thought). This callback is also called if
> no data was added to the queue we are polling on because another
> client has grabbed the input device (is this a bug or intended?).
>
> There is no call into the epoll code when input queue is emptied, so
> we can't deactivate the wakeup-source until epoll_wait is called
> again. This also should be workable, but result in different stats.
>
> It does not look like the normal poll and select interfaces can be
> extended the same way (since they remove themselves from the
> wait-queue before returning to user-space), so user-space has to be
> changed to use epoll even if select or poll would be a better fit.

Well, epoll without EPOLLET is equivalent to poll, so the only potential
issue is select. How serious may the problem with that be?

> I don't know how many other drivers this would work for. The input
> driver will wake up user-space from the same thread or interrupt
> handler that queued the event, but other drivers may defer this to
> another thread which makes an epoll wakeup-source insufficient.

If we go for new ioctls insread, we'll have to add them to all of those
drivers, so I would prefer the epoll-based approach if that's viable at
least for a subset of the relevant drivers.

> ...
> >> + snprintf(name, sizeof(name), "%s-%d",
> >> + dev_name(&evdev->dev), task_tgid_vnr(current));
> >
> > This does not look like it will work well with tasks in different pid
> > namespaces. What should happen, I think, is the wakeup_source should hold a
> > reference to either the struct pid of current or current itself. Then
> > when someone reads the file you should get the pid vnr in the reader's
> > pid namespace. That way instead of a bogus pid vnr 0 would show up if
> > "current" here is not in the reader's pid namepsace.
> >
>
> The pid here is only used for debugging purposes, and used less than
> the dev_name. I don't think tracking pid namespaces is worth the
> trouble here, so if this is a real problem we can just drop the pid
> from the name for now.

OK

Thanks,
Rafael

2012-02-26 20:53:19

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 4/7] Input / PM: Add ioctl to block suspend while event queue is not empty

On Friday, February 24, 2012, Matt Helsley wrote:
> On Wed, Feb 22, 2012 at 12:34:58AM +0100, Rafael J. Wysocki wrote:
> > From: Arve Hjønnevåg <[email protected]>
> >
> > Add a new ioctl, EVIOCSWAKEUPSRC, to attach a wakeup source object to
> > an evdev client event queue, such that it will be active whenever the
> > queue is not empty. Then, all events in the queue will be regarded
> > as wakeup events in progress and pm_get_wakeup_count() will block (or
> > return false if woken up by a signal) until they are removed from the
> > queue. In consequence, if the checking of wakeup events is enabled
> > (e.g. throught the /sys/power/wakeup_count interface), the system
> > won't be able to go into a sleep state until the queue is empty.
> >
> > This allows user space processes to handle situations in which they
> > want to do a select() on an evdev descriptor, so they go to sleep
> > until there are some events to read from the device's queue, and then
> > they don't want the system to go into a sleep state until all the
> > events are read (presumably for further processing). Of course, if
> > they don't want the system to go into a sleep state _after_ all the
> > events have been read from the queue, they have to use a separate
> > mechanism that will prevent the system from doing that and it has
> > to be activated before reading the first event (that also may be the
> > last one).
>
> I haven't seen this idea mentioned before but I must admit I haven't
> been following this thread too closely so apologies (and don't bother
> rehashing) if it has:
>
> Could you just add this to epoll so that any fd userspace chooses would be
> capable of doing this without introducing potentially ecclectic ioctl
> interfaces?
>
> struct epoll_event ev;
>
> epfd = epoll_create1(EPOLL_STAY_AWAKE_SET);
> ev.data.ptr = foo;
> epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev);
>
> Which could be useful because you can put one epollfd in another's epoll
> set. Or maybe as an EPOLLKEEPAWAKE flag in the event struct sort of like
> EPOLLET:
>
> epfd = epoll_create1(0);
> ev.events = EPOLLIN|EPOLLKEEPAWAKE;
> epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev);

Do you mean something like the patch below, or something different?

Rafael

---
drivers/input/evdev.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++
fs/eventpoll.c | 15 +++++++++++-
include/linux/eventpoll.h | 6 +++++
include/linux/fs.h | 1
4 files changed, 76 insertions(+), 1 deletion(-)

Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h
+++ linux/include/linux/fs.h
@@ -1604,6 +1604,7 @@ struct file_operations {
ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
int (*readdir) (struct file *, void *, filldir_t);
unsigned int (*poll) (struct file *, struct poll_table_struct *);
+ void (*epoll_ctl) (struct file *, int, unsigned int);
long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
Index: linux/fs/eventpoll.c
===================================================================
--- linux.orig/fs/eventpoll.c
+++ linux/fs/eventpoll.c
@@ -609,6 +609,10 @@ static int ep_remove(struct eventpoll *e
unsigned long flags;
struct file *file = epi->ffd.file;

+ /* Notify the underlying driver that the polling has completed */
+ if (file->f_op->epoll_ctl)
+ file->f_op->epoll_ctl(file, EPOLL_CTL_DEL, epi->event.events);
+
/*
* Removes poll wait queue hooks. We _have_ to do this without holding
* the "ep->lock" otherwise a deadlock might occur. This because of the
@@ -1094,6 +1098,10 @@ static int ep_insert(struct eventpoll *e
epq.epi = epi;
init_poll_funcptr(&epq.pt, ep_ptable_queue_proc);

+ /* Notify the underlying driver that we want to poll it */
+ if (tfile->f_op->epoll_ctl)
+ tfile->f_op->epoll_ctl(tfile, EPOLL_CTL_ADD, event->events);
+
/*
* Attach the item to the poll hooks and get current event bits.
* We can safely use the file* here because its usage count has
@@ -1185,6 +1193,7 @@ error_unregister:
*/
static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_event *event)
{
+ struct file *file = epi->ffd.file;
int pwake = 0;
unsigned int revents;

@@ -1196,11 +1205,15 @@ static int ep_modify(struct eventpoll *e
epi->event.events = event->events;
epi->event.data = event->data; /* protected by mtx */

+ /* Notify the underlying driver of the change */
+ if (file->f_op->epoll_ctl)
+ file->f_op->epoll_ctl(file, EPOLL_CTL_MOD, event->events);
+
/*
* Get current event bits. We can safely use the file* here because
* its usage count has been increased by the caller of this function.
*/
- revents = epi->ffd.file->f_op->poll(epi->ffd.file, NULL);
+ revents = file->f_op->poll(file, NULL);

/*
* If the item is "hot" and it is not registered inside the ready
Index: linux/drivers/input/evdev.c
===================================================================
--- linux.orig/drivers/input/evdev.c
+++ linux/drivers/input/evdev.c
@@ -16,6 +16,7 @@
#define EVDEV_BUF_PACKETS 8

#include <linux/poll.h>
+#include <linux/eventpoll.h>
#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/module.h>
@@ -43,6 +44,7 @@ struct evdev_client {
unsigned int tail;
unsigned int packet_head; /* [future] position of the first element of next packet */
spinlock_t buffer_lock; /* protects access to buffer, head and tail */
+ struct wakeup_source *wakeup_source;
struct fasync_struct *fasync;
struct evdev *evdev;
struct list_head node;
@@ -75,10 +77,12 @@ static void evdev_pass_event(struct evde
client->buffer[client->tail].value = 0;

client->packet_head = client->tail;
+ __pm_relax(client->wakeup_source);
}

if (event->type == EV_SYN && event->code == SYN_REPORT) {
client->packet_head = client->head;
+ __pm_stay_awake(client->wakeup_source);
kill_fasync(&client->fasync, SIGIO, POLL_IN);
}

@@ -255,6 +259,8 @@ static int evdev_release(struct inode *i
mutex_unlock(&evdev->mutex);

evdev_detach_client(evdev, client);
+ wakeup_source_unregister(client->wakeup_source);
+
kfree(client);

evdev_close_device(evdev);
@@ -373,6 +379,8 @@ static int evdev_fetch_next_event(struct
if (have_event) {
*event = client->buffer[client->tail++];
client->tail &= client->bufsize - 1;
+ if (client->packet_head == client->tail)
+ __pm_relax(client->wakeup_source);
}

spin_unlock_irq(&client->buffer_lock);
@@ -433,6 +441,52 @@ static unsigned int evdev_poll(struct fi
return mask;
}

+static void evdev_client_attach_wakeup_source(struct evdev_client *client)
+{
+ struct wakeup_source *ws;
+
+ ws = wakeup_source_register(dev_name(&client->evdev->dev));
+ spin_lock_irq(&client->buffer_lock);
+ client->wakeup_source = ws;
+ if (client->packet_head != client->tail)
+ __pm_stay_awake(client->wakeup_source);
+ spin_unlock_irq(&client->buffer_lock);
+}
+
+static void evdev_client_detach_wakeup_source(struct evdev_client *client)
+{
+ struct wakeup_source *ws;
+
+ spin_lock_irq(&client->buffer_lock);
+ ws = client->wakeup_source;
+ client->wakeup_source = NULL;
+ spin_unlock_irq(&client->buffer_lock);
+ wakeup_source_unregister(ws);
+}
+
+static void evdev_epoll_ctl(struct file *file, int op,
+ unsigned int events)
+{
+ struct evdev_client *client = file->private_data;
+
+ switch (op) {
+ case EPOLL_CTL_ADD:
+ if ((events & EPOLLWAKEUP) && !client->wakeup_source)
+ evdev_client_attach_wakeup_source(client);
+ break;
+ case EPOLL_CTL_DEL:
+ if (events & EPOLLWAKEUP)
+ evdev_client_detach_wakeup_source(client);
+ break;
+ case EPOLL_CTL_MOD:
+ /* 'events' is the new events mask (after the change) */
+ if ((events & EPOLLWAKEUP) && !client->wakeup_source)
+ evdev_client_attach_wakeup_source(client);
+ else if (!(events & EPOLLWAKEUP))
+ evdev_client_detach_wakeup_source(client);
+ }
+}
+
#ifdef CONFIG_COMPAT

#define BITS_PER_LONG_COMPAT (sizeof(compat_long_t) * 8)
@@ -845,6 +899,7 @@ static const struct file_operations evde
.read = evdev_read,
.write = evdev_write,
.poll = evdev_poll,
+ .epoll_ctl = evdev_epoll_ctl,
.open = evdev_open,
.release = evdev_release,
.unlocked_ioctl = evdev_ioctl,
Index: linux/include/linux/eventpoll.h
===================================================================
--- linux.orig/include/linux/eventpoll.h
+++ linux/include/linux/eventpoll.h
@@ -26,6 +26,12 @@
#define EPOLL_CTL_DEL 2
#define EPOLL_CTL_MOD 3

+/*
+ * Request the handling of system wakeup events so as to prevent automatic
+ * system suspends from happening while those events are being processed.
+ */
+#define EPOLLWAKEUP (1 << 29)
+
/* Set the One Shot behaviour for the target file descriptor */
#define EPOLLONESHOT (1 << 30)

2012-02-28 00:56:23

by Matt Helsley

[permalink] [raw]
Subject: Re: [RFC][PATCH 4/7] Input / PM: Add ioctl to block suspend while event queue is not empty

On Sun, Feb 26, 2012 at 09:57:18PM +0100, Rafael J. Wysocki wrote:
> On Friday, February 24, 2012, Matt Helsley wrote:
> > On Wed, Feb 22, 2012 at 12:34:58AM +0100, Rafael J. Wysocki wrote:
> > > From: Arve Hjønnevåg <[email protected]>
> > >
> > > Add a new ioctl, EVIOCSWAKEUPSRC, to attach a wakeup source object to
> > > an evdev client event queue, such that it will be active whenever the
> > > queue is not empty. Then, all events in the queue will be regarded
> > > as wakeup events in progress and pm_get_wakeup_count() will block (or
> > > return false if woken up by a signal) until they are removed from the
> > > queue. In consequence, if the checking of wakeup events is enabled
> > > (e.g. throught the /sys/power/wakeup_count interface), the system
> > > won't be able to go into a sleep state until the queue is empty.
> > >
> > > This allows user space processes to handle situations in which they
> > > want to do a select() on an evdev descriptor, so they go to sleep
> > > until there are some events to read from the device's queue, and then
> > > they don't want the system to go into a sleep state until all the
> > > events are read (presumably for further processing). Of course, if
> > > they don't want the system to go into a sleep state _after_ all the
> > > events have been read from the queue, they have to use a separate
> > > mechanism that will prevent the system from doing that and it has
> > > to be activated before reading the first event (that also may be the
> > > last one).
> >
> > I haven't seen this idea mentioned before but I must admit I haven't
> > been following this thread too closely so apologies (and don't bother
> > rehashing) if it has:
> >
> > Could you just add this to epoll so that any fd userspace chooses would be
> > capable of doing this without introducing potentially ecclectic ioctl
> > interfaces?
> >
> > struct epoll_event ev;
> >
> > epfd = epoll_create1(EPOLL_STAY_AWAKE_SET);
> > ev.data.ptr = foo;
> > epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev);
> >
> > Which could be useful because you can put one epollfd in another's epoll
> > set. Or maybe as an EPOLLKEEPAWAKE flag in the event struct sort of like
> > EPOLLET:
> >
> > epfd = epoll_create1(0);
> > ev.events = EPOLLIN|EPOLLKEEPAWAKE;
> > epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev);
>
> Do you mean something like the patch below, or something different?

Yeah, this was sort of what I was thinking of. It nicely avoids the
ioctl() bits. I guess my only issue is the fop mimics the epoll
interface -- should it just be an fop to manage the file as a wakeup
source rather than a generic hook into epoll?

Cheers,
-Matt Helsley

>
> Rafael
>
> ---
> drivers/input/evdev.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++
> fs/eventpoll.c | 15 +++++++++++-
> include/linux/eventpoll.h | 6 +++++
> include/linux/fs.h | 1
> 4 files changed, 76 insertions(+), 1 deletion(-)
>
> Index: linux/include/linux/fs.h
> ===================================================================
> --- linux.orig/include/linux/fs.h
> +++ linux/include/linux/fs.h
> @@ -1604,6 +1604,7 @@ struct file_operations {
> ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
> int (*readdir) (struct file *, void *, filldir_t);
> unsigned int (*poll) (struct file *, struct poll_table_struct *);
> + void (*epoll_ctl) (struct file *, int, unsigned int);
> long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
> long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
> int (*mmap) (struct file *, struct vm_area_struct *);
> Index: linux/fs/eventpoll.c
> ===================================================================
> --- linux.orig/fs/eventpoll.c
> +++ linux/fs/eventpoll.c
> @@ -609,6 +609,10 @@ static int ep_remove(struct eventpoll *e
> unsigned long flags;
> struct file *file = epi->ffd.file;
>
> + /* Notify the underlying driver that the polling has completed */
> + if (file->f_op->epoll_ctl)
> + file->f_op->epoll_ctl(file, EPOLL_CTL_DEL, epi->event.events);
> +
> /*
> * Removes poll wait queue hooks. We _have_ to do this without holding
> * the "ep->lock" otherwise a deadlock might occur. This because of the
> @@ -1094,6 +1098,10 @@ static int ep_insert(struct eventpoll *e
> epq.epi = epi;
> init_poll_funcptr(&epq.pt, ep_ptable_queue_proc);
>
> + /* Notify the underlying driver that we want to poll it */
> + if (tfile->f_op->epoll_ctl)
> + tfile->f_op->epoll_ctl(tfile, EPOLL_CTL_ADD, event->events);
> +
> /*
> * Attach the item to the poll hooks and get current event bits.
> * We can safely use the file* here because its usage count has
> @@ -1185,6 +1193,7 @@ error_unregister:
> */
> static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_event *event)
> {
> + struct file *file = epi->ffd.file;
> int pwake = 0;
> unsigned int revents;
>
> @@ -1196,11 +1205,15 @@ static int ep_modify(struct eventpoll *e
> epi->event.events = event->events;
> epi->event.data = event->data; /* protected by mtx */
>
> + /* Notify the underlying driver of the change */
> + if (file->f_op->epoll_ctl)
> + file->f_op->epoll_ctl(file, EPOLL_CTL_MOD, event->events);
> +
> /*
> * Get current event bits. We can safely use the file* here because
> * its usage count has been increased by the caller of this function.
> */
> - revents = epi->ffd.file->f_op->poll(epi->ffd.file, NULL);
> + revents = file->f_op->poll(file, NULL);
>
> /*
> * If the item is "hot" and it is not registered inside the ready
> Index: linux/drivers/input/evdev.c
> ===================================================================
> --- linux.orig/drivers/input/evdev.c
> +++ linux/drivers/input/evdev.c
> @@ -16,6 +16,7 @@
> #define EVDEV_BUF_PACKETS 8
>
> #include <linux/poll.h>
> +#include <linux/eventpoll.h>
> #include <linux/sched.h>
> #include <linux/slab.h>
> #include <linux/module.h>
> @@ -43,6 +44,7 @@ struct evdev_client {
> unsigned int tail;
> unsigned int packet_head; /* [future] position of the first element of next packet */
> spinlock_t buffer_lock; /* protects access to buffer, head and tail */
> + struct wakeup_source *wakeup_source;
> struct fasync_struct *fasync;
> struct evdev *evdev;
> struct list_head node;
> @@ -75,10 +77,12 @@ static void evdev_pass_event(struct evde
> client->buffer[client->tail].value = 0;
>
> client->packet_head = client->tail;
> + __pm_relax(client->wakeup_source);
> }
>
> if (event->type == EV_SYN && event->code == SYN_REPORT) {
> client->packet_head = client->head;
> + __pm_stay_awake(client->wakeup_source);
> kill_fasync(&client->fasync, SIGIO, POLL_IN);
> }
>
> @@ -255,6 +259,8 @@ static int evdev_release(struct inode *i
> mutex_unlock(&evdev->mutex);
>
> evdev_detach_client(evdev, client);
> + wakeup_source_unregister(client->wakeup_source);
> +
> kfree(client);
>
> evdev_close_device(evdev);
> @@ -373,6 +379,8 @@ static int evdev_fetch_next_event(struct
> if (have_event) {
> *event = client->buffer[client->tail++];
> client->tail &= client->bufsize - 1;
> + if (client->packet_head == client->tail)
> + __pm_relax(client->wakeup_source);
> }
>
> spin_unlock_irq(&client->buffer_lock);
> @@ -433,6 +441,52 @@ static unsigned int evdev_poll(struct fi
> return mask;
> }
>
> +static void evdev_client_attach_wakeup_source(struct evdev_client *client)
> +{
> + struct wakeup_source *ws;
> +
> + ws = wakeup_source_register(dev_name(&client->evdev->dev));
> + spin_lock_irq(&client->buffer_lock);
> + client->wakeup_source = ws;
> + if (client->packet_head != client->tail)
> + __pm_stay_awake(client->wakeup_source);
> + spin_unlock_irq(&client->buffer_lock);
> +}
> +
> +static void evdev_client_detach_wakeup_source(struct evdev_client *client)
> +{
> + struct wakeup_source *ws;
> +
> + spin_lock_irq(&client->buffer_lock);
> + ws = client->wakeup_source;
> + client->wakeup_source = NULL;
> + spin_unlock_irq(&client->buffer_lock);
> + wakeup_source_unregister(ws);
> +}
> +
> +static void evdev_epoll_ctl(struct file *file, int op,
> + unsigned int events)
> +{
> + struct evdev_client *client = file->private_data;
> +
> + switch (op) {
> + case EPOLL_CTL_ADD:
> + if ((events & EPOLLWAKEUP) && !client->wakeup_source)
> + evdev_client_attach_wakeup_source(client);
> + break;
> + case EPOLL_CTL_DEL:
> + if (events & EPOLLWAKEUP)
> + evdev_client_detach_wakeup_source(client);
> + break;
> + case EPOLL_CTL_MOD:
> + /* 'events' is the new events mask (after the change) */
> + if ((events & EPOLLWAKEUP) && !client->wakeup_source)
> + evdev_client_attach_wakeup_source(client);
> + else if (!(events & EPOLLWAKEUP))
> + evdev_client_detach_wakeup_source(client);
> + }
> +}
> +
> #ifdef CONFIG_COMPAT
>
> #define BITS_PER_LONG_COMPAT (sizeof(compat_long_t) * 8)
> @@ -845,6 +899,7 @@ static const struct file_operations evde
> .read = evdev_read,
> .write = evdev_write,
> .poll = evdev_poll,
> + .epoll_ctl = evdev_epoll_ctl,
> .open = evdev_open,
> .release = evdev_release,
> .unlocked_ioctl = evdev_ioctl,
> Index: linux/include/linux/eventpoll.h
> ===================================================================
> --- linux.orig/include/linux/eventpoll.h
> +++ linux/include/linux/eventpoll.h
> @@ -26,6 +26,12 @@
> #define EPOLL_CTL_DEL 2
> #define EPOLL_CTL_MOD 3
>
> +/*
> + * Request the handling of system wakeup events so as to prevent automatic
> + * system suspends from happening while those events are being processed.
> + */
> +#define EPOLLWAKEUP (1 << 29)
> +
> /* Set the One Shot behaviour for the target file descriptor */
> #define EPOLLONESHOT (1 << 30)
>
>

2012-02-28 00:57:08

by Matt Helsley

[permalink] [raw]
Subject: Re: [RFC][PATCH 4/7] Input / PM: Add ioctl to block suspend while event queue is not empty

On Fri, Feb 24, 2012 at 08:25:30PM -0800, Arve Hjønnevåg wrote:
> On Thu, Feb 23, 2012 at 9:16 PM, Matt Helsley <[email protected]> wrote:
> > On Wed, Feb 22, 2012 at 12:34:58AM +0100, Rafael J. Wysocki wrote:
> >> From: Arve Hjønnevåg <[email protected]>
> >>
> >> Add a new ioctl, EVIOCSWAKEUPSRC, to attach a wakeup source object to
> >> an evdev client event queue, such that it will be active whenever the
> >> queue is not empty.  Then, all events in the queue will be regarded
> >> as wakeup events in progress and pm_get_wakeup_count() will block (or
> >> return false if woken up by a signal) until they are removed from the
> >> queue.  In consequence, if the checking of wakeup events is enabled
> >> (e.g. throught the /sys/power/wakeup_count interface), the system
> >> won't be able to go into a sleep state until the queue is empty.
> >>
> >> This allows user space processes to handle situations in which they
> >> want to do a select() on an evdev descriptor, so they go to sleep
> >> until there are some events to read from the device's queue, and then
> >> they don't want the system to go into a sleep state until all the
> >> events are read (presumably for further processing).  Of course, if
> >> they don't want the system to go into a sleep state _after_ all the
> >> events have been read from the queue, they have to use a separate
> >> mechanism that will prevent the system from doing that and it has
> >> to be activated before reading the first event (that also may be the
> >> last one).
> >
> > I haven't seen this idea mentioned before but I must admit I haven't
> > been following this thread too closely so apologies (and don't bother
> > rehashing) if it has:
> >
> > Could you just add this to epoll so that any fd userspace chooses would be
> > capable of doing this without introducing potentially ecclectic ioctl
> > interfaces?
> >
>
> This is an interesting idea, but I'm not sure how well it would work.
>
> I looked at the epoll code and it looks like it is possible to
> activate the wakeup-source from the wait queue function it uses. The
> epoll callback will happen without holding evdev client buffer_lock,
> so the wakeup-source and buffer state will not always be in sync (this
> may be OK, but require more thought). This callback is also called if
> no data was added to the queue we are polling on because another
> client has grabbed the input device (is this a bug or intended?).
>
> There is no call into the epoll code when input queue is emptied, so
> we can't deactivate the wakeup-source until epoll_wait is called
> again. This also should be workable, but result in different stats.
>
> It does not look like the normal poll and select interfaces can be
> extended the same way (since they remove themselves from the
> wait-queue before returning to user-space), so user-space has to be

Yup, that is exactly why epoll is so well suited to this.

> changed to use epoll even if select or poll would be a better fit.

Either way, modification of application code is necessary, right?

> I don't know how many other drivers this would work for. The input
> driver will wake up user-space from the same thread or interrupt
> handler that queued the event, but other drivers may defer this to
> another thread which makes an epoll wakeup-source insufficient.

I don't understand how this would be insufficient. So long as the
interrupt causes the wakeup source to prevent the machine from suspending
before finishing interrupt handling does it matter whether the event
handling itself is deferred?

In case there's some confusion: I'm not saying that this idea will solve
all of the problems, especially:

> >> Of course, if
> >> they don't want the system to go into a sleep state _after_ all the
> >> events have been read from the queue, they have to use a separate
> >> mechanism that will prevent the system from doing that and it has
> >> to be activated before reading the first event (that also may be
> >> the
> >> last one).

(endquote)

>
> ...
> >> +     snprintf(name, sizeof(name), "%s-%d",
> >> +              dev_name(&evdev->dev), task_tgid_vnr(current));
> >
> > This does not look like it will work well with tasks in different pid
> > namespaces. What should happen, I think, is the wakeup_source should hold a
> > reference to either the struct pid of current or current itself. Then
> > when someone reads the file you should get the pid vnr in the reader's
> > pid namespace. That way instead of a bogus pid vnr 0 would show up if
> > "current" here is not in the reader's pid namepsace.
> >
>
> The pid here is only used for debugging purposes, and used less than
> the dev_name. I don't think tracking pid namespaces is worth the
> trouble here, so if this is a real problem we can just drop the pid
> from the name for now.

I think dropping the pid would be the best choice. If it's absolutely
necessary in the output then it should be made to work with pid namespaces
because the interface will be maintained forever.

Cheers,
-Matt

2012-02-28 01:13:41

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 4/7] Input / PM: Add ioctl to block suspend while event queue is not empty

On Monday, February 27, 2012, Matt Helsley wrote:
> On Sun, Feb 26, 2012 at 09:57:18PM +0100, Rafael J. Wysocki wrote:
> > On Friday, February 24, 2012, Matt Helsley wrote:
> > > On Wed, Feb 22, 2012 at 12:34:58AM +0100, Rafael J. Wysocki wrote:
> > > > From: Arve Hjønnevåg <[email protected]>
> > > >
> > > > Add a new ioctl, EVIOCSWAKEUPSRC, to attach a wakeup source object to
> > > > an evdev client event queue, such that it will be active whenever the
> > > > queue is not empty. Then, all events in the queue will be regarded
> > > > as wakeup events in progress and pm_get_wakeup_count() will block (or
> > > > return false if woken up by a signal) until they are removed from the
> > > > queue. In consequence, if the checking of wakeup events is enabled
> > > > (e.g. throught the /sys/power/wakeup_count interface), the system
> > > > won't be able to go into a sleep state until the queue is empty.
> > > >
> > > > This allows user space processes to handle situations in which they
> > > > want to do a select() on an evdev descriptor, so they go to sleep
> > > > until there are some events to read from the device's queue, and then
> > > > they don't want the system to go into a sleep state until all the
> > > > events are read (presumably for further processing). Of course, if
> > > > they don't want the system to go into a sleep state _after_ all the
> > > > events have been read from the queue, they have to use a separate
> > > > mechanism that will prevent the system from doing that and it has
> > > > to be activated before reading the first event (that also may be the
> > > > last one).
> > >
> > > I haven't seen this idea mentioned before but I must admit I haven't
> > > been following this thread too closely so apologies (and don't bother
> > > rehashing) if it has:
> > >
> > > Could you just add this to epoll so that any fd userspace chooses would be
> > > capable of doing this without introducing potentially ecclectic ioctl
> > > interfaces?
> > >
> > > struct epoll_event ev;
> > >
> > > epfd = epoll_create1(EPOLL_STAY_AWAKE_SET);
> > > ev.data.ptr = foo;
> > > epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev);
> > >
> > > Which could be useful because you can put one epollfd in another's epoll
> > > set. Or maybe as an EPOLLKEEPAWAKE flag in the event struct sort of like
> > > EPOLLET:
> > >
> > > epfd = epoll_create1(0);
> > > ev.events = EPOLLIN|EPOLLKEEPAWAKE;
> > > epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev);
> >
> > Do you mean something like the patch below, or something different?
>
> Yeah, this was sort of what I was thinking of. It nicely avoids the
> ioctl() bits. I guess my only issue is the fop mimics the epoll
> interface -- should it just be an fop to manage the file as a wakeup
> source rather than a generic hook into epoll?

I'm not exactly sure what you mean, could you be a bit more specific, please?

Rafael

2012-02-28 05:59:01

by Arve Hjønnevåg

[permalink] [raw]
Subject: Re: [RFC][PATCH 4/7] Input / PM: Add ioctl to block suspend while event queue is not empty

On Sun, Feb 26, 2012 at 12:57 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Friday, February 24, 2012, Matt Helsley wrote:
>> On Wed, Feb 22, 2012 at 12:34:58AM +0100, Rafael J. Wysocki wrote:
>> > From: Arve Hj?nnev?g <[email protected]>
>> >
>> > Add a new ioctl, EVIOCSWAKEUPSRC, to attach a wakeup source object to
>> > an evdev client event queue, such that it will be active whenever the
>> > queue is not empty. ?Then, all events in the queue will be regarded
>> > as wakeup events in progress and pm_get_wakeup_count() will block (or
>> > return false if woken up by a signal) until they are removed from the
>> > queue. ?In consequence, if the checking of wakeup events is enabled
>> > (e.g. throught the /sys/power/wakeup_count interface), the system
>> > won't be able to go into a sleep state until the queue is empty.
>> >
>> > This allows user space processes to handle situations in which they
>> > want to do a select() on an evdev descriptor, so they go to sleep
>> > until there are some events to read from the device's queue, and then
>> > they don't want the system to go into a sleep state until all the
>> > events are read (presumably for further processing). ?Of course, if
>> > they don't want the system to go into a sleep state _after_ all the
>> > events have been read from the queue, they have to use a separate
>> > mechanism that will prevent the system from doing that and it has
>> > to be activated before reading the first event (that also may be the
>> > last one).
>>
>> I haven't seen this idea mentioned before but I must admit I haven't
>> been following this thread too closely so apologies (and don't bother
>> rehashing) if it has:
>>
>> Could you just add this to epoll so that any fd userspace chooses would be
>> capable of doing this without introducing potentially ecclectic ioctl
>> interfaces?
>>
>> struct epoll_event ev;
>>
>> epfd = epoll_create1(EPOLL_STAY_AWAKE_SET);
>> ev.data.ptr = foo;
>> epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev);
>>
>> Which could be useful because you can put one epollfd in another's epoll
>> set. Or maybe as an EPOLLKEEPAWAKE flag in the event struct sort of like
>> EPOLLET:
>>
>> epfd = epoll_create1(0);
>> ev.events = EPOLLIN|EPOLLKEEPAWAKE;
>> epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev);
>
> Do you mean something like the patch below, or something different?
>
> Rafael
>
> ---

I don't think it is useful to tie an evdev implementation to epoll
that way. You just replaced the ioctl with a new control function.

The code below tries to implement the same flag without modifying
evdev at all. The behavior of this is different as it will keep the
device awake until user-space calls epoll_wait again. I also used an
extra wakeup source to handle the function that runs without the
spin_lock held which means that non-wakeup files in same epoll list
could abort suspend.

--
Arve Hj?nnev?g

----
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index f9cfd16..45af494 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -33,6 +33,7 @@
#include <linux/bitops.h>
#include <linux/mutex.h>
#include <linux/anon_inodes.h>
+#include <linux/device.h>
#include <asm/uaccess.h>
#include <asm/system.h>
#include <asm/io.h>
@@ -79,7 +80,7 @@
*/

/* Epoll private bits inside the event mask */
-#define EP_PRIVATE_BITS (EPOLLONESHOT | EPOLLET)
+#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET)

/* Maximum number of nesting allowed inside epoll sets */
#define EP_MAX_NESTS 4
@@ -146,6 +147,9 @@ struct epitem {
/* List header used to link this item to the "struct file" items list */
struct list_head fllink;

+ /* wakeup_source used when EPOLLWAKEUP is set */
+ struct wakeup_source *ws;
+
/* The structure that describe the interested events and the source fd */
struct epoll_event event;
};
@@ -186,6 +190,9 @@ struct eventpoll {
*/
struct epitem *ovflist;

+ /* wakeup_source used when ep_scan_ready_list is running */
+ struct wakeup_source *ws;
+
/* The user that created the eventpoll descriptor */
struct user_struct *user;
};
@@ -492,6 +499,7 @@ static int ep_scan_ready_list(struct eventpoll *ep,
* in a lockless way.
*/
spin_lock_irqsave(&ep->lock, flags);
+ __pm_stay_awake(ep->ws);
list_splice_init(&ep->rdllist, &txlist);
ep->ovflist = NULL;
spin_unlock_irqrestore(&ep->lock, flags);
@@ -515,9 +523,12 @@ static int ep_scan_ready_list(struct eventpoll *ep,
* queued into ->ovflist but the "txlist" might already
* contain them, and the list_splice() below takes care of them.
*/
- if (!ep_is_linked(&epi->rdllink))
+ if (!ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);
+ }
}
+
/*
* We need to set back ep->ovflist to EP_UNACTIVE_PTR, so that after
* releasing the lock, events will be queued in the normal way inside
@@ -529,6 +540,7 @@ static int ep_scan_ready_list(struct eventpoll *ep,
* Quickly re-inject items left on "txlist".
*/
list_splice(&txlist, &ep->rdllist);
+ __pm_relax(ep->ws);

if (!list_empty(&ep->rdllist)) {
/*
@@ -583,6 +595,9 @@ static int ep_remove(struct eventpoll *ep, struct
epitem *epi)
list_del_init(&epi->rdllink);
spin_unlock_irqrestore(&ep->lock, flags);

+ if (epi->ws)
+ wakeup_source_unregister(epi->ws);
+
/* At this point it is safe to free the eventpoll item */
kmem_cache_free(epi_cache, epi);

@@ -633,6 +648,8 @@ static void ep_free(struct eventpoll *ep)
mutex_unlock(&epmutex);
mutex_destroy(&ep->mtx);
free_uid(ep->user);
+ if (ep->ws)
+ wakeup_source_unregister(ep->ws);
kfree(ep);
}

@@ -661,6 +678,7 @@ static int ep_read_events_proc(struct eventpoll
*ep, struct list_head *head,
* callback, but it's not actually ready, as far as
* caller requested events goes. We can remove it here.
*/
+ __pm_relax(epi->ws);
list_del_init(&epi->rdllink);
}
}
@@ -851,8 +869,10 @@ static int ep_poll_callback(wait_queue_t *wait,
unsigned mode, int sync, void *k
}

/* If this file is already in the ready list we exit soon */
- if (!ep_is_linked(&epi->rdllink))
+ if (!ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);
+ }

/*
* Wake up ( if active ) both the eventpoll wait list and the ->poll()
@@ -915,6 +935,30 @@ static void ep_rbtree_insert(struct eventpoll
*ep, struct epitem *epi)
rb_insert_color(&epi->rbn, &ep->rbr);
}

+static int ep_create_wakeup_source(struct epitem *epi)
+{
+ const char *name;
+
+ if (!epi->ep->ws) {
+ epi->ep->ws = wakeup_source_register("eventpoll");
+ if (!epi->ep->ws)
+ return -ENOMEM;
+ }
+
+ name = epi->ffd.file->f_path.dentry->d_name.name;
+ epi->ws = wakeup_source_register(name);
+ if (!epi->ws)
+ return -ENOMEM;
+
+ return 0;
+}
+
+static void ep_destroy_wakeup_source(struct epitem *epi)
+{
+ wakeup_source_unregister(epi->ws);
+ epi->ws = NULL;
+}
+
/*
* Must be called with "mtx" held.
*/
@@ -942,6 +986,13 @@ static int ep_insert(struct eventpoll *ep, struct
epoll_event *event,
epi->event = *event;
epi->nwait = 0;
epi->next = EP_UNACTIVE_PTR;
+ if (epi->event.events & EPOLLWAKEUP) {
+ error = ep_create_wakeup_source(epi);
+ if (error)
+ goto error_create_wakeup_source;
+ } else {
+ epi->ws = NULL;
+ }

/* Initialize the poll table using the queue callback */
epq.epi = epi;
@@ -982,6 +1033,7 @@ static int ep_insert(struct eventpoll *ep, struct
epoll_event *event,
/* If the file is already "ready" we drop it inside the ready list */
if ((revents & event->events) && !ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);

/* Notify waiting tasks that events are available */
if (waitqueue_active(&ep->wq))
@@ -1014,6 +1066,10 @@ error_unregister:
list_del_init(&epi->rdllink);
spin_unlock_irqrestore(&ep->lock, flags);

+ if (epi->ws)
+ wakeup_source_unregister(epi->ws);
+
+error_create_wakeup_source:
kmem_cache_free(epi_cache, epi);

return error;
@@ -1035,6 +1091,12 @@ static int ep_modify(struct eventpoll *ep,
struct epitem *epi, struct epoll_even
*/
epi->event.events = event->events;
epi->event.data = event->data; /* protected by mtx */
+ if (epi->event.events & EPOLLWAKEUP) {
+ if (!epi->ws)
+ ep_create_wakeup_source(epi);
+ } else if (epi->ws) {
+ ep_destroy_wakeup_source(epi);
+ }

/*
* Get current event bits. We can safely use the file* here because
@@ -1050,6 +1112,7 @@ static int ep_modify(struct eventpoll *ep,
struct epitem *epi, struct epoll_even
spin_lock_irq(&ep->lock);
if (!ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);

/* Notify waiting tasks that events are available */
if (waitqueue_active(&ep->wq))
@@ -1085,6 +1148,7 @@ static int ep_send_events_proc(struct eventpoll
*ep, struct list_head *head,
!list_empty(head) && eventcnt < esed->maxevents;) {
epi = list_first_entry(head, struct epitem, rdllink);

+ __pm_relax(epi->ws);
list_del_init(&epi->rdllink);

revents = epi->ffd.file->f_op->poll(epi->ffd.file, NULL) &
@@ -1100,6 +1164,7 @@ static int ep_send_events_proc(struct eventpoll
*ep, struct list_head *head,
if (__put_user(revents, &uevent->events) ||
__put_user(epi->event.data, &uevent->data)) {
list_add(&epi->rdllink, head);
+ __pm_stay_awake(epi->ws);
return eventcnt ? eventcnt : -EFAULT;
}
eventcnt++;
@@ -1119,6 +1184,7 @@ static int ep_send_events_proc(struct eventpoll
*ep, struct list_head *head,
* poll callback will queue them in ep->ovflist.
*/
list_add_tail(&epi->rdllink, &ep->rdllist);
+ __pm_stay_awake(epi->ws);
}
}
}
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index f362733..cd156ff 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -26,6 +26,12 @@
#define EPOLL_CTL_DEL 2
#define EPOLL_CTL_MOD 3

+/*
+ * Request the handling of system wakeup events so as to prevent automatic
+ * system suspends from happening while those events are being processed.
+ */
+#define EPOLLWAKEUP (1 << 29)
+
/* Set the One Shot behaviour for the target file descriptor */
#define EPOLLONESHOT (1 << 30)

2012-02-28 10:24:18

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/7] PM: Implement autosleep and "wake locks", take2

On 02/26/2012 02:31 AM, Rafael J. Wysocki wrote:

>
> I think we can do something like in the updated patch [5/7] below.
>
> It uses a special wakeup source object called "autosleep" to bump up the
> number of wakeup events in progress before acquiring autosleep_lock in
> pm_autosleep_set_state(). This way, either pm_autosleep_set_state() will
> acquire autosleep_lock before try_to_suspend(), in which case the latter
> will see the change of autosleep_state immediately (after autosleep_lock has
> been passed to it), or try_to_suspend() will get it first, but then
> pm_save_wakeup_count() or pm_suspend()/hibernate() will see the nonzero counter
> of wakeup events in progress and return error code (sooner or later).
>
> The drawback is that writes to /sys/power/autosleep may interfere with
> the /sys/power/wakeup_count + /sys/power/state interface by interrupting
> transitions started by writing to /sys/power/state, for example (although
> I think that's highly unlikely).


Yes, but I think we can live with that.. It doesn't look like a big issue.

>
> Additionally, I made pm_autosleep_lock() use mutex_trylock_interruptible()


You have used mutex_lock_interruptible() in the code below.. It wouldn't matter
as long as you have used some form of "interruptible" but I think
mutex_trylock_interruptible would be even better..

> to prevent operations on /sys/power/wakeup_count and/or /sys/power/state
> from failing the freezing of tasks started by try_to_suspend().
>
> Thanks,
> Rafael
>


The approach taken by the patch below looks good to me. I don't see any obvious
problems, except for the minor ones listed below.

> ---
> From: Rafael J. Wysocki <[email protected]>
> Subject: PM / Sleep: Implement opportunistic sleep
>
> Introduce a mechanism by which the kernel can trigger global
> transitions to a sleep state chosen by user space if there are no
> active wakeup sources.
>
> It consists of a new sysfs attribute, /sys/power/autosleep, that
> can be written one of the strings returned by reads from
> /sys/power/state, an ordered workqueue and a work item carrying out
> the "suspend" operations. If a string representing the system's
> sleep state is written to /sys/power/autosleep, the work item
> triggering transitions to that state is queued up and it requeues
> itself after every execution until user space writes "off" to
> /sys/power/autosleep.
>
> That work item enables the detection of wakeup events using the
> functions already defined in drivers/base/power/wakeup.c (with one
> small modification) and calls either pm_suspend(), or hibernate() to
> put the system into a sleep state. If a wakeup event is reported
> while the transition is in progress, it will abort the transition and
> the "system suspend" work item will be queued up again.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> Index: linux/kernel/power/main.c
> ===================================================================
> --- linux.orig/kernel/power/main.c
> +++ linux/kernel/power/main.c
> @@ -269,8 +269,7 @@ static ssize_t state_show(struct kobject
> return (s - buf);
> }
>
> -static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
> - const char *buf, size_t n)
> +static suspend_state_t decode_state(const char *buf, size_t n)
> {
> #ifdef CONFIG_SUSPEND
> suspend_state_t state = PM_SUSPEND_STANDBY;
> @@ -278,27 +277,48 @@ static ssize_t state_store(struct kobjec
> #endif
> char *p;
> int len;
> - int error = -EINVAL;
>
> p = memchr(buf, '\n', n);
> len = p ? p - buf : n;
>
> - /* First, check if we are requested to hibernate */
> - if (len == 4 && !strncmp(buf, "disk", len)) {
> - error = hibernate();
> - goto Exit;
> - }
> + /* Check hibernation first. */
> + if (len == 4 && !strncmp(buf, "disk", len))
> + return PM_SUSPEND_MAX;
>
> #ifdef CONFIG_SUSPEND
> - for (s = &pm_states[state]; state < PM_SUSPEND_MAX; s++, state++) {
> - if (*s && len == strlen(*s) && !strncmp(buf, *s, len)) {
> - error = pm_suspend(state);
> - break;
> - }
> - }
> + for (s = &pm_states[state]; state < PM_SUSPEND_MAX; s++, state++)
> + if (*s && len == strlen(*s) && !strncmp(buf, *s, len))
> + return state;
> #endif
>
> - Exit:
> + return PM_SUSPEND_ON;
> +}
> +
> +static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
> + const char *buf, size_t n)
> +{
> + suspend_state_t state;
> + int error;
> +
> + error = pm_autosleep_lock();
> + if (error)
> + return error;
> +
> + if (pm_autosleep_state() > PM_SUSPEND_ON) {
> + error = -EBUSY;
> + goto out;
> + }
> +
> + state = decode_state(buf, n);
> + if (state < PM_SUSPEND_MAX)
> + error = pm_suspend(state);
> + else if (state > PM_SUSPEND_ON)
> + error = hibernate();
> + else
> + error = -EINVAL;


By the way, the condition checks in the above if-else block look kinda
odd, considering what is done in other similar places, which are more
readable. It would be great if you could make them consistent.

> +
> + out:
> + pm_autosleep_unlock();
> return error ? error : n;
> }
>
> @@ -339,7 +359,8 @@ static ssize_t wakeup_count_show(struct
> {
> unsigned int val;
>
> - return pm_get_wakeup_count(&val) ? sprintf(buf, "%u\n", val) : -EINTR;
> + return pm_get_wakeup_count(&val, true) ?
> + sprintf(buf, "%u\n", val) : -EINTR;
> }
>
> +
> +static ssize_t autosleep_store(struct kobject *kobj,
> + struct kobj_attribute *attr,
> + const char *buf, size_t n)
> +{
> + suspend_state_t state = decode_state(buf, n);
> + int error;
> +
> + if (state == PM_SUSPEND_ON && strncmp(buf, "off", 3)
> + && strncmp(buf, "off\n", 4))
> + return -EINVAL;
> +


I am pretty sure you meant "if autosleep is already off, and the user
wrote "off" to /sys/power/autosleep, then return -EINVAL"

But strncmp() returns 0 if the strings match, and hence the code above
doesn't seem to do what you intended.

> + error = pm_autosleep_set_state(state);
> + return error ? error : n;
> +}
> +
> +power_attr(autosleep);
> +#endif /* CONFIG_PM_AUTOSLEEP */
> #endif /* CONFIG_PM_SLEEP */
>
> #ifdef CONFIG_PM_TRACE


Regards,
Srivatsa S. Bhat

2012-05-03 00:23:21

by Arve Hjønnevåg

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/8] PM / Sleep: Implement opportunistic sleep

On Thu, Apr 26, 2012 at 2:52 PM, Rafael J. Wysocki <[email protected]> wrote:
...
> From: Rafael J. Wysocki <[email protected]>
> Subject: PM / Sleep: Implement opportunistic sleep, v2
>
> Introduce a mechanism by which the kernel can trigger global
> transitions to a sleep state chosen by user space if there are no
> active wakeup sources.
>
> It consists of a new sysfs attribute, /sys/power/autosleep, that
> can be written one of the strings returned by reads from
> /sys/power/state, an ordered workqueue and a work item carrying out
> the "suspend" operations. ?If a string representing the system's
> sleep state is written to /sys/power/autosleep, the work item
> triggering transitions to that state is queued up and it requeues
> itself after every execution until user space writes "off" to
> /sys/power/autosleep.
>

This does not work. Writing something other than "off" disabled auto
suspend for me.

...
> +static ssize_t autosleep_store(struct kobject *kobj,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?struct kobj_attribute *attr,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?const char *buf, size_t n)
> +{
> + ? ? ? suspend_state_t state = decode_state(buf, n);
> + ? ? ? int error;
> +
> + ? ? ? if (state == PM_SUSPEND_ON
> + ? ? ? ? ? && !(strncmp(buf, "off", 3) && strncmp(buf, "off\n", 4)))
> + ? ? ? ? ? ? ? return -EINVAL;

Did you mean:
if (state == PM_SUSPEND_ON
&& strcmp(buf, "off") && strcmp(buf, "off\n"))
return -EINVAL;

--
Arve Hj?nnev?g

2012-05-03 13:23:27

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/8] PM / Sleep: Implement opportunistic sleep

On Thursday, May 03, 2012, Arve Hj?nnev?g wrote:
> On Thu, Apr 26, 2012 at 2:52 PM, Rafael J. Wysocki <[email protected]> wrote:
> ...
> > From: Rafael J. Wysocki <[email protected]>
> > Subject: PM / Sleep: Implement opportunistic sleep, v2
> >
> > Introduce a mechanism by which the kernel can trigger global
> > transitions to a sleep state chosen by user space if there are no
> > active wakeup sources.
> >
> > It consists of a new sysfs attribute, /sys/power/autosleep, that
> > can be written one of the strings returned by reads from
> > /sys/power/state, an ordered workqueue and a work item carrying out
> > the "suspend" operations. If a string representing the system's
> > sleep state is written to /sys/power/autosleep, the work item
> > triggering transitions to that state is queued up and it requeues
> > itself after every execution until user space writes "off" to
> > /sys/power/autosleep.
> >
>
> This does not work. Writing something other than "off" disabled auto
> suspend for me.

My bad, sorry about that.

> ...
> > +static ssize_t autosleep_store(struct kobject *kobj,
> > + struct kobj_attribute *attr,
> > + const char *buf, size_t n)
> > +{
> > + suspend_state_t state = decode_state(buf, n);
> > + int error;
> > +
> > + if (state == PM_SUSPEND_ON
> > + && !(strncmp(buf, "off", 3) && strncmp(buf, "off\n", 4)))
> > + return -EINVAL;
>
> Did you mean:
> if (state == PM_SUSPEND_ON
> && strcmp(buf, "off") && strcmp(buf, "off\n"))
> return -EINVAL;


Yes, I did.

I'll add the following as an incremental patch on top of the series.

Thanks,
Rafael

---
kernel/power/main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/kernel/power/main.c
===================================================================
--- linux.orig/kernel/power/main.c
+++ linux/kernel/power/main.c
@@ -422,7 +422,7 @@ static ssize_t autosleep_store(struct ko
int error;

if (state == PM_SUSPEND_ON
- && !(strncmp(buf, "off", 3) && strncmp(buf, "off\n", 4)))
+ && strncmp(buf, "off", 3) && strncmp(buf, "off\n", 4))
return -EINVAL;

error = pm_autosleep_set_state(state);

2012-05-03 21:27:21

by Arve Hjønnevåg

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/8] PM / Sleep: Implement opportunistic sleep

On Thu, May 3, 2012 at 6:28 AM, Rafael J. Wysocki <[email protected]> wrote:
> On Thursday, May 03, 2012, Arve Hj?nnev?g wrote:
>> On Thu, Apr 26, 2012 at 2:52 PM, Rafael J. Wysocki <[email protected]> wrote:
>> ...
>> > From: Rafael J. Wysocki <[email protected]>
>> > Subject: PM / Sleep: Implement opportunistic sleep, v2
>> >
>> > Introduce a mechanism by which the kernel can trigger global
>> > transitions to a sleep state chosen by user space if there are no
>> > active wakeup sources.
>> >
>> > It consists of a new sysfs attribute, /sys/power/autosleep, that
>> > can be written one of the strings returned by reads from
>> > /sys/power/state, an ordered workqueue and a work item carrying out
>> > the "suspend" operations. ?If a string representing the system's
>> > sleep state is written to /sys/power/autosleep, the work item
>> > triggering transitions to that state is queued up and it requeues
>> > itself after every execution until user space writes "off" to
>> > /sys/power/autosleep.
>> >
>>
>> This does not work. Writing something other than "off" disabled auto
>> suspend for me.
>
> My bad, sorry about that.
>
>> ...
>> > +static ssize_t autosleep_store(struct kobject *kobj,
>> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?struct kobj_attribute *attr,
>> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?const char *buf, size_t n)
>> > +{
>> > + ? ? ? suspend_state_t state = decode_state(buf, n);
>> > + ? ? ? int error;
>> > +
>> > + ? ? ? if (state == PM_SUSPEND_ON
>> > + ? ? ? ? ? && !(strncmp(buf, "off", 3) && strncmp(buf, "off\n", 4)))
>> > + ? ? ? ? ? ? ? return -EINVAL;
>>
>> Did you mean:
>> ? ? ? if (state == PM_SUSPEND_ON
>> ? ? ? ? ? && strcmp(buf, "off") && strcmp(buf, "off\n"))
>> ? ? ? ? ? ? ? return -EINVAL;
>
>
> Yes, I did.
>
> I'll add the following as an incremental patch on top of the series.
>
> Thanks,
> Rafael
>
> ---
> ?kernel/power/main.c | ? ?2 +-
> ?1 file changed, 1 insertion(+), 1 deletion(-)
>
> Index: linux/kernel/power/main.c
> ===================================================================
> --- linux.orig/kernel/power/main.c
> +++ linux/kernel/power/main.c
> @@ -422,7 +422,7 @@ static ssize_t autosleep_store(struct ko
> ? ? ? ?int error;
>
> ? ? ? ?if (state == PM_SUSPEND_ON
> - ? ? ? ? ? && !(strncmp(buf, "off", 3) && strncmp(buf, "off\n", 4)))
> + ? ? ? ? ? && strncmp(buf, "off", 3) && strncmp(buf, "off\n", 4))
> ? ? ? ? ? ? ? ?return -EINVAL;
>
> ? ? ? ?error = pm_autosleep_set_state(state);

You still use strncmp here, so anything that starts with "off" is
allowed (and the second strncmp is redundant).

--
Arve Hj?nnev?g

2012-05-03 22:15:37

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/8] PM / Sleep: Implement opportunistic sleep

On Thursday, May 03, 2012, Arve Hj?nnev?g wrote:
> On Thu, May 3, 2012 at 6:28 AM, Rafael J. Wysocki <[email protected]> wrote:
> > On Thursday, May 03, 2012, Arve Hj?nnev?g wrote:
> >> On Thu, Apr 26, 2012 at 2:52 PM, Rafael J. Wysocki <[email protected]> wrote:
> >> ...
> >> > From: Rafael J. Wysocki <[email protected]>
> >> > Subject: PM / Sleep: Implement opportunistic sleep, v2
> >> >
> >> > Introduce a mechanism by which the kernel can trigger global
> >> > transitions to a sleep state chosen by user space if there are no
> >> > active wakeup sources.
> >> >
> >> > It consists of a new sysfs attribute, /sys/power/autosleep, that
> >> > can be written one of the strings returned by reads from
> >> > /sys/power/state, an ordered workqueue and a work item carrying out
> >> > the "suspend" operations. If a string representing the system's
> >> > sleep state is written to /sys/power/autosleep, the work item
> >> > triggering transitions to that state is queued up and it requeues
> >> > itself after every execution until user space writes "off" to
> >> > /sys/power/autosleep.
> >> >
> >>
> >> This does not work. Writing something other than "off" disabled auto
> >> suspend for me.
> >
> > My bad, sorry about that.
> >
> >> ...
> >> > +static ssize_t autosleep_store(struct kobject *kobj,
> >> > + struct kobj_attribute *attr,
> >> > + const char *buf, size_t n)
> >> > +{
> >> > + suspend_state_t state = decode_state(buf, n);
> >> > + int error;
> >> > +
> >> > + if (state == PM_SUSPEND_ON
> >> > + && !(strncmp(buf, "off", 3) && strncmp(buf, "off\n", 4)))
> >> > + return -EINVAL;
> >>
> >> Did you mean:
> >> if (state == PM_SUSPEND_ON
> >> && strcmp(buf, "off") && strcmp(buf, "off\n"))
> >> return -EINVAL;
> >
> >
> > Yes, I did.
> >
> > I'll add the following as an incremental patch on top of the series.
> >
> > Thanks,
> > Rafael
> >
> > ---
> > kernel/power/main.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > Index: linux/kernel/power/main.c
> > ===================================================================
> > --- linux.orig/kernel/power/main.c
> > +++ linux/kernel/power/main.c
> > @@ -422,7 +422,7 @@ static ssize_t autosleep_store(struct ko
> > int error;
> >
> > if (state == PM_SUSPEND_ON
> > - && !(strncmp(buf, "off", 3) && strncmp(buf, "off\n", 4)))
> > + && strncmp(buf, "off", 3) && strncmp(buf, "off\n", 4))
> > return -EINVAL;
> >
> > error = pm_autosleep_set_state(state);
>
> You still use strncmp here, so anything that starts with "off" is
> allowed (and the second strncmp is redundant).

Good point. So I'm going to add the patch below after all.
OK to add your sign-off to it?

Rafael


---
kernel/power/main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/kernel/power/main.c
===================================================================
--- linux.orig/kernel/power/main.c
+++ linux/kernel/power/main.c
@@ -422,7 +422,7 @@ static ssize_t autosleep_store(struct ko
int error;

if (state == PM_SUSPEND_ON
- && !(strncmp(buf, "off", 3) && strncmp(buf, "off\n", 4)))
+ && strcmp(buf, "off") && strcmp(buf, "off\n"))
return -EINVAL;

error = pm_autosleep_set_state(state);

2012-05-03 22:16:57

by Arve Hjønnevåg

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/8] PM / Sleep: Implement opportunistic sleep

On Thu, May 3, 2012 at 3:20 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Thursday, May 03, 2012, Arve Hj?nnev?g wrote:
>> On Thu, May 3, 2012 at 6:28 AM, Rafael J. Wysocki <[email protected]> wrote:
>> > On Thursday, May 03, 2012, Arve Hj?nnev?g wrote:
>> >> On Thu, Apr 26, 2012 at 2:52 PM, Rafael J. Wysocki <[email protected]> wrote:
>> >> ...
>> >> > From: Rafael J. Wysocki <[email protected]>
>> >> > Subject: PM / Sleep: Implement opportunistic sleep, v2
>> >> >
>> >> > Introduce a mechanism by which the kernel can trigger global
>> >> > transitions to a sleep state chosen by user space if there are no
>> >> > active wakeup sources.
>> >> >
>> >> > It consists of a new sysfs attribute, /sys/power/autosleep, that
>> >> > can be written one of the strings returned by reads from
>> >> > /sys/power/state, an ordered workqueue and a work item carrying out
>> >> > the "suspend" operations. ?If a string representing the system's
>> >> > sleep state is written to /sys/power/autosleep, the work item
>> >> > triggering transitions to that state is queued up and it requeues
>> >> > itself after every execution until user space writes "off" to
>> >> > /sys/power/autosleep.
>> >> >
>> >>
>> >> This does not work. Writing something other than "off" disabled auto
>> >> suspend for me.
>> >
>> > My bad, sorry about that.
>> >
>> >> ...
>> >> > +static ssize_t autosleep_store(struct kobject *kobj,
>> >> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?struct kobj_attribute *attr,
>> >> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?const char *buf, size_t n)
>> >> > +{
>> >> > + ? ? ? suspend_state_t state = decode_state(buf, n);
>> >> > + ? ? ? int error;
>> >> > +
>> >> > + ? ? ? if (state == PM_SUSPEND_ON
>> >> > + ? ? ? ? ? && !(strncmp(buf, "off", 3) && strncmp(buf, "off\n", 4)))
>> >> > + ? ? ? ? ? ? ? return -EINVAL;
>> >>
>> >> Did you mean:
>> >> ? ? ? if (state == PM_SUSPEND_ON
>> >> ? ? ? ? ? && strcmp(buf, "off") && strcmp(buf, "off\n"))
>> >> ? ? ? ? ? ? ? return -EINVAL;
>> >
>> >
>> > Yes, I did.
>> >
>> > I'll add the following as an incremental patch on top of the series.
>> >
>> > Thanks,
>> > Rafael
>> >
>> > ---
>> > ?kernel/power/main.c | ? ?2 +-
>> > ?1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > Index: linux/kernel/power/main.c
>> > ===================================================================
>> > --- linux.orig/kernel/power/main.c
>> > +++ linux/kernel/power/main.c
>> > @@ -422,7 +422,7 @@ static ssize_t autosleep_store(struct ko
>> > ? ? ? ?int error;
>> >
>> > ? ? ? ?if (state == PM_SUSPEND_ON
>> > - ? ? ? ? ? && !(strncmp(buf, "off", 3) && strncmp(buf, "off\n", 4)))
>> > + ? ? ? ? ? && strncmp(buf, "off", 3) && strncmp(buf, "off\n", 4))
>> > ? ? ? ? ? ? ? ?return -EINVAL;
>> >
>> > ? ? ? ?error = pm_autosleep_set_state(state);
>>
>> You still use strncmp here, so anything that starts with "off" is
>> allowed (and the second strncmp is redundant).
>
> Good point. ?So I'm going to add the patch below after all.
> OK to add your sign-off to it?
>

Yes.

--
Arve Hj?nnev?g

2012-05-03 22:19:51

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/8] PM / Sleep: Implement opportunistic sleep

On Friday, May 04, 2012, Arve Hj?nnev?g wrote:
> On Thu, May 3, 2012 at 3:20 PM, Rafael J. Wysocki <[email protected]> wrote:
> > On Thursday, May 03, 2012, Arve Hj?nnev?g wrote:
> >> On Thu, May 3, 2012 at 6:28 AM, Rafael J. Wysocki <[email protected]> wrote:
> >> > On Thursday, May 03, 2012, Arve Hj?nnev?g wrote:
> >> >> On Thu, Apr 26, 2012 at 2:52 PM, Rafael J. Wysocki <[email protected]> wrote:
> >> >> ...
> >> >> > From: Rafael J. Wysocki <[email protected]>
> >> >> > Subject: PM / Sleep: Implement opportunistic sleep, v2
> >> >> >
> >> >> > Introduce a mechanism by which the kernel can trigger global
> >> >> > transitions to a sleep state chosen by user space if there are no
> >> >> > active wakeup sources.
> >> >> >
> >> >> > It consists of a new sysfs attribute, /sys/power/autosleep, that
> >> >> > can be written one of the strings returned by reads from
> >> >> > /sys/power/state, an ordered workqueue and a work item carrying out
> >> >> > the "suspend" operations. If a string representing the system's
> >> >> > sleep state is written to /sys/power/autosleep, the work item
> >> >> > triggering transitions to that state is queued up and it requeues
> >> >> > itself after every execution until user space writes "off" to
> >> >> > /sys/power/autosleep.
> >> >> >
> >> >>
> >> >> This does not work. Writing something other than "off" disabled auto
> >> >> suspend for me.
> >> >
> >> > My bad, sorry about that.
> >> >
> >> >> ...
> >> >> > +static ssize_t autosleep_store(struct kobject *kobj,
> >> >> > + struct kobj_attribute *attr,
> >> >> > + const char *buf, size_t n)
> >> >> > +{
> >> >> > + suspend_state_t state = decode_state(buf, n);
> >> >> > + int error;
> >> >> > +
> >> >> > + if (state == PM_SUSPEND_ON
> >> >> > + && !(strncmp(buf, "off", 3) && strncmp(buf, "off\n", 4)))
> >> >> > + return -EINVAL;
> >> >>
> >> >> Did you mean:
> >> >> if (state == PM_SUSPEND_ON
> >> >> && strcmp(buf, "off") && strcmp(buf, "off\n"))
> >> >> return -EINVAL;
> >> >
> >> >
> >> > Yes, I did.
> >> >
> >> > I'll add the following as an incremental patch on top of the series.
> >> >
> >> > Thanks,
> >> > Rafael
> >> >
> >> > ---
> >> > kernel/power/main.c | 2 +-
> >> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >> >
> >> > Index: linux/kernel/power/main.c
> >> > ===================================================================
> >> > --- linux.orig/kernel/power/main.c
> >> > +++ linux/kernel/power/main.c
> >> > @@ -422,7 +422,7 @@ static ssize_t autosleep_store(struct ko
> >> > int error;
> >> >
> >> > if (state == PM_SUSPEND_ON
> >> > - && !(strncmp(buf, "off", 3) && strncmp(buf, "off\n", 4)))
> >> > + && strncmp(buf, "off", 3) && strncmp(buf, "off\n", 4))
> >> > return -EINVAL;
> >> >
> >> > error = pm_autosleep_set_state(state);
> >>
> >> You still use strncmp here, so anything that starts with "off" is
> >> allowed (and the second strncmp is redundant).
> >
> > Good point. So I'm going to add the patch below after all.
> > OK to add your sign-off to it?
> >
>
> Yes.

OK, done.