2013-04-30 22:29:39

by Zoran Markovic

[permalink] [raw]
Subject: [RFC PATCH] drivers: power: Add watchdog timer to catch drivers which lockup during suspend.

From: Benoit Goby <[email protected]>

Below is a patch from android kernel that detects a driver suspend
lockup and captures dump in the kernel log. Please review and provide
comments.

Rather than hard-lock the kernel, dump the suspend thread stack and
BUG() when a driver takes too long to suspend. The timeout is set to
12 seconds to be longer than the usbhid 10 second timeout.

Exclude from the watchdog the time spent waiting for children that
are resumed asynchronously and time every device, whether or not they
resumed synchronously.

Cc: Android Kernel Team <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: Todd Poynor <[email protected]>
Cc: San Mehat <[email protected]>
Cc: Benoit Goby <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Original-author: San Mehat <[email protected]>
Signed-off-by: Benoit Goby <[email protected]>
[[email protected]: Changed printk(KERN_EMERG,...) to pr_emerg(...),
tweaked commit message.]
Signed-off-by: Zoran Markovic <[email protected]>
---
drivers/base/power/main.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 15beb50..eb70c0e 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -29,6 +29,8 @@
#include <linux/async.h>
#include <linux/suspend.h>
#include <linux/cpuidle.h>
+#include <linux/timer.h>
+
#include "../base.h"
#include "power.h"

@@ -54,6 +56,12 @@ struct suspend_stats suspend_stats;
static DEFINE_MUTEX(dpm_list_mtx);
static pm_message_t pm_transition;

+static void dpm_drv_timeout(unsigned long data);
+struct dpm_drv_wd_data {
+ struct device *dev;
+ struct task_struct *tsk;
+};
+
static int async_error;

/**
@@ -663,6 +671,30 @@ static bool is_async(struct device *dev)
}

/**
+ * dpm_drv_timeout - Driver suspend / resume watchdog handler
+ * @data: struct device which timed out
+ *
+ * Called when a driver has timed out suspending or resuming.
+ * There's not much we can do here to recover so
+ * BUG() out for a crash-dump
+ *
+ */
+static void dpm_drv_timeout(unsigned long data)
+{
+ struct dpm_drv_wd_data *wd_data = (void *)data;
+ struct device *dev = wd_data->dev;
+ struct task_struct *tsk = wd_data->tsk;
+
+ pr_emerg("**** DPM device timeout: %s (%s)\n", dev_name(dev),
+ (dev->driver ? dev->driver->name : "no driver"));
+
+ pr_emerg("dpm suspend stack:\n");
+ show_stack(tsk, NULL);
+
+ BUG();
+}
+
+/**
* dpm_resume - Execute "resume" callbacks for non-sysdev devices.
* @state: PM transition of the system being carried out.
*
@@ -1053,6 +1085,8 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async)
pm_callback_t callback = NULL;
char *info = NULL;
int error = 0;
+ struct timer_list timer;
+ struct dpm_drv_wd_data data;

dpm_wait_for_children(dev, async);

@@ -1076,6 +1110,14 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async)
if (dev->power.syscore)
goto Complete;

+ data.dev = dev;
+ data.tsk = get_current();
+ init_timer_on_stack(&timer);
+ timer.expires = jiffies + HZ * 12;
+ timer.function = dpm_drv_timeout;
+ timer.data = (unsigned long)&data;
+ add_timer(&timer);
+
device_lock(dev);

if (dev->pm_domain) {
@@ -1131,6 +1173,9 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async)

device_unlock(dev);

+ del_timer_sync(&timer);
+ destroy_timer_on_stack(&timer);
+
Complete:
complete_all(&dev->power.completion);
if (error)
--
1.7.9.5


2013-04-30 22:29:57

by Zoran Markovic

[permalink] [raw]
Subject: [RFC PATCH] power: Add option to log time spent in suspend

From: Colin Cross <[email protected]>

Below is a patch from android kernel that maintains a histogram of
suspend times. Please review and provide feedback.

Prints the time spent in suspend in the kernel log, and keeps statistics
on the time spent in suspend in /sys/kernel/debug/suspend_time

Cc: Android Kernel Team <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: Todd Poynor <[email protected]>
Cc: San Mehat <[email protected]>
Cc: Benoit Goby <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Len Brown <[email protected]>
Signed-off-by: Colin Cross <[email protected]>
Signed-off-by: Todd Poynor <[email protected]>
[[email protected]: Re-formatted suspend time table to better fit
expected values, tweaked commit message]
Signed-off-by: Zoran Markovic <[email protected]>
---
kernel/power/Kconfig | 7 +++
kernel/power/Makefile | 1 +
kernel/power/suspend_time.c | 111 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 119 insertions(+)
create mode 100644 kernel/power/suspend_time.c

diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
index 5dfdc9e..edc8bdd 100644
--- a/kernel/power/Kconfig
+++ b/kernel/power/Kconfig
@@ -274,3 +274,10 @@ config PM_GENERIC_DOMAINS_RUNTIME
config CPU_PM
bool
depends on SUSPEND || CPU_IDLE
+
+config SUSPEND_TIME
+ bool "Log time spent in suspend"
+ ---help---
+ Prints the time spent in suspend in the kernel log, and
+ keeps statistics on the time spent in suspend in
+ /sys/kernel/debug/suspend_time
diff --git a/kernel/power/Makefile b/kernel/power/Makefile
index 29472bf..578e20e 100644
--- a/kernel/power/Makefile
+++ b/kernel/power/Makefile
@@ -11,5 +11,6 @@ obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o \
block_io.o
obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o
obj-$(CONFIG_PM_WAKELOCKS) += wakelock.o
+obj-$(CONFIG_SUSPEND_TIME) += suspend_time.o

obj-$(CONFIG_MAGIC_SYSRQ) += poweroff.o
diff --git a/kernel/power/suspend_time.c b/kernel/power/suspend_time.c
new file mode 100644
index 0000000..a613ede
--- /dev/null
+++ b/kernel/power/suspend_time.c
@@ -0,0 +1,111 @@
+/*
+ * debugfs file to track time spent in suspend
+ *
+ * Copyright (c) 2011, Google, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ */
+
+#include <linux/debugfs.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/seq_file.h>
+#include <linux/syscore_ops.h>
+#include <linux/time.h>
+
+static struct timespec suspend_time_before;
+static unsigned int time_in_suspend_bins[32];
+
+#ifdef CONFIG_DEBUG_FS
+static int suspend_time_debug_show(struct seq_file *s, void *data)
+{
+ unsigned int bin;
+ seq_printf(s, " time (secs) count\n");
+ seq_printf(s, "------------------------------\n");
+ for (bin = 0; bin < 32; bin++) {
+ if (time_in_suspend_bins[bin] == 0)
+ continue;
+ seq_printf(s, "%10u - %-10u %4u\n",
+ bin ? 1 << (bin - 1) : 0, 1 << bin,
+ time_in_suspend_bins[bin]);
+ }
+ return 0;
+}
+
+static int suspend_time_debug_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, suspend_time_debug_show, NULL);
+}
+
+static const struct file_operations suspend_time_debug_fops = {
+ .open = suspend_time_debug_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+static int __init suspend_time_debug_init(void)
+{
+ struct dentry *d;
+
+ d = debugfs_create_file("suspend_time", 0444, NULL, NULL,
+ &suspend_time_debug_fops);
+ if (!d) {
+ pr_err("Failed to create suspend_time debug file\n");
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+late_initcall(suspend_time_debug_init);
+#endif
+
+static int suspend_time_syscore_suspend(void)
+{
+ read_persistent_clock(&suspend_time_before);
+
+ return 0;
+}
+
+static void suspend_time_syscore_resume(void)
+{
+ struct timespec after;
+
+ read_persistent_clock(&after);
+
+ after = timespec_sub(after, suspend_time_before);
+
+ time_in_suspend_bins[fls(after.tv_sec)]++;
+
+ pr_info("Suspended for %lu.%03lu seconds\n", after.tv_sec,
+ after.tv_nsec / NSEC_PER_MSEC);
+}
+
+static struct syscore_ops suspend_time_syscore_ops = {
+ .suspend = suspend_time_syscore_suspend,
+ .resume = suspend_time_syscore_resume,
+};
+
+static int suspend_time_syscore_init(void)
+{
+ register_syscore_ops(&suspend_time_syscore_ops);
+
+ return 0;
+}
+
+static void suspend_time_syscore_exit(void)
+{
+ unregister_syscore_ops(&suspend_time_syscore_ops);
+}
+module_init(suspend_time_syscore_init);
+module_exit(suspend_time_syscore_exit);
--
1.7.9.5

2013-04-30 23:30:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC PATCH] drivers: power: Add watchdog timer to catch drivers which lockup during suspend.

On Tue, Apr 30, 2013 at 03:28:33PM -0700, Zoran Markovic wrote:
> From: Benoit Goby <[email protected]>
>
> Below is a patch from android kernel that detects a driver suspend
> lockup and captures dump in the kernel log. Please review and provide
> comments.

There's this really cool thing called a watchdog driver that does stuff
like this :)

> Rather than hard-lock the kernel, dump the suspend thread stack and
> BUG() when a driver takes too long to suspend. The timeout is set to
> 12 seconds to be longer than the usbhid 10 second timeout.
>
> Exclude from the watchdog the time spent waiting for children that
> are resumed asynchronously and time every device, whether or not they
> resumed synchronously.

No, don't add a driver-core-only timer, use the existing watchdog timers
if you are worried about the kernel locking up.

thanks,

greg k-h

2013-05-01 00:29:34

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC PATCH] power: Add option to log time spent in suspend

Hi!

> From: Colin Cross <[email protected]>
>
> Below is a patch from android kernel that maintains a histogram of
> suspend times. Please review and provide feedback.
>
> Prints the time spent in suspend in the kernel log, and keeps statistics
> on the time spent in suspend in /sys/kernel/debug/suspend_time
>
> Cc: Android Kernel Team <[email protected]>
> Cc: Colin Cross <[email protected]>
> Cc: Todd Poynor <[email protected]>
> Cc: San Mehat <[email protected]>
> Cc: Benoit Goby <[email protected]>
> Cc: John Stultz <[email protected]>
> Cc: Pavel Machek <[email protected]>
> Cc: Rafael J. Wysocki <[email protected]>
> Cc: Len Brown <[email protected]>
> Signed-off-by: Colin Cross <[email protected]>
> Signed-off-by: Todd Poynor <[email protected]>
> [[email protected]: Re-formatted suspend time table to better fit
> expected values, tweaked commit message]
> Signed-off-by: Zoran Markovic <[email protected]>

> +++ b/kernel/power/suspend_time.c
> @@ -0,0 +1,111 @@
> +/*
> + * debugfs file to track time spent in suspend
> + *
> + * Copyright (c) 2011, Google, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.
> + */
> +
> +#include <linux/debugfs.h>
> +#include <linux/err.h>
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/seq_file.h>
> +#include <linux/syscore_ops.h>
> +#include <linux/time.h>
> +
> +static struct timespec suspend_time_before;
> +static unsigned int time_in_suspend_bins[32];
> +
> +#ifdef CONFIG_DEBUG_FS
> +static int suspend_time_debug_show(struct seq_file *s, void *data)
> +{
> + unsigned int bin;
> + seq_printf(s, " time (secs) count\n");
> + seq_printf(s, "------------------------------\n");
> + for (bin = 0; bin < 32; bin++) {
> + if (time_in_suspend_bins[bin] == 0)
> + continue;
> + seq_printf(s, "%10u - %-10u %4u\n",
> + bin ? 1 << (bin - 1) : 0, 1 << bin,
> + time_in_suspend_bins[bin]);
> + }
> + return 0;
> +}
> +
> +static int suspend_time_debug_open(struct inode *inode, struct file *file)
> +{
> + return single_open(file, suspend_time_debug_show, NULL);
> +}

Ok, whole new driver to get timing info... As we already have times in
printk, can't existing printks() get the same info? Maybe
Documentation/ file describing how to get that info from dmesg would
be enough?
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-05-01 00:31:03

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC PATCH] drivers: power: Add watchdog timer to catch drivers which lockup during suspend.

Hi!

> Below is a patch from android kernel that detects a driver suspend
> lockup and captures dump in the kernel log. Please review and provide
> comments.
>
> Rather than hard-lock the kernel, dump the suspend thread stack and
> BUG() when a driver takes too long to suspend. The timeout is set to
> 12 seconds to be longer than the usbhid 10 second timeout.
>
> Exclude from the watchdog the time spent waiting for children that
> are resumed asynchronously and time every device, whether or not they
> resumed synchronously.
>
> Cc: Android Kernel Team <[email protected]>
> Cc: Colin Cross <[email protected]>
> Cc: Todd Poynor <[email protected]>
> Cc: San Mehat <[email protected]>
> Cc: Benoit Goby <[email protected]>
> Cc: John Stultz <[email protected]>
> Cc: Pavel Machek <[email protected]>
> Cc: Rafael J. Wysocki <[email protected]>
> Cc: Len Brown <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Original-author: San Mehat <[email protected]>
> Signed-off-by: Benoit Goby <[email protected]>
> [[email protected]: Changed printk(KERN_EMERG,...) to pr_emerg(...),
> tweaked commit message.]
> Signed-off-by: Zoran Markovic <[email protected]>
> ---
> drivers/base/power/main.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 45 insertions(+)
>
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index 15beb50..eb70c0e 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -29,6 +29,8 @@
> #include <linux/async.h>
> #include <linux/suspend.h>
> #include <linux/cpuidle.h>
> +#include <linux/timer.h>
> +
> #include "../base.h"
> #include "power.h"
>
> @@ -54,6 +56,12 @@ struct suspend_stats suspend_stats;
> static DEFINE_MUTEX(dpm_list_mtx);
> static pm_message_t pm_transition;
>
> +static void dpm_drv_timeout(unsigned long data);
> +struct dpm_drv_wd_data {
> + struct device *dev;
> + struct task_struct *tsk;
> +};
> +
> static int async_error;
>
> /**
> @@ -663,6 +671,30 @@ static bool is_async(struct device *dev)
> }
>
> /**
> + * dpm_drv_timeout - Driver suspend / resume watchdog handler
> + * @data: struct device which timed out
> + *
> + * Called when a driver has timed out suspending or resuming.
> + * There's not much we can do here to recover so
> + * BUG() out for a crash-dump
> + *
> + */
> +static void dpm_drv_timeout(unsigned long data)
> +{
> + struct dpm_drv_wd_data *wd_data = (void *)data;
> + struct device *dev = wd_data->dev;
> + struct task_struct *tsk = wd_data->tsk;
> +
> + pr_emerg("**** DPM device timeout: %s (%s)\n", dev_name(dev),
> + (dev->driver ? dev->driver->name : "no driver"));
> +
> + pr_emerg("dpm suspend stack:\n");
> + show_stack(tsk, NULL);
> +
> + BUG();
> +}

So you:

dump stack of the suspend task

do BUG which
dumps stack of current task
kills current task

Current task may very well be idle task; in such case you kill the
machine. Sounds like you should be doing something else, like kill -9
instead of BUG()?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-05-01 03:29:28

by Colin Cross

[permalink] [raw]
Subject: Re: [RFC PATCH] power: Add option to log time spent in suspend

On Tue, Apr 30, 2013 at 5:29 PM, Pavel Machek <[email protected]> wrote:
> Hi!
>
>> From: Colin Cross <[email protected]>
>>
>> Below is a patch from android kernel that maintains a histogram of
>> suspend times. Please review and provide feedback.
>>
>> Prints the time spent in suspend in the kernel log, and keeps statistics
>> on the time spent in suspend in /sys/kernel/debug/suspend_time
>>
>> Cc: Android Kernel Team <[email protected]>
>> Cc: Colin Cross <[email protected]>
>> Cc: Todd Poynor <[email protected]>
>> Cc: San Mehat <[email protected]>
>> Cc: Benoit Goby <[email protected]>
>> Cc: John Stultz <[email protected]>
>> Cc: Pavel Machek <[email protected]>
>> Cc: Rafael J. Wysocki <[email protected]>
>> Cc: Len Brown <[email protected]>
>> Signed-off-by: Colin Cross <[email protected]>
>> Signed-off-by: Todd Poynor <[email protected]>
>> [[email protected]: Re-formatted suspend time table to better fit
>> expected values, tweaked commit message]
>> Signed-off-by: Zoran Markovic <[email protected]>
>
>> +++ b/kernel/power/suspend_time.c
>> @@ -0,0 +1,111 @@
>> +/*
>> + * debugfs file to track time spent in suspend
>> + *
>> + * Copyright (c) 2011, Google, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
>> + * more details.
>> + */
>> +
>> +#include <linux/debugfs.h>
>> +#include <linux/err.h>
>> +#include <linux/init.h>
>> +#include <linux/kernel.h>
>> +#include <linux/seq_file.h>
>> +#include <linux/syscore_ops.h>
>> +#include <linux/time.h>
>> +
>> +static struct timespec suspend_time_before;
>> +static unsigned int time_in_suspend_bins[32];
>> +
>> +#ifdef CONFIG_DEBUG_FS
>> +static int suspend_time_debug_show(struct seq_file *s, void *data)
>> +{
>> + unsigned int bin;
>> + seq_printf(s, " time (secs) count\n");
>> + seq_printf(s, "------------------------------\n");
>> + for (bin = 0; bin < 32; bin++) {
>> + if (time_in_suspend_bins[bin] == 0)
>> + continue;
>> + seq_printf(s, "%10u - %-10u %4u\n",
>> + bin ? 1 << (bin - 1) : 0, 1 << bin,
>> + time_in_suspend_bins[bin]);
>> + }
>> + return 0;
>> +}
>> +
>> +static int suspend_time_debug_open(struct inode *inode, struct file *file)
>> +{
>> + return single_open(file, suspend_time_debug_show, NULL);
>> +}
>
> Ok, whole new driver to get timing info... As we already have times in
> printk, can't existing printks() get the same info? Maybe
> Documentation/ file describing how to get that info from dmesg would
> be enough?

If this were to be merged I assume it would be rewritten directly into
kernel/power/suspend.c instead of using syscore ops. That was just
done to ease future merges while it was out of tree.

dmesg is not the solution, it's not a stable api, it would require
constant parsing to make sure you didn't miss data before it fell out
of the ringbuffer, and sched_clock often does not (and should not)
tick during suspend, so printk times don't show the time spent
suspended.

2013-05-01 03:36:28

by Colin Cross

[permalink] [raw]
Subject: Re: [RFC PATCH] drivers: power: Add watchdog timer to catch drivers which lockup during suspend.

On Tue, Apr 30, 2013 at 4:30 PM, Greg Kroah-Hartman
<[email protected]> wrote:
> On Tue, Apr 30, 2013 at 03:28:33PM -0700, Zoran Markovic wrote:
>> From: Benoit Goby <[email protected]>
>>
>> Below is a patch from android kernel that detects a driver suspend
>> lockup and captures dump in the kernel log. Please review and provide
>> comments.
>
> There's this really cool thing called a watchdog driver that does stuff
> like this :)

If the watchdog driver worked in this case this patch wouldn't exist.

>> Rather than hard-lock the kernel, dump the suspend thread stack and
>> BUG() when a driver takes too long to suspend. The timeout is set to
>> 12 seconds to be longer than the usbhid 10 second timeout.
>>
>> Exclude from the watchdog the time spent waiting for children that
>> are resumed asynchronously and time every device, whether or not they
>> resumed synchronously.
>
> No, don't add a driver-core-only timer, use the existing watchdog timers
> if you are worried about the kernel locking up.

The watchdog timers are useless here. For one, they generally stop
when their driver suspend op is called, so you may not even have one
running when you lock up. More importantly, the purpose of this patch
is to tell you which driver locked up and hopefully why, and the
watchdog driver will usually result in a silent reset. This patch
will cause a stack trace of the driver suspend op that is blocking
suspend progress, even if that call does not happen in the suspend
thread.

2013-05-01 03:39:55

by Colin Cross

[permalink] [raw]
Subject: Re: [RFC PATCH] drivers: power: Add watchdog timer to catch drivers which lockup during suspend.

On Tue, Apr 30, 2013 at 5:30 PM, Pavel Machek <[email protected]> wrote:
> Hi!
>
>> Below is a patch from android kernel that detects a driver suspend
>> lockup and captures dump in the kernel log. Please review and provide
>> comments.
>>
>> Rather than hard-lock the kernel, dump the suspend thread stack and
>> BUG() when a driver takes too long to suspend. The timeout is set to
>> 12 seconds to be longer than the usbhid 10 second timeout.
>>
>> Exclude from the watchdog the time spent waiting for children that
>> are resumed asynchronously and time every device, whether or not they
>> resumed synchronously.
>>
>> Cc: Android Kernel Team <[email protected]>
>> Cc: Colin Cross <[email protected]>
>> Cc: Todd Poynor <[email protected]>
>> Cc: San Mehat <[email protected]>
>> Cc: Benoit Goby <[email protected]>
>> Cc: John Stultz <[email protected]>
>> Cc: Pavel Machek <[email protected]>
>> Cc: Rafael J. Wysocki <[email protected]>
>> Cc: Len Brown <[email protected]>
>> Cc: Greg Kroah-Hartman <[email protected]>
>> Original-author: San Mehat <[email protected]>
>> Signed-off-by: Benoit Goby <[email protected]>
>> [[email protected]: Changed printk(KERN_EMERG,...) to pr_emerg(...),
>> tweaked commit message.]
>> Signed-off-by: Zoran Markovic <[email protected]>
>> ---
>> drivers/base/power/main.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 45 insertions(+)
>>
>> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
>> index 15beb50..eb70c0e 100644
>> --- a/drivers/base/power/main.c
>> +++ b/drivers/base/power/main.c
>> @@ -29,6 +29,8 @@
>> #include <linux/async.h>
>> #include <linux/suspend.h>
>> #include <linux/cpuidle.h>
>> +#include <linux/timer.h>
>> +
>> #include "../base.h"
>> #include "power.h"
>>
>> @@ -54,6 +56,12 @@ struct suspend_stats suspend_stats;
>> static DEFINE_MUTEX(dpm_list_mtx);
>> static pm_message_t pm_transition;
>>
>> +static void dpm_drv_timeout(unsigned long data);
>> +struct dpm_drv_wd_data {
>> + struct device *dev;
>> + struct task_struct *tsk;
>> +};
>> +
>> static int async_error;
>>
>> /**
>> @@ -663,6 +671,30 @@ static bool is_async(struct device *dev)
>> }
>>
>> /**
>> + * dpm_drv_timeout - Driver suspend / resume watchdog handler
>> + * @data: struct device which timed out
>> + *
>> + * Called when a driver has timed out suspending or resuming.
>> + * There's not much we can do here to recover so
>> + * BUG() out for a crash-dump
>> + *
>> + */
>> +static void dpm_drv_timeout(unsigned long data)
>> +{
>> + struct dpm_drv_wd_data *wd_data = (void *)data;
>> + struct device *dev = wd_data->dev;
>> + struct task_struct *tsk = wd_data->tsk;
>> +
>> + pr_emerg("**** DPM device timeout: %s (%s)\n", dev_name(dev),
>> + (dev->driver ? dev->driver->name : "no driver"));
>> +
>> + pr_emerg("dpm suspend stack:\n");
>> + show_stack(tsk, NULL);
>> +
>> + BUG();
>> +}
>
> So you:
>
> dump stack of the suspend task
It dumps the stack of the suspend task if the suspend callback is run
synchronously, or the async task if the suspend op is run
asynchronously.

> do BUG which
> dumps stack of current task
> kills current task
>
> Current task may very well be idle task; in such case you kill the
> machine. Sounds like you should be doing something else, like kill -9
> instead of BUG()?

Not much else you can do, you are stuck part way into suspend with a
driver's suspend callback half executed. All userspace tasks are
frozen, and the suspend task is blocked indefinitely.

2013-05-01 04:17:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC PATCH] drivers: power: Add watchdog timer to catch drivers which lockup during suspend.

On Tue, Apr 30, 2013 at 08:36:21PM -0700, Colin Cross wrote:
> On Tue, Apr 30, 2013 at 4:30 PM, Greg Kroah-Hartman
> <[email protected]> wrote:
> > On Tue, Apr 30, 2013 at 03:28:33PM -0700, Zoran Markovic wrote:
> >> From: Benoit Goby <[email protected]>
> >>
> >> Below is a patch from android kernel that detects a driver suspend
> >> lockup and captures dump in the kernel log. Please review and provide
> >> comments.
> >
> > There's this really cool thing called a watchdog driver that does stuff
> > like this :)
>
> If the watchdog driver worked in this case this patch wouldn't exist.

Great, let's fix the watchdog timer then :)

What's wrong with it?

> >> Rather than hard-lock the kernel, dump the suspend thread stack and
> >> BUG() when a driver takes too long to suspend. The timeout is set to
> >> 12 seconds to be longer than the usbhid 10 second timeout.
> >>
> >> Exclude from the watchdog the time spent waiting for children that
> >> are resumed asynchronously and time every device, whether or not they
> >> resumed synchronously.
> >
> > No, don't add a driver-core-only timer, use the existing watchdog timers
> > if you are worried about the kernel locking up.
>
> The watchdog timers are useless here. For one, they generally stop
> when their driver suspend op is called, so you may not even have one
> running when you lock up.

But you can fix that, right?

> More importantly, the purpose of this patch is to tell you which
> driver locked up and hopefully why, and the watchdog driver will
> usually result in a silent reset.

I thought it was an option as to what the watchdog does when it
triggers.

> This patch will cause a stack trace of the driver suspend op that is
> blocking suspend progress, even if that call does not happen in the
> suspend thread.

But who can see this, the machine is now dead.

greg k-h

2013-05-01 04:39:48

by Colin Cross

[permalink] [raw]
Subject: Re: [RFC PATCH] drivers: power: Add watchdog timer to catch drivers which lockup during suspend.

On Tue, Apr 30, 2013 at 9:17 PM, Greg Kroah-Hartman
<[email protected]> wrote:
> On Tue, Apr 30, 2013 at 08:36:21PM -0700, Colin Cross wrote:
>> On Tue, Apr 30, 2013 at 4:30 PM, Greg Kroah-Hartman
>> <[email protected]> wrote:
>> > On Tue, Apr 30, 2013 at 03:28:33PM -0700, Zoran Markovic wrote:
>> >> From: Benoit Goby <[email protected]>
>> >>
>> >> Below is a patch from android kernel that detects a driver suspend
>> >> lockup and captures dump in the kernel log. Please review and provide
>> >> comments.
>> >
>> > There's this really cool thing called a watchdog driver that does stuff
>> > like this :)
>>
>> If the watchdog driver worked in this case this patch wouldn't exist.
>
> Great, let's fix the watchdog timer then :)
>
> What's wrong with it?
>
>> >> Rather than hard-lock the kernel, dump the suspend thread stack and
>> >> BUG() when a driver takes too long to suspend. The timeout is set to
>> >> 12 seconds to be longer than the usbhid 10 second timeout.
>> >>
>> >> Exclude from the watchdog the time spent waiting for children that
>> >> are resumed asynchronously and time every device, whether or not they
>> >> resumed synchronously.
>> >
>> > No, don't add a driver-core-only timer, use the existing watchdog timers
>> > if you are worried about the kernel locking up.
>>
>> The watchdog timers are useless here. For one, they generally stop
>> when their driver suspend op is called, so you may not even have one
>> running when you lock up.
>
> But you can fix that, right?

Ah, you're talking about the lockup detectors, and not drivers/watchdog.

The hardlockup detector can tell you if timer interrupts are not
firing, which is unaffected by this patch since the timer wouldn't
fire any way. The softlockup detector could eventually tell you that
tasks were not being scheduled, but not why. Even panic on softlockup
will only get you the stack trace of the current task, which will be
the locked up task if it is spinning, but is likely to be the idle
task if the suspend task is blocked on a wait_event. This patch will
give the stack trace of the suspend operation that is blocked, even if
it is an asynchronous suspend callback.

>> More importantly, the purpose of this patch is to tell you which
>> driver locked up and hopefully why, and the watchdog driver will
>> usually result in a silent reset.
>
> I thought it was an option as to what the watchdog does when it
> triggers.
>
>> This patch will cause a stack trace of the driver suspend op that is
>> blocking suspend progress, even if that call does not happen in the
>> suspend thread.
>
> But who can see this, the machine is now dead.

I'm not sure what might still be working in this situation on x86, but
on ARM the machine is dead anyways. Some random subset of drivers are
suspended, so you probably have no hardware watchdog, no console, no
video. kexec on panic, kgdb on panic, console messages saved in
pstore, or jtag are the only options I know of. This patch is very
useful in conjunction with pstore console.

2013-05-01 05:15:06

by Colin Cross

[permalink] [raw]
Subject: Re: [RFC PATCH] drivers: power: Add watchdog timer to catch drivers which lockup during suspend.

On Tue, Apr 30, 2013 at 9:57 PM, anish singh
<[email protected]> wrote:
>
>
>
> On Wed, May 1, 2013 at 10:09 AM, Colin Cross <[email protected]> wrote:
>>
>> On Tue, Apr 30, 2013 at 9:17 PM, Greg Kroah-Hartman
>> <[email protected]> wrote:
>> > On Tue, Apr 30, 2013 at 08:36:21PM -0700, Colin Cross wrote:
>> >> On Tue, Apr 30, 2013 at 4:30 PM, Greg Kroah-Hartman
>> >> <[email protected]> wrote:
>> >> > On Tue, Apr 30, 2013 at 03:28:33PM -0700, Zoran Markovic wrote:
>> >> >> From: Benoit Goby <[email protected]>
>> >> >>
>> >> >> Below is a patch from android kernel that detects a driver suspend
>> >> >> lockup and captures dump in the kernel log. Please review and
>> >> >> provide
>> >> >> comments.
>> >> >
>> >> > There's this really cool thing called a watchdog driver that does
>> >> > stuff
>> >> > like this :)
>> >>
>> >> If the watchdog driver worked in this case this patch wouldn't exist.
>> >
>> > Great, let's fix the watchdog timer then :)
>> >
>> > What's wrong with it?
>> >
>> >> >> Rather than hard-lock the kernel, dump the suspend thread stack and
>> >> >> BUG() when a driver takes too long to suspend. The timeout is set
>> >> >> to
>> >> >> 12 seconds to be longer than the usbhid 10 second timeout.
>> >> >>
>> >> >> Exclude from the watchdog the time spent waiting for children that
>> >> >> are resumed asynchronously and time every device, whether or not
>> >> >> they
>> >> >> resumed synchronously.
>> >> >
>> >> > No, don't add a driver-core-only timer, use the existing watchdog
>> >> > timers
>> >> > if you are worried about the kernel locking up.
>> >>
>> >> The watchdog timers are useless here. For one, they generally stop
>> >> when their driver suspend op is called, so you may not even have one
>> >> running when you lock up.
>> >
>> > But you can fix that, right?
>>
>> Ah, you're talking about the lockup detectors, and not drivers/watchdog.
>>
>> The hardlockup detector can tell you if timer interrupts are not
>> firing, which is unaffected by this patch since the timer wouldn't
>> fire any way. The softlockup detector could eventually tell you that
>
> I was wondering if timers don't fire then how is your dpm_drv_timeout
> function gets called?

That's what I meant - this patch doesn't do anything if timers are not firing.

>>
>> tasks were not being scheduled, but not why. Even panic on softlockup
>> will only get you the stack trace of the current task, which will be
>> the locked up task if it is spinning, but is likely to be the idle
>> task if the suspend task is blocked on a wait_event. This patch will
>> give the stack trace of the suspend operation that is blocked, even if
>> it is an asynchronous suspend callback.
>
> ......but not when timers itself is not firing right?
>>
>> ...but not when ti
>> >> More importantly, the purpose of this patch is to tell you which
>> >> driver locked up and hopefully why, and the watchdog driver will
>> >> usually result in a silent reset.
>> >
>> > I thought it was an option as to what the watchdog does when it
>> > triggers.
>> >
>> >> This patch will cause a stack trace of the driver suspend op that is
>> >> blocking suspend progress, even if that call does not happen in the
>> >> suspend thread.
>> >
>> > But who can see this, the machine is now dead.
>>
>> I'm not sure what might still be working in this situation on x86, but
>> on ARM the machine is dead anyways. Some random subset of drivers are
>> suspended, so you probably have no hardware watchdog, no console, no
>> video. kexec on panic, kgdb on panic, console messages saved in
>> pstore, or jtag are the only options I know of. This patch is very
>> useful in conjunction with pstore console.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>
>

2013-05-01 10:56:39

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC PATCH] drivers: power: Add watchdog timer to catch drivers which lockup during suspend.

Hi!

> >> @@ -663,6 +671,30 @@ static bool is_async(struct device *dev)
> >> }
> >>
> >> /**
> >> + * dpm_drv_timeout - Driver suspend / resume watchdog handler
> >> + * @data: struct device which timed out
> >> + *
> >> + * Called when a driver has timed out suspending or resuming.
> >> + * There's not much we can do here to recover so
> >> + * BUG() out for a crash-dump
> >> + *
> >> + */
> >> +static void dpm_drv_timeout(unsigned long data)
> >> +{
> >> + struct dpm_drv_wd_data *wd_data = (void *)data;
> >> + struct device *dev = wd_data->dev;
> >> + struct task_struct *tsk = wd_data->tsk;
> >> +
> >> + pr_emerg("**** DPM device timeout: %s (%s)\n", dev_name(dev),
> >> + (dev->driver ? dev->driver->name : "no driver"));
> >> +
> >> + pr_emerg("dpm suspend stack:\n");
> >> + show_stack(tsk, NULL);
> >> +
> >> + BUG();
> >> +}
> >
> > So you:
> >
> > dump stack of the suspend task
> It dumps the stack of the suspend task if the suspend callback is run
> synchronously, or the async task if the suspend op is run
> asynchronously.

Lets call that [a]suspend task.

> > do BUG which
> > dumps stack of current task
> > kills current task
> >
> > Current task may very well be idle task; in such case you kill the
> > machine. Sounds like you should be doing something else, like kill -9
> > instead of BUG()?
>
> Not much else you can do, you are stuck part way into suspend with a
> driver's suspend callback half executed. All userspace tasks are
> frozen, and the suspend task is blocked indefinitely.

Yes, there's better option. Attempt killing the [a]suspend task,
instead of killing the current task.

Try putting mdelay(100000) into suspend path. Your patch will do the
wrong thing in that case (actually turning debuggable problem into
undebuggable one).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-05-01 16:10:58

by Colin Cross

[permalink] [raw]
Subject: Re: [RFC PATCH] drivers: power: Add watchdog timer to catch drivers which lockup during suspend.

On Wed, May 1, 2013 at 3:56 AM, Pavel Machek <[email protected]> wrote:
> Hi!
>
>> >> @@ -663,6 +671,30 @@ static bool is_async(struct device *dev)
>> >> }
>> >>
>> >> /**
>> >> + * dpm_drv_timeout - Driver suspend / resume watchdog handler
>> >> + * @data: struct device which timed out
>> >> + *
>> >> + * Called when a driver has timed out suspending or resuming.
>> >> + * There's not much we can do here to recover so
>> >> + * BUG() out for a crash-dump
>> >> + *
>> >> + */
>> >> +static void dpm_drv_timeout(unsigned long data)
>> >> +{
>> >> + struct dpm_drv_wd_data *wd_data = (void *)data;
>> >> + struct device *dev = wd_data->dev;
>> >> + struct task_struct *tsk = wd_data->tsk;
>> >> +
>> >> + pr_emerg("**** DPM device timeout: %s (%s)\n", dev_name(dev),
>> >> + (dev->driver ? dev->driver->name : "no driver"));
>> >> +
>> >> + pr_emerg("dpm suspend stack:\n");
>> >> + show_stack(tsk, NULL);
>> >> +
>> >> + BUG();
>> >> +}
>> >
>> > So you:
>> >
>> > dump stack of the suspend task
>> It dumps the stack of the suspend task if the suspend callback is run
>> synchronously, or the async task if the suspend op is run
>> asynchronously.
>
> Lets call that [a]suspend task.
>
>> > do BUG which
>> > dumps stack of current task
>> > kills current task
>> >
>> > Current task may very well be idle task; in such case you kill the
>> > machine. Sounds like you should be doing something else, like kill -9
>> > instead of BUG()?
>>
>> Not much else you can do, you are stuck part way into suspend with a
>> driver's suspend callback half executed. All userspace tasks are
>> frozen, and the suspend task is blocked indefinitely.
>
> Yes, there's better option. Attempt killing the [a]suspend task,
> instead of killing the current task.

That will leave you in a completely undefined state. If you just kill
the task, you are likely to kill the synchronous suspend task, which
is the task that would resume your drivers and unfreeze tasks. That
will leave you with no userspace tasks running, and much of your
hardware suspended. How is that a useful result? If you somehow
respawn a resume thread to resume whatever hardware you can and
unfreeze tasks, you still have the hardware that was suspending when
it was killed in a bad state, and probably has locks held, so you're
just going to deadlock or crash soon after.

> Try putting mdelay(100000) into suspend path. Your patch will do the
> wrong thing in that case (actually turning debuggable problem into
> undebuggable one).

I'm not saying this patch as is is right for everyone (it probably at
least needs to be configurable to be turned off, change the delay, and
change the panic to just a stack trace), but from a mobile perspective
this patch is far more debuggable than without this patch. We work
very hard to make sure that panic's are highly debuggable, in fact we
often prefer panics over any other behavior when the device is in a
bad state, because it immediately gets the user's device working again
while still giving us useful information in our automatic log
collection.

With an mdelay(100000) in the suspend path, users in our debug device
pool are likely to just pull the battery because their screen won't
turn on, in which case I get no debugging information. With this
patch, the device will automatically reboot due to the panic, and I
will get captured logs after reboot that show a stack trace ending
with
mdelay, which tells me exactly where to look for this mdelay(100000).

2013-05-01 16:24:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC PATCH] drivers: power: Add watchdog timer to catch drivers which lockup during suspend.

On Wed, May 01, 2013 at 09:10:49AM -0700, Colin Cross wrote:
> I'm not saying this patch as is is right for everyone (it probably at
> least needs to be configurable to be turned off, change the delay, and
> change the panic to just a stack trace),

Those changes would be nice.

> but from a mobile perspective
> this patch is far more debuggable than without this patch. We work
> very hard to make sure that panic's are highly debuggable, in fact we
> often prefer panics over any other behavior when the device is in a
> bad state, because it immediately gets the user's device working again
> while still giving us useful information in our automatic log
> collection.
>
> With an mdelay(100000) in the suspend path, users in our debug device
> pool are likely to just pull the battery because their screen won't
> turn on, in which case I get no debugging information. With this
> patch, the device will automatically reboot due to the panic, and I
> will get captured logs after reboot that show a stack trace ending
> with mdelay, which tells me exactly where to look for this
> mdelay(100000).

All of that information would be _great_ to have in the changelog for
the patch, as it explains exactly why you need this. {hint}

thanks,

greg k-h

2013-05-02 12:27:51

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC PATCH] power: Add option to log time spent in suspend

Hi!

> >> +#include <linux/debugfs.h>
> >> +#include <linux/err.h>
> >> +#include <linux/init.h>
> >> +#include <linux/kernel.h>
> >> +#include <linux/seq_file.h>
> >> +#include <linux/syscore_ops.h>
> >> +#include <linux/time.h>
> >> +
> >> +static struct timespec suspend_time_before;
> >> +static unsigned int time_in_suspend_bins[32];
> >> +
> >> +#ifdef CONFIG_DEBUG_FS
> >> +static int suspend_time_debug_show(struct seq_file *s, void *data)
> >> +{
> >> + unsigned int bin;
> >> + seq_printf(s, " time (secs) count\n");
> >> + seq_printf(s, "------------------------------\n");
> >> + for (bin = 0; bin < 32; bin++) {
> >> + if (time_in_suspend_bins[bin] == 0)
> >> + continue;
> >> + seq_printf(s, "%10u - %-10u %4u\n",
> >> + bin ? 1 << (bin - 1) : 0, 1 << bin,
> >> + time_in_suspend_bins[bin]);
> >> + }
> >> + return 0;
> >> +}
> >> +
> >> +static int suspend_time_debug_open(struct inode *inode, struct file *file)
> >> +{
> >> + return single_open(file, suspend_time_debug_show, NULL);
> >> +}
> >
> > Ok, whole new driver to get timing info... As we already have times in
> > printk, can't existing printks() get the same info? Maybe
> > Documentation/ file describing how to get that info from dmesg would
> > be enough?
>
> If this were to be merged I assume it would be rewritten directly into
> kernel/power/suspend.c instead of using syscore ops. That was just
> done to ease future merges while it was out of tree.
>
> dmesg is not the solution, it's not a stable api, it would require
> constant parsing to make sure you didn't miss data before it fell out
> of the ringbuffer, and sched_clock often does not (and should not)
> tick during suspend, so printk times don't show the time spent
> suspended.

If timing info printed is invalid, that should be fixed.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-05-02 12:30:38

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC PATCH] drivers: power: Add watchdog timer to catch drivers which lockup during suspend.

Hi!

> >> > do BUG which
> >> > dumps stack of current task
> >> > kills current task
> >> >
> >> > Current task may very well be idle task; in such case you kill the
> >> > machine. Sounds like you should be doing something else, like kill -9
> >> > instead of BUG()?
> >>
> >> Not much else you can do, you are stuck part way into suspend with a
> >> driver's suspend callback half executed. All userspace tasks are
> >> frozen, and the suspend task is blocked indefinitely.
> >
> > Yes, there's better option. Attempt killing the [a]suspend task,
> > instead of killing the current task.
>
> That will leave you in a completely undefined state. If you just kill
> the task, you are likely to kill the synchronous suspend task, which
> is the task that would resume your drivers and unfreeze tasks. That
> will leave you with no userspace tasks running, and much of your
> hardware suspended. How is that a useful result? If you somehow

So instead you kill random task? (BUG() from timer kills pretty much
random task, right?)

If you want to do panic(), do panic().
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2013-05-02 18:25:16

by Colin Cross

[permalink] [raw]
Subject: Re: [RFC PATCH] drivers: power: Add watchdog timer to catch drivers which lockup during suspend.

On Thu, May 2, 2013 at 5:30 AM, Pavel Machek <[email protected]> wrote:
> Hi!
>
>> >> > do BUG which
>> >> > dumps stack of current task
>> >> > kills current task
>> >> >
>> >> > Current task may very well be idle task; in such case you kill the
>> >> > machine. Sounds like you should be doing something else, like kill -9
>> >> > instead of BUG()?
>> >>
>> >> Not much else you can do, you are stuck part way into suspend with a
>> >> driver's suspend callback half executed. All userspace tasks are
>> >> frozen, and the suspend task is blocked indefinitely.
>> >
>> > Yes, there's better option. Attempt killing the [a]suspend task,
>> > instead of killing the current task.
>>
>> That will leave you in a completely undefined state. If you just kill
>> the task, you are likely to kill the synchronous suspend task, which
>> is the task that would resume your drivers and unfreeze tasks. That
>> will leave you with no userspace tasks running, and much of your
>> hardware suspended. How is that a useful result? If you somehow
>
> So instead you kill random task? (BUG() from timer kills pretty much
> random task, right?)
>
> If you want to do panic(), do panic().

At least on ARM a BUG() in an interrupt or softirq always results in a
panic, but this can be switched to directly call panic.

2013-05-02 18:29:07

by Colin Cross

[permalink] [raw]
Subject: Re: [RFC PATCH] power: Add option to log time spent in suspend

On Thu, May 2, 2013 at 5:27 AM, Pavel Machek <[email protected]> wrote:
> Hi!
>
>> >> +#include <linux/debugfs.h>
>> >> +#include <linux/err.h>
>> >> +#include <linux/init.h>
>> >> +#include <linux/kernel.h>
>> >> +#include <linux/seq_file.h>
>> >> +#include <linux/syscore_ops.h>
>> >> +#include <linux/time.h>
>> >> +
>> >> +static struct timespec suspend_time_before;
>> >> +static unsigned int time_in_suspend_bins[32];
>> >> +
>> >> +#ifdef CONFIG_DEBUG_FS
>> >> +static int suspend_time_debug_show(struct seq_file *s, void *data)
>> >> +{
>> >> + unsigned int bin;
>> >> + seq_printf(s, " time (secs) count\n");
>> >> + seq_printf(s, "------------------------------\n");
>> >> + for (bin = 0; bin < 32; bin++) {
>> >> + if (time_in_suspend_bins[bin] == 0)
>> >> + continue;
>> >> + seq_printf(s, "%10u - %-10u %4u\n",
>> >> + bin ? 1 << (bin - 1) : 0, 1 << bin,
>> >> + time_in_suspend_bins[bin]);
>> >> + }
>> >> + return 0;
>> >> +}
>> >> +
>> >> +static int suspend_time_debug_open(struct inode *inode, struct file *file)
>> >> +{
>> >> + return single_open(file, suspend_time_debug_show, NULL);
>> >> +}
>> >
>> > Ok, whole new driver to get timing info... As we already have times in
>> > printk, can't existing printks() get the same info? Maybe
>> > Documentation/ file describing how to get that info from dmesg would
>> > be enough?
>>
>> If this were to be merged I assume it would be rewritten directly into
>> kernel/power/suspend.c instead of using syscore ops. That was just
>> done to ease future merges while it was out of tree.
>>
>> dmesg is not the solution, it's not a stable api, it would require
>> constant parsing to make sure you didn't miss data before it fell out
>> of the ringbuffer, and sched_clock often does not (and should not)
>> tick during suspend, so printk times don't show the time spent
>> suspended.
>
> If timing info printed is invalid, that should be fixed.

It's not invalid, its just not measuring the time that is useful here.
printk is measuring something similar to (but not exactly the same
as) CLOCK_MONOTONIC, but this is measuring something similar to
CLOCK_BOOTTIME.

2013-05-02 18:58:27

by John Stultz

[permalink] [raw]
Subject: Re: [RFC PATCH] power: Add option to log time spent in suspend

On 05/02/2013 11:29 AM, Colin Cross wrote:
> On Thu, May 2, 2013 at 5:27 AM, Pavel Machek <[email protected]> wrote:
>>
>>> dmesg is not the solution, it's not a stable api, it would require
>>> constant parsing to make sure you didn't miss data before it fell out
>>> of the ringbuffer, and sched_clock often does not (and should not)
>>> tick during suspend, so printk times don't show the time spent
>>> suspended.
>> If timing info printed is invalid, that should be fixed.
> It's not invalid, its just not measuring the time that is useful here.
> printk is measuring something similar to (but not exactly the same
> as) CLOCK_MONOTONIC, but this is measuring something similar to
> CLOCK_BOOTTIME.

One thing I'm curious about with this patch, since we measure and keep
track of suspend time in the timekeeping core, there may be enough
justification to adding this sort of debugging info the timekeeping core
itself.

Although what is being tracked with this driver isn't just total time in
suspend, but looks like more of a logarithmic histogram of suspend
times. Is there any context you can provide as to why that particular
format was chosen (which would could be added to help improve the commit
message)?

thanks
-john

2013-05-02 19:11:53

by Colin Cross

[permalink] [raw]
Subject: Re: [RFC PATCH] power: Add option to log time spent in suspend

On Thu, May 2, 2013 at 11:58 AM, John Stultz <[email protected]> wrote:
> On 05/02/2013 11:29 AM, Colin Cross wrote:
>>
>> On Thu, May 2, 2013 at 5:27 AM, Pavel Machek <[email protected]> wrote:
>>>
>>>
>>>> dmesg is not the solution, it's not a stable api, it would require
>>>> constant parsing to make sure you didn't miss data before it fell out
>>>> of the ringbuffer, and sched_clock often does not (and should not)
>>>> tick during suspend, so printk times don't show the time spent
>>>> suspended.
>>>
>>> If timing info printed is invalid, that should be fixed.
>>
>> It's not invalid, its just not measuring the time that is useful here.
>> printk is measuring something similar to (but not exactly the same
>> as) CLOCK_MONOTONIC, but this is measuring something similar to
>> CLOCK_BOOTTIME.
>
>
> One thing I'm curious about with this patch, since we measure and keep track
> of suspend time in the timekeeping core, there may be enough justification
> to adding this sort of debugging info the timekeeping core itself.
>
> Although what is being tracked with this driver isn't just total time in
> suspend, but looks like more of a logarithmic histogram of suspend times. Is
> there any context you can provide as to why that particular format was
> chosen (which would could be added to help improve the commit message)?

The idea was to keep a minimal amount of data while still being able
to see how often the device is waking up.