Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754578Ab3JJL4K (ORCPT ); Thu, 10 Oct 2013 07:56:10 -0400 Received: from mail-vb0-f42.google.com ([209.85.212.42]:36327 "EHLO mail-vb0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753035Ab3JJL4I (ORCPT ); Thu, 10 Oct 2013 07:56:08 -0400 MIME-Version: 1.0 In-Reply-To: <1572166.uWxBjPOJOn@vostro.rjw.lan> References: <1380148313-763-1-git-send-email-zoran.markovic@linaro.org> <1572166.uWxBjPOJOn@vostro.rjw.lan> Date: Thu, 10 Oct 2013 13:56:06 +0200 Message-ID: Subject: Re: [RFC PATCHv4] drivers: power: Detect device suspend/resume lockup and log event in pstore. From: Ulf Hansson To: "Rafael J. Wysocki" Cc: Zoran Markovic , "linux-kernel@vger.kernel.org" , "linux-pm@vger.kernel.org" , Benoit Goby , Android Kernel Team , Colin Cross , Todd Poynor , San Mehat , John Stultz , Pavel Machek , Len Brown , Greg Kroah-Hartman Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8450 Lines: 252 On 10 October 2013 02:08, Rafael J. Wysocki wrote: > On Wednesday, September 25, 2013 03:31:53 PM Zoran Markovic wrote: >> From: Benoit Goby >> >> Rather than hard-lock the kernel, dump the suspend/resume thread stack and >> panic() to capture a message in pstore when a driver takes too long to >> suspend/resume. Default suspend/resume watchdog timeout is set to 12 >> seconds to be longer than the usbhid 10 second timeout, but could be >> changed at compile time. >> >> Exclude from the watchdog the time spent waiting for children that >> are resumed asynchronously and time every device, whether or not they >> resumed synchronously. >> >> This patch is targeted for mobile devices where a suspend/resume lockup >> could cause a system reboot. Information about failing device can be >> retrieved in subsequent boot session by mounting pstore and inspecting >> the log. Laptops with EFI-enabled pstore could also benefit from >> this feature. >> >> The hardware watchdog timer is likely suspended during this time and >> couldn't be relied upon. The soft-lockup detector would eventually tell >> that tasks are not scheduled, but would provide little context as to why. >> The patch hence uses system timer and assumes it is still active while the >> devices are suspended/resumed. >> >> This feature can be enabled/disabled during kernel configuration. >> >> Cc: Android Kernel Team >> Cc: Colin Cross >> Cc: Todd Poynor >> Cc: San Mehat >> Cc: Benoit Goby >> Cc: John Stultz >> Cc: Pavel Machek >> Cc: Rafael J. Wysocki >> Cc: Len Brown >> Cc: Greg Kroah-Hartman >> Original-author: San Mehat >> Signed-off-by: Benoit Goby >> [zoran.markovic@linaro.org: Changed printk(KERN_EMERG,...) to pr_emerg(...), >> tweaked commit message. Moved call to dpm_wd_set() before device_lock() in >> device_resume(). Minor changes to add compile-time inclusion of the feature.] >> Signed-off-by: Zoran Markovic > > I wonder if anyone who thinks that this is useful can ACK it? Hi Rafael, For whatever you think it is worth. :-) Acked-by: Ulf Hansson Also note, this feature has been most valuable when we were debugging suspend/resume hangs for the ux500 Android platforms. Kind regards Ulf Hansson > >> --- >> drivers/base/power/main.c | 68 +++++++++++++++++++++++++++++++++++++++++++++ >> kernel/power/Kconfig | 16 +++++++++++ >> 2 files changed, 84 insertions(+) >> >> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c >> index 9f098a8..9b7e6b6 100644 >> --- a/drivers/base/power/main.c >> +++ b/drivers/base/power/main.c >> @@ -30,6 +30,8 @@ >> #include >> #include >> #include >> +#include >> + >> #include "../base.h" >> #include "power.h" >> >> @@ -55,6 +57,12 @@ struct suspend_stats suspend_stats; >> static DEFINE_MUTEX(dpm_list_mtx); >> static pm_message_t pm_transition; >> >> +struct dpm_watchdog { >> + struct device *dev; >> + struct task_struct *tsk; >> + struct timer_list timer; >> +}; >> + >> static int async_error; >> >> static char *pm_verb(int event) >> @@ -390,6 +398,60 @@ static int dpm_run_callback(pm_callback_t cb, struct device *dev, >> return error; >> } >> >> +#ifdef CONFIG_DPM_WD >> +/** >> + * dpm_wd_handler - Driver suspend / resume watchdog handler. >> + * >> + * Called when a driver has timed out suspending or resuming. >> + * There's not much we can do here to recover so panic() to >> + * capture a crash-dump in pstore. >> + */ >> +static void dpm_wd_handler(unsigned long data) > > I'd prefer this to be called dpm_watchdog_handler() and similarly everywhere > below. > >> +{ >> + struct dpm_watchdog *wd = (void *)data; >> + >> + dev_emerg(wd->dev, "**** DPM device timeout ****\n"); >> + show_stack(wd->tsk, NULL); >> + panic("%s %s: unrecoverable failure\n", >> + dev_driver_string(wd->dev), dev_name(wd->dev)); >> +} >> + >> +/** >> + * dpm_wd_set - Enable pm watchdog for given device. >> + * @wd: Watchdog. Must be allocated on the stack. >> + * @dev: Device to handle. >> + */ >> +static void dpm_wd_set(struct dpm_watchdog *wd, struct device *dev) >> +{ >> + struct timer_list *timer = &wd->timer; >> + >> + wd->dev = dev; >> + wd->tsk = get_current(); >> + >> + init_timer_on_stack(timer); >> + /* use same timeout value for both suspend and resume */ >> + timer->expires = jiffies + HZ * CONFIG_DPM_WD_TIMEOUT; >> + timer->function = dpm_wd_handler; >> + timer->data = (unsigned long)wd; >> + add_timer(timer); >> +} >> + >> +/** >> + * dpm_wd_clear - Disable suspend/resume watchdog. >> + * @wd: Watchdog to disable. >> + */ >> +static void dpm_wd_clear(struct dpm_watchdog *wd) >> +{ >> + struct timer_list *timer = &wd->timer; >> + >> + del_timer_sync(timer); >> + destroy_timer_on_stack(timer); >> +} >> +#else >> +#define dpm_wd_set(x, y) >> +#define dpm_wd_clear(x) >> +#endif >> + >> /*------------------------- Resume routines -------------------------*/ >> >> /** >> @@ -576,6 +638,7 @@ static int device_resume(struct device *dev, pm_message_t state, bool async) >> pm_callback_t callback = NULL; >> char *info = NULL; >> int error = 0; >> + struct dpm_watchdog wd; >> >> TRACE_DEVICE(dev); >> TRACE_RESUME(0); >> @@ -584,6 +647,7 @@ static int device_resume(struct device *dev, pm_message_t state, bool async) >> goto Complete; >> >> dpm_wait(dev->parent, async); >> + dpm_wd_set(&wd, dev); >> device_lock(dev); >> >> /* >> @@ -642,6 +706,7 @@ static int device_resume(struct device *dev, pm_message_t state, bool async) >> >> Unlock: >> device_unlock(dev); >> + dpm_wd_clear(&wd); >> >> Complete: >> complete_all(&dev->power.completion); >> @@ -1060,6 +1125,7 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async) >> pm_callback_t callback = NULL; >> char *info = NULL; >> int error = 0; >> + struct dpm_watchdog wd; >> >> dpm_wait_for_children(dev, async); >> >> @@ -1083,6 +1149,7 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async) >> if (dev->power.syscore) >> goto Complete; >> >> + dpm_wd_set(&wd, dev); >> device_lock(dev); >> >> if (dev->pm_domain) { >> @@ -1139,6 +1206,7 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async) >> } >> >> device_unlock(dev); >> + dpm_wd_clear(&wd); >> >> Complete: >> complete_all(&dev->power.completion); >> diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig >> index d444c4e..6a6b763 100644 >> --- a/kernel/power/Kconfig >> +++ b/kernel/power/Kconfig >> @@ -178,6 +178,22 @@ config PM_SLEEP_DEBUG >> def_bool y >> depends on PM_DEBUG && PM_SLEEP >> >> +config DPM_WD > > Please expand this to DPM_WATCHDOG -> > >> + bool "Device suspend/resume watchdog" >> + depends on PM_DEBUG && PSTORE >> + ---help--- >> + Sets up a watchdog timer to capture drivers that are >> + locked up attempting to suspend/resume a device. >> + A detected lockup causes system panic with message >> + captured in pstore device for inspection in subsequent >> + boot session. >> + >> +config DPM_WD_TIMEOUT > > -> and this to DPM_WATCHDOG_TIMEOUT > >> + int "Watchdog timeout in seconds" >> + range 1 120 >> + default 12 >> + depends on DPM_WD >> + >> config PM_TRACE >> bool >> help > > Thanks! > > -- > I speak only for myself. > Rafael J. Wysocki, Intel Open Source Technology Center. > -- > To unsubscribe from this list: send the line "unsubscribe linux-pm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/