2024-06-06 08:52:51

by Khasnis Soumya

[permalink] [raw]
Subject: [PATCH v3] driver core: Add timeout for device shutdown

The device shutdown callbacks invoked during shutdown/reboot
are prone to errors depending on the device state or mishandling
by one or more driver. In order to prevent a device hang in such
scenarios, we bail out after a timeout while dumping a meaningful
call trace of the shutdown callback to kernel logs, which blocks
the shutdown or reboot process.

Signed-off-by: Soumya Khasnis <[email protected]>
Signed-off-by: Srinavasa Nagaraju <[email protected]>
---
Changes in v3:
-fix review comments
-updated commit message

drivers/base/Kconfig | 18 ++++++++++++++++++
drivers/base/base.h | 8 ++++++++
drivers/base/core.c | 40 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 66 insertions(+)

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 2b8fd6bb7da0..342d3f87a404 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -243,3 +243,21 @@ config FW_DEVLINK_SYNC_STATE_TIMEOUT
work on.

endmenu
+
+config DEVICE_SHUTDOWN_TIMEOUT
+ bool "device shutdown timeout"
+ default y
+ help
+ Enable timeout for device shutdown. In case of device shutdown is
+ broken or device is not responding, system shutdown or restart may hang.
+ This timeout handles such situation and triggers emergency_restart or
+ machine_power_off. Also dumps call trace of shutdown process.
+
+
+config DEVICE_SHUTDOWN_TIMEOUT_SEC
+ int "device shutdown timeout in seconds"
+ range 10 60
+ default 10
+ depends on DEVICE_SHUTDOWN_TIMEOUT
+ help
+ sets time for device shutdown timeout in seconds
diff --git a/drivers/base/base.h b/drivers/base/base.h
index 0738ccad08b2..97eea57a8868 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -243,3 +243,11 @@ static inline int devtmpfs_delete_node(struct device *dev) { return 0; }

void software_node_notify(struct device *dev);
void software_node_notify_remove(struct device *dev);
+
+#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
+struct device_shutdown_timeout {
+ struct timer_list timer;
+ struct task_struct *task;
+};
+#define SHUTDOWN_TIMEOUT CONFIG_DEVICE_SHUTDOWN_TIMEOUT_SEC
+#endif
diff --git a/drivers/base/core.c b/drivers/base/core.c
index b93f3c5716ae..dab455054a80 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -35,6 +35,12 @@
#include "base.h"
#include "physical_location.h"
#include "power/power.h"
+#include <linux/sched/debug.h>
+#include <linux/reboot.h>
+
+#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
+struct device_shutdown_timeout devs_shutdown;
+#endif

/* Device links support. */
static LIST_HEAD(deferred_sync);
@@ -4799,6 +4805,38 @@ int device_change_owner(struct device *dev, kuid_t kuid, kgid_t kgid)
}
EXPORT_SYMBOL_GPL(device_change_owner);

+#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
+static void device_shutdown_timeout_handler(struct timer_list *t)
+{
+ pr_emerg("**** device shutdown timeout ****\n");
+ show_stack(devs_shutdown.task, NULL, KERN_EMERG);
+ if (system_state == SYSTEM_RESTART)
+ emergency_restart();
+ else
+ machine_power_off();
+}
+
+static void device_shutdown_timer_set(void)
+{
+ devs_shutdown.task = current;
+ timer_setup(&devs_shutdown.timer, device_shutdown_timeout_handler, 0);
+ devs_shutdown.timer.expires = jiffies + SHUTDOWN_TIMEOUT * HZ;
+ add_timer(&devs_shutdown.timer);
+}
+
+static void device_shutdown_timer_clr(void)
+{
+ del_timer(&devs_shutdown.timer);
+}
+#else
+static inline void device_shutdown_timer_set(void)
+{
+}
+static inline void device_shutdown_timer_clr(void)
+{
+}
+#endif
+
/**
* device_shutdown - call ->shutdown() on each device to shutdown.
*/
@@ -4810,6 +4848,7 @@ void device_shutdown(void)
device_block_probing();

cpufreq_suspend();
+ device_shutdown_timer_set();

spin_lock(&devices_kset->list_lock);
/*
@@ -4869,6 +4908,7 @@ void device_shutdown(void)
spin_lock(&devices_kset->list_lock);
}
spin_unlock(&devices_kset->list_lock);
+ device_shutdown_timer_clr();
}

/*
--
2.40.0



2024-06-06 15:53:48

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v3] driver core: Add timeout for device shutdown

On 06/06/2024 10:50, Soumya Khasnis wrote:
> The device shutdown callbacks invoked during shutdown/reboot
> are prone to errors depending on the device state or mishandling
> by one or more driver. In order to prevent a device hang in such
> scenarios, we bail out after a timeout while dumping a meaningful
> call trace of the shutdown callback to kernel logs, which blocks
> the shutdown or reboot process.

Is that not somehow already achieved by the watchdog mechanism ?

> Signed-off-by: Soumya Khasnis <[email protected]>
> Signed-off-by: Srinavasa Nagaraju <[email protected]>
> ---
> Changes in v3:
> -fix review comments
> -updated commit message
>
> drivers/base/Kconfig | 18 ++++++++++++++++++
> drivers/base/base.h | 8 ++++++++
> drivers/base/core.c | 40 ++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 66 insertions(+)
>
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index 2b8fd6bb7da0..342d3f87a404 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -243,3 +243,21 @@ config FW_DEVLINK_SYNC_STATE_TIMEOUT
> work on.
>
> endmenu
> +
> +config DEVICE_SHUTDOWN_TIMEOUT
> + bool "device shutdown timeout"
> + default y
> + help
> + Enable timeout for device shutdown. In case of device shutdown is
> + broken or device is not responding, system shutdown or restart may hang.
> + This timeout handles such situation and triggers emergency_restart or
> + machine_power_off. Also dumps call trace of shutdown process.
> +
> +
> +config DEVICE_SHUTDOWN_TIMEOUT_SEC
> + int "device shutdown timeout in seconds"
> + range 10 60
> + default 10

How do you know the shutdown time is between this range?

What about large systems ?

> + depends on DEVICE_SHUTDOWN_TIMEOUT
> + help
> + sets time for device shutdown timeout in seconds
> diff --git a/drivers/base/base.h b/drivers/base/base.h
> index 0738ccad08b2..97eea57a8868 100644
> --- a/drivers/base/base.h
> +++ b/drivers/base/base.h
> @@ -243,3 +243,11 @@ static inline int devtmpfs_delete_node(struct device *dev) { return 0; }
>
> void software_node_notify(struct device *dev);
> void software_node_notify_remove(struct device *dev);
> +
> +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> +struct device_shutdown_timeout {
> + struct timer_list timer;
> + struct task_struct *task;
> +};
> +#define SHUTDOWN_TIMEOUT CONFIG_DEVICE_SHUTDOWN_TIMEOUT_SEC
> +#endif
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index b93f3c5716ae..dab455054a80 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -35,6 +35,12 @@
> #include "base.h"
> #include "physical_location.h"
> #include "power/power.h"
> +#include <linux/sched/debug.h>
> +#include <linux/reboot.h>
> +
> +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> +struct device_shutdown_timeout devs_shutdown;
> +#endif
>
> /* Device links support. */
> static LIST_HEAD(deferred_sync);
> @@ -4799,6 +4805,38 @@ int device_change_owner(struct device *dev, kuid_t kuid, kgid_t kgid)
> }
> EXPORT_SYMBOL_GPL(device_change_owner);
>
> +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> +static void device_shutdown_timeout_handler(struct timer_list *t)
> +{
> + pr_emerg("**** device shutdown timeout ****\n");
> + show_stack(devs_shutdown.task, NULL, KERN_EMERG);
> + if (system_state == SYSTEM_RESTART)
> + emergency_restart();
> + else
> + machine_power_off();
> +}

So if one device is misbehaving, all the others shutdown callbacks are
skipped with emergency halt/reboot ? That is prone to break the system, no?

> +static void device_shutdown_timer_set(void)
> +{
> + devs_shutdown.task = current;
> + timer_setup(&devs_shutdown.timer, device_shutdown_timeout_handler, 0);
> + devs_shutdown.timer.expires = jiffies + SHUTDOWN_TIMEOUT * HZ;
> + add_timer(&devs_shutdown.timer);
> +}
> +
> +static void device_shutdown_timer_clr(void)
> +{
> + del_timer(&devs_shutdown.timer);
> +}
> +#else
> +static inline void device_shutdown_timer_set(void)
> +{
> +}
> +static inline void device_shutdown_timer_clr(void)
> +{
> +}
> +#endif
> +
> /**
> * device_shutdown - call ->shutdown() on each device to shutdown.
> */
> @@ -4810,6 +4848,7 @@ void device_shutdown(void)
> device_block_probing();
>
> cpufreq_suspend();
> + device_shutdown_timer_set();
>
> spin_lock(&devices_kset->list_lock);
> /*
> @@ -4869,6 +4908,7 @@ void device_shutdown(void)
> spin_lock(&devices_kset->list_lock);
> }
> spin_unlock(&devices_kset->list_lock);
> + device_shutdown_timer_clr();
> }
>
> /*

--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


2024-06-06 20:52:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3] driver core: Add timeout for device shutdown

On Thu, Jun 06, 2024 at 08:50:03AM +0000, Soumya Khasnis wrote:
> The device shutdown callbacks invoked during shutdown/reboot
> are prone to errors depending on the device state or mishandling
> by one or more driver. In order to prevent a device hang in such
> scenarios, we bail out after a timeout while dumping a meaningful
> call trace of the shutdown callback to kernel logs, which blocks
> the shutdown or reboot process.
>
> Signed-off-by: Soumya Khasnis <[email protected]>
> Signed-off-by: Srinavasa Nagaraju <[email protected]>
> ---
> Changes in v3:
> -fix review comments

What ones? In what way? This really doesn't help, sorry. Would you
like to see this type of response if you were to review other people's
code?

> -updated commit message

In what way?

confused,

greg k-h

2024-06-07 00:04:37

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v3] driver core: Add timeout for device shutdown

Hi Soumya,

kernel test robot noticed the following build warnings:

[auto build test WARNING on driver-core/driver-core-testing]
[also build test WARNING on driver-core/driver-core-next driver-core/driver-core-linus linus/master v6.10-rc2 next-20240606]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Soumya-Khasnis/driver-core-Add-timeout-for-device-shutdown/20240606-165454
base: driver-core/driver-core-testing
patch link: https://lore.kernel.org/r/20240606085003.GA118950%40sony.com
patch subject: [PATCH v3] driver core: Add timeout for device shutdown
config: x86_64-randconfig-121-20240607 (https://download.01.org/0day-ci/archive/20240607/[email protected]/config)
compiler: gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240607/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

sparse warnings: (new ones prefixed by >>)
>> drivers/base/core.c:42:32: sparse: sparse: symbol 'devs_shutdown' was not declared. Should it be static?
drivers/base/core.c: note: in included file (through include/linux/resource_ext.h, include/linux/acpi.h):
include/linux/list.h:83:21: sparse: sparse: self-comparison always evaluates to true
include/linux/list.h:83:21: sparse: sparse: self-comparison always evaluates to true
include/linux/list.h:83:21: sparse: sparse: self-comparison always evaluates to true
include/linux/list.h:83:21: sparse: sparse: self-comparison always evaluates to true
include/linux/list.h:83:21: sparse: sparse: self-comparison always evaluates to true
include/linux/list.h:83:21: sparse: sparse: self-comparison always evaluates to true
include/linux/list.h:83:21: sparse: sparse: self-comparison always evaluates to true
include/linux/list.h:83:21: sparse: sparse: self-comparison always evaluates to true
include/linux/list.h:83:21: sparse: sparse: self-comparison always evaluates to true
include/linux/list.h:83:21: sparse: sparse: self-comparison always evaluates to true

vim +/devs_shutdown +42 drivers/base/core.c

40
41 #ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> 42 struct device_shutdown_timeout devs_shutdown;
43 #endif
44

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2024-06-07 11:38:50

by Khasnis Soumya

[permalink] [raw]
Subject: Re: [PATCH v3] driver core: Add timeout for device shutdown

On Thu, Jun 06, 2024 at 05:23:19PM +0200, Daniel Lezcano wrote:
> On 06/06/2024 10:50, Soumya Khasnis wrote:
> > The device shutdown callbacks invoked during shutdown/reboot
> > are prone to errors depending on the device state or mishandling
> > by one or more driver. In order to prevent a device hang in such
> > scenarios, we bail out after a timeout while dumping a meaningful
> > call trace of the shutdown callback to kernel logs, which blocks
> > the shutdown or reboot process.
>
> Is that not somehow already achieved by the watchdog mechanism ?
The hard or software watchdog enabled by config_lockup_detector couldn’t
detect the cases when stalled on IO wait (wait_for_completion/io)

>
> > Signed-off-by: Soumya Khasnis <[email protected]>
> > Signed-off-by: Srinavasa Nagaraju <[email protected]>
> > ---
> > Changes in v3:
> > -fix review comments
> > -updated commit message
> >
> > drivers/base/Kconfig | 18 ++++++++++++++++++
> > drivers/base/base.h | 8 ++++++++
> > drivers/base/core.c | 40 ++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 66 insertions(+)
> >
> > diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> > index 2b8fd6bb7da0..342d3f87a404 100644
> > --- a/drivers/base/Kconfig
> > +++ b/drivers/base/Kconfig
> > @@ -243,3 +243,21 @@ config FW_DEVLINK_SYNC_STATE_TIMEOUT
> > work on.
> >
> > endmenu
> > +
> > +config DEVICE_SHUTDOWN_TIMEOUT
> > + bool "device shutdown timeout"
> > + default y
> > + help
> > + Enable timeout for device shutdown. In case of device shutdown is
> > + broken or device is not responding, system shutdown or restart may hang.
> > + This timeout handles such situation and triggers emergency_restart or
> > + machine_power_off. Also dumps call trace of shutdown process.
> > +
> > +
> > +config DEVICE_SHUTDOWN_TIMEOUT_SEC
> > + int "device shutdown timeout in seconds"
> > + range 10 60
> > + default 10
>
> How do you know the shutdown time is between this range?
>
> What about large systems ?
Agree it is difficult to set single timeout for all device.
This range I have based on consumer device where response time cannot be more.
But still as you mentioned we can not make this configuration by default "true/y"
with some fixed range. I will change patch to set this configuration default to
"false/n" as before, and will also remove range.

>
> > + depends on DEVICE_SHUTDOWN_TIMEOUT
> > + help
> > + sets time for device shutdown timeout in seconds
> > diff --git a/drivers/base/base.h b/drivers/base/base.h
> > index 0738ccad08b2..97eea57a8868 100644
> > --- a/drivers/base/base.h
> > +++ b/drivers/base/base.h
> > @@ -243,3 +243,11 @@ static inline int devtmpfs_delete_node(struct device *dev) { return 0; }
> >
> > void software_node_notify(struct device *dev);
> > void software_node_notify_remove(struct device *dev);
> > +
> > +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> > +struct device_shutdown_timeout {
> > + struct timer_list timer;
> > + struct task_struct *task;
> > +};
> > +#define SHUTDOWN_TIMEOUT CONFIG_DEVICE_SHUTDOWN_TIMEOUT_SEC
> > +#endif
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index b93f3c5716ae..dab455054a80 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -35,6 +35,12 @@
> > #include "base.h"
> > #include "physical_location.h"
> > #include "power/power.h"
> > +#include <linux/sched/debug.h>
> > +#include <linux/reboot.h>
> > +
> > +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> > +struct device_shutdown_timeout devs_shutdown;
> > +#endif
> >
> > /* Device links support. */
> > static LIST_HEAD(deferred_sync);
> > @@ -4799,6 +4805,38 @@ int device_change_owner(struct device *dev, kuid_t kuid, kgid_t kgid)
> > }
> > EXPORT_SYMBOL_GPL(device_change_owner);
> >
> > +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> > +static void device_shutdown_timeout_handler(struct timer_list *t)
> > +{
> > + pr_emerg("**** device shutdown timeout ****\n");
> > + show_stack(devs_shutdown.task, NULL, KERN_EMERG);
> > + if (system_state == SYSTEM_RESTART)
> > + emergency_restart();
> > + else
> > + machine_power_off();
> > +}
>
> So if one device is misbehaving, all the others shutdown callbacks are
> skipped with emergency halt/reboot ? That is prone to break the system, no?
Skipping other callback may not cause system break, but emergency shutdown or
reboot is better then leave system in hung state. That is the main functionality
of this patch.
>
> > +static void device_shutdown_timer_set(void)
> > +{
> > + devs_shutdown.task = current;
> > + timer_setup(&devs_shutdown.timer, device_shutdown_timeout_handler, 0);
> > + devs_shutdown.timer.expires = jiffies + SHUTDOWN_TIMEOUT * HZ;
> > + add_timer(&devs_shutdown.timer);
> > +}
> > +
> > +static void device_shutdown_timer_clr(void)
> > +{
> > + del_timer(&devs_shutdown.timer);
> > +}
> > +#else
> > +static inline void device_shutdown_timer_set(void)
> > +{
> > +}
> > +static inline void device_shutdown_timer_clr(void)
> > +{
> > +}
> > +#endif
> > +
> > /**
> > * device_shutdown - call ->shutdown() on each device to shutdown.
> > */
> > @@ -4810,6 +4848,7 @@ void device_shutdown(void)
> > device_block_probing();
> >
> > cpufreq_suspend();
> > + device_shutdown_timer_set();
> >
> > spin_lock(&devices_kset->list_lock);
> > /*
> > @@ -4869,6 +4908,7 @@ void device_shutdown(void)
> > spin_lock(&devices_kset->list_lock);
> > }
> > spin_unlock(&devices_kset->list_lock);
> > + device_shutdown_timer_clr();
> > }
> >
> > /*
>
> --
> <https://urldefense.com/v3/__http://www.linaro.org/__;!!JmoZiZGBv3RvKRSx!6XWB4gl8L3rRMPtMmiqJdKcGhAMKhZ9UVvLyqOiGr3vHiQzlgwInwY3OVNNzXZsLONbeCLZZ-CY-APJdHGYO7DpNrCqk$ [linaro[.]org]> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro: <https://urldefense.com/v3/__http://www.facebook.com/pages/Linaro__;!!JmoZiZGBv3RvKRSx!6XWB4gl8L3rRMPtMmiqJdKcGhAMKhZ9UVvLyqOiGr3vHiQzlgwInwY3OVNNzXZsLONbeCLZZ-CY-APJdHGYO7AtMvPiK$ [facebook[.]com]> Facebook |
> <https://urldefense.com/v3/__http://twitter.com/*!/linaroorg__;Iw!!JmoZiZGBv3RvKRSx!6XWB4gl8L3rRMPtMmiqJdKcGhAMKhZ9UVvLyqOiGr3vHiQzlgwInwY3OVNNzXZsLONbeCLZZ-CY-APJdHGYO7Imo3W2M$ [twitter[.]com]> Twitter |
> <https://urldefense.com/v3/__http://www.linaro.org/linaro-blog/__;!!JmoZiZGBv3RvKRSx!6XWB4gl8L3rRMPtMmiqJdKcGhAMKhZ9UVvLyqOiGr3vHiQzlgwInwY3OVNNzXZsLONbeCLZZ-CY-APJdHGYO7DxWnKe3$ [linaro[.]org]> Blog
>

2024-06-07 11:50:53

by Khasnis Soumya

[permalink] [raw]
Subject: Re: [PATCH v3] driver core: Add timeout for device shutdown

On Thu, Jun 06, 2024 at 10:52:14PM +0200, Greg KH wrote:
> On Thu, Jun 06, 2024 at 08:50:03AM +0000, Soumya Khasnis wrote:
> > The device shutdown callbacks invoked during shutdown/reboot
> > are prone to errors depending on the device state or mishandling
> > by one or more driver. In order to prevent a device hang in such
> > scenarios, we bail out after a timeout while dumping a meaningful
> > call trace of the shutdown callback to kernel logs, which blocks
> > the shutdown or reboot process.
> >
> > Signed-off-by: Soumya Khasnis <[email protected]>
> > Signed-off-by: Srinavasa Nagaraju <[email protected]>
> > ---
> > Changes in v3:
> > -fix review comments
Sorry for inconvenience caused Greg, i will update details properly.
here are changes i did in patchset-3(v3)
1. added help text
2. set configuration by default "y"
3. added range for timeout value(DEVICE_SHUTDOWN_TIMEOUT_SEC)
4. moved #define's to base.h file
5. moved timeout functionality to device_shutdown() driver/base/core.c from reboot.c

>
> What ones? In what way? This really doesn't help, sorry. Would you
> like to see this type of response if you were to review other people's
> code?
>
> > -updated commit message
>
1. added information of where call trace is logged.
2. changed patch subject from "reboot:" to "driver core:"

> In what way?
>
> confused,
>
Sorry again for inconvenience caused.
> greg k-h
Thanks and Regards,
soumya.