2023-11-24 14:54:03

by Oleksij Rempel

[permalink] [raw]
Subject: [PATCH v1 0/3] introduce priority-based shutdown support

Hi,

This patch series introduces support for prioritized device shutdown.
The main goal is to enable prioritization for shutting down specific
devices, particularly crucial in scenarios like power loss where
hardware damage can occur if not handled properly.

Oleksij Rempel (3):
driver core: move core part of device_shutdown() to a separate
function
driver core: introduce prioritized device shutdown sequence
mmc: core: increase shutdown priority for MMC devices

drivers/base/core.c | 157 +++++++++++++++++++++++++++--------------
drivers/mmc/core/bus.c | 2 +
include/linux/device.h | 51 ++++++++++++-
kernel/reboot.c | 4 +-
4 files changed, 157 insertions(+), 57 deletions(-)

--
2.39.2


2023-11-24 14:54:03

by Oleksij Rempel

[permalink] [raw]
Subject: [PATCH v1 2/3] driver core: introduce prioritized device shutdown sequence

This commit revises the device shutdown mechanism to implement a
prioritized shutdown sequence. The new function,
prioritized_device_shutdown, ensures devices are shut down in reverse
order, mirroring the system construction order. Within this process,
devices are shut down based on their assigned priority levels.
Additionally, this patch ensures that a device inherits its shutdown
priority from its parent, maintaining hierarchy integrity. This is
crucial to prevent child nodes of high-priority parents from being
orphaned in the shutdown sequence.

This change is vital in scenarios like power drops with limited backup
energy, where shutdown time is constrained. By prioritizing critical
devices, particularly storage, the focus is on maintaining device
integrity by ensuring they are properly shut down. This approach reduces
the risk of hardware damage and enhances system resilience during
emergency shutdowns.

Signed-off-by: Oleksij Rempel <[email protected]>
---
drivers/base/core.c | 53 ++++++++++++++++++++++++++++++++++--------
include/linux/device.h | 51 +++++++++++++++++++++++++++++++++++++++-
kernel/reboot.c | 4 ++--
3 files changed, 95 insertions(+), 13 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 0f5646a097d3..5b6989e9ae4d 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3554,9 +3554,13 @@ int device_add(struct device *dev)
if (kobj)
dev->kobj.parent = kobj;

- /* use parent numa_node */
- if (parent && (dev_to_node(dev) == NUMA_NO_NODE))
- set_dev_node(dev, dev_to_node(parent));
+ if (parent) {
+ /* use parent numa_node */
+ if (dev_to_node(dev) == NUMA_NO_NODE)
+ set_dev_node(dev, dev_to_node(parent));
+
+ dev_inherit_shutdown_priority(dev, parent);
+ }

/* first, register with generic layer. */
/* we require the name to be set before, and pass NULL */
@@ -4553,6 +4557,8 @@ int device_move(struct device *dev, struct device *new_parent,
klist_add_tail(&dev->p->knode_parent,
&new_parent->p->klist_children);
set_dev_node(dev, dev_to_node(new_parent));
+
+ dev_inherit_shutdown_priority(dev, new_parent);
}

if (dev->class) {
@@ -4568,6 +4574,8 @@ int device_move(struct device *dev, struct device *new_parent,
klist_add_tail(&dev->p->knode_parent,
&old_parent->p->klist_children);
set_dev_node(dev, dev_to_node(old_parent));
+
+ dev_inherit_shutdown_priority(dev, old_parent);
}
}
cleanup_glue_dir(dev, new_parent_kobj);
@@ -4781,28 +4789,53 @@ static void device_shutdown_one_locked(struct device *dev)
}

/**
- * device_shutdown - call ->shutdown() on each device to shutdown.
+ * prioritized_device_shutdown - shut down devices in reverse and priority order
+ *
+ * This function is designed to shut down devices in a manner that mirrors the
+ * reverse order of system construction. It iterates over the devices in
+ * reverse, ensuring that the system is torn down in a similar order to how it
+ * was set up. Importantly, within this reverse order, the function also employs
+ * a device shutdown priority mechanism. This prioritization ensures that
+ * critical devices are shut down in an orderly and safe manner before less
+ * critical devices.
+ *
+ * This prioritized and reverse order shutdown is particularly crucial in
+ * emergency scenarios where there is a limited time window for shutdown, such
+ * as in the event of a power drop backed by limited energy source like
+ * capacitors. It ensures that essential systems and data are secured first,
+ * reducing the risk of data loss and system instability.
*/
-void device_shutdown(void)
+void prioritized_device_shutdown(void)
{
- struct device *dev;
+ enum device_shutdown_priority current_prio = DEVICE_SHUTDOWN_PRIO_MAX;

wait_for_device_probe();
device_block_probing();

cpufreq_suspend();

- spin_lock(&devices_kset->list_lock);
/*
* Walk the devices list backward, shutting down each in turn.
* Beware that device unplug events may also start pulling
* devices offline, even as the system is shutting down.
*/
+ spin_lock(&devices_kset->list_lock);
while (!list_empty(&devices_kset->list)) {
- dev = list_entry(devices_kset->list.prev, struct device,
- kobj.entry);
+ struct device *dev, *n;
+ enum device_shutdown_priority next_prio = 0;
+
+ list_for_each_entry_safe_reverse(dev, n, &devices_kset->list,
+ kobj.entry) {
+ enum device_shutdown_priority dev_prio;
+
+ dev_prio = dev_get_shutdown_priority(dev);
+ if (dev_prio >= current_prio)
+ device_shutdown_one_locked(dev);
+ else if (dev_prio > next_prio)
+ next_prio = dev_prio;
+ }

- device_shutdown_one_locked(dev);
+ current_prio = next_prio;
}
spin_unlock(&devices_kset->list_lock);
}
diff --git a/include/linux/device.h b/include/linux/device.h
index d7a72a8749ea..1c43a6326417 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -580,6 +580,33 @@ enum device_physical_location_horizontal_position {
DEVICE_HORI_POS_RIGHT,
};

+/**
+ * enum device_shutdown_priority - Defines device shutdown priorities
+ *
+ * This enum defines different priority levels for device shutdown
+ * during a system power-off sequence. The priorities ensure that critical
+ * devices are shut down in an orderly and safe manner before less critical
+ * devices. Each device in the system is assigned a priority level, which
+ * determines the order in which it is shut down.
+ *
+ * @DEVICE_SHUTDOWN_PRIO_DEFAULT: The default shutdown priority for devices
+ * that do not require special handling or have no specific shutdown order.
+ * This is the lowest priority level.
+ *
+ * @DEVICE_SHUTDOWN_PRIO_STORAGE: Priority level for storage devices such as
+ * hard drives, SSDs, and SD cards. These devices often need to be shut down
+ * early to ensure data integrity and prevent corruption.
+ *
+ * @DEVICE_SHUTDOWN_PRIO_MAX: Represents the highest possible priority level
+ * for device shutdown. This is used as an upper bound for the priority range
+ * and typically not assigned to actual devices.
+ */
+enum device_shutdown_priority {
+ DEVICE_SHUTDOWN_PRIO_DEFAULT = 0,
+ DEVICE_SHUTDOWN_PRIO_STORAGE,
+ DEVICE_SHUTDOWN_PRIO_MAX,
+};
+
/**
* struct device_physical_location - Device data related to physical location
* of the device connection point.
@@ -693,6 +720,8 @@ struct device_physical_location {
* and optionall (if the coherent mask is large enough) also
* for dma allocations. This flag is managed by the dma ops
* instance from ->dma_supported.
+ * @shutdown_priority: Shutdown ordering priority for the device.
+ * @inher_shutdown_priority: Inherited shutdown ordering priority from parent.
*
* At the lowest level, every device in a Linux system is represented by an
* instance of struct device. The device structure contains the information
@@ -805,6 +834,8 @@ struct device {
#ifdef CONFIG_DMA_OPS_BYPASS
bool dma_ops_bypass : 1;
#endif
+ enum device_shutdown_priority shutdown_priority;
+ enum device_shutdown_priority inher_shutdown_priority;
};

/**
@@ -1046,6 +1077,24 @@ static inline bool dev_removable_is_valid(struct device *dev)
return dev->removable != DEVICE_REMOVABLE_NOT_SUPPORTED;
}

+static inline void dev_set_shutdown_priority(struct device *dev,
+ enum device_shutdown_priority priority)
+{
+ dev->shutdown_priority = priority;
+}
+
+static inline enum device_shutdown_priority
+dev_get_shutdown_priority(struct device *dev)
+{
+ return max(dev->shutdown_priority, dev->inher_shutdown_priority);
+}
+
+static inline void dev_inherit_shutdown_priority(struct device *dev,
+ struct device *parent)
+{
+ dev->inher_shutdown_priority = dev_get_shutdown_priority(parent);
+}
+
/*
* High level routines for use by the bus drivers
*/
@@ -1236,7 +1285,7 @@ static inline int devtmpfs_mount(void) { return 0; }
#endif

/* drivers/base/power/shutdown.c */
-void device_shutdown(void);
+void prioritized_device_shutdown(void);

/* debugging and troubleshooting/diagnostic helpers. */
const char *dev_driver_string(const struct device *dev);
diff --git a/kernel/reboot.c b/kernel/reboot.c
index 395a0ea3c7a8..ac5820020c6a 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -85,7 +85,7 @@ void kernel_restart_prepare(char *cmd)
blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
system_state = SYSTEM_RESTART;
usermodehelper_disable();
- device_shutdown();
+ prioritized_device_shutdown();
}

/**
@@ -285,7 +285,7 @@ static void kernel_shutdown_prepare(enum system_states state)
(state == SYSTEM_HALT) ? SYS_HALT : SYS_POWER_OFF, NULL);
system_state = state;
usermodehelper_disable();
- device_shutdown();
+ prioritized_device_shutdown();
}
/**
* kernel_halt - halt the system
--
2.39.2

2023-11-24 14:54:21

by Oleksij Rempel

[permalink] [raw]
Subject: [PATCH v1 1/3] driver core: move core part of device_shutdown() to a separate function

Split the device_shutdown() as a preparation for the prioritization
support.

Signed-off-by: Oleksij Rempel <[email protected]>
---
drivers/base/core.c | 110 +++++++++++++++++++++++++-------------------
1 file changed, 63 insertions(+), 47 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 67ba592afc77..0f5646a097d3 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -4719,12 +4719,73 @@ int device_change_owner(struct device *dev, kuid_t kuid, kgid_t kgid)
}
EXPORT_SYMBOL_GPL(device_change_owner);

+/**
+ * device_shutdown_one - shut down a device
+ * @dev: device to shut down
+ *
+ * It is called with the device lock held.
+ *
+ * The device must be on the devices_kset list.
+ */
+static void device_shutdown_one_locked(struct device *dev)
+{
+ struct device *parent;
+
+ lockdep_assert_held(&devices_kset->list_lock);
+ /*
+ * hold reference count of device's parent to
+ * prevent it from being freed because parent's
+ * lock is to be held
+ */
+ parent = get_device(dev->parent);
+ get_device(dev);
+ /*
+ * Make sure the device is off the kset list, in the
+ * event that dev->*->shutdown() doesn't remove it.
+ */
+ list_del_init(&dev->kobj.entry);
+ spin_unlock(&devices_kset->list_lock);
+
+ /* hold lock to avoid race with probe/release */
+ if (parent)
+ device_lock(parent);
+ device_lock(dev);
+
+ /* Don't allow any more runtime suspends */
+ pm_runtime_get_noresume(dev);
+ pm_runtime_barrier(dev);
+
+ if (dev->class && dev->class->shutdown_pre) {
+ if (initcall_debug)
+ dev_info(dev, "shutdown_pre\n");
+ dev->class->shutdown_pre(dev);
+ }
+ if (dev->bus && dev->bus->shutdown) {
+ if (initcall_debug)
+ dev_info(dev, "shutdown\n");
+ dev->bus->shutdown(dev);
+ } else if (dev->driver && dev->driver->shutdown) {
+ if (initcall_debug)
+ dev_info(dev, "shutdown\n");
+ dev->driver->shutdown(dev);
+ }
+
+ device_unlock(dev);
+ if (parent)
+ device_unlock(parent);
+
+ put_device(dev);
+ put_device(parent);
+
+ spin_lock(&devices_kset->list_lock);
+}
+
/**
* device_shutdown - call ->shutdown() on each device to shutdown.
*/
void device_shutdown(void)
{
- struct device *dev, *parent;
+ struct device *dev;

wait_for_device_probe();
device_block_probing();
@@ -4741,52 +4802,7 @@ void device_shutdown(void)
dev = list_entry(devices_kset->list.prev, struct device,
kobj.entry);

- /*
- * hold reference count of device's parent to
- * prevent it from being freed because parent's
- * lock is to be held
- */
- parent = get_device(dev->parent);
- get_device(dev);
- /*
- * Make sure the device is off the kset list, in the
- * event that dev->*->shutdown() doesn't remove it.
- */
- list_del_init(&dev->kobj.entry);
- spin_unlock(&devices_kset->list_lock);
-
- /* hold lock to avoid race with probe/release */
- if (parent)
- device_lock(parent);
- device_lock(dev);
-
- /* Don't allow any more runtime suspends */
- pm_runtime_get_noresume(dev);
- pm_runtime_barrier(dev);
-
- if (dev->class && dev->class->shutdown_pre) {
- if (initcall_debug)
- dev_info(dev, "shutdown_pre\n");
- dev->class->shutdown_pre(dev);
- }
- if (dev->bus && dev->bus->shutdown) {
- if (initcall_debug)
- dev_info(dev, "shutdown\n");
- dev->bus->shutdown(dev);
- } else if (dev->driver && dev->driver->shutdown) {
- if (initcall_debug)
- dev_info(dev, "shutdown\n");
- dev->driver->shutdown(dev);
- }
-
- device_unlock(dev);
- if (parent)
- device_unlock(parent);
-
- put_device(dev);
- put_device(parent);
-
- spin_lock(&devices_kset->list_lock);
+ device_shutdown_one_locked(dev);
}
spin_unlock(&devices_kset->list_lock);
}
--
2.39.2

2023-11-24 15:07:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] driver core: move core part of device_shutdown() to a separate function

On Fri, Nov 24, 2023 at 03:53:36PM +0100, Oleksij Rempel wrote:
> Split the device_shutdown() as a preparation for the prioritization
> support.

Nit, this is going to need a lot more description, as at this point in
time, we do not know what "prioritization support" is.

>
> Signed-off-by: Oleksij Rempel <[email protected]>
> ---
> drivers/base/core.c | 110 +++++++++++++++++++++++++-------------------
> 1 file changed, 63 insertions(+), 47 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 67ba592afc77..0f5646a097d3 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -4719,12 +4719,73 @@ int device_change_owner(struct device *dev, kuid_t kuid, kgid_t kgid)
> }
> EXPORT_SYMBOL_GPL(device_change_owner);
>
> +/**

This doesn't need kernel-doc for a static function, right?

> + * device_shutdown_one - shut down a device
> + * @dev: device to shut down
> + *
> + * It is called with the device lock held.
> + *
> + * The device must be on the devices_kset list.
> + */
> +static void device_shutdown_one_locked(struct device *dev)
> +{
> + struct device *parent;
> +
> + lockdep_assert_held(&devices_kset->list_lock);
> + /*
> + * hold reference count of device's parent to
> + * prevent it from being freed because parent's
> + * lock is to be held
> + */
> + parent = get_device(dev->parent);
> + get_device(dev);
> + /*

As you are moving the code, might as well make it a bit prettier and add
proper line breaks before the comments please.

thanks,

greg k-h

2023-11-24 15:09:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Fri, Nov 24, 2023 at 03:53:35PM +0100, Oleksij Rempel wrote:
> Hi,
>
> This patch series introduces support for prioritized device shutdown.
> The main goal is to enable prioritization for shutting down specific
> devices, particularly crucial in scenarios like power loss where
> hardware damage can occur if not handled properly.

Oh fun, now we will have drivers and subsystems fighting over their
priority, with each one insisting that they are the most important!

/s

Anyway, this is ripe for problems and issues in the long-run, what is so
special about this hardware that it can not just shutdown in the
existing order that it has to be "first" over everyone else? What
exactly does this prevent and what devices are requiring this?

And most importantly, what has changed in the past 20+ years to
suddenly require this new functionality and how does any other operating
system handle it?

thanks,

greg k-h

2023-11-24 15:10:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v1 2/3] driver core: introduce prioritized device shutdown sequence

On Fri, Nov 24, 2023 at 03:53:37PM +0100, Oleksij Rempel wrote:
> This commit revises the device shutdown mechanism to implement a
> prioritized shutdown sequence. The new function,
> prioritized_device_shutdown, ensures devices are shut down in reverse
> order, mirroring the system construction order. Within this process,
> devices are shut down based on their assigned priority levels.
> Additionally, this patch ensures that a device inherits its shutdown
> priority from its parent, maintaining hierarchy integrity. This is
> crucial to prevent child nodes of high-priority parents from being
> orphaned in the shutdown sequence.
>
> This change is vital in scenarios like power drops with limited backup
> energy, where shutdown time is constrained. By prioritizing critical
> devices, particularly storage, the focus is on maintaining device
> integrity by ensuring they are properly shut down. This approach reduces
> the risk of hardware damage and enhances system resilience during
> emergency shutdowns.

So you are going to race the power drain and just hope and pray that the
kernel gets to shut down the hardware before the capacitors discharge?

That seems ripe for loads of problems, as you are trying to achive
something that software just can not do, as the hardware isn't
supporting it at all.

You are making a promise here that the kernel can never achive, sorry.
I understand your wish to try to fix broken hardware with software, but
please go back and tell those hardware engineers that they need to fix
this properly if they don't want broken devices as this is just not
going to work at all.

One naming nit:

> -void device_shutdown(void)
> +void prioritized_device_shutdown(void)

It's the driver core, please prefix stuff correctly, so this would be
device_shutdown_prioritized() if we were to take this.

Anyway, good luck with the hardware engineers!

greg k-h

2023-11-24 15:22:01

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Fri, Nov 24, 2023 at 03:05:47PM +0000, Greg Kroah-Hartman wrote:

> Anyway, this is ripe for problems and issues in the long-run, what is so
> special about this hardware that it can not just shutdown in the
> existing order that it has to be "first" over everyone else? What
> exactly does this prevent and what devices are requiring this?

> And most importantly, what has changed in the past 20+ years to
> suddenly require this new functionality and how does any other operating
> system handle it?

This came out of some discussions about trying to handle emergency power
failure notifications.


Attachments:
(No filename) (617.00 B)
signature.asc (499.00 B)
Download all attachments

2023-11-24 15:28:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote:
> On Fri, Nov 24, 2023 at 03:05:47PM +0000, Greg Kroah-Hartman wrote:
>
> > Anyway, this is ripe for problems and issues in the long-run, what is so
> > special about this hardware that it can not just shutdown in the
> > existing order that it has to be "first" over everyone else? What
> > exactly does this prevent and what devices are requiring this?
>
> > And most importantly, what has changed in the past 20+ years to
> > suddenly require this new functionality and how does any other operating
> > system handle it?
>
> This came out of some discussions about trying to handle emergency power
> failure notifications.

I'm sorry, but I don't know what that means. Are you saying that the
kernel is now going to try to provide a hard guarantee that some devices
are going to be shut down in X number of seconds when asked? If so, why
not do this in userspace?

thanks,

greg k-h

2023-11-24 15:50:01

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote:
> On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote:

> > This came out of some discussions about trying to handle emergency power
> > failure notifications.

> I'm sorry, but I don't know what that means. Are you saying that the
> kernel is now going to try to provide a hard guarantee that some devices
> are going to be shut down in X number of seconds when asked? If so, why
> not do this in userspace?

No, it was initially (or when I initially saw it anyway) handling of
notifications from regulators that they're in trouble and we have some
small amount of time to do anything we might want to do about it before
we expire.


Attachments:
(No filename) (725.00 B)
signature.asc (499.00 B)
Download all attachments

2023-11-24 15:56:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote:
> On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote:
> > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote:
>
> > > This came out of some discussions about trying to handle emergency power
> > > failure notifications.
>
> > I'm sorry, but I don't know what that means. Are you saying that the
> > kernel is now going to try to provide a hard guarantee that some devices
> > are going to be shut down in X number of seconds when asked? If so, why
> > not do this in userspace?
>
> No, it was initially (or when I initially saw it anyway) handling of
> notifications from regulators that they're in trouble and we have some
> small amount of time to do anything we might want to do about it before
> we expire.

So we are going to guarantee a "time" in which we are going to do
something? Again, if that's required, why not do it in userspace using
a RT kernel?

thanks,

greg k-h

2023-11-24 16:33:04

by Oleksij Rempel

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote:
> On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote:
> > On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote:
> > > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote:
> >
> > > > This came out of some discussions about trying to handle emergency power
> > > > failure notifications.
> >
> > > I'm sorry, but I don't know what that means. Are you saying that the
> > > kernel is now going to try to provide a hard guarantee that some devices
> > > are going to be shut down in X number of seconds when asked? If so, why
> > > not do this in userspace?
> >
> > No, it was initially (or when I initially saw it anyway) handling of
> > notifications from regulators that they're in trouble and we have some
> > small amount of time to do anything we might want to do about it before
> > we expire.
>
> So we are going to guarantee a "time" in which we are going to do
> something? Again, if that's required, why not do it in userspace using
> a RT kernel?

For the HW in question I have only 100ms time before power loss. By
doing it over use space some we will have even less time to react.

In fact, this is not a new requirement. It exist on different flavors of
automotive Linux for about 10 years. Linux in cars should be able to
handle voltage drops for example on ignition and so on. The only new thing is
the attempt to mainline it.

Regards,
Oleksij
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2023-11-24 17:26:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote:
> On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote:
> > On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote:
> > > On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote:
> > > > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote:
> > >
> > > > > This came out of some discussions about trying to handle emergency power
> > > > > failure notifications.
> > >
> > > > I'm sorry, but I don't know what that means. Are you saying that the
> > > > kernel is now going to try to provide a hard guarantee that some devices
> > > > are going to be shut down in X number of seconds when asked? If so, why
> > > > not do this in userspace?
> > >
> > > No, it was initially (or when I initially saw it anyway) handling of
> > > notifications from regulators that they're in trouble and we have some
> > > small amount of time to do anything we might want to do about it before
> > > we expire.
> >
> > So we are going to guarantee a "time" in which we are going to do
> > something? Again, if that's required, why not do it in userspace using
> > a RT kernel?
>
> For the HW in question I have only 100ms time before power loss. By
> doing it over use space some we will have even less time to react.

Why can't userspace react that fast? Why will the kernel be somehow
faster? Speed should be the same, just get the "power is cut" signal
and have userspace flush and unmount the disk before power is gone. Why
can the kernel do this any differently?

> In fact, this is not a new requirement. It exist on different flavors of
> automotive Linux for about 10 years. Linux in cars should be able to
> handle voltage drops for example on ignition and so on. The only new thing is
> the attempt to mainline it.

But your patch is not guaranteeing anything, it's just doing a "I want
this done before the other devices are handled", that's it. There is no
chance that 100ms is going to be a requirement, or that some other
device type is not going to come along and demand to be ahead of your
device in the list.

So you are going to have a constant fight among device types over the
years, and people complaining that the kernel is now somehow going to
guarantee that a device is shutdown in a set amount of time, which
again, the kernel can not guarantee here.

This might work as a one-off for a specific hardware platform, which is
odd, but not anything you really should be adding for anyone else to use
here as your reasoning for it does not reflect what the code does.

thanks,

greg k-h

2023-11-24 18:58:16

by Oleksij Rempel

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Fri, Nov 24, 2023 at 05:26:30PM +0000, Greg Kroah-Hartman wrote:
> On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote:
> > On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote:
> > > On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote:
> > > > On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote:
> > > > > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote:
> > > >
> > > > > > This came out of some discussions about trying to handle emergency power
> > > > > > failure notifications.
> > > >
> > > > > I'm sorry, but I don't know what that means. Are you saying that the
> > > > > kernel is now going to try to provide a hard guarantee that some devices
> > > > > are going to be shut down in X number of seconds when asked? If so, why
> > > > > not do this in userspace?
> > > >
> > > > No, it was initially (or when I initially saw it anyway) handling of
> > > > notifications from regulators that they're in trouble and we have some
> > > > small amount of time to do anything we might want to do about it before
> > > > we expire.
> > >
> > > So we are going to guarantee a "time" in which we are going to do
> > > something? Again, if that's required, why not do it in userspace using
> > > a RT kernel?
> >
> > For the HW in question I have only 100ms time before power loss. By
> > doing it over use space some we will have even less time to react.
>
> Why can't userspace react that fast? Why will the kernel be somehow
> faster? Speed should be the same, just get the "power is cut" signal
> and have userspace flush and unmount the disk before power is gone. Why
> can the kernel do this any differently?
>
> > In fact, this is not a new requirement. It exist on different flavors of
> > automotive Linux for about 10 years. Linux in cars should be able to
> > handle voltage drops for example on ignition and so on. The only new thing is
> > the attempt to mainline it.
>
> But your patch is not guaranteeing anything, it's just doing a "I want
> this done before the other devices are handled", that's it. There is no
> chance that 100ms is going to be a requirement, or that some other
> device type is not going to come along and demand to be ahead of your
> device in the list.
>
> So you are going to have a constant fight among device types over the
> years, and people complaining that the kernel is now somehow going to
> guarantee that a device is shutdown in a set amount of time, which
> again, the kernel can not guarantee here.
>
> This might work as a one-off for a specific hardware platform, which is
> odd, but not anything you really should be adding for anyone else to use
> here as your reasoning for it does not reflect what the code does.

I see. Good point.

In my case umount is not needed, there is not enough time to write down
the data. We should send a shutdown command to the eMMC ASAP.

@Ulf, are there a way request mmc shutdown from user space?
If I see it correctly, sysfs-devices-power-control support only "auto" and
"on". Unbinding the module will not execute MMC shutdown notification.
If user space is the way to go, do sysfs-devices-power-control "off"
command will be acceptable?

The other option I have is to add a regulator event handler to the MMC
framework and do shutdown notification on under-voltage event.

Are there other options?

Regards,
Oleksij
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2023-11-24 20:05:10

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] driver core: move core part of device_shutdown() to a separate function

Hi Oleksij,

kernel test robot noticed the following build warnings:

[auto build test WARNING on driver-core/driver-core-testing]
[also build test WARNING on driver-core/driver-core-next driver-core/driver-core-linus linus/master v6.7-rc2 next-20231124]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Oleksij-Rempel/driver-core-move-core-part-of-device_shutdown-to-a-separate-function/20231124-225602
base: driver-core/driver-core-testing
patch link: https://lore.kernel.org/r/20231124145338.3112416-2-o.rempel%40pengutronix.de
patch subject: [PATCH v1 1/3] driver core: move core part of device_shutdown() to a separate function
config: x86_64-allnoconfig (https://download.01.org/0day-ci/archive/20231125/[email protected]/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231125/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All warnings (new ones prefixed by >>):

>> drivers/base/core.c:4731: warning: expecting prototype for device_shutdown_one(). Prototype was for device_shutdown_one_locked() instead


vim +4731 drivers/base/core.c

b8f33e5d76a7a1 Christian Brauner 2020-02-27 4721
37b0c020343080 Greg Kroah-Hartman 2007-11-26 4722 /**
52950bfb3bedfc Oleksij Rempel 2023-11-24 4723 * device_shutdown_one - shut down a device
52950bfb3bedfc Oleksij Rempel 2023-11-24 4724 * @dev: device to shut down
52950bfb3bedfc Oleksij Rempel 2023-11-24 4725 *
52950bfb3bedfc Oleksij Rempel 2023-11-24 4726 * It is called with the device lock held.
52950bfb3bedfc Oleksij Rempel 2023-11-24 4727 *
52950bfb3bedfc Oleksij Rempel 2023-11-24 4728 * The device must be on the devices_kset list.
37b0c020343080 Greg Kroah-Hartman 2007-11-26 4729 */
52950bfb3bedfc Oleksij Rempel 2023-11-24 4730 static void device_shutdown_one_locked(struct device *dev)
37b0c020343080 Greg Kroah-Hartman 2007-11-26 @4731 {
52950bfb3bedfc Oleksij Rempel 2023-11-24 4732 struct device *parent;
d1c6c030fcec6f Ming Lei 2012-06-22 4733
52950bfb3bedfc Oleksij Rempel 2023-11-24 4734 lockdep_assert_held(&devices_kset->list_lock);
d1c6c030fcec6f Ming Lei 2012-06-22 4735 /*
d1c6c030fcec6f Ming Lei 2012-06-22 4736 * hold reference count of device's parent to
d1c6c030fcec6f Ming Lei 2012-06-22 4737 * prevent it from being freed because parent's
d1c6c030fcec6f Ming Lei 2012-06-22 4738 * lock is to be held
d1c6c030fcec6f Ming Lei 2012-06-22 4739 */
f123db8e9d6c84 Benson Leung 2013-09-24 4740 parent = get_device(dev->parent);
6245838fe4d2ce Hugh Daschbach 2010-03-22 4741 get_device(dev);
6245838fe4d2ce Hugh Daschbach 2010-03-22 4742 /*
6245838fe4d2ce Hugh Daschbach 2010-03-22 4743 * Make sure the device is off the kset list, in the
6245838fe4d2ce Hugh Daschbach 2010-03-22 4744 * event that dev->*->shutdown() doesn't remove it.
6245838fe4d2ce Hugh Daschbach 2010-03-22 4745 */
6245838fe4d2ce Hugh Daschbach 2010-03-22 4746 list_del_init(&dev->kobj.entry);
6245838fe4d2ce Hugh Daschbach 2010-03-22 4747 spin_unlock(&devices_kset->list_lock);
fe6b91f47080eb Alan Stern 2011-12-06 4748
d1c6c030fcec6f Ming Lei 2012-06-22 4749 /* hold lock to avoid race with probe/release */
f123db8e9d6c84 Benson Leung 2013-09-24 4750 if (parent)
f123db8e9d6c84 Benson Leung 2013-09-24 4751 device_lock(parent);
d1c6c030fcec6f Ming Lei 2012-06-22 4752 device_lock(dev);
d1c6c030fcec6f Ming Lei 2012-06-22 4753
fe6b91f47080eb Alan Stern 2011-12-06 4754 /* Don't allow any more runtime suspends */
fe6b91f47080eb Alan Stern 2011-12-06 4755 pm_runtime_get_noresume(dev);
fe6b91f47080eb Alan Stern 2011-12-06 4756 pm_runtime_barrier(dev);
37b0c020343080 Greg Kroah-Hartman 2007-11-26 4757
7521621e600aee Michal Suchanek 2017-08-11 4758 if (dev->class && dev->class->shutdown_pre) {
f77af151658474 Josh Zimmerman 2017-06-25 4759 if (initcall_debug)
7521621e600aee Michal Suchanek 2017-08-11 4760 dev_info(dev, "shutdown_pre\n");
7521621e600aee Michal Suchanek 2017-08-11 4761 dev->class->shutdown_pre(dev);
7521621e600aee Michal Suchanek 2017-08-11 4762 }
7521621e600aee Michal Suchanek 2017-08-11 4763 if (dev->bus && dev->bus->shutdown) {
0246c4fafccd6c ShuoX Liu 2012-11-23 4764 if (initcall_debug)
0246c4fafccd6c ShuoX Liu 2012-11-23 4765 dev_info(dev, "shutdown\n");
37b0c020343080 Greg Kroah-Hartman 2007-11-26 4766 dev->bus->shutdown(dev);
37b0c020343080 Greg Kroah-Hartman 2007-11-26 4767 } else if (dev->driver && dev->driver->shutdown) {
0246c4fafccd6c ShuoX Liu 2012-11-23 4768 if (initcall_debug)
0246c4fafccd6c ShuoX Liu 2012-11-23 4769 dev_info(dev, "shutdown\n");
37b0c020343080 Greg Kroah-Hartman 2007-11-26 4770 dev->driver->shutdown(dev);
37b0c020343080 Greg Kroah-Hartman 2007-11-26 4771 }
d1c6c030fcec6f Ming Lei 2012-06-22 4772
d1c6c030fcec6f Ming Lei 2012-06-22 4773 device_unlock(dev);
f123db8e9d6c84 Benson Leung 2013-09-24 4774 if (parent)
f123db8e9d6c84 Benson Leung 2013-09-24 4775 device_unlock(parent);
d1c6c030fcec6f Ming Lei 2012-06-22 4776
6245838fe4d2ce Hugh Daschbach 2010-03-22 4777 put_device(dev);
f123db8e9d6c84 Benson Leung 2013-09-24 4778 put_device(parent);
6245838fe4d2ce Hugh Daschbach 2010-03-22 4779
6245838fe4d2ce Hugh Daschbach 2010-03-22 4780 spin_lock(&devices_kset->list_lock);
37b0c020343080 Greg Kroah-Hartman 2007-11-26 4781 }
52950bfb3bedfc Oleksij Rempel 2023-11-24 4782

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2023-11-25 07:32:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Fri, Nov 24, 2023 at 07:57:25PM +0100, Oleksij Rempel wrote:
> On Fri, Nov 24, 2023 at 05:26:30PM +0000, Greg Kroah-Hartman wrote:
> > On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote:
> > > On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote:
> > > > On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote:
> > > > > On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote:
> > > > > > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote:
> > > > >
> > > > > > > This came out of some discussions about trying to handle emergency power
> > > > > > > failure notifications.
> > > > >
> > > > > > I'm sorry, but I don't know what that means. Are you saying that the
> > > > > > kernel is now going to try to provide a hard guarantee that some devices
> > > > > > are going to be shut down in X number of seconds when asked? If so, why
> > > > > > not do this in userspace?
> > > > >
> > > > > No, it was initially (or when I initially saw it anyway) handling of
> > > > > notifications from regulators that they're in trouble and we have some
> > > > > small amount of time to do anything we might want to do about it before
> > > > > we expire.
> > > >
> > > > So we are going to guarantee a "time" in which we are going to do
> > > > something? Again, if that's required, why not do it in userspace using
> > > > a RT kernel?
> > >
> > > For the HW in question I have only 100ms time before power loss. By
> > > doing it over use space some we will have even less time to react.
> >
> > Why can't userspace react that fast? Why will the kernel be somehow
> > faster? Speed should be the same, just get the "power is cut" signal
> > and have userspace flush and unmount the disk before power is gone. Why
> > can the kernel do this any differently?
> >
> > > In fact, this is not a new requirement. It exist on different flavors of
> > > automotive Linux for about 10 years. Linux in cars should be able to
> > > handle voltage drops for example on ignition and so on. The only new thing is
> > > the attempt to mainline it.
> >
> > But your patch is not guaranteeing anything, it's just doing a "I want
> > this done before the other devices are handled", that's it. There is no
> > chance that 100ms is going to be a requirement, or that some other
> > device type is not going to come along and demand to be ahead of your
> > device in the list.
> >
> > So you are going to have a constant fight among device types over the
> > years, and people complaining that the kernel is now somehow going to
> > guarantee that a device is shutdown in a set amount of time, which
> > again, the kernel can not guarantee here.
> >
> > This might work as a one-off for a specific hardware platform, which is
> > odd, but not anything you really should be adding for anyone else to use
> > here as your reasoning for it does not reflect what the code does.
>
> I see. Good point.
>
> In my case umount is not needed, there is not enough time to write down
> the data. We should send a shutdown command to the eMMC ASAP.

If you don't care about the data, why is a shutdown command to the
hardware needed? What does that do that makes anything "safe" if your
data is lost.

thanks,

greg k-h

2023-11-25 08:51:13

by Oleksij Rempel

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Sat, Nov 25, 2023 at 06:51:55AM +0000, Greg Kroah-Hartman wrote:
> On Fri, Nov 24, 2023 at 07:57:25PM +0100, Oleksij Rempel wrote:
> > On Fri, Nov 24, 2023 at 05:26:30PM +0000, Greg Kroah-Hartman wrote:
> > > On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote:
> > > > On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote:
> > > > > On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote:
> > > > > > On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote:
> > > > > > > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote:
> > > > > >
> > > > > > > > This came out of some discussions about trying to handle emergency power
> > > > > > > > failure notifications.
> > > > > >
> > > > > > > I'm sorry, but I don't know what that means. Are you saying that the
> > > > > > > kernel is now going to try to provide a hard guarantee that some devices
> > > > > > > are going to be shut down in X number of seconds when asked? If so, why
> > > > > > > not do this in userspace?
> > > > > >
> > > > > > No, it was initially (or when I initially saw it anyway) handling of
> > > > > > notifications from regulators that they're in trouble and we have some
> > > > > > small amount of time to do anything we might want to do about it before
> > > > > > we expire.
> > > > >
> > > > > So we are going to guarantee a "time" in which we are going to do
> > > > > something? Again, if that's required, why not do it in userspace using
> > > > > a RT kernel?
> > > >
> > > > For the HW in question I have only 100ms time before power loss. By
> > > > doing it over use space some we will have even less time to react.
> > >
> > > Why can't userspace react that fast? Why will the kernel be somehow
> > > faster? Speed should be the same, just get the "power is cut" signal
> > > and have userspace flush and unmount the disk before power is gone. Why
> > > can the kernel do this any differently?
> > >
> > > > In fact, this is not a new requirement. It exist on different flavors of
> > > > automotive Linux for about 10 years. Linux in cars should be able to
> > > > handle voltage drops for example on ignition and so on. The only new thing is
> > > > the attempt to mainline it.
> > >
> > > But your patch is not guaranteeing anything, it's just doing a "I want
> > > this done before the other devices are handled", that's it. There is no
> > > chance that 100ms is going to be a requirement, or that some other
> > > device type is not going to come along and demand to be ahead of your
> > > device in the list.
> > >
> > > So you are going to have a constant fight among device types over the
> > > years, and people complaining that the kernel is now somehow going to
> > > guarantee that a device is shutdown in a set amount of time, which
> > > again, the kernel can not guarantee here.
> > >
> > > This might work as a one-off for a specific hardware platform, which is
> > > odd, but not anything you really should be adding for anyone else to use
> > > here as your reasoning for it does not reflect what the code does.
> >
> > I see. Good point.
> >
> > In my case umount is not needed, there is not enough time to write down
> > the data. We should send a shutdown command to the eMMC ASAP.
>
> If you don't care about the data, why is a shutdown command to the
> hardware needed? What does that do that makes anything "safe" if your
> data is lost.

It prevents HW damage. In a typical automotive under-voltage labor it is
usually possible to reproduce X amount of bricked eMMCs or NANDs on Y
amount of under-voltage cycles (I do not have exact numbers right now).
Even if the numbers not so high in the labor tests (sometimes something
like one bricked device in a month of tests), the field returns are
significant enough to care about software solution for this problem.

Same problem was seen not only in automotive devices, but also in
industrial or agricultural. With other words, it is important enough to bring
some kind of solution mainline.

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2023-11-25 09:10:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Sat, Nov 25, 2023 at 09:50:38AM +0100, Oleksij Rempel wrote:
> On Sat, Nov 25, 2023 at 06:51:55AM +0000, Greg Kroah-Hartman wrote:
> > On Fri, Nov 24, 2023 at 07:57:25PM +0100, Oleksij Rempel wrote:
> > > On Fri, Nov 24, 2023 at 05:26:30PM +0000, Greg Kroah-Hartman wrote:
> > > > On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote:
> > > > > On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote:
> > > > > > On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote:
> > > > > > > On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote:
> > > > > > > > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote:
> > > > > > >
> > > > > > > > > This came out of some discussions about trying to handle emergency power
> > > > > > > > > failure notifications.
> > > > > > >
> > > > > > > > I'm sorry, but I don't know what that means. Are you saying that the
> > > > > > > > kernel is now going to try to provide a hard guarantee that some devices
> > > > > > > > are going to be shut down in X number of seconds when asked? If so, why
> > > > > > > > not do this in userspace?
> > > > > > >
> > > > > > > No, it was initially (or when I initially saw it anyway) handling of
> > > > > > > notifications from regulators that they're in trouble and we have some
> > > > > > > small amount of time to do anything we might want to do about it before
> > > > > > > we expire.
> > > > > >
> > > > > > So we are going to guarantee a "time" in which we are going to do
> > > > > > something? Again, if that's required, why not do it in userspace using
> > > > > > a RT kernel?
> > > > >
> > > > > For the HW in question I have only 100ms time before power loss. By
> > > > > doing it over use space some we will have even less time to react.
> > > >
> > > > Why can't userspace react that fast? Why will the kernel be somehow
> > > > faster? Speed should be the same, just get the "power is cut" signal
> > > > and have userspace flush and unmount the disk before power is gone. Why
> > > > can the kernel do this any differently?
> > > >
> > > > > In fact, this is not a new requirement. It exist on different flavors of
> > > > > automotive Linux for about 10 years. Linux in cars should be able to
> > > > > handle voltage drops for example on ignition and so on. The only new thing is
> > > > > the attempt to mainline it.
> > > >
> > > > But your patch is not guaranteeing anything, it's just doing a "I want
> > > > this done before the other devices are handled", that's it. There is no
> > > > chance that 100ms is going to be a requirement, or that some other
> > > > device type is not going to come along and demand to be ahead of your
> > > > device in the list.
> > > >
> > > > So you are going to have a constant fight among device types over the
> > > > years, and people complaining that the kernel is now somehow going to
> > > > guarantee that a device is shutdown in a set amount of time, which
> > > > again, the kernel can not guarantee here.
> > > >
> > > > This might work as a one-off for a specific hardware platform, which is
> > > > odd, but not anything you really should be adding for anyone else to use
> > > > here as your reasoning for it does not reflect what the code does.
> > >
> > > I see. Good point.
> > >
> > > In my case umount is not needed, there is not enough time to write down
> > > the data. We should send a shutdown command to the eMMC ASAP.
> >
> > If you don't care about the data, why is a shutdown command to the
> > hardware needed? What does that do that makes anything "safe" if your
> > data is lost.
>
> It prevents HW damage. In a typical automotive under-voltage labor it is
> usually possible to reproduce X amount of bricked eMMCs or NANDs on Y
> amount of under-voltage cycles (I do not have exact numbers right now).
> Even if the numbers not so high in the labor tests (sometimes something
> like one bricked device in a month of tests), the field returns are
> significant enough to care about software solution for this problem.

So hardware is attempting to rely on software in order to prevent the
destruction of that same hardware? Surely hardware designers aren't
that crazy, right? (rhetorical question, I know...)

> Same problem was seen not only in automotive devices, but also in
> industrial or agricultural. With other words, it is important enough to bring
> some kind of solution mainline.

But you are not providing a real solution here, only a "I am going to
attempt to shut down a specific type of device before the others, there
are no time or ordering guarantees here, so good luck!" solution.

And again, how are you going to prevent the in-fighting of all device
types to be "first" in the list?

thanks,

greg k-h

2023-11-25 10:31:01

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Sat, Nov 25, 2023 at 09:09:01AM +0000, Greg Kroah-Hartman wrote:
> On Sat, Nov 25, 2023 at 09:50:38AM +0100, Oleksij Rempel wrote:

> > It prevents HW damage. In a typical automotive under-voltage labor it is
> > usually possible to reproduce X amount of bricked eMMCs or NANDs on Y
> > amount of under-voltage cycles (I do not have exact numbers right now).
> > Even if the numbers not so high in the labor tests (sometimes something
> > like one bricked device in a month of tests), the field returns are
> > significant enough to care about software solution for this problem.

> So hardware is attempting to rely on software in order to prevent the
> destruction of that same hardware? Surely hardware designers aren't
> that crazy, right? (rhetorical question, I know...)

Surely software people aren't going to make no effort to integrate with
the notification features that the hardware engineers have so helpfully
provided us with?

> > Same problem was seen not only in automotive devices, but also in
> > industrial or agricultural. With other words, it is important enough to bring
> > some kind of solution mainline.

> But you are not providing a real solution here, only a "I am going to
> attempt to shut down a specific type of device before the others, there
> are no time or ordering guarantees here, so good luck!" solution.

I'm not sure there are great solutions here, the system integrators are
constrained by the what the application appropriate silicon that's on
the market is capable of, the siicon is constrained by the area costs of
dealing with corner cases for system robustness and how much of the
market cares about fixing these issues and software is constrained by
what hardware ends up being built. Everyone's just got to try their
best with the reality they're confronted with, hopefully what's possible
will improve with time.

> And again, how are you going to prevent the in-fighting of all device
> types to be "first" in the list?

It doesn't seem like the most complex integration challenge we've ever
had to deal with TBH.


Attachments:
(No filename) (2.06 kB)
signature.asc (499.00 B)
Download all attachments

2023-11-25 14:35:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Sat, Nov 25, 2023 at 10:30:42AM +0000, Mark Brown wrote:
> On Sat, Nov 25, 2023 at 09:09:01AM +0000, Greg Kroah-Hartman wrote:
> > On Sat, Nov 25, 2023 at 09:50:38AM +0100, Oleksij Rempel wrote:
>
> > > It prevents HW damage. In a typical automotive under-voltage labor it is
> > > usually possible to reproduce X amount of bricked eMMCs or NANDs on Y
> > > amount of under-voltage cycles (I do not have exact numbers right now).
> > > Even if the numbers not so high in the labor tests (sometimes something
> > > like one bricked device in a month of tests), the field returns are
> > > significant enough to care about software solution for this problem.
>
> > So hardware is attempting to rely on software in order to prevent the
> > destruction of that same hardware? Surely hardware designers aren't
> > that crazy, right? (rhetorical question, I know...)
>
> Surely software people aren't going to make no effort to integrate with
> the notification features that the hardware engineers have so helpfully
> provided us with?

That would be great, but I don't see that here, do you? All I see is
the shutdown sequence changing because someone wants it to go "faster"
with the threat of hardware breaking if we don't meet that "faster"
number, yet no knowledge or guarantee that this number can ever be known
or happen.

> > > Same problem was seen not only in automotive devices, but also in
> > > industrial or agricultural. With other words, it is important enough to bring
> > > some kind of solution mainline.
>
> > But you are not providing a real solution here, only a "I am going to
> > attempt to shut down a specific type of device before the others, there
> > are no time or ordering guarantees here, so good luck!" solution.
>
> I'm not sure there are great solutions here, the system integrators are
> constrained by the what the application appropriate silicon that's on
> the market is capable of, the siicon is constrained by the area costs of
> dealing with corner cases for system robustness and how much of the
> market cares about fixing these issues and software is constrained by
> what hardware ends up being built. Everyone's just got to try their
> best with the reality they're confronted with, hopefully what's possible
> will improve with time.

Agreed, but I don't think this patch is going to actually work properly
over time as there is no time values involved :)

> > And again, how are you going to prevent the in-fighting of all device
> > types to be "first" in the list?
>
> It doesn't seem like the most complex integration challenge we've ever
> had to deal with TBH.

True, but we all know how this grows and thinking about how to handle it
now is key for this to be acceptable.

thanks,

greg k-h

2023-11-25 15:43:19

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Sat, Nov 25, 2023 at 02:35:41PM +0000, Greg Kroah-Hartman wrote:
> On Sat, Nov 25, 2023 at 10:30:42AM +0000, Mark Brown wrote:
> > On Sat, Nov 25, 2023 at 09:09:01AM +0000, Greg Kroah-Hartman wrote:

> > > So hardware is attempting to rely on software in order to prevent the
> > > destruction of that same hardware? Surely hardware designers aren't
> > > that crazy, right? (rhetorical question, I know...)

> > Surely software people aren't going to make no effort to integrate with
> > the notification features that the hardware engineers have so helpfully
> > provided us with?

> That would be great, but I don't see that here, do you? All I see is
> the shutdown sequence changing because someone wants it to go "faster"
> with the threat of hardware breaking if we don't meet that "faster"
> number, yet no knowledge or guarantee that this number can ever be known
> or happen.

The idea was to have somewhere to send notifications when the hardware
starts reporting things like power supplies starting to fail. We do
have those from hardware, we just don't do anything terribly useful
with them yet.

TBH it does seem reasonable that there will be systems that can usefully
detect these issues but hasn't got a detailed characterisation of
exactly how long you've got before things expire, it's also likely that
the actual bound is going to be highly variable depending on what the
system is up to at the point of detection. It's quite likely that we'd
only get a worst case bound so it's also likely that we'd have more time
in practice than in spec. I'd expect characterisation that does happen
to be very system specific at this point, I don't think we can rely on
getting that information. I'd certainly expect that we have vastly more
systems can usefully detect issues than systems where we have firm
numbers.

> > > > Same problem was seen not only in automotive devices, but also in
> > > > industrial or agricultural. With other words, it is important enough to bring
> > > > some kind of solution mainline.

> > > But you are not providing a real solution here, only a "I am going to
> > > attempt to shut down a specific type of device before the others, there
> > > are no time or ordering guarantees here, so good luck!" solution.

> > I'm not sure there are great solutions here, the system integrators are
> > constrained by the what the application appropriate silicon that's on
> > the market is capable of, the siicon is constrained by the area costs of
> > dealing with corner cases for system robustness and how much of the
> > market cares about fixing these issues and software is constrained by
> > what hardware ends up being built. Everyone's just got to try their
> > best with the reality they're confronted with, hopefully what's possible
> > will improve with time.

> Agreed, but I don't think this patch is going to actually work properly
> over time as there is no time values involved :)

This seems to be more into the area of mitigation than firm solution, I
suspect users will be pleased if they can make a noticable dent in the
number of failures they're seeing.

> > > And again, how are you going to prevent the in-fighting of all device
> > > types to be "first" in the list?

> > It doesn't seem like the most complex integration challenge we've ever
> > had to deal with TBH.

> True, but we all know how this grows and thinking about how to handle it
> now is key for this to be acceptable.

It feels like if we're concerned about mitigating physical damage during
the process of power failure that's a very limited set of devices - the
storage case where we're in the middle of writing to flash or whatever
is the most obvious case.


Attachments:
(No filename) (3.68 kB)
signature.asc (499.00 B)
Download all attachments

2023-11-25 19:58:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Sat, Nov 25, 2023 at 03:43:02PM +0000, Mark Brown wrote:
> On Sat, Nov 25, 2023 at 02:35:41PM +0000, Greg Kroah-Hartman wrote:
> > On Sat, Nov 25, 2023 at 10:30:42AM +0000, Mark Brown wrote:
> > > On Sat, Nov 25, 2023 at 09:09:01AM +0000, Greg Kroah-Hartman wrote:
>
> > > > So hardware is attempting to rely on software in order to prevent the
> > > > destruction of that same hardware? Surely hardware designers aren't
> > > > that crazy, right? (rhetorical question, I know...)
>
> > > Surely software people aren't going to make no effort to integrate with
> > > the notification features that the hardware engineers have so helpfully
> > > provided us with?
>
> > That would be great, but I don't see that here, do you? All I see is
> > the shutdown sequence changing because someone wants it to go "faster"
> > with the threat of hardware breaking if we don't meet that "faster"
> > number, yet no knowledge or guarantee that this number can ever be known
> > or happen.
>
> The idea was to have somewhere to send notifications when the hardware
> starts reporting things like power supplies starting to fail. We do
> have those from hardware, we just don't do anything terribly useful
> with them yet.

Ok, but that's not what I recall this patchset doing, or did I missing
something? All I saw was a "reorder the shutdown sequence" set of
changes. Or at least that's all I remember at this point in time,
sorry, it's been a few days, but at least that lines up with what the
Subject line says above :)

> TBH it does seem reasonable that there will be systems that can usefully
> detect these issues but hasn't got a detailed characterisation of
> exactly how long you've got before things expire, it's also likely that
> the actual bound is going to be highly variable depending on what the
> system is up to at the point of detection. It's quite likely that we'd
> only get a worst case bound so it's also likely that we'd have more time
> in practice than in spec. I'd expect characterisation that does happen
> to be very system specific at this point, I don't think we can rely on
> getting that information. I'd certainly expect that we have vastly more
> systems can usefully detect issues than systems where we have firm
> numbers.

Sure, that all sounds good, but again, I don't think that's what is
happening here.

> > > > > Same problem was seen not only in automotive devices, but also in
> > > > > industrial or agricultural. With other words, it is important enough to bring
> > > > > some kind of solution mainline.
>
> > > > But you are not providing a real solution here, only a "I am going to
> > > > attempt to shut down a specific type of device before the others, there
> > > > are no time or ordering guarantees here, so good luck!" solution.
>
> > > I'm not sure there are great solutions here, the system integrators are
> > > constrained by the what the application appropriate silicon that's on
> > > the market is capable of, the siicon is constrained by the area costs of
> > > dealing with corner cases for system robustness and how much of the
> > > market cares about fixing these issues and software is constrained by
> > > what hardware ends up being built. Everyone's just got to try their
> > > best with the reality they're confronted with, hopefully what's possible
> > > will improve with time.

Note, if you attempt to mitigate broken hardware with software fixes,
hardware will never get unbroken as it never needs to change. Push back
on this, it's the only real way forward here. I know it's not always
possible, but the number of times I have heard hardware engineers say
"but no one ever told us that was broken/impossible/whatever, we just
assumed software could handle it" is uncountable.

> > Agreed, but I don't think this patch is going to actually work properly
> > over time as there is no time values involved :)
>
> This seems to be more into the area of mitigation than firm solution, I
> suspect users will be pleased if they can make a noticable dent in the
> number of failures they're seeing.

Mitigation is good, but this patch series is just a hack by doing "throw
this device type at the front of the shutdown list because we have
hardware that crashes a lot" :)

> > > > And again, how are you going to prevent the in-fighting of all device
> > > > types to be "first" in the list?
>
> > > It doesn't seem like the most complex integration challenge we've ever
> > > had to deal with TBH.
>
> > True, but we all know how this grows and thinking about how to handle it
> > now is key for this to be acceptable.
>
> It feels like if we're concerned about mitigating physical damage during
> the process of power failure that's a very limited set of devices - the
> storage case where we're in the middle of writing to flash or whatever
> is the most obvious case.

Then why isn't userspace handling this? This is a policy decision that
it needs to take to properly know what hardware needs to be shut down,
and what needs to happen in order to do that (i.e. flush, unmount,
etc.?) And userspace today should be able to say, "power down this
device now!" for any device in the system based on the sysfs device
tree, or at the very least, force it to a specific power state. So why
not handle this policy there?

thanks,

greg k-h

2023-11-26 10:15:51

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Sat, Nov 25, 2023 at 07:58:12PM +0000, Greg Kroah-Hartman wrote:
> On Sat, Nov 25, 2023 at 03:43:02PM +0000, Mark Brown wrote:
> > On Sat, Nov 25, 2023 at 02:35:41PM +0000, Greg Kroah-Hartman wrote:

> > > That would be great, but I don't see that here, do you? All I see is
> > > the shutdown sequence changing because someone wants it to go "faster"
> > > with the threat of hardware breaking if we don't meet that "faster"
> > > number, yet no knowledge or guarantee that this number can ever be known
> > > or happen.

> > The idea was to have somewhere to send notifications when the hardware
> > starts reporting things like power supplies starting to fail. We do
> > have those from hardware, we just don't do anything terribly useful
> > with them yet.

> Ok, but that's not what I recall this patchset doing, or did I missing
> something? All I saw was a "reorder the shutdown sequence" set of
> changes. Or at least that's all I remember at this point in time,
> sorry, it's been a few days, but at least that lines up with what the
> Subject line says above :)

That's not in the series, a bunch of it is merged in some form (eg, see
hw_protection_shutdown()) and more of it would need to be built on top
if this were merged.

> > > Agreed, but I don't think this patch is going to actually work properly
> > > over time as there is no time values involved :)

> > This seems to be more into the area of mitigation than firm solution, I
> > suspect users will be pleased if they can make a noticable dent in the
> > number of failures they're seeing.

> Mitigation is good, but this patch series is just a hack by doing "throw
> this device type at the front of the shutdown list because we have
> hardware that crashes a lot" :)

Sounds like a mitigation to me.

> > It feels like if we're concerned about mitigating physical damage during
> > the process of power failure that's a very limited set of devices - the
> > storage case where we're in the middle of writing to flash or whatever
> > is the most obvious case.

> Then why isn't userspace handling this? This is a policy decision that
> it needs to take to properly know what hardware needs to be shut down,
> and what needs to happen in order to do that (i.e. flush, unmount,
> etc.?) And userspace today should be able to say, "power down this
> device now!" for any device in the system based on the sysfs device
> tree, or at the very least, force it to a specific power state. So why
> not handle this policy there?

Given the tight timelines it does seem reasonable to have some of this
in the kernel - the specific decisions about how to handle these events
can always be controlled from userspace (eg, with a sysfs file like we
do for autosuspend delay times which seem to be in a similar ballpark).


Attachments:
(No filename) (2.78 kB)
signature.asc (499.00 B)
Download all attachments

2023-11-26 19:31:54

by Oleksij Rempel

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Sun, Nov 26, 2023 at 10:14:45AM +0000, Mark Brown wrote:
> On Sat, Nov 25, 2023 at 07:58:12PM +0000, Greg Kroah-Hartman wrote:
> > On Sat, Nov 25, 2023 at 03:43:02PM +0000, Mark Brown wrote:
> > > On Sat, Nov 25, 2023 at 02:35:41PM +0000, Greg Kroah-Hartman wrote:
>
> > > > That would be great, but I don't see that here, do you? All I see is
> > > > the shutdown sequence changing because someone wants it to go "faster"
> > > > with the threat of hardware breaking if we don't meet that "faster"
> > > > number, yet no knowledge or guarantee that this number can ever be known
> > > > or happen.
>
> > > The idea was to have somewhere to send notifications when the hardware
> > > starts reporting things like power supplies starting to fail. We do
> > > have those from hardware, we just don't do anything terribly useful
> > > with them yet.
>
> > Ok, but that's not what I recall this patchset doing, or did I missing
> > something? All I saw was a "reorder the shutdown sequence" set of
> > changes. Or at least that's all I remember at this point in time,
> > sorry, it's been a few days, but at least that lines up with what the
> > Subject line says above :)
>
> That's not in the series, a bunch of it is merged in some form (eg, see
> hw_protection_shutdown()) and more of it would need to be built on top
> if this were merged.

The current kernel has enough infrastructure to manage essential functions
related to hardware protection:
- The Device Tree specifies the source of interrupts for detecting
under-voltage events. It also details critical system regulators and some
of specification of backup power supplied by the board.
- Various frameworks within the kernel can identify critical hardware
conditions like over-temperature and under-voltage. Upon detection, these
frameworks invoke the hw_protection_shutdown() function.

> > > > Agreed, but I don't think this patch is going to actually work properly
> > > > over time as there is no time values involved :)

If we're to implement a deadline for each shutdown call (as the requirement for
"time values" suggests?), then prioritization becomes essential. Without
establishing a shutdown order, the inclusion of time values might not be
effectively utilized. Am I overlooking anything in this regard?

> > > This seems to be more into the area of mitigation than firm solution, I
> > > suspect users will be pleased if they can make a noticable dent in the
> > > number of failures they're seeing.
>
> > Mitigation is good, but this patch series is just a hack by doing "throw
> > this device type at the front of the shutdown list because we have
> > hardware that crashes a lot" :)

The root of the issue seems to be the choice of primary storage device.

All storage technologies - HDD, SSD, eMMC, NAND - are vulnerable to power
loss. The only foolproof safeguard is a backup power source, but this
introduces its own set of challenges:

1. Batteries: While they provide a backup, they come with limitations like a
finite number of charge cycles, sensitivity to temperature (a significant
concern in industrial and automotive environments), higher costs, and
increased device size. For most embedded applications, a UPS isn't a viable
solution.

2. Capacitors: A potential alternative, but they cannot offer prolonged
backup time. Increasing the number of capacitors to extend backup time leads
to additional issues:
- Increased costs and space requirements on the PCB.
- The need to manage partially charged capacitors during power failures.
- The requirement for a power supply capable of rapid charging.
- The risk of not reaching a safe state before the backup energy
depletes.
- In specific environments, like explosive atmospheres, storing large
amounts of energy can be hazardous.

Given these considerations, it's crucial to understand that such design choices
aren't merely "hacks". They represent a balance between different types of
trade-offs.

> > > It feels like if we're concerned about mitigating physical damage during
> > > the process of power failure that's a very limited set of devices - the
> > > storage case where we're in the middle of writing to flash or whatever
> > > is the most obvious case.
>
> > Then why isn't userspace handling this? This is a policy decision that
> > it needs to take to properly know what hardware needs to be shut down,
> > and what needs to happen in order to do that (i.e. flush, unmount,
> > etc.?) And userspace today should be able to say, "power down this
> > device now!" for any device in the system based on the sysfs device
> > tree, or at the very least, force it to a specific power state. So why
> > not handle this policy there?
>
> Given the tight timelines it does seem reasonable to have some of this
> in the kernel - the specific decisions about how to handle these events
> can always be controlled from userspace (eg, with a sysfs file like we
> do for autosuspend delay times which seem to be in a similar ballpark).

Upon investigating the feasibility of a user space solution for eMMC
power control, I've concluded that it's likely not possible. The primary
issue is that most board designs don't include reset signaling for
eMMCs. Additionally, the eMMC power rail is usually linked to the
system's main power controller. While powering off is doable, cleanly
powering it back on isn’t feasible. This is especially problematic when
the rootfs is located on the eMMC, as power cycling the storage device
could lead to system instability.

Therefore, any user space method to power off eMMC wouldn't be reliable
or safe, as there's no way to ensure it can be turned back on without
risking the integrity of the system. The design rationale is clear:
avoiding the risks associated with powering off the primary storage
device.

Considering these constraints, the only practical implementation I see
is integrating this functionality into the system's shutdown sequence.
This approach ensures a controlled environment for powering off the
eMMC, avoiding potential issues.

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2023-11-26 19:42:19

by Ferry Toth

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

Ha ha,

Funny discussion. As a hardware engineer (with no experience in
automotive, but actual experience in industrial applications and
debugging issues arising from bad shutdowns) let me add my 5ct at the end.

Op 26-11-2023 om 11:14 schreef Mark Brown:
> On Sat, Nov 25, 2023 at 07:58:12PM +0000, Greg Kroah-Hartman wrote:
>> On Sat, Nov 25, 2023 at 03:43:02PM +0000, Mark Brown wrote:
>>> On Sat, Nov 25, 2023 at 02:35:41PM +0000, Greg Kroah-Hartman wrote:
>
>>>> That would be great, but I don't see that here, do you? All I see is
>>>> the shutdown sequence changing because someone wants it to go "faster"
>>>> with the threat of hardware breaking if we don't meet that "faster"
>>>> number, yet no knowledge or guarantee that this number can ever be known
>>>> or happen.
>
>>> The idea was to have somewhere to send notifications when the hardware
>>> starts reporting things like power supplies starting to fail. We do
>>> have those from hardware, we just don't do anything terribly useful
>>> with them yet.
>
>> Ok, but that's not what I recall this patchset doing, or did I missing
>> something? All I saw was a "reorder the shutdown sequence" set of
>> changes. Or at least that's all I remember at this point in time,
>> sorry, it's been a few days, but at least that lines up with what the
>> Subject line says above :)
>
> That's not in the series, a bunch of it is merged in some form (eg, see
> hw_protection_shutdown()) and more of it would need to be built on top
> if this were merged.
>
>>>> Agreed, but I don't think this patch is going to actually work properly
>>>> over time as there is no time values involved :)
>
>>> This seems to be more into the area of mitigation than firm solution, I
>>> suspect users will be pleased if they can make a noticable dent in the
>>> number of failures they're seeing.
>
>> Mitigation is good, but this patch series is just a hack by doing "throw
>> this device type at the front of the shutdown list because we have
>> hardware that crashes a lot" :)
>
> Sounds like a mitigation to me.
>
>>> It feels like if we're concerned about mitigating physical damage during
>>> the process of power failure that's a very limited set of devices - the
>>> storage case where we're in the middle of writing to flash or whatever
>>> is the most obvious case.
>
>> Then why isn't userspace handling this? This is a policy decision that
>> it needs to take to properly know what hardware needs to be shut down,
>> and what needs to happen in order to do that (i.e. flush, unmount,
>> etc.?) And userspace today should be able to say, "power down this
>> device now!" for any device in the system based on the sysfs device
>> tree, or at the very least, force it to a specific power state. So why
>> not handle this policy there?
>
> Given the tight timelines it does seem reasonable to have some of this
> in the kernel - the specific decisions about how to handle these events
> can always be controlled from userspace (eg, with a sysfs file like we
> do for autosuspend delay times which seem to be in a similar ballpark).

I'd prefer not to call the HW broken in this case. The life of hardware
(unlike software) continues during and after power down. That means
there may be requirements and specs for it to conform to during those
transitions and states. Unlike broken hardware, which does not conform
to its specs. Typically, a HDD that autoparks its heads to a safe
position on its last rotation energy, that's not broken, that's
carefully designed.

That said, I agree with Greg, if there is a hard requirement to shutdown
safely to prevent damage, the solution is not to shutdown fast. The
solution is to shutdown on time.

In fact, if the software needs more energy to shutdown safely, any
hardware engineer will consider that a requirement. And ask the
appropriate question: "how much energy do you need exactly?". There are
various reasons why that can not be answered in general. The most funny
answer I ever got (thanks Albert) being: "My software doesn't consume
energy".
Now, we do need to keep in mind that storing J in a supercap, executing
a CPU at GHz, storing GB data do not come free. So, after making sure
things shutdown in time, it often pays off to shorten that deadline, and
indeed make it faster.

Looking at the above discussion from the different angles:
1) The hardware mentioned does not need to shutdown (as said, it doesn't
need to be unmounted). It needs to be placed into a safe state on time.
And the only thing here that can know for the particular hardware what
is a safe state, is the driver itself.
2) To get a signal (Low Power Warning) to the driver on time, the
PREEMPT_RT kernel seems like a natural choice.
3) To me (but hey who am I) it makes sense to have a generic mechanism
from drivers to transition to their safe state if they require that.
4) I wouldn't worry about drivers fighting for priority, these systems
are normally "embedded" with fixed hardware. Otherwise there is no way
to calculate shutdown energy required and do proper hardware design.

2023-11-27 10:14:32

by Christian Loehle

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On 25/11/2023 08:50, Oleksij Rempel wrote:
> On Sat, Nov 25, 2023 at 06:51:55AM +0000, Greg Kroah-Hartman wrote:
>> On Fri, Nov 24, 2023 at 07:57:25PM +0100, Oleksij Rempel wrote:
>>> On Fri, Nov 24, 2023 at 05:26:30PM +0000, Greg Kroah-Hartman wrote:
>>>> On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote:
>>>>> On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote:
>>>>>> On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote:
>>>>>>> On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote:
>>>>>>>> On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote:
>>>>>>>
>>>>>>>>> This came out of some discussions about trying to handle emergency power
>>>>>>>>> failure notifications.
>>>>>>>
>>>>>>>> I'm sorry, but I don't know what that means. Are you saying that the
>>>>>>>> kernel is now going to try to provide a hard guarantee that some devices
>>>>>>>> are going to be shut down in X number of seconds when asked? If so, why
>>>>>>>> not do this in userspace?
>>>>>>>
>>>>>>> No, it was initially (or when I initially saw it anyway) handling of
>>>>>>> notifications from regulators that they're in trouble and we have some
>>>>>>> small amount of time to do anything we might want to do about it before
>>>>>>> we expire.
>>>>>>
>>>>>> So we are going to guarantee a "time" in which we are going to do
>>>>>> something? Again, if that's required, why not do it in userspace using
>>>>>> a RT kernel?
>>>>>
>>>>> For the HW in question I have only 100ms time before power loss. By
>>>>> doing it over use space some we will have even less time to react.
>>>>
>>>> Why can't userspace react that fast? Why will the kernel be somehow
>>>> faster? Speed should be the same, just get the "power is cut" signal
>>>> and have userspace flush and unmount the disk before power is gone. Why
>>>> can the kernel do this any differently?
>>>>
>>>>> In fact, this is not a new requirement. It exist on different flavors of
>>>>> automotive Linux for about 10 years. Linux in cars should be able to
>>>>> handle voltage drops for example on ignition and so on. The only new thing is
>>>>> the attempt to mainline it.
>>>>
>>>> But your patch is not guaranteeing anything, it's just doing a "I want
>>>> this done before the other devices are handled", that's it. There is no
>>>> chance that 100ms is going to be a requirement, or that some other
>>>> device type is not going to come along and demand to be ahead of your
>>>> device in the list.
>>>>
>>>> So you are going to have a constant fight among device types over the
>>>> years, and people complaining that the kernel is now somehow going to
>>>> guarantee that a device is shutdown in a set amount of time, which
>>>> again, the kernel can not guarantee here.
>>>>
>>>> This might work as a one-off for a specific hardware platform, which is
>>>> odd, but not anything you really should be adding for anyone else to use
>>>> here as your reasoning for it does not reflect what the code does.
>>>
>>> I see. Good point.
>>>
>>> In my case umount is not needed, there is not enough time to write down
>>> the data. We should send a shutdown command to the eMMC ASAP.
>>
>> If you don't care about the data, why is a shutdown command to the
>> hardware needed? What does that do that makes anything "safe" if your
>> data is lost.
>
> It prevents HW damage. In a typical automotive under-voltage labor it is
> usually possible to reproduce X amount of bricked eMMCs or NANDs on Y
> amount of under-voltage cycles (I do not have exact numbers right now).
> Even if the numbers not so high in the labor tests (sometimes something
> like one bricked device in a month of tests), the field returns are
> significant enough to care about software solution for this problem.
>
> Same problem was seen not only in automotive devices, but also in
> industrial or agricultural. With other words, it is important enough to bring
> some kind of solution mainline.
>

IMO that is a serious problem with the used storage / eMMC in that case and it
is not suitable for industrial/automotive uses?
Any industrial/automotive-suitable storage device should detect under-voltage and
just treat it as a power-down/loss, and while that isn't nice for the storage device,
it really shouldn't be able to brick a device (within <1M cycles anyway).
What does the storage module vendor say about this?

BR,
Christian

2023-11-27 11:27:47

by Christian Loehle

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On 26/11/2023 19:31, Oleksij Rempel wrote:
> On Sun, Nov 26, 2023 at 10:14:45AM +0000, Mark Brown wrote:
>> On Sat, Nov 25, 2023 at 07:58:12PM +0000, Greg Kroah-Hartman wrote:
>>> On Sat, Nov 25, 2023 at 03:43:02PM +0000, Mark Brown wrote:
>>>> On Sat, Nov 25, 2023 at 02:35:41PM +0000, Greg Kroah-Hartman wrote:
>>
>>>>> That would be great, but I don't see that here, do you? All I see is
>>>>> the shutdown sequence changing because someone wants it to go "faster"
>>>>> with the threat of hardware breaking if we don't meet that "faster"
>>>>> number, yet no knowledge or guarantee that this number can ever be known
>>>>> or happen.
>>
>>>> The idea was to have somewhere to send notifications when the hardware
>>>> starts reporting things like power supplies starting to fail. We do
>>>> have those from hardware, we just don't do anything terribly useful
>>>> with them yet.
>>
>>> Ok, but that's not what I recall this patchset doing, or did I missing
>>> something? All I saw was a "reorder the shutdown sequence" set of
>>> changes. Or at least that's all I remember at this point in time,
>>> sorry, it's been a few days, but at least that lines up with what the
>>> Subject line says above :)
>>
>> That's not in the series, a bunch of it is merged in some form (eg, see
>> hw_protection_shutdown()) and more of it would need to be built on top
>> if this were merged.
>
> The current kernel has enough infrastructure to manage essential functions
> related to hardware protection:
> - The Device Tree specifies the source of interrupts for detecting
> under-voltage events. It also details critical system regulators and some
> of specification of backup power supplied by the board.
> - Various frameworks within the kernel can identify critical hardware
> conditions like over-temperature and under-voltage. Upon detection, these
> frameworks invoke the hw_protection_shutdown() function.
>
>>>>> Agreed, but I don't think this patch is going to actually work properly
>>>>> over time as there is no time values involved :)
>
> If we're to implement a deadline for each shutdown call (as the requirement for
> "time values" suggests?), then prioritization becomes essential. Without
> establishing a shutdown order, the inclusion of time values might not be
> effectively utilized. Am I overlooking anything in this regard?
>
>>>> This seems to be more into the area of mitigation than firm solution, I
>>>> suspect users will be pleased if they can make a noticable dent in the
>>>> number of failures they're seeing.
>>
>>> Mitigation is good, but this patch series is just a hack by doing "throw
>>> this device type at the front of the shutdown list because we have
>>> hardware that crashes a lot" :)
>
> The root of the issue seems to be the choice of primary storage device.
>
> All storage technologies - HDD, SSD, eMMC, NAND - are vulnerable to power
> loss. The only foolproof safeguard is a backup power source, but this
> introduces its own set of challenges:

I disagree and would say that any storage device sold as "industrial" should
guarantee power-fail safety. Plus, you mentioned data loss isn't even your concern,
but the storage device fails/bricks.
>
> 1. Batteries: While they provide a backup, they come with limitations like a
> finite number of charge cycles, sensitivity to temperature (a significant
> concern in industrial and automotive environments), higher costs, and
> increased device size. For most embedded applications, a UPS isn't a viable
> solution.
>
> 2. Capacitors: A potential alternative, but they cannot offer prolonged
> backup time. Increasing the number of capacitors to extend backup time leads
> to additional issues:
> - Increased costs and space requirements on the PCB.
> - The need to manage partially charged capacitors during power failures.
> - The requirement for a power supply capable of rapid charging.
> - The risk of not reaching a safe state before the backup energy
> depletes.
> - In specific environments, like explosive atmospheres, storing large
> amounts of energy can be hazardous.

And also just practically, ensuring a safe power down could be in the order
of a second, so it would be quite a capacitor.

>
> Given these considerations, it's crucial to understand that such design choices
> aren't merely "hacks". They represent a balance between different types of
> trade-offs.
>
>>>> It feels like if we're concerned about mitigating physical damage during
>>>> the process of power failure that's a very limited set of devices - the
>>>> storage case where we're in the middle of writing to flash or whatever
>>>> is the most obvious case.
>>
>>> Then why isn't userspace handling this? This is a policy decision that
>>> it needs to take to properly know what hardware needs to be shut down,
>>> and what needs to happen in order to do that (i.e. flush, unmount,
>>> etc.?) And userspace today should be able to say, "power down this
>>> device now!" for any device in the system based on the sysfs device
>>> tree, or at the very least, force it to a specific power state. So why
>>> not handle this policy there?
>>
>> Given the tight timelines it does seem reasonable to have some of this
>> in the kernel - the specific decisions about how to handle these events
>> can always be controlled from userspace (eg, with a sysfs file like we
>> do for autosuspend delay times which seem to be in a similar ballpark).
>
> Upon investigating the feasibility of a user space solution for eMMC
> power control, I've concluded that it's likely not possible. The primary
> issue is that most board designs don't include reset signaling for
> eMMCs. Additionally, the eMMC power rail is usually linked to the
> system's main power controller. While powering off is doable, cleanly
> powering it back on isn’t feasible. This is especially problematic when
> the rootfs is located on the eMMC, as power cycling the storage device
> could lead to system instability.
>
> Therefore, any user space method to power off eMMC wouldn't be reliable
> or safe, as there's no way to ensure it can be turned back on without
> risking the integrity of the system. The design rationale is clear:
> avoiding the risks associated with powering off the primary storage
> device.
>
> Considering these constraints, the only practical implementation I see
> is integrating this functionality into the system's shutdown sequence.
> This approach ensures a controlled environment for powering off the
> eMMC, avoiding potential issues.

You don't need the RST signal, in fact even if you had it it would be
the wrong thing to do. (Implementation is vendor-specific but RST
assumes that eMMCs' VCC and VCCQ are left untouched.)
You can try turning off eMMC cache completely and/or sending power down
notification on 'emergency shutdown', but since power-loss/fail behavior
is vendor-specific asking the storage device vendor how to ensure a safe
power-down.
Anyway the proper eMMC power-down methods are up to a second in timeouts,
so infeasible for your requirements from what I can see.

BR,
Christian

2023-11-27 11:36:35

by Oleksij Rempel

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Mon, Nov 27, 2023 at 10:13:49AM +0000, Christian Loehle wrote:
> > Same problem was seen not only in automotive devices, but also in
> > industrial or agricultural. With other words, it is important enough to bring
> > some kind of solution mainline.
> >
>
> IMO that is a serious problem with the used storage / eMMC in that case and it
> is not suitable for industrial/automotive uses?
> Any industrial/automotive-suitable storage device should detect under-voltage and
> just treat it as a power-down/loss, and while that isn't nice for the storage device,
> it really shouldn't be able to brick a device (within <1M cycles anyway).
> What does the storage module vendor say about this?

Good question. I do not have insights ATM. I'll forward it.

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2023-11-27 11:44:38

by Oleksij Rempel

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Mon, Nov 27, 2023 at 11:27:31AM +0000, Christian Loehle wrote:
> On 26/11/2023 19:31, Oleksij Rempel wrote:
> > On Sun, Nov 26, 2023 at 10:14:45AM +0000, Mark Brown wrote:
> >> On Sat, Nov 25, 2023 at 07:58:12PM +0000, Greg Kroah-Hartman wrote:
> >>> On Sat, Nov 25, 2023 at 03:43:02PM +0000, Mark Brown wrote:
> >>>> On Sat, Nov 25, 2023 at 02:35:41PM +0000, Greg Kroah-Hartman wrote:
> >>
> >>>>> That would be great, but I don't see that here, do you? All I see is
> >>>>> the shutdown sequence changing because someone wants it to go "faster"
> >>>>> with the threat of hardware breaking if we don't meet that "faster"
> >>>>> number, yet no knowledge or guarantee that this number can ever be known
> >>>>> or happen.
> >>
> >>>> The idea was to have somewhere to send notifications when the hardware
> >>>> starts reporting things like power supplies starting to fail. We do
> >>>> have those from hardware, we just don't do anything terribly useful
> >>>> with them yet.
> >>
> >>> Ok, but that's not what I recall this patchset doing, or did I missing
> >>> something? All I saw was a "reorder the shutdown sequence" set of
> >>> changes. Or at least that's all I remember at this point in time,
> >>> sorry, it's been a few days, but at least that lines up with what the
> >>> Subject line says above :)
> >>
> >> That's not in the series, a bunch of it is merged in some form (eg, see
> >> hw_protection_shutdown()) and more of it would need to be built on top
> >> if this were merged.
> >
> > The current kernel has enough infrastructure to manage essential functions
> > related to hardware protection:
> > - The Device Tree specifies the source of interrupts for detecting
> > under-voltage events. It also details critical system regulators and some
> > of specification of backup power supplied by the board.
> > - Various frameworks within the kernel can identify critical hardware
> > conditions like over-temperature and under-voltage. Upon detection, these
> > frameworks invoke the hw_protection_shutdown() function.
> >
> >>>>> Agreed, but I don't think this patch is going to actually work properly
> >>>>> over time as there is no time values involved :)
> >
> > If we're to implement a deadline for each shutdown call (as the requirement for
> > "time values" suggests?), then prioritization becomes essential. Without
> > establishing a shutdown order, the inclusion of time values might not be
> > effectively utilized. Am I overlooking anything in this regard?
> >
> >>>> This seems to be more into the area of mitigation than firm solution, I
> >>>> suspect users will be pleased if they can make a noticable dent in the
> >>>> number of failures they're seeing.
> >>
> >>> Mitigation is good, but this patch series is just a hack by doing "throw
> >>> this device type at the front of the shutdown list because we have
> >>> hardware that crashes a lot" :)
> >
> > The root of the issue seems to be the choice of primary storage device.
> >
> > All storage technologies - HDD, SSD, eMMC, NAND - are vulnerable to power
> > loss. The only foolproof safeguard is a backup power source, but this
> > introduces its own set of challenges:
>
> I disagree and would say that any storage device sold as "industrial" should
> guarantee power-fail safety. Plus, you mentioned data loss isn't even your concern,
> but the storage device fails/bricks.
> >
> > 1. Batteries: While they provide a backup, they come with limitations like a
> > finite number of charge cycles, sensitivity to temperature (a significant
> > concern in industrial and automotive environments), higher costs, and
> > increased device size. For most embedded applications, a UPS isn't a viable
> > solution.
> >
> > 2. Capacitors: A potential alternative, but they cannot offer prolonged
> > backup time. Increasing the number of capacitors to extend backup time leads
> > to additional issues:
> > - Increased costs and space requirements on the PCB.
> > - The need to manage partially charged capacitors during power failures.
> > - The requirement for a power supply capable of rapid charging.
> > - The risk of not reaching a safe state before the backup energy
> > depletes.
> > - In specific environments, like explosive atmospheres, storing large
> > amounts of energy can be hazardous.
>
> And also just practically, ensuring a safe power down could be in the order
> of a second, so it would be quite a capacitor.
>
> >
> > Given these considerations, it's crucial to understand that such design choices
> > aren't merely "hacks". They represent a balance between different types of
> > trade-offs.
> >
> >>>> It feels like if we're concerned about mitigating physical damage during
> >>>> the process of power failure that's a very limited set of devices - the
> >>>> storage case where we're in the middle of writing to flash or whatever
> >>>> is the most obvious case.
> >>
> >>> Then why isn't userspace handling this? This is a policy decision that
> >>> it needs to take to properly know what hardware needs to be shut down,
> >>> and what needs to happen in order to do that (i.e. flush, unmount,
> >>> etc.?) And userspace today should be able to say, "power down this
> >>> device now!" for any device in the system based on the sysfs device
> >>> tree, or at the very least, force it to a specific power state. So why
> >>> not handle this policy there?
> >>
> >> Given the tight timelines it does seem reasonable to have some of this
> >> in the kernel - the specific decisions about how to handle these events
> >> can always be controlled from userspace (eg, with a sysfs file like we
> >> do for autosuspend delay times which seem to be in a similar ballpark).
> >
> > Upon investigating the feasibility of a user space solution for eMMC
> > power control, I've concluded that it's likely not possible. The primary
> > issue is that most board designs don't include reset signaling for
> > eMMCs. Additionally, the eMMC power rail is usually linked to the
> > system's main power controller. While powering off is doable, cleanly
> > powering it back on isn’t feasible. This is especially problematic when
> > the rootfs is located on the eMMC, as power cycling the storage device
> > could lead to system instability.
> >
> > Therefore, any user space method to power off eMMC wouldn't be reliable
> > or safe, as there's no way to ensure it can be turned back on without
> > risking the integrity of the system. The design rationale is clear:
> > avoiding the risks associated with powering off the primary storage
> > device.
> >
> > Considering these constraints, the only practical implementation I see
> > is integrating this functionality into the system's shutdown sequence.
> > This approach ensures a controlled environment for powering off the
> > eMMC, avoiding potential issues.
>
> You don't need the RST signal, in fact even if you had it it would be
> the wrong thing to do. (Implementation is vendor-specific but RST
> assumes that eMMCs' VCC and VCCQ are left untouched.)

It means, if VCC and VCCQ are off on reboot or watchdog reset, there is
potentially bigger problem?

> You can try turning off eMMC cache completely and/or sending power down
> notification on 'emergency shutdown', but since power-loss/fail behavior
> is vendor-specific asking the storage device vendor how to ensure a safe
> power-down.
> Anyway the proper eMMC power-down methods are up to a second in timeouts,
> so infeasible for your requirements from what I can see.

Ok. So, increasing capacity at least to one second should be main goal
for now? But even if capacity is increased, emergency shutdown should
notify eMMCs as early as possible?

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2023-11-27 11:58:05

by Christian Loehle

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On 27/11/2023 11:44, Oleksij Rempel wrote:
> On Mon, Nov 27, 2023 at 11:27:31AM +0000, Christian Loehle wrote:
>> On 26/11/2023 19:31, Oleksij Rempel wrote:
>>> On Sun, Nov 26, 2023 at 10:14:45AM +0000, Mark Brown wrote:
>>>> On Sat, Nov 25, 2023 at 07:58:12PM +0000, Greg Kroah-Hartman wrote:
>>>>> On Sat, Nov 25, 2023 at 03:43:02PM +0000, Mark Brown wrote:
>>>>>> On Sat, Nov 25, 2023 at 02:35:41PM +0000, Greg Kroah-Hartman wrote:
>>>>
>>>>>>> That would be great, but I don't see that here, do you? All I see is
>>>>>>> the shutdown sequence changing because someone wants it to go "faster"
>>>>>>> with the threat of hardware breaking if we don't meet that "faster"
>>>>>>> number, yet no knowledge or guarantee that this number can ever be known
>>>>>>> or happen.
>>>>
>>>>>> The idea was to have somewhere to send notifications when the hardware
>>>>>> starts reporting things like power supplies starting to fail. We do
>>>>>> have those from hardware, we just don't do anything terribly useful
>>>>>> with them yet.
>>>>
>>>>> Ok, but that's not what I recall this patchset doing, or did I missing
>>>>> something? All I saw was a "reorder the shutdown sequence" set of
>>>>> changes. Or at least that's all I remember at this point in time,
>>>>> sorry, it's been a few days, but at least that lines up with what the
>>>>> Subject line says above :)
>>>>
>>>> That's not in the series, a bunch of it is merged in some form (eg, see
>>>> hw_protection_shutdown()) and more of it would need to be built on top
>>>> if this were merged.
>>>
>>> The current kernel has enough infrastructure to manage essential functions
>>> related to hardware protection:
>>> - The Device Tree specifies the source of interrupts for detecting
>>> under-voltage events. It also details critical system regulators and some
>>> of specification of backup power supplied by the board.
>>> - Various frameworks within the kernel can identify critical hardware
>>> conditions like over-temperature and under-voltage. Upon detection, these
>>> frameworks invoke the hw_protection_shutdown() function.
>>>
>>>>>>> Agreed, but I don't think this patch is going to actually work properly
>>>>>>> over time as there is no time values involved :)
>>>
>>> If we're to implement a deadline for each shutdown call (as the requirement for
>>> "time values" suggests?), then prioritization becomes essential. Without
>>> establishing a shutdown order, the inclusion of time values might not be
>>> effectively utilized. Am I overlooking anything in this regard?
>>>
>>>>>> This seems to be more into the area of mitigation than firm solution, I
>>>>>> suspect users will be pleased if they can make a noticable dent in the
>>>>>> number of failures they're seeing.
>>>>
>>>>> Mitigation is good, but this patch series is just a hack by doing "throw
>>>>> this device type at the front of the shutdown list because we have
>>>>> hardware that crashes a lot" :)
>>>
>>> The root of the issue seems to be the choice of primary storage device.
>>>
>>> All storage technologies - HDD, SSD, eMMC, NAND - are vulnerable to power
>>> loss. The only foolproof safeguard is a backup power source, but this
>>> introduces its own set of challenges:
>>
>> I disagree and would say that any storage device sold as "industrial" should
>> guarantee power-fail safety. Plus, you mentioned data loss isn't even your concern,
>> but the storage device fails/bricks.
>>>
>>> 1. Batteries: While they provide a backup, they come with limitations like a
>>> finite number of charge cycles, sensitivity to temperature (a significant
>>> concern in industrial and automotive environments), higher costs, and
>>> increased device size. For most embedded applications, a UPS isn't a viable
>>> solution.
>>>
>>> 2. Capacitors: A potential alternative, but they cannot offer prolonged
>>> backup time. Increasing the number of capacitors to extend backup time leads
>>> to additional issues:
>>> - Increased costs and space requirements on the PCB.
>>> - The need to manage partially charged capacitors during power failures.
>>> - The requirement for a power supply capable of rapid charging.
>>> - The risk of not reaching a safe state before the backup energy
>>> depletes.
>>> - In specific environments, like explosive atmospheres, storing large
>>> amounts of energy can be hazardous.
>>
>> And also just practically, ensuring a safe power down could be in the order
>> of a second, so it would be quite a capacitor.
>>
>>>
>>> Given these considerations, it's crucial to understand that such design choices
>>> aren't merely "hacks". They represent a balance between different types of
>>> trade-offs.
>>>
>>>>>> It feels like if we're concerned about mitigating physical damage during
>>>>>> the process of power failure that's a very limited set of devices - the
>>>>>> storage case where we're in the middle of writing to flash or whatever
>>>>>> is the most obvious case.
>>>>
>>>>> Then why isn't userspace handling this? This is a policy decision that
>>>>> it needs to take to properly know what hardware needs to be shut down,
>>>>> and what needs to happen in order to do that (i.e. flush, unmount,
>>>>> etc.?) And userspace today should be able to say, "power down this
>>>>> device now!" for any device in the system based on the sysfs device
>>>>> tree, or at the very least, force it to a specific power state. So why
>>>>> not handle this policy there?
>>>>
>>>> Given the tight timelines it does seem reasonable to have some of this
>>>> in the kernel - the specific decisions about how to handle these events
>>>> can always be controlled from userspace (eg, with a sysfs file like we
>>>> do for autosuspend delay times which seem to be in a similar ballpark).
>>>
>>> Upon investigating the feasibility of a user space solution for eMMC
>>> power control, I've concluded that it's likely not possible. The primary
>>> issue is that most board designs don't include reset signaling for
>>> eMMCs. Additionally, the eMMC power rail is usually linked to the
>>> system's main power controller. While powering off is doable, cleanly
>>> powering it back on isn’t feasible. This is especially problematic when
>>> the rootfs is located on the eMMC, as power cycling the storage device
>>> could lead to system instability.
>>>
>>> Therefore, any user space method to power off eMMC wouldn't be reliable
>>> or safe, as there's no way to ensure it can be turned back on without
>>> risking the integrity of the system. The design rationale is clear:
>>> avoiding the risks associated with powering off the primary storage
>>> device.
>>>
>>> Considering these constraints, the only practical implementation I see
>>> is integrating this functionality into the system's shutdown sequence.
>>> This approach ensures a controlled environment for powering off the
>>> eMMC, avoiding potential issues.
>>
>> You don't need the RST signal, in fact even if you had it it would be
>> the wrong thing to do. (Implementation is vendor-specific but RST
>> assumes that eMMCs' VCC and VCCQ are left untouched.)
>
> It means, if VCC and VCCQ are off on reboot or watchdog reset, there is
> potentially bigger problem?

Just to confirm I was talking about EMMC_RST signal, which I understood you
where talking about, too?
Sending a EMMC_RST pulse does not have to trigger a safe shutdown for the eMMC,
(again it could), but both VCC and VCCQ should be left untouched and stable.
(Otherwise with the short timeout on EMMC_RST you might as well toggle EMMC_VCC.)
If you toggle EMMC_VCC and EMMC_VCCQ on system reboot/reset is a design choice,
but if your eMMC module has trouble with 'sudden' power-loss, leaving them
on could be beneficial?
Anyway definitely not required and not really related to your issue.

>
>> You can try turning off eMMC cache completely and/or sending power down
>> notification on 'emergency shutdown', but since power-loss/fail behavior
>> is vendor-specific asking the storage device vendor how to ensure a safe
>> power-down.
>> Anyway the proper eMMC power-down methods are up to a second in timeouts,
>> so infeasible for your requirements from what I can see.
>
> Ok. So, increasing capacity at least to one second should be main goal
> for now? But even if capacity is increased, emergency shutdown should
> notify eMMCs as early as possible?

Well if that is an option that is a path to explore for sure, that is,
if you don't want to switch the eMMC to something more fitting.
Yes you would need to notify the eMMC somehow, otherwise you just delayed
the power-fail for a second.
Sending a sleep, power-down notification or flushing the cache could all be
ways to trigger a safe shutdown for the eMMC, but again, you really would have
to confirm this with the eMMC vendor, as all of these can essentially be
implemented as a NOP (apart from the state-machine transition) and be
spec-compliant.

BR,
Christian

2023-11-27 12:54:51

by Matti Vaittinen

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

pe 24. marrask. 2023 klo 19.26 Greg Kroah-Hartman
([email protected]) kirjoitti:
>
> On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote:
> > On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote:
> > > On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote:
> > > > On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote:
> > > > > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote:
> > > >
> > > > > > This came out of some discussions about trying to handle emergency power
> > > > > > failure notifications.
> > > >
> > > > > I'm sorry, but I don't know what that means. Are you saying that the
> > > > > kernel is now going to try to provide a hard guarantee that some devices
> > > > > are going to be shut down in X number of seconds when asked? If so, why
> > > > > not do this in userspace?
> > > >
> > > > No, it was initially (or when I initially saw it anyway) handling of
> > > > notifications from regulators that they're in trouble and we have some
> > > > small amount of time to do anything we might want to do about it before
> > > > we expire.
> > >
> > > So we are going to guarantee a "time" in which we are going to do
> > > something? Again, if that's required, why not do it in userspace using
> > > a RT kernel?
> >
> > For the HW in question I have only 100ms time before power loss. By
> > doing it over use space some we will have even less time to react.
>
> Why can't userspace react that fast? Why will the kernel be somehow
> faster? Speed should be the same, just get the "power is cut" signal
> and have userspace flush and unmount the disk before power is gone. Why
> can the kernel do this any differently?
>
> > In fact, this is not a new requirement. It exist on different flavors of
> > automotive Linux for about 10 years. Linux in cars should be able to
> > handle voltage drops for example on ignition and so on. The only new thing is
> > the attempt to mainline it.
>
> But your patch is not guaranteeing anything, it's just doing a "I want
> this done before the other devices are handled", that's it. There is no
> chance that 100ms is going to be a requirement, or that some other
> device type is not going to come along and demand to be ahead of your
> device in the list.
>
> So you are going to have a constant fight among device types over the
> years, and people complaining that the kernel is now somehow going to
> guarantee that a device is shutdown in a set amount of time, which
> again, the kernel can not guarantee here.
>
> This might work as a one-off for a specific hardware platform, which is
> odd, but not anything you really should be adding for anyone else to use
> here as your reasoning for it does not reflect what the code does.

I was (am) interested in knowing how/where the regulator error
notifications are utilized - hence I asked this in ELCE last summer.
Replies indeed mostly pointed to automotive and handling the under
voltage events.

As to what has changed (I think this was asked in another mail on this
topic) - I understood from the discussions that the demand of running
systems with as low power as possible is even more
important/desirable. Hence, the under-voltage events are more usual
than they were when cars used to be working by burning flammable
liquids :)

Anyways, what I thought I'd comment on is that the severity of the
regulator error notifications can be given from device-tree. Rationale
behind this is that figuring out whether a certain detected problem is
fatal or not (in embedded systems) should be done by the board
designers, per board. Maybe the understanding which hardware should
react first is also a property of hardware and could come from the
device-tree? Eg, instead of having a "DEVICE_SHUTDOWN_PRIO_STORAGE"
set unconditionally for EMMC, systems could set shutdown priority per
board and per device explicitly using device-tree?

Yours,
-- Matti

--

Matti Vaittinen
Linux kernel developer at ROHM Semiconductors
Oulu Finland

~~ When things go utterly wrong vim users can always type :help! ~~

2023-11-27 13:54:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Mon, Nov 27, 2023 at 02:54:21PM +0200, Matti Vaittinen wrote:
> pe 24. marrask. 2023 klo 19.26 Greg Kroah-Hartman
> ([email protected]) kirjoitti:
> >
> > On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote:
> > > On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote:
> > > > On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote:
> > > > > On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote:
> > > > > > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote:
> > > > >
> > > > > > > This came out of some discussions about trying to handle emergency power
> > > > > > > failure notifications.
> > > > >
> > > > > > I'm sorry, but I don't know what that means. Are you saying that the
> > > > > > kernel is now going to try to provide a hard guarantee that some devices
> > > > > > are going to be shut down in X number of seconds when asked? If so, why
> > > > > > not do this in userspace?
> > > > >
> > > > > No, it was initially (or when I initially saw it anyway) handling of
> > > > > notifications from regulators that they're in trouble and we have some
> > > > > small amount of time to do anything we might want to do about it before
> > > > > we expire.
> > > >
> > > > So we are going to guarantee a "time" in which we are going to do
> > > > something? Again, if that's required, why not do it in userspace using
> > > > a RT kernel?
> > >
> > > For the HW in question I have only 100ms time before power loss. By
> > > doing it over use space some we will have even less time to react.
> >
> > Why can't userspace react that fast? Why will the kernel be somehow
> > faster? Speed should be the same, just get the "power is cut" signal
> > and have userspace flush and unmount the disk before power is gone. Why
> > can the kernel do this any differently?
> >
> > > In fact, this is not a new requirement. It exist on different flavors of
> > > automotive Linux for about 10 years. Linux in cars should be able to
> > > handle voltage drops for example on ignition and so on. The only new thing is
> > > the attempt to mainline it.
> >
> > But your patch is not guaranteeing anything, it's just doing a "I want
> > this done before the other devices are handled", that's it. There is no
> > chance that 100ms is going to be a requirement, or that some other
> > device type is not going to come along and demand to be ahead of your
> > device in the list.
> >
> > So you are going to have a constant fight among device types over the
> > years, and people complaining that the kernel is now somehow going to
> > guarantee that a device is shutdown in a set amount of time, which
> > again, the kernel can not guarantee here.
> >
> > This might work as a one-off for a specific hardware platform, which is
> > odd, but not anything you really should be adding for anyone else to use
> > here as your reasoning for it does not reflect what the code does.
>
> I was (am) interested in knowing how/where the regulator error
> notifications are utilized - hence I asked this in ELCE last summer.
> Replies indeed mostly pointed to automotive and handling the under
> voltage events.
>
> As to what has changed (I think this was asked in another mail on this
> topic) - I understood from the discussions that the demand of running
> systems with as low power as possible is even more
> important/desirable. Hence, the under-voltage events are more usual
> than they were when cars used to be working by burning flammable
> liquids :)
>
> Anyways, what I thought I'd comment on is that the severity of the
> regulator error notifications can be given from device-tree. Rationale
> behind this is that figuring out whether a certain detected problem is
> fatal or not (in embedded systems) should be done by the board
> designers, per board. Maybe the understanding which hardware should
> react first is also a property of hardware and could come from the
> device-tree? Eg, instead of having a "DEVICE_SHUTDOWN_PRIO_STORAGE"
> set unconditionally for EMMC, systems could set shutdown priority per
> board and per device explicitly using device-tree?

Yes, using device tree would be good, but now you have created something
that is device-tree-specific and not all the world is device tree :(

Also, many devices are finally moving out to non-device-tree busses,
like PCI and USB, so how would you handle them in this type of scheme?

thanks,

greg k-h

2023-11-27 14:18:30

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Sun, Nov 26, 2023 at 08:42:02PM +0100, Ferry Toth wrote:

> Funny discussion. As a hardware engineer (with no experience in automotive,
> but actual experience in industrial applications and debugging issues
> arising from bad shutdowns) let me add my 5ct at the end.

I suspect there's also a space here beyond systems that were designed
with these failure modes in mind where people run into issues once they
have the hardware and are trying to improve what they can after the fact.

> Now, we do need to keep in mind that storing J in a supercap, executing a
> CPU at GHz, storing GB data do not come free. So, after making sure things
> shutdown in time, it often pays off to shorten that deadline, and indeed
> make it faster.

Indeed.


Attachments:
(No filename) (760.00 B)
signature.asc (499.00 B)
Download all attachments

2023-11-27 14:25:27

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Mon, Nov 27, 2023 at 01:08:24PM +0000, Greg Kroah-Hartman wrote:

> Yes, using device tree would be good, but now you have created something
> that is device-tree-specific and not all the world is device tree :(

AFAICT the idiomatic thing for ACPI would be platform quirks based on
DMI information. Yay ACPI. If the system is more Linux targetted then
you can use _DSD properties to store DT properties, these can then be
parsed out in a firmware interface neutral way via the fwnode API. I'm
not sure there's any avoiding dealing with firmware interface specifics
at some point if we need platform description.

> Also, many devices are finally moving out to non-device-tree busses,
> like PCI and USB, so how would you handle them in this type of scheme?

DT does have bindings for devices on discoverable buses like PCI - I
think the original thing was for vendors cheaping out on EEPROMs though
it's also useful when things are soldered down in embedded systems.


Attachments:
(No filename) (992.00 B)
signature.asc (499.00 B)
Download all attachments

2023-11-27 14:50:01

by Matti Vaittinen

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On 11/27/23 15:08, Greg Kroah-Hartman wrote:
> On Mon, Nov 27, 2023 at 02:54:21PM +0200, Matti Vaittinen wrote:
>> pe 24. marrask. 2023 klo 19.26 Greg Kroah-Hartman
>> ([email protected]) kirjoitti:
>>>
>>> On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote:
>>>> On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote:
>>>>> On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote:
>>>>>> On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote:
>>>>>>> On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote:
>>>>>>
>>>>>>>> This came out of some discussions about trying to handle emergency power
>>>>>>>> failure notifications.
>>>>>>
>>>>>>> I'm sorry, but I don't know what that means. Are you saying that the
>>>>>>> kernel is now going to try to provide a hard guarantee that some devices
>>>>>>> are going to be shut down in X number of seconds when asked? If so, why
>>>>>>> not do this in userspace?
>>>>>>
>>>>>> No, it was initially (or when I initially saw it anyway) handling of
>>>>>> notifications from regulators that they're in trouble and we have some
>>>>>> small amount of time to do anything we might want to do about it before
>>>>>> we expire.
>>>>>
>>>>> So we are going to guarantee a "time" in which we are going to do
>>>>> something? Again, if that's required, why not do it in userspace using
>>>>> a RT kernel?
>>>>
>>>> For the HW in question I have only 100ms time before power loss. By
>>>> doing it over use space some we will have even less time to react.
>>>
>>> Why can't userspace react that fast? Why will the kernel be somehow
>>> faster? Speed should be the same, just get the "power is cut" signal
>>> and have userspace flush and unmount the disk before power is gone. Why
>>> can the kernel do this any differently?
>>>
>>>> In fact, this is not a new requirement. It exist on different flavors of
>>>> automotive Linux for about 10 years. Linux in cars should be able to
>>>> handle voltage drops for example on ignition and so on. The only new thing is
>>>> the attempt to mainline it.
>>>
>>> But your patch is not guaranteeing anything, it's just doing a "I want
>>> this done before the other devices are handled", that's it. There is no
>>> chance that 100ms is going to be a requirement, or that some other
>>> device type is not going to come along and demand to be ahead of your
>>> device in the list.
>>>
>>> So you are going to have a constant fight among device types over the
>>> years, and people complaining that the kernel is now somehow going to
>>> guarantee that a device is shutdown in a set amount of time, which
>>> again, the kernel can not guarantee here.
>>>
>>> This might work as a one-off for a specific hardware platform, which is
>>> odd, but not anything you really should be adding for anyone else to use
>>> here as your reasoning for it does not reflect what the code does.
>>
>> I was (am) interested in knowing how/where the regulator error
>> notifications are utilized - hence I asked this in ELCE last summer.
>> Replies indeed mostly pointed to automotive and handling the under
>> voltage events.
>>
>> As to what has changed (I think this was asked in another mail on this
>> topic) - I understood from the discussions that the demand of running
>> systems with as low power as possible is even more
>> important/desirable. Hence, the under-voltage events are more usual
>> than they were when cars used to be working by burning flammable
>> liquids :)
>>
>> Anyways, what I thought I'd comment on is that the severity of the
>> regulator error notifications can be given from device-tree. Rationale
>> behind this is that figuring out whether a certain detected problem is
>> fatal or not (in embedded systems) should be done by the board
>> designers, per board. Maybe the understanding which hardware should
>> react first is also a property of hardware and could come from the
>> device-tree? Eg, instead of having a "DEVICE_SHUTDOWN_PRIO_STORAGE"
>> set unconditionally for EMMC, systems could set shutdown priority per
>> board and per device explicitly using device-tree?
>
> Yes, using device tree would be good, but now you have created something
> that is device-tree-specific and not all the world is device tree :(

True. However, my understanding is that the regulator subsystem is
largely written to work with DT-based systems. Hence supporting the
DT-based solution would probably fit to this specific use-case as source
of problem notifications is the regulator subsystem.

> Also, many devices are finally moving out to non-device-tree busses,
> like PCI and USB, so how would you handle them in this type of scheme?

I do readily admit I don't have [all ;) ] the answers. I also think that
if we add support for prioritized shutdown on device-tree-based systems,
people may eventually want to use this on non device-tree setups too.
There may also be other use-cases for prioritized shutdown (Don't know
what they would be though).

For now I would leave that to be the problem of the folks who need non
device-tree systems when (if) this needs realizes. Assuming there was
the handling of priorities in place, the missing piece would then be to
find out the place to store this hardware specific priority information.
If this is solved for the non DT cases, then the DT-based and non
DT-based solutions can co-exist.

Just a suggestion though. I am not working on under-voltage "stuff"
right now.

Yours,
-- Matti

--
Matti Vaittinen
Linux kernel developer at ROHM Semiconductors
Oulu Finland

~~ When things go utterly wrong vim users can always type :help! ~~

2023-11-27 16:25:51

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Mon, Nov 27, 2023 at 04:49:49PM +0200, Matti Vaittinen wrote:
> On 11/27/23 15:08, Greg Kroah-Hartman wrote:

> > Yes, using device tree would be good, but now you have created something
> > that is device-tree-specific and not all the world is device tree :(

> True. However, my understanding is that the regulator subsystem is largely
> written to work with DT-based systems. Hence supporting the DT-based
> solution would probably fit to this specific use-case as source of problem
> notifications is the regulator subsystem.

Yes, ACPI has a strong model that things like regulators and clocks are
not visible to the OS.


Attachments:
(No filename) (642.00 B)
signature.asc (499.00 B)
Download all attachments

2023-11-30 09:58:00

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

On Fri, 24 Nov 2023 at 15:53, Oleksij Rempel <[email protected]> wrote:
>
> Hi,
>
> This patch series introduces support for prioritized device shutdown.
> The main goal is to enable prioritization for shutting down specific
> devices, particularly crucial in scenarios like power loss where
> hardware damage can occur if not handled properly.
>
> Oleksij Rempel (3):
> driver core: move core part of device_shutdown() to a separate
> function
> driver core: introduce prioritized device shutdown sequence
> mmc: core: increase shutdown priority for MMC devices
>
> drivers/base/core.c | 157 +++++++++++++++++++++++++++--------------
> drivers/mmc/core/bus.c | 2 +
> include/linux/device.h | 51 ++++++++++++-
> kernel/reboot.c | 4 +-
> 4 files changed, 157 insertions(+), 57 deletions(-)
>

Sorry for joining the discussions a bit late! Besides the valuable
feedback that you already received from others (which indicates that
we have quite some work to do in the commit messages to better explain
and justify these changes), I wanted to share my overall thoughts
around this.

So, I fully understand the reason behind the $subject series, as we
unfortunately can't rely on flash-based (NAND/NOR) storage devices
being 100% tolerant to sudden-power failures. Besides for the reasons
already discussed in the thread, the robustness simply depends on the
"quality" of the FTL (flash translation layer) and the NAND/NOR/etc
device it runs.

For example, back in the days when Android showed up, we were testing
YAFFS and UBIFS on rawNAND, which failed miserably after just a few
thousands of power-cycles. It was even worse with ext3/4 on the early
variants of eMMC devices, as those survived only a few hundreds of
power-cycles. Now, I assume this has improved a lot over the years,
but I haven't really verified this myself.

That said, for eMMC and other flash-based storage devices, industrial
or not, I think it would make sense to try to notify the device about
the power-failure, if possible. This would add another level of
mitigation, I think.

From an implementation point of view, it looks to me that the approach
in the $subject series has some potential. Although, rather than
diving into the details, I will defer to review the next version.

Kind regards
Uffe

2023-11-30 21:59:46

by Francesco Dolcini

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] introduce priority-based shutdown support

Hello all,

On Mon, Nov 27, 2023 at 12:36:11PM +0100, Oleksij Rempel wrote:
> On Mon, Nov 27, 2023 at 10:13:49AM +0000, Christian Loehle wrote:
> > > Same problem was seen not only in automotive devices, but also in
> > > industrial or agricultural. With other words, it is important enough to bring
> > > some kind of solution mainline.
> > >
> >
> > IMO that is a serious problem with the used storage / eMMC in that case and it
> > is not suitable for industrial/automotive uses?
> > Any industrial/automotive-suitable storage device should detect under-voltage and
> > just treat it as a power-down/loss, and while that isn't nice for the storage device,
> > it really shouldn't be able to brick a device (within <1M cycles anyway).
> > What does the storage module vendor say about this?
>
> Good question. I do not have insights ATM. I'll forward it.

From personal experience I can tell that bricked eMMC devices happen because of
eMMC controller firmware bugs. You might find some recently committed quirk
on this regard.

Waiting for any additional details you might be able to find, given my past
experience I would agree with what Christian wrote.

Francesco