2022-05-18 04:46:28

by Tanjore Suresh

[permalink] [raw]
Subject: [PATCH v3 0/3] Asynchronous shutdown interface and example implementation

Problem:

Some of our machines are configured with many NVMe devices and
are validated for strict shutdown time requirements. Each NVMe
device plugged into the system, typicaly takes about 4.5 secs
to shutdown. A system with 16 such NVMe devices will takes
approximately 80 secs to shutdown and go through reboot.

The current shutdown APIs as defined at bus level is defined to be
synchronous. Therefore, more devices are in the system the greater
the time it takes to shutdown. This shutdown time significantly
contributes the machine reboot time.

Solution:

This patch set proposes an asynchronous shutdown interface at bus level,
modifies the core driver, device shutdown routine to exploit the
new interface while maintaining backward compatibility with synchronous
implementation already existing (Patch 1 of 3) and exploits new interface
to enable all PCI-E based devices to use asynchronous interface semantics
if necessary (Patch 2 of 3). The implementation at PCI-E level also works
in a backward compatible way, to allow exiting device implementation
to work with current synchronous semantics. Only show cases an example
implementation for NVMe device to exploit this asynchronous shutdown
interface. (Patch 3 of 3).

Changelog:

v2: - Replaced the shutdown_pre & shutdown_post entry point names with the
recommended names (async_shutdown_start and asynch_shutdown_end).

- Comment about ordering requirements between bridge shutdown versus
leaf/endpoint shutdown was agreed to be different when calling
async_shutdown_start and async_shutdown_end. Now this implements the
same order of calling both start and end entry points.

v3: - This notes clarifies why power management framework was not
considered for implementing this shutdown optimization.
There is no code change done. This change notes clarfies
the reasoning only.

This patch is only for shutdown of the system. The shutdown
entry points are traditionally have different requirement
where all devices are brought to a quiescent state and then
system power may be removed (power down request scenarios)
and also the same entry point is used to shutdown all devices
and re-initialized and restarted (soft shutdown/reboot
scenarios).

Whereas, the device power management (dpm) allows the device
to bring down any device configured in the system that may be
idle to various low power states that the device may support
in a selective manner and based on transitions that device
implementation allows. The power state transitions initiated
by the system can be achieved using 'dpm' interfaces already
specified.

Therefore the request to use the 'dpm' interface to achieve
this shutdown optimization is not the right approach as the
suggested interface is meant to solve an orthogonal requirement
and have historically been kept separate from the shutdown entry
points defined and its associated semantics.

Tanjore Suresh (3):
driver core: Support asynchronous driver shutdown
PCI: Support asynchronous shutdown
nvme: Add async shutdown support

drivers/base/core.c | 38 +++++++++++++++++-
drivers/nvme/host/core.c | 28 +++++++++----
drivers/nvme/host/nvme.h | 8 ++++
drivers/nvme/host/pci.c | 80 ++++++++++++++++++++++++--------------
drivers/pci/pci-driver.c | 20 ++++++++--
include/linux/device/bus.h | 12 ++++++
include/linux/pci.h | 4 ++
7 files changed, 149 insertions(+), 41 deletions(-)

--
2.36.0.550.gb090851708-goog



2022-05-18 04:53:36

by Tanjore Suresh

[permalink] [raw]
Subject: [PATCH v3 1/3] driver core: Support asynchronous driver shutdown

This changes the bus driver interface with additional entry points
to enable devices to implement asynchronous shutdown. The existing
synchronous interface to shutdown is unmodified and retained for
backward compatibility.

This changes the common device shutdown code to enable devices to
participate in asynchronous shutdown implementation.

Signed-off-by: Tanjore Suresh <[email protected]>
---
drivers/base/core.c | 38 +++++++++++++++++++++++++++++++++++++-
include/linux/device/bus.h | 12 ++++++++++++
2 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 3d6430eb0c6a..ba267ae70a22 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -4479,6 +4479,7 @@ EXPORT_SYMBOL_GPL(device_change_owner);
void device_shutdown(void)
{
struct device *dev, *parent;
+ LIST_HEAD(async_shutdown_list);

wait_for_device_probe();
device_block_probing();
@@ -4523,7 +4524,13 @@ void device_shutdown(void)
dev_info(dev, "shutdown_pre\n");
dev->class->shutdown_pre(dev);
}
- if (dev->bus && dev->bus->shutdown) {
+ if (dev->bus && dev->bus->async_shutdown_start) {
+ if (initcall_debug)
+ dev_info(dev, "async_shutdown_start\n");
+ dev->bus->async_shutdown_start(dev);
+ list_add_tail(&dev->kobj.entry,
+ &async_shutdown_list);
+ } else if (dev->bus && dev->bus->shutdown) {
if (initcall_debug)
dev_info(dev, "shutdown\n");
dev->bus->shutdown(dev);
@@ -4543,6 +4550,35 @@ void device_shutdown(void)
spin_lock(&devices_kset->list_lock);
}
spin_unlock(&devices_kset->list_lock);
+
+ /*
+ * Second pass spin for only devices, that have configured
+ * Asynchronous shutdown.
+ */
+ while (!list_empty(&async_shutdown_list)) {
+ dev = list_entry(async_shutdown_list.next, struct device,
+ kobj.entry);
+ parent = get_device(dev->parent);
+ get_device(dev);
+ /*
+ * Make sure the device is off the list
+ */
+ list_del_init(&dev->kobj.entry);
+ if (parent)
+ device_lock(parent);
+ device_lock(dev);
+ if (dev->bus && dev->bus->async_shutdown_end) {
+ if (initcall_debug)
+ dev_info(dev,
+ "async_shutdown_end called\n");
+ dev->bus->async_shutdown_end(dev);
+ }
+ device_unlock(dev);
+ if (parent)
+ device_unlock(parent);
+ put_device(dev);
+ put_device(parent);
+ }
}

/*
diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
index a039ab809753..f582c9d21515 100644
--- a/include/linux/device/bus.h
+++ b/include/linux/device/bus.h
@@ -49,6 +49,16 @@ struct fwnode_handle;
* will never get called until they do.
* @remove: Called when a device removed from this bus.
* @shutdown: Called at shut-down time to quiesce the device.
+ * @async_shutdown_start: Called at the shutdown-time to start
+ * the shutdown process on the device.
+ * This entry point will be called only
+ * when the bus driver has indicated it would
+ * like to participate in asynchronous shutdown
+ * completion.
+ * @async_shutdown_end: Called at shutdown-time to complete the shutdown
+ * process of the device. This entry point will be called
+ * only when the bus drive has indicated it would like to
+ * participate in the asynchronous shutdown completion.
*
* @online: Called to put the device back online (after offlining it).
* @offline: Called to put the device offline for hot-removal. May fail.
@@ -93,6 +103,8 @@ struct bus_type {
void (*sync_state)(struct device *dev);
void (*remove)(struct device *dev);
void (*shutdown)(struct device *dev);
+ void (*async_shutdown_start)(struct device *dev);
+ void (*async_shutdown_end)(struct device *dev);

int (*online)(struct device *dev);
int (*offline)(struct device *dev);
--
2.36.0.550.gb090851708-goog


2022-05-18 11:40:01

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] driver core: Support asynchronous driver shutdown

On Wed, May 18, 2022 at 12:08 AM Tanjore Suresh <[email protected]> wrote:
>
> This changes the bus driver interface with additional entry points
> to enable devices to implement asynchronous shutdown. The existing
> synchronous interface to shutdown is unmodified and retained for
> backward compatibility.
>
> This changes the common device shutdown code to enable devices to
> participate in asynchronous shutdown implementation.
>
> Signed-off-by: Tanjore Suresh <[email protected]>
> ---
> drivers/base/core.c | 38 +++++++++++++++++++++++++++++++++++++-
> include/linux/device/bus.h | 12 ++++++++++++
> 2 files changed, 49 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 3d6430eb0c6a..ba267ae70a22 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -4479,6 +4479,7 @@ EXPORT_SYMBOL_GPL(device_change_owner);
> void device_shutdown(void)
> {
> struct device *dev, *parent;
> + LIST_HEAD(async_shutdown_list);
>
> wait_for_device_probe();
> device_block_probing();
> @@ -4523,7 +4524,13 @@ void device_shutdown(void)
> dev_info(dev, "shutdown_pre\n");
> dev->class->shutdown_pre(dev);
> }
> - if (dev->bus && dev->bus->shutdown) {
> + if (dev->bus && dev->bus->async_shutdown_start) {
> + if (initcall_debug)
> + dev_info(dev, "async_shutdown_start\n");
> + dev->bus->async_shutdown_start(dev);
> + list_add_tail(&dev->kobj.entry,
> + &async_shutdown_list);
> + } else if (dev->bus && dev->bus->shutdown) {
> if (initcall_debug)
> dev_info(dev, "shutdown\n");
> dev->bus->shutdown(dev);
> @@ -4543,6 +4550,35 @@ void device_shutdown(void)
> spin_lock(&devices_kset->list_lock);
> }
> spin_unlock(&devices_kset->list_lock);
> +
> + /*
> + * Second pass spin for only devices, that have configured
> + * Asynchronous shutdown.
> + */
> + while (!list_empty(&async_shutdown_list)) {
> + dev = list_entry(async_shutdown_list.next, struct device,
> + kobj.entry);
> + parent = get_device(dev->parent);
> + get_device(dev);
> + /*
> + * Make sure the device is off the list
> + */
> + list_del_init(&dev->kobj.entry);
> + if (parent)
> + device_lock(parent);
> + device_lock(dev);
> + if (dev->bus && dev->bus->async_shutdown_end) {
> + if (initcall_debug)
> + dev_info(dev,
> + "async_shutdown_end called\n");
> + dev->bus->async_shutdown_end(dev);
> + }
> + device_unlock(dev);
> + if (parent)
> + device_unlock(parent);
> + put_device(dev);
> + put_device(parent);
> + }
> }
>
> /*
> diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
> index a039ab809753..f582c9d21515 100644
> --- a/include/linux/device/bus.h
> +++ b/include/linux/device/bus.h
> @@ -49,6 +49,16 @@ struct fwnode_handle;
> * will never get called until they do.
> * @remove: Called when a device removed from this bus.
> * @shutdown: Called at shut-down time to quiesce the device.
> + * @async_shutdown_start: Called at the shutdown-time to start
> + * the shutdown process on the device.
> + * This entry point will be called only
> + * when the bus driver has indicated it would
> + * like to participate in asynchronous shutdown
> + * completion.
> + * @async_shutdown_end: Called at shutdown-time to complete the shutdown
> + * process of the device. This entry point will be called
> + * only when the bus drive has indicated it would like to
> + * participate in the asynchronous shutdown completion.

I'm going to repeat my point here, but only once.

I see no reason to do async shutdown this way, instead of adding a
flag for drivers to opt in for calling their existing shutdown
callbacks asynchronously, in analogy with the async suspend and resume
implementation.

Is there any reason why this is not viable?

> *
> * @online: Called to put the device back online (after offlining it).
> * @offline: Called to put the device offline for hot-removal. May fail.
> @@ -93,6 +103,8 @@ struct bus_type {
> void (*sync_state)(struct device *dev);
> void (*remove)(struct device *dev);
> void (*shutdown)(struct device *dev);
> + void (*async_shutdown_start)(struct device *dev);
> + void (*async_shutdown_end)(struct device *dev);
>
> int (*online)(struct device *dev);
> int (*offline)(struct device *dev);
> --
> 2.36.0.550.gb090851708-goog
>

2022-05-18 17:50:19

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] driver core: Support asynchronous driver shutdown

On Wed, May 18, 2022 at 01:38:49PM +0200, Rafael J. Wysocki wrote:
> On Wed, May 18, 2022 at 12:08 AM Tanjore Suresh <[email protected]> wrote:
> >
> > This changes the bus driver interface with additional entry points
> > to enable devices to implement asynchronous shutdown. The existing
> > synchronous interface to shutdown is unmodified and retained for
> > backward compatibility.
> >
> > This changes the common device shutdown code to enable devices to
> > participate in asynchronous shutdown implementation.
> >
> > Signed-off-by: Tanjore Suresh <[email protected]>
> > ---
> > drivers/base/core.c | 38 +++++++++++++++++++++++++++++++++++++-
> > include/linux/device/bus.h | 12 ++++++++++++
> > 2 files changed, 49 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index 3d6430eb0c6a..ba267ae70a22 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -4479,6 +4479,7 @@ EXPORT_SYMBOL_GPL(device_change_owner);
> > void device_shutdown(void)
> > {
> > struct device *dev, *parent;
> > + LIST_HEAD(async_shutdown_list);
> >
> > wait_for_device_probe();
> > device_block_probing();
> > @@ -4523,7 +4524,13 @@ void device_shutdown(void)
> > dev_info(dev, "shutdown_pre\n");
> > dev->class->shutdown_pre(dev);
> > }
> > - if (dev->bus && dev->bus->shutdown) {
> > + if (dev->bus && dev->bus->async_shutdown_start) {
> > + if (initcall_debug)
> > + dev_info(dev, "async_shutdown_start\n");
> > + dev->bus->async_shutdown_start(dev);
> > + list_add_tail(&dev->kobj.entry,
> > + &async_shutdown_list);
> > + } else if (dev->bus && dev->bus->shutdown) {
> > if (initcall_debug)
> > dev_info(dev, "shutdown\n");
> > dev->bus->shutdown(dev);
> > @@ -4543,6 +4550,35 @@ void device_shutdown(void)
> > spin_lock(&devices_kset->list_lock);
> > }
> > spin_unlock(&devices_kset->list_lock);
> > +
> > + /*
> > + * Second pass spin for only devices, that have configured
> > + * Asynchronous shutdown.
> > + */
> > + while (!list_empty(&async_shutdown_list)) {
> > + dev = list_entry(async_shutdown_list.next, struct device,
> > + kobj.entry);
> > + parent = get_device(dev->parent);
> > + get_device(dev);
> > + /*
> > + * Make sure the device is off the list
> > + */
> > + list_del_init(&dev->kobj.entry);
> > + if (parent)
> > + device_lock(parent);
> > + device_lock(dev);
> > + if (dev->bus && dev->bus->async_shutdown_end) {
> > + if (initcall_debug)
> > + dev_info(dev,
> > + "async_shutdown_end called\n");
> > + dev->bus->async_shutdown_end(dev);
> > + }
> > + device_unlock(dev);
> > + if (parent)
> > + device_unlock(parent);
> > + put_device(dev);
> > + put_device(parent);
> > + }
> > }
> >
> > /*
> > diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
> > index a039ab809753..f582c9d21515 100644
> > --- a/include/linux/device/bus.h
> > +++ b/include/linux/device/bus.h
> > @@ -49,6 +49,16 @@ struct fwnode_handle;
> > * will never get called until they do.
> > * @remove: Called when a device removed from this bus.
> > * @shutdown: Called at shut-down time to quiesce the device.
> > + * @async_shutdown_start: Called at the shutdown-time to start
> > + * the shutdown process on the device.
> > + * This entry point will be called only
> > + * when the bus driver has indicated it would
> > + * like to participate in asynchronous shutdown
> > + * completion.
> > + * @async_shutdown_end: Called at shutdown-time to complete the shutdown
> > + * process of the device. This entry point will be called
> > + * only when the bus drive has indicated it would like to
> > + * participate in the asynchronous shutdown completion.
>
> I'm going to repeat my point here, but only once.
>
> I see no reason to do async shutdown this way, instead of adding a
> flag for drivers to opt in for calling their existing shutdown
> callbacks asynchronously, in analogy with the async suspend and resume
> implementation.

There's a lot of code here that mere mortals like myself don't
understand very well, so here's my meager understanding of how
async suspend works and what you're suggesting to make this a
little more concrete.

Devices have this async_suspend bit:

struct device {
struct dev_pm_info {
unsigned int async_suspend:1;

Drivers call device_enable_async_suspend() to set async_suspend if
they want it. The system suspend path is something like this:

suspend_enter
dpm_suspend_noirq(PMSG_SUSPEND)
dpm_noirq_suspend_devices(PMSG_SUSPEND)
pm_transition = PMSG_SUSPEND
while (!list_empty(&dpm_late_early_list))
device_suspend_noirq(dev)
dpm_async_fn(dev, async_suspend_noirq)
if (is_async(dev))
async_schedule_dev(async_suspend_noirq) # async path

async_suspend_noirq # called asynchronously
__device_suspend_noirq(dev, PMSG_SUSPEND, true)
callback = pm_noirq_op(PMSG_SUSPEND) # .suspend_noirq()
dpm_run_callback(callback) # async call

__device_suspend_noirq(dev, pm_transition, false) # sync path
callback = pm_noirq_op(PMSG_SUSPEND) # .suspend_noirq()
dpm_run_callback(callback) # sync call

async_synchronize_full # wait

If a driver has called device_enable_async_suspend(), we'll use the
async_schedule_dev() path to schedule the appropriate .suspend_noirq()
method. After scheduling it via the async path or directly calling it
via the sync path, the async_synchronize_full() waits for completion
of all the async methods.

I assume your suggestion is to do something like this:

struct device {
struct dev_pm_info {
unsigned int async_suspend:1;
+ unsigned int async_shutdown:1;

+ void device_enable_async_shutdown(struct device *dev)
+ dev->power.async_shutdown = true;

device_shutdown
while (!list_empty(&devices_kset->list))
- dev->...->shutdown()
+ if (is_async_shutdown(dev))
+ async_schedule_dev(async_shutdown) # async path
+
+ async_shutdown # called asynchronously
+ dev->...->shutdown()
+
+ else
+ dev->...->shutdown() # sync path
+
+ async_synchronize_full # wait

2022-05-18 18:05:41

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] driver core: Support asynchronous driver shutdown

On Wed, May 18, 2022 at 7:50 PM Bjorn Helgaas <[email protected]> wrote:
>
> On Wed, May 18, 2022 at 01:38:49PM +0200, Rafael J. Wysocki wrote:
> > On Wed, May 18, 2022 at 12:08 AM Tanjore Suresh <[email protected]> wrote:
> > >
> > > This changes the bus driver interface with additional entry points
> > > to enable devices to implement asynchronous shutdown. The existing
> > > synchronous interface to shutdown is unmodified and retained for
> > > backward compatibility.
> > >
> > > This changes the common device shutdown code to enable devices to
> > > participate in asynchronous shutdown implementation.
> > >
> > > Signed-off-by: Tanjore Suresh <[email protected]>
> > > ---
> > > drivers/base/core.c | 38 +++++++++++++++++++++++++++++++++++++-
> > > include/linux/device/bus.h | 12 ++++++++++++
> > > 2 files changed, 49 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > index 3d6430eb0c6a..ba267ae70a22 100644
> > > --- a/drivers/base/core.c
> > > +++ b/drivers/base/core.c
> > > @@ -4479,6 +4479,7 @@ EXPORT_SYMBOL_GPL(device_change_owner);
> > > void device_shutdown(void)
> > > {
> > > struct device *dev, *parent;
> > > + LIST_HEAD(async_shutdown_list);
> > >
> > > wait_for_device_probe();
> > > device_block_probing();
> > > @@ -4523,7 +4524,13 @@ void device_shutdown(void)
> > > dev_info(dev, "shutdown_pre\n");
> > > dev->class->shutdown_pre(dev);
> > > }
> > > - if (dev->bus && dev->bus->shutdown) {
> > > + if (dev->bus && dev->bus->async_shutdown_start) {
> > > + if (initcall_debug)
> > > + dev_info(dev, "async_shutdown_start\n");
> > > + dev->bus->async_shutdown_start(dev);
> > > + list_add_tail(&dev->kobj.entry,
> > > + &async_shutdown_list);
> > > + } else if (dev->bus && dev->bus->shutdown) {
> > > if (initcall_debug)
> > > dev_info(dev, "shutdown\n");
> > > dev->bus->shutdown(dev);
> > > @@ -4543,6 +4550,35 @@ void device_shutdown(void)
> > > spin_lock(&devices_kset->list_lock);
> > > }
> > > spin_unlock(&devices_kset->list_lock);
> > > +
> > > + /*
> > > + * Second pass spin for only devices, that have configured
> > > + * Asynchronous shutdown.
> > > + */
> > > + while (!list_empty(&async_shutdown_list)) {
> > > + dev = list_entry(async_shutdown_list.next, struct device,
> > > + kobj.entry);
> > > + parent = get_device(dev->parent);
> > > + get_device(dev);
> > > + /*
> > > + * Make sure the device is off the list
> > > + */
> > > + list_del_init(&dev->kobj.entry);
> > > + if (parent)
> > > + device_lock(parent);
> > > + device_lock(dev);
> > > + if (dev->bus && dev->bus->async_shutdown_end) {
> > > + if (initcall_debug)
> > > + dev_info(dev,
> > > + "async_shutdown_end called\n");
> > > + dev->bus->async_shutdown_end(dev);
> > > + }
> > > + device_unlock(dev);
> > > + if (parent)
> > > + device_unlock(parent);
> > > + put_device(dev);
> > > + put_device(parent);
> > > + }
> > > }
> > >
> > > /*
> > > diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
> > > index a039ab809753..f582c9d21515 100644
> > > --- a/include/linux/device/bus.h
> > > +++ b/include/linux/device/bus.h
> > > @@ -49,6 +49,16 @@ struct fwnode_handle;
> > > * will never get called until they do.
> > > * @remove: Called when a device removed from this bus.
> > > * @shutdown: Called at shut-down time to quiesce the device.
> > > + * @async_shutdown_start: Called at the shutdown-time to start
> > > + * the shutdown process on the device.
> > > + * This entry point will be called only
> > > + * when the bus driver has indicated it would
> > > + * like to participate in asynchronous shutdown
> > > + * completion.
> > > + * @async_shutdown_end: Called at shutdown-time to complete the shutdown
> > > + * process of the device. This entry point will be called
> > > + * only when the bus drive has indicated it would like to
> > > + * participate in the asynchronous shutdown completion.
> >
> > I'm going to repeat my point here, but only once.
> >
> > I see no reason to do async shutdown this way, instead of adding a
> > flag for drivers to opt in for calling their existing shutdown
> > callbacks asynchronously, in analogy with the async suspend and resume
> > implementation.
>
> There's a lot of code here that mere mortals like myself don't
> understand very well, so here's my meager understanding of how
> async suspend works and what you're suggesting to make this a
> little more concrete.
>
> Devices have this async_suspend bit:
>
> struct device {
> struct dev_pm_info {
> unsigned int async_suspend:1;
>
> Drivers call device_enable_async_suspend() to set async_suspend if
> they want it. The system suspend path is something like this:
>
> suspend_enter
> dpm_suspend_noirq(PMSG_SUSPEND)
> dpm_noirq_suspend_devices(PMSG_SUSPEND)
> pm_transition = PMSG_SUSPEND
> while (!list_empty(&dpm_late_early_list))
> device_suspend_noirq(dev)
> dpm_async_fn(dev, async_suspend_noirq)
> if (is_async(dev))
> async_schedule_dev(async_suspend_noirq) # async path
>
> async_suspend_noirq # called asynchronously
> __device_suspend_noirq(dev, PMSG_SUSPEND, true)
> callback = pm_noirq_op(PMSG_SUSPEND) # .suspend_noirq()
> dpm_run_callback(callback) # async call
>
> __device_suspend_noirq(dev, pm_transition, false) # sync path
> callback = pm_noirq_op(PMSG_SUSPEND) # .suspend_noirq()
> dpm_run_callback(callback) # sync call
>
> async_synchronize_full # wait
>
> If a driver has called device_enable_async_suspend(), we'll use the
> async_schedule_dev() path to schedule the appropriate .suspend_noirq()
> method. After scheduling it via the async path or directly calling it
> via the sync path, the async_synchronize_full() waits for completion
> of all the async methods.
>
> I assume your suggestion is to do something like this:
>
> struct device {
> struct dev_pm_info {
> unsigned int async_suspend:1;
> + unsigned int async_shutdown:1;
>
> + void device_enable_async_shutdown(struct device *dev)
> + dev->power.async_shutdown = true;
>
> device_shutdown
> while (!list_empty(&devices_kset->list))
> - dev->...->shutdown()
> + if (is_async_shutdown(dev))
> + async_schedule_dev(async_shutdown) # async path
> +
> + async_shutdown # called asynchronously
> + dev->...->shutdown()
> +
> + else
> + dev->...->shutdown() # sync path
> +
> + async_synchronize_full # wait

Yes, that's the idea IIUC.

2022-05-18 20:45:24

by Tanjore Suresh

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] driver core: Support asynchronous driver shutdown

Rafeal,

On Wed, May 18, 2022 at 11:01 AM Rafael J. Wysocki <[email protected]> wrote:
>
> On Wed, May 18, 2022 at 7:50 PM Bjorn Helgaas <[email protected]> wrote:
> >
> > On Wed, May 18, 2022 at 01:38:49PM +0200, Rafael J. Wysocki wrote:
> > > On Wed, May 18, 2022 at 12:08 AM Tanjore Suresh <[email protected]> wrote:
> > > >
> > > > This changes the bus driver interface with additional entry points
> > > > to enable devices to implement asynchronous shutdown. The existing
> > > > synchronous interface to shutdown is unmodified and retained for
> > > > backward compatibility.
> > > >
> > > > This changes the common device shutdown code to enable devices to
> > > > participate in asynchronous shutdown implementation.
> > > >
> > > > Signed-off-by: Tanjore Suresh <[email protected]>
> > > > ---
> > > > drivers/base/core.c | 38 +++++++++++++++++++++++++++++++++++++-
> > > > include/linux/device/bus.h | 12 ++++++++++++
> > > > 2 files changed, 49 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > > index 3d6430eb0c6a..ba267ae70a22 100644
> > > > --- a/drivers/base/core.c
> > > > +++ b/drivers/base/core.c
> > > > @@ -4479,6 +4479,7 @@ EXPORT_SYMBOL_GPL(device_change_owner);
> > > > void device_shutdown(void)
> > > > {
> > > > struct device *dev, *parent;
> > > > + LIST_HEAD(async_shutdown_list);
> > > >
> > > > wait_for_device_probe();
> > > > device_block_probing();
> > > > @@ -4523,7 +4524,13 @@ void device_shutdown(void)
> > > > dev_info(dev, "shutdown_pre\n");
> > > > dev->class->shutdown_pre(dev);
> > > > }
> > > > - if (dev->bus && dev->bus->shutdown) {
> > > > + if (dev->bus && dev->bus->async_shutdown_start) {
> > > > + if (initcall_debug)
> > > > + dev_info(dev, "async_shutdown_start\n");
> > > > + dev->bus->async_shutdown_start(dev);
> > > > + list_add_tail(&dev->kobj.entry,
> > > > + &async_shutdown_list);
> > > > + } else if (dev->bus && dev->bus->shutdown) {
> > > > if (initcall_debug)
> > > > dev_info(dev, "shutdown\n");
> > > > dev->bus->shutdown(dev);
> > > > @@ -4543,6 +4550,35 @@ void device_shutdown(void)
> > > > spin_lock(&devices_kset->list_lock);
> > > > }
> > > > spin_unlock(&devices_kset->list_lock);
> > > > +
> > > > + /*
> > > > + * Second pass spin for only devices, that have configured
> > > > + * Asynchronous shutdown.
> > > > + */
> > > > + while (!list_empty(&async_shutdown_list)) {
> > > > + dev = list_entry(async_shutdown_list.next, struct device,
> > > > + kobj.entry);
> > > > + parent = get_device(dev->parent);
> > > > + get_device(dev);
> > > > + /*
> > > > + * Make sure the device is off the list
> > > > + */
> > > > + list_del_init(&dev->kobj.entry);
> > > > + if (parent)
> > > > + device_lock(parent);
> > > > + device_lock(dev);
> > > > + if (dev->bus && dev->bus->async_shutdown_end) {
> > > > + if (initcall_debug)
> > > > + dev_info(dev,
> > > > + "async_shutdown_end called\n");
> > > > + dev->bus->async_shutdown_end(dev);
> > > > + }
> > > > + device_unlock(dev);
> > > > + if (parent)
> > > > + device_unlock(parent);
> > > > + put_device(dev);
> > > > + put_device(parent);
> > > > + }
> > > > }
> > > >
> > > > /*
> > > > diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
> > > > index a039ab809753..f582c9d21515 100644
> > > > --- a/include/linux/device/bus.h
> > > > +++ b/include/linux/device/bus.h
> > > > @@ -49,6 +49,16 @@ struct fwnode_handle;
> > > > * will never get called until they do.
> > > > * @remove: Called when a device removed from this bus.
> > > > * @shutdown: Called at shut-down time to quiesce the device.
> > > > + * @async_shutdown_start: Called at the shutdown-time to start
> > > > + * the shutdown process on the device.
> > > > + * This entry point will be called only
> > > > + * when the bus driver has indicated it would
> > > > + * like to participate in asynchronous shutdown
> > > > + * completion.
> > > > + * @async_shutdown_end: Called at shutdown-time to complete the shutdown
> > > > + * process of the device. This entry point will be called
> > > > + * only when the bus drive has indicated it would like to
> > > > + * participate in the asynchronous shutdown completion.
> > >
> > > I'm going to repeat my point here, but only once.
> > >
> > > I see no reason to do async shutdown this way, instead of adding a
> > > flag for drivers to opt in for calling their existing shutdown
> > > callbacks asynchronously, in analogy with the async suspend and resume
> > > implementation.
> >
> > There's a lot of code here that mere mortals like myself don't
> > understand very well, so here's my meager understanding of how
> > async suspend works and what you're suggesting to make this a
> > little more concrete.
> >
> > Devices have this async_suspend bit:
> >
> > struct device {
> > struct dev_pm_info {
> > unsigned int async_suspend:1;
> >
> > Drivers call device_enable_async_suspend() to set async_suspend if
> > they want it. The system suspend path is something like this:
> >
> > suspend_enter
> > dpm_suspend_noirq(PMSG_SUSPEND)
> > dpm_noirq_suspend_devices(PMSG_SUSPEND)
> > pm_transition = PMSG_SUSPEND
> > while (!list_empty(&dpm_late_early_list))
> > device_suspend_noirq(dev)
> > dpm_async_fn(dev, async_suspend_noirq)
> > if (is_async(dev))
> > async_schedule_dev(async_suspend_noirq) # async path
> >
> > async_suspend_noirq # called asynchronously
> > __device_suspend_noirq(dev, PMSG_SUSPEND, true)
> > callback = pm_noirq_op(PMSG_SUSPEND) # .suspend_noirq()
> > dpm_run_callback(callback) # async call
> >
> > __device_suspend_noirq(dev, pm_transition, false) # sync path
> > callback = pm_noirq_op(PMSG_SUSPEND) # .suspend_noirq()
> > dpm_run_callback(callback) # sync call
> >
> > async_synchronize_full # wait
> >
> > If a driver has called device_enable_async_suspend(), we'll use the
> > async_schedule_dev() path to schedule the appropriate .suspend_noirq()
> > method. After scheduling it via the async path or directly calling it
> > via the sync path, the async_synchronize_full() waits for completion
> > of all the async methods.
> >
> > I assume your suggestion is to do something like this:
> >
> > struct device {
> > struct dev_pm_info {
> > unsigned int async_suspend:1;
> > + unsigned int async_shutdown:1;
> >
> > + void device_enable_async_shutdown(struct device *dev)
> > + dev->power.async_shutdown = true;
> >
> > device_shutdown
> > while (!list_empty(&devices_kset->list))
> > - dev->...->shutdown()
> > + if (is_async_shutdown(dev))
> > + async_schedule_dev(async_shutdown) # async path
> > +
> > + async_shutdown # called asynchronously
> > + dev->...->shutdown()
> > +
> > + else
> > + dev->...->shutdown() # sync path
> > +
> > + async_synchronize_full # wait
>
> Yes, that's the idea IIUC.

Thanks for the clarification, I misunderstood your earlier comment,
thanks for explaining and clarification. Let me evaluate and get back
to you as soon as possible.

Thanks
sureshtk

2022-08-02 21:37:27

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] driver core: Support asynchronous driver shutdown

[Beginning of thread: https://lore.kernel.org/r/[email protected]]
On Wed, May 18, 2022 at 12:50:02PM -0500, Bjorn Helgaas wrote:

> Devices have this async_suspend bit:
>
> struct device {
> struct dev_pm_info {
> unsigned int async_suspend:1;
>
> Drivers call device_enable_async_suspend() to set async_suspend if
> they want it. The system suspend path is something like this:
>
> suspend_enter
> dpm_suspend_noirq(PMSG_SUSPEND)
> dpm_noirq_suspend_devices(PMSG_SUSPEND)
> pm_transition = PMSG_SUSPEND
> while (!list_empty(&dpm_late_early_list))
> device_suspend_noirq(dev)
> dpm_async_fn(dev, async_suspend_noirq)
> if (is_async(dev))
> async_schedule_dev(async_suspend_noirq) # async path
>
> async_suspend_noirq # called asynchronously
> __device_suspend_noirq(dev, PMSG_SUSPEND, true)
> callback = pm_noirq_op(PMSG_SUSPEND) # .suspend_noirq()
> dpm_run_callback(callback) # async call
>
> __device_suspend_noirq(dev, pm_transition, false) # sync path
> callback = pm_noirq_op(PMSG_SUSPEND) # .suspend_noirq()
> dpm_run_callback(callback) # sync call
>
> async_synchronize_full # wait
>
> If a driver has called device_enable_async_suspend(), we'll use the
> async_schedule_dev() path to schedule the appropriate .suspend_noirq()
> method. After scheduling it via the async path or directly calling it
> via the sync path, the async_synchronize_full() waits for completion
> of all the async methods.

Correct me if I'm wrong: in the suspend scenario, there are several
phases, and async_synchronize_full() ensures that one phase finishes
before the next phase starts. For example:

dpm_suspend_end
dpm_suspend_late # phase 1
while (!list_empty(&dpm_suspended_list))
device_suspend_late
async_synchronize_full # finish phase 1
dpm_suspend_noirq # phase 2
dpm_noirq_suspend_devices
while (!list_empty(&dpm_late_early_list))
device_suspend_noirq
async_synchronize_full

The device .suspend_late() and .suspend_noirq() methods may all be
started asynchronously. So far there's nothing to order them within
the phase, but async_synchronize_full() ensures that all the
.suspend_late() methods finish before the .suspend_noirq() methods
start.

Obviously we do want a child's method to complete before we run the
parent's method. If I understand correctly, that parent/child
synchronization is done by a different method: __device_suspend_late()
and __device_suspend_noirq() call dpm_wait_for_subordinate(), which
waits for &dev->power.completion for all children:

__device_suspend_late
dpm_wait_for_subordinate
dpm_wait_for_children # wait for children .suspend_late()
device_for_each_child(dev, &async, dpm_wait_fn)
dpm_wait_fn
dpm_wait
wait_for_completion(&dev->power.completion)
dpm_run_callback # run parent method, e.g., ops->suspend_late
complete_all(&dev->power.completion) # note completion of parent

> I assume your suggestion is to do something like this:
>
> struct device {
> struct dev_pm_info {
> unsigned int async_suspend:1;
> + unsigned int async_shutdown:1;
>
> + void device_enable_async_shutdown(struct device *dev)
> + dev->power.async_shutdown = true;
>
> device_shutdown
> while (!list_empty(&devices_kset->list))
> - dev->...->shutdown()
> + if (is_async_shutdown(dev))
> + async_schedule_dev(async_shutdown) # async path
> +
> + async_shutdown # called asynchronously
> + dev->...->shutdown()
> +
> + else
> + dev->...->shutdown() # sync path
> +
> + async_synchronize_full # wait

In the shutdown case, I think we still probably need the
async_synchronize_full() to ensure that all the .shutdown() methods
complete before we turn the power off, reboot, or kexec.

But I think we also need a mechanism like dev->power.completion to
make sure all the child .shutdown() methods complete before we run a
parent's .shutdown().

There's not much overlap between the suspend path and the shutdown
path (probably none at all), so it's tempting to use the existing
dev->power.completion for shutdown as well.

But I don't think that's feasible because dev->power.completion is
tied up with dev->power.async_suspend, which is set by
device_enable_async_suspend(). That's a different concept than async
shutdown, and drivers will want one without the other.

Does this make sense?

Bjorn

2022-08-03 15:15:27

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] driver core: Support asynchronous driver shutdown

On Tue, Aug 2, 2022 at 11:35 PM Bjorn Helgaas <[email protected]> wrote:
>
> [Beginning of thread: https://lore.kernel.org/r/[email protected]]
> On Wed, May 18, 2022 at 12:50:02PM -0500, Bjorn Helgaas wrote:
>
> > Devices have this async_suspend bit:
> >
> > struct device {
> > struct dev_pm_info {
> > unsigned int async_suspend:1;
> >
> > Drivers call device_enable_async_suspend() to set async_suspend if
> > they want it. The system suspend path is something like this:
> >
> > suspend_enter
> > dpm_suspend_noirq(PMSG_SUSPEND)
> > dpm_noirq_suspend_devices(PMSG_SUSPEND)
> > pm_transition = PMSG_SUSPEND
> > while (!list_empty(&dpm_late_early_list))
> > device_suspend_noirq(dev)
> > dpm_async_fn(dev, async_suspend_noirq)
> > if (is_async(dev))
> > async_schedule_dev(async_suspend_noirq) # async path
> >
> > async_suspend_noirq # called asynchronously
> > __device_suspend_noirq(dev, PMSG_SUSPEND, true)
> > callback = pm_noirq_op(PMSG_SUSPEND) # .suspend_noirq()
> > dpm_run_callback(callback) # async call
> >
> > __device_suspend_noirq(dev, pm_transition, false) # sync path
> > callback = pm_noirq_op(PMSG_SUSPEND) # .suspend_noirq()
> > dpm_run_callback(callback) # sync call
> >
> > async_synchronize_full # wait
> >
> > If a driver has called device_enable_async_suspend(), we'll use the
> > async_schedule_dev() path to schedule the appropriate .suspend_noirq()
> > method. After scheduling it via the async path or directly calling it
> > via the sync path, the async_synchronize_full() waits for completion
> > of all the async methods.
>
> Correct me if I'm wrong: in the suspend scenario, there are several
> phases, and async_synchronize_full() ensures that one phase finishes
> before the next phase starts. For example:
>
> dpm_suspend_end
> dpm_suspend_late # phase 1
> while (!list_empty(&dpm_suspended_list))
> device_suspend_late
> async_synchronize_full # finish phase 1
> dpm_suspend_noirq # phase 2
> dpm_noirq_suspend_devices
> while (!list_empty(&dpm_late_early_list))
> device_suspend_noirq
> async_synchronize_full
>
> The device .suspend_late() and .suspend_noirq() methods may all be
> started asynchronously. So far there's nothing to order them within
> the phase, but async_synchronize_full() ensures that all the
> .suspend_late() methods finish before the .suspend_noirq() methods
> start.
>
> Obviously we do want a child's method to complete before we run the
> parent's method. If I understand correctly, that parent/child
> synchronization is done by a different method: __device_suspend_late()
> and __device_suspend_noirq() call dpm_wait_for_subordinate(), which
> waits for &dev->power.completion for all children:
>
> __device_suspend_late
> dpm_wait_for_subordinate
> dpm_wait_for_children # wait for children .suspend_late()
> device_for_each_child(dev, &async, dpm_wait_fn)
> dpm_wait_fn
> dpm_wait
> wait_for_completion(&dev->power.completion)
> dpm_run_callback # run parent method, e.g., ops->suspend_late
> complete_all(&dev->power.completion) # note completion of parent
>
> > I assume your suggestion is to do something like this:
> >
> > struct device {
> > struct dev_pm_info {
> > unsigned int async_suspend:1;
> > + unsigned int async_shutdown:1;
> >
> > + void device_enable_async_shutdown(struct device *dev)
> > + dev->power.async_shutdown = true;
> >
> > device_shutdown
> > while (!list_empty(&devices_kset->list))
> > - dev->...->shutdown()
> > + if (is_async_shutdown(dev))
> > + async_schedule_dev(async_shutdown) # async path
> > +
> > + async_shutdown # called asynchronously
> > + dev->...->shutdown()
> > +
> > + else
> > + dev->...->shutdown() # sync path
> > +
> > + async_synchronize_full # wait
>
> In the shutdown case, I think we still probably need the
> async_synchronize_full() to ensure that all the .shutdown() methods
> complete before we turn the power off, reboot, or kexec.
>
> But I think we also need a mechanism like dev->power.completion to
> make sure all the child .shutdown() methods complete before we run a
> parent's .shutdown().
>
> There's not much overlap between the suspend path and the shutdown
> path (probably none at all), so it's tempting to use the existing
> dev->power.completion for shutdown as well.
>
> But I don't think that's feasible because dev->power.completion is
> tied up with dev->power.async_suspend, which is set by
> device_enable_async_suspend(). That's a different concept than async
> shutdown, and drivers will want one without the other.
>
> Does this make sense?

Well, why don't we change the code so that dev->power.completion is
not tied to dev->power.async_suspend, but can be also used
(analogously) if something like dev->async_shutdown is set?

My point is that we know how to suspend devices asynchronously (the
fact that there are multiple phases is not that important IMV), so why
don't we design the async shutdown analogously?