2022-03-28 23:35:57

by Tanjore Suresh

[permalink] [raw]
Subject: [PATCH v1 0/3] Asynchronous shutdown interface and example implementation

Problem:

Some of our machines are configured with many NVMe devices and
are validated for strict shutdown time requirements. Each NVMe
device plugged into the system, typicaly takes about 4.5 secs
to shutdown. A system with 16 such NVMe devices will takes
approximately 80 secs to shutdown and go through reboot.

The current shutdown APIs as defined at bus level is defined to be
synchronous. Therefore, more devices are in the system the greater
the time it takes to shutdown. This shutdown time significantly
contributes the machine reboot time.

Solution:

This patch set proposes an asynchronous shutdown interface at bus level,
modifies the core driver, device shutdown routine to exploit the
new interface while maintaining backward compatibility with synchronous
implementation already existing (Patch 1 of 3) and exploits new interface
to enable all PCI-E based devices to use asynchronous interface semantics
if necessary (Patch 2 of 3). The implementation at PCI-E level also works
in a backward compatible way, to allow exiting device implementation
to work with current synchronous semantics. Only show cases an example
implementation for NVMe device to exploit this asynchronous shutdown
interface. (Patch 3 of 3).

Tanjore Suresh (3):
driver core: Support asynchronous driver shutdown
PCI: Support asynchronous shutdown
nvme: Add async shutdown support

drivers/base/core.c | 39 ++++++++++++++++++-
drivers/nvme/host/core.c | 28 +++++++++----
drivers/nvme/host/nvme.h | 8 ++++
drivers/nvme/host/pci.c | 80 ++++++++++++++++++++++++--------------
drivers/pci/pci-driver.c | 17 ++++++--
include/linux/device/bus.h | 10 +++++
include/linux/pci.h | 2 +
7 files changed, 144 insertions(+), 40 deletions(-)

--
2.35.1.1021.g381101b075-goog


2022-03-28 23:36:12

by Tanjore Suresh

[permalink] [raw]
Subject: [PATCH v1 1/3] driver core: Support asynchronous driver shutdown

This changes the bus driver interface with additional entry points
to enable devices to implement asynchronous shutdown. The existing
synchronous interface to shutdown is unmodified and retained for
backward compatibility.

This changes the common device shutdown code to enable devices to
participate in asynchronous shutdown implementation.

Signed-off-by: Tanjore Suresh <[email protected]>
---
drivers/base/core.c | 39 +++++++++++++++++++++++++++++++++++++-
include/linux/device/bus.h | 10 ++++++++++
2 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 3d6430eb0c6a..359e7067e8b8 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -4479,6 +4479,7 @@ EXPORT_SYMBOL_GPL(device_change_owner);
void device_shutdown(void)
{
struct device *dev, *parent;
+ LIST_HEAD(async_shutdown_list);

wait_for_device_probe();
device_block_probing();
@@ -4523,7 +4524,14 @@ void device_shutdown(void)
dev_info(dev, "shutdown_pre\n");
dev->class->shutdown_pre(dev);
}
- if (dev->bus && dev->bus->shutdown) {
+
+ if (dev->bus && dev->bus->shutdown_pre) {
+ if (initcall_debug)
+ dev_info(dev, "shutdown_pre\n");
+ dev->bus->shutdown_pre(dev);
+ list_add(&dev->kobj.entry,
+ &async_shutdown_list);
+ } else if (dev->bus && dev->bus->shutdown) {
if (initcall_debug)
dev_info(dev, "shutdown\n");
dev->bus->shutdown(dev);
@@ -4543,6 +4551,35 @@ void device_shutdown(void)
spin_lock(&devices_kset->list_lock);
}
spin_unlock(&devices_kset->list_lock);
+
+ /*
+ * Second pass spin for only devices, that have configured
+ * Asynchronous shutdown.
+ */
+ while (!list_empty(&async_shutdown_list)) {
+ dev = list_entry(async_shutdown_list.next, struct device,
+ kobj.entry);
+ parent = get_device(dev->parent);
+ get_device(dev);
+ /*
+ * Make sure the device is off the list
+ */
+ list_del_init(&dev->kobj.entry);
+ if (parent)
+ device_lock(parent);
+ device_lock(dev);
+ if (dev->bus && dev->bus->shutdown_post) {
+ if (initcall_debug)
+ dev_info(dev,
+ "shutdown_post called\n");
+ dev->bus->shutdown_post(dev);
+ }
+ device_unlock(dev);
+ if (parent)
+ device_unlock(parent);
+ put_device(dev);
+ put_device(parent);
+ }
}

/*
diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
index a039ab809753..e261819601e9 100644
--- a/include/linux/device/bus.h
+++ b/include/linux/device/bus.h
@@ -49,6 +49,14 @@ struct fwnode_handle;
* will never get called until they do.
* @remove: Called when a device removed from this bus.
* @shutdown: Called at shut-down time to quiesce the device.
+ * @shutdown_pre: Called at the shutdown-time to start the shutdown
+ * process on the device. This entry point will be called
+ * only when the bus driver has indicated it would like
+ * to participate in asynchronous shutdown completion.
+ * @shutdown_post: Called at shutdown-time to complete the shutdown
+ * process of the device. This entry point will be called
+ * only when the bus drive has indicated it would like to
+ * participate in the asynchronous shutdown completion.
*
* @online: Called to put the device back online (after offlining it).
* @offline: Called to put the device offline for hot-removal. May fail.
@@ -93,6 +101,8 @@ struct bus_type {
void (*sync_state)(struct device *dev);
void (*remove)(struct device *dev);
void (*shutdown)(struct device *dev);
+ void (*shutdown_pre)(struct device *dev);
+ void (*shutdown_post)(struct device *dev);

int (*online)(struct device *dev);
int (*offline)(struct device *dev);
--
2.35.1.1021.g381101b075-goog

2022-03-29 00:37:56

by Oliver O'Halloran

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] driver core: Support asynchronous driver shutdown

On Tue, Mar 29, 2022 at 10:35 AM Tanjore Suresh <[email protected]> wrote:
>
> This changes the bus driver interface with additional entry points
> to enable devices to implement asynchronous shutdown. The existing
> synchronous interface to shutdown is unmodified and retained for
> backward compatibility.
>
> This changes the common device shutdown code to enable devices to
> participate in asynchronous shutdown implementation.

nice to see someone looking at improving the shutdown path

> Signed-off-by: Tanjore Suresh <[email protected]>
> ---
> drivers/base/core.c | 39 +++++++++++++++++++++++++++++++++++++-
> include/linux/device/bus.h | 10 ++++++++++
> 2 files changed, 48 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 3d6430eb0c6a..359e7067e8b8 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -4479,6 +4479,7 @@ EXPORT_SYMBOL_GPL(device_change_owner);
> *snip*

This all seems a bit dangerous and I'm wondering what systems you've
tested these changes with. I had a look at implementing something
similar a few years ago and one case that always concerned me was
embedded systems where the PCIe root complex also has a driver bound.
Say you've got the following PCIe topology:

00:00.0 - root port
01:00.0 - nvme drive

With the current implementation of device_shutdown() we can guarantee
that the child device (the nvme) is shut down before we start trying
to shut down the parent device (the root complex) so there's no
possibility of deadlocks and other dependency headaches. With this
implementation of async shutdown we lose that guarantee and I'm not
sure what the consequences are. Personally I was never able to
convince myself it was safe, but maybe you're braver than I am :)

That all said, there's probably only a few kinds of device that will
really want to implement async shutdown support so maybe you can
restrict it to leaf devices and flip the ordering around to something
like:

for_each_device(dev) {
if (can_async(dev) && has_no_children(dev))
start_async_shutdown(dev)
}
wait_for_all_async_shutdowns_to_finish()

// tear down the remaining system devices synchronously
for_each_device(dev)
do_sync_shutdown(dev)

> /*
> diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
> index a039ab809753..e261819601e9 100644
> --- a/include/linux/device/bus.h
> +++ b/include/linux/device/bus.h
> @@ -93,6 +101,8 @@ struct bus_type {
> void (*sync_state)(struct device *dev);
> void (*remove)(struct device *dev);
> void (*shutdown)(struct device *dev);
> + void (*shutdown_pre)(struct device *dev);
> + void (*shutdown_post)(struct device *dev);

Call them shutdown_async_start() / shutdown_async_end() or something
IMO. These names are not at all helpful and they're easy to mix up
their role with the class based shutdown_pre / _post

2022-03-29 07:14:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] Asynchronous shutdown interface and example implementation

On Mon, Mar 28, 2022 at 04:00:05PM -0700, Tanjore Suresh wrote:
> Problem:
>
> Some of our machines are configured with many NVMe devices and
> are validated for strict shutdown time requirements. Each NVMe
> device plugged into the system, typicaly takes about 4.5 secs
> to shutdown. A system with 16 such NVMe devices will takes
> approximately 80 secs to shutdown and go through reboot.
>
> The current shutdown APIs as defined at bus level is defined to be
> synchronous. Therefore, more devices are in the system the greater
> the time it takes to shutdown. This shutdown time significantly
> contributes the machine reboot time.
>
> Solution:
>
> This patch set proposes an asynchronous shutdown interface at bus level,
> modifies the core driver, device shutdown routine to exploit the
> new interface while maintaining backward compatibility with synchronous
> implementation already existing (Patch 1 of 3) and exploits new interface
> to enable all PCI-E based devices to use asynchronous interface semantics
> if necessary (Patch 2 of 3). The implementation at PCI-E level also works
> in a backward compatible way, to allow exiting device implementation
> to work with current synchronous semantics. Only show cases an example
> implementation for NVMe device to exploit this asynchronous shutdown
> interface. (Patch 3 of 3).
>
> Tanjore Suresh (3):
> driver core: Support asynchronous driver shutdown
> PCI: Support asynchronous shutdown
> nvme: Add async shutdown support
>
> drivers/base/core.c | 39 ++++++++++++++++++-
> drivers/nvme/host/core.c | 28 +++++++++----
> drivers/nvme/host/nvme.h | 8 ++++
> drivers/nvme/host/pci.c | 80 ++++++++++++++++++++++++--------------
> drivers/pci/pci-driver.c | 17 ++++++--
> include/linux/device/bus.h | 10 +++++
> include/linux/pci.h | 2 +
> 7 files changed, 144 insertions(+), 40 deletions(-)
>
> --
> 2.35.1.1021.g381101b075-goog
>

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman. You have sent him
a patch that has triggered this response. He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created. Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- This looks like a new version of a previously submitted patch, but you
did not list below the --- line any changes from the previous version.
Please read the section entitled "The canonical patch format" in the
kernel file, Documentation/SubmittingPatches for what needs to be done
here to properly describe this.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot

2022-03-30 11:58:43

by Lukas Wunner

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] Asynchronous shutdown interface and example implementation

On Tue, Mar 29, 2022 at 08:07:51PM -0600, Keith Busch wrote:
> Thanks, I agree we should improve shutdown times. I tried a while ago, but
> lost track to follow up at the time. Here's the reference, fwiw, though it
> may be out of date :):
>
> http://lists.infradead.org/pipermail/linux-nvme/2014-May/000826.html
>
> The above solution is similiar to how probe waits on an async domain.
> Maybe pci can schedule the async shutdown instead of relying on low-level
> drivers so that everyone implicitly benefits instead of just nvme? I'll
> double-check if that's reasonable, but I'll look through this series too.

Using the async API seems much more reasonable than adding new callbacks.

However I'd argue that it shouldn't be necessary to amend any drivers,
this should all be doable in the driver core: Basically a device needs
to wait for its children and device links consumers to shutdown, apart
from that everything should be able to run asynchronously.

Thanks,

Lukas

2022-03-30 16:18:56

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] Asynchronous shutdown interface and example implementation

On Wed, Mar 30, 2022 at 8:25 AM Lukas Wunner <[email protected]> wrote:
>
> On Tue, Mar 29, 2022 at 08:07:51PM -0600, Keith Busch wrote:
> > Thanks, I agree we should improve shutdown times. I tried a while ago, but
> > lost track to follow up at the time. Here's the reference, fwiw, though it
> > may be out of date :):
> >
> > http://lists.infradead.org/pipermail/linux-nvme/2014-May/000826.html
> >
> > The above solution is similiar to how probe waits on an async domain.
> > Maybe pci can schedule the async shutdown instead of relying on low-level
> > drivers so that everyone implicitly benefits instead of just nvme? I'll
> > double-check if that's reasonable, but I'll look through this series too.
>
> Using the async API seems much more reasonable than adding new callbacks.
>
> However I'd argue that it shouldn't be necessary to amend any drivers,
> this should all be doable in the driver core: Basically a device needs
> to wait for its children and device links consumers to shutdown, apart
> from that everything should be able to run asynchronously.

Well, this is done already in the system-wide and hibernation paths.
It should be possible to implement asynchronous shutdown analogously.

2022-03-30 23:48:16

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] Asynchronous shutdown interface and example implementation

On Mon, Mar 28, 2022 at 04:00:05PM -0700, Tanjore Suresh wrote:
> Problem:
>
> Some of our machines are configured with many NVMe devices and
> are validated for strict shutdown time requirements. Each NVMe
> device plugged into the system, typicaly takes about 4.5 secs
> to shutdown. A system with 16 such NVMe devices will takes
> approximately 80 secs to shutdown and go through reboot.
>
> The current shutdown APIs as defined at bus level is defined to be
> synchronous. Therefore, more devices are in the system the greater
> the time it takes to shutdown. This shutdown time significantly
> contributes the machine reboot time.
>
> Solution:
>
> This patch set proposes an asynchronous shutdown interface at bus level,
> modifies the core driver, device shutdown routine to exploit the
> new interface while maintaining backward compatibility with synchronous
> implementation already existing (Patch 1 of 3) and exploits new interface
> to enable all PCI-E based devices to use asynchronous interface semantics
> if necessary (Patch 2 of 3). The implementation at PCI-E level also works
> in a backward compatible way, to allow exiting device implementation
> to work with current synchronous semantics. Only show cases an example
> implementation for NVMe device to exploit this asynchronous shutdown
> interface. (Patch 3 of 3).

Thanks, I agree we should improve shutdown times. I tried a while ago, but
lost track to follow up at the time. Here's the reference, fwiw, though it
may be out of date :):

http://lists.infradead.org/pipermail/linux-nvme/2014-May/000826.html

The above solution is similiar to how probe waits on an async domain.
Maybe pci can schedule the async shutdown instead of relying on low-level
drivers so that everyone implicitly benefits instead of just nvme? I'll
double-check if that's reasonable, but I'll look through this series too.

2022-03-31 04:06:06

by Belanger, Martin

[permalink] [raw]
Subject: RE: [PATCH v1 1/3] driver core: Support asynchronous driver shutdown

> From: Linux-nvme <[email protected]> On Behalf Of
> Oliver O'Halloran
> Sent: Monday, March 28, 2022 8:20 PM
> To: Tanjore Suresh
> Cc: Greg Kroah-Hartman; Rafael J . Wysocki; Christoph Hellwig; Sagi Grimberg;
> Bjorn Helgaas; Linux Kernel Mailing List; [email protected]; linux-
> pci
> Subject: Re: [PATCH v1 1/3] driver core: Support asynchronous driver shutdown
>
>
> On Tue, Mar 29, 2022 at 10:35 AM Tanjore Suresh <[email protected]>
> wrote:
> >
> > This changes the bus driver interface with additional entry points to
> > enable devices to implement asynchronous shutdown. The existing
> > synchronous interface to shutdown is unmodified and retained for
> > backward compatibility.
> >
> > This changes the common device shutdown code to enable devices to
> > participate in asynchronous shutdown implementation.
>
> nice to see someone looking at improving the shutdown path

Agreed!

I know this patch is mainly for PCI devices, however, NVMe over Fabrics
devices can suffer even longer shutdowns. Last September, I reported
that shutting down an NVMe-oF TCP connection while the network is down
will result in a 1-minute deadlock. That's because the driver tries to perform
a proper shutdown by sending commands to the remote target and the
timeout for unanswered commands is 1-minute. If one needs to shut down
several NVMe-oF connections, each connection will be shut down sequentially
taking each 1 minute. Try running "nvme disconnect-all" while the network
is down and you'll see what I mean. Of course, the KATO is supposed to
detect when connectivity is lost, but if you have a long KATO (e.g. 2 minutes)
you will most likely hit this condition.

Here's the patch I proposed in September, which shortens the timeout to
5 sec on a disconnect.

http://lists.infradead.org/pipermail/linux-nvme/2021-September/027867.html

Regards,
Martin Belanger

>
> > Signed-off-by: Tanjore Suresh <[email protected]>
> > ---
> > drivers/base/core.c | 39 +++++++++++++++++++++++++++++++++++++-
> > include/linux/device/bus.h | 10 ++++++++++
> > 2 files changed, 48 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c index
> > 3d6430eb0c6a..359e7067e8b8 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -4479,6 +4479,7 @@ EXPORT_SYMBOL_GPL(device_change_owner);
> > *snip*
>
> This all seems a bit dangerous and I'm wondering what systems you've tested
> these changes with. I had a look at implementing something similar a few years
> ago and one case that always concerned me was embedded systems where the
> PCIe root complex also has a driver bound.
> Say you've got the following PCIe topology:
>
> 00:00.0 - root port
> 01:00.0 - nvme drive
>
> With the current implementation of device_shutdown() we can guarantee that
> the child device (the nvme) is shut down before we start trying to shut down the
> parent device (the root complex) so there's no possibility of deadlocks and
> other dependency headaches. With this implementation of async shutdown we
> lose that guarantee and I'm not sure what the consequences are. Personally I
> was never able to convince myself it was safe, but maybe you're braver than I
> am :)
>
> That all said, there's probably only a few kinds of device that will really want to
> implement async shutdown support so maybe you can restrict it to leaf devices
> and flip the ordering around to something
> like:
>
> for_each_device(dev) {
> if (can_async(dev) && has_no_children(dev))
> start_async_shutdown(dev)
> }
> wait_for_all_async_shutdowns_to_finish()
>
> // tear down the remaining system devices synchronously
> for_each_device(dev)
> do_sync_shutdown(dev)
>
> > /*
> > diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
> > index a039ab809753..e261819601e9 100644
> > --- a/include/linux/device/bus.h
> > +++ b/include/linux/device/bus.h
> > @@ -93,6 +101,8 @@ struct bus_type {
> > void (*sync_state)(struct device *dev);
> > void (*remove)(struct device *dev);
> > void (*shutdown)(struct device *dev);
> > + void (*shutdown_pre)(struct device *dev);
> > + void (*shutdown_post)(struct device *dev);
>
> Call them shutdown_async_start() / shutdown_async_end() or something IMO.
> These names are not at all helpful and they're easy to mix up their role with
> the class based shutdown_pre / _post

2022-03-31 15:47:35

by Daniel Wagner

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] driver core: Support asynchronous driver shutdown

On Wed, Mar 30, 2022 at 02:12:18PM +0000, Belanger, Martin wrote:
> I know this patch is mainly for PCI devices, however, NVMe over Fabrics
> devices can suffer even longer shutdowns. Last September, I reported
> that shutting down an NVMe-oF TCP connection while the network is down
> will result in a 1-minute deadlock. That's because the driver tries to perform
> a proper shutdown by sending commands to the remote target and the
> timeout for unanswered commands is 1-minute. If one needs to shut down
> several NVMe-oF connections, each connection will be shut down sequentially
> taking each 1 minute. Try running "nvme disconnect-all" while the network
> is down and you'll see what I mean. Of course, the KATO is supposed to
> detect when connectivity is lost, but if you have a long KATO (e.g. 2 minutes)
> you will most likely hit this condition.

I've debugging something similar:

[44888.710527] nvme nvme0: Removing ctrl: NQN "xxx"
[44898.981684] nvme nvme0: failed to send request -32
[44960.982977] nvme nvme0: queue 0: timeout request 0x18 type 4
[44960.983099] nvme nvme0: Property Set error: 881, offset 0x14

Currently testing this patch:

+++ b/drivers/nvme/host/tcp.c
@@ -1103,9 +1103,12 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)
if (ret == -EAGAIN) {
ret = 0;
} else if (ret < 0) {
+ struct request *rq = blk_mq_rq_from_pdu(queue->request);
+
dev_err(queue->ctrl->ctrl.device,
"failed to send request %d\n", ret);
- if (ret != -EPIPE && ret != -ECONNRESET)
+ if ((ret != -EPIPE && ret != -ECONNRESET) ||
+ rq->cmd_flags & REQ_FAILFAST_DRIVER)
nvme_tcp_fail_request(queue->request);
nvme_tcp_done_send_req(queue);
}

2022-04-01 14:55:10

by Jonathan Derrick

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] driver core: Support asynchronous driver shutdown



On 3/28/2022 6:19 PM, Oliver O'Halloran wrote:
> On Tue, Mar 29, 2022 at 10:35 AM Tanjore Suresh <[email protected]> wrote:
>>
>> This changes the bus driver interface with additional entry points
>> to enable devices to implement asynchronous shutdown. The existing
>> synchronous interface to shutdown is unmodified and retained for
>> backward compatibility.
>>
>> This changes the common device shutdown code to enable devices to
>> participate in asynchronous shutdown implementation.
>
> nice to see someone looking at improving the shutdown path
>
>> Signed-off-by: Tanjore Suresh <[email protected]>
>> ---
>> drivers/base/core.c | 39 +++++++++++++++++++++++++++++++++++++-
>> include/linux/device/bus.h | 10 ++++++++++
>> 2 files changed, 48 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>> index 3d6430eb0c6a..359e7067e8b8 100644
>> --- a/drivers/base/core.c
>> +++ b/drivers/base/core.c
>> @@ -4479,6 +4479,7 @@ EXPORT_SYMBOL_GPL(device_change_owner);
>> *snip*
>
> This all seems a bit dangerous and I'm wondering what systems you've
> tested these changes with. I had a look at implementing something
> similar a few years ago and one case that always concerned me was
> embedded systems where the PCIe root complex also has a driver bound.
> Say you've got the following PCIe topology:
>
> 00:00.0 - root port
> 01:00.0 - nvme drive
>
> With the current implementation of device_shutdown() we can guarantee
> that the child device (the nvme) is shut down before we start trying
> to shut down the parent device (the root complex) so there's no
> possibility of deadlocks and other dependency headaches. With this
> implementation of async shutdown we lose that guarantee and I'm not
> sure what the consequences are. Personally I was never able to
> convince myself it was safe, but maybe you're braver than I am :)
>
> That all said, there's probably only a few kinds of device that will
> really want to implement async shutdown support so maybe you can
> restrict it to leaf devices and flip the ordering around to something
> like:

It seems like it might be helpful to split the async shutdowns into
refcounted hierarchies and proceed with the next level up when all the
refs are in.

Ex:
00:00.0 - RP
01:00.0 - NVMe A
02:00.0 - Bridge USP
03:00.0 - Bridge DSP
04:00.0 - NVMe B
03:00.1 - Bridge DSP
05:00.0 - NVMe C

NVMe A could start shutting down at the beginning of the hierarchy
traversal. Then async shutdown of bus 3 wouldn't start until all
children of bus 3 are shutdown.

You could probably do this by having the async_shutdown_list in the pci_bus.

>
> for_each_device(dev) {
> if (can_async(dev) && has_no_children(dev))
> start_async_shutdown(dev)
> }
> wait_for_all_async_shutdowns_to_finish()
>
> // tear down the remaining system devices synchronously
> for_each_device(dev)
> do_sync_shutdown(dev)
>
>> /*
>> diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
>> index a039ab809753..e261819601e9 100644
>> --- a/include/linux/device/bus.h
>> +++ b/include/linux/device/bus.h
>> @@ -93,6 +101,8 @@ struct bus_type {
>> void (*sync_state)(struct device *dev);
>> void (*remove)(struct device *dev);
>> void (*shutdown)(struct device *dev);
>> + void (*shutdown_pre)(struct device *dev);
>> + void (*shutdown_post)(struct device *dev);
>
> Call them shutdown_async_start() / shutdown_async_end() or something
> IMO. These names are not at all helpful and they're easy to mix up
> their role with the class based shutdown_pre / _post
>