2018-02-12 16:51:33

by jianchao.wang

[permalink] [raw]
Subject: [PATCH] nvme-pci: drain the entered requests after ctrl is shutdown

Currently, we will unquiesce the queues after the controller is
shutdown to avoid residual requests to be stuck. In fact, we can
do it more cleanly, just wait freeze and drain the queue in
nvme_dev_disable and finally leave the queues quiesced.

Signed-off-by: Jianchao Wang <[email protected]>
---
drivers/nvme/host/pci.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 4a7c420..6e5d2ca 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2183,10 +2183,8 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
* Give the controller a chance to complete all entered requests if
* doing a safe shutdown.
*/
- if (!dead) {
- if (shutdown)
- nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
- }
+ if (!dead && shutdown)
+ nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);

nvme_stop_queues(&dev->ctrl);

@@ -2211,12 +2209,15 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_request, &dev->ctrl);

/*
- * The driver will not be starting up queues again if shutting down so
- * must flush all entered requests to their failed completion to avoid
- * deadlocking blk-mq hot-cpu notifier.
+ * For shutdown case, controller will not be setup again soon. If any
+ * residual requests here, the controller must have go wrong. Drain and
+ * fail all the residual entered IO requests.
*/
- if (shutdown)
+ if (shutdown) {
nvme_start_queues(&dev->ctrl);
+ nvme_wait_freeze(&dev->ctrl);
+ nvme_stop_queues(&dev->ctrl);
+ }
mutex_unlock(&dev->shutdown_lock);
}

--
2.7.4



2018-02-12 18:45:29

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH] nvme-pci: drain the entered requests after ctrl is shutdown


> Currently, we will unquiesce the queues after the controller is
> shutdown to avoid residual requests to be stuck. In fact, we can
> do it more cleanly, just wait freeze and drain the queue in
> nvme_dev_disable and finally leave the queues quiesced.

Does this fix a bug? What is the benefit of leaving the queues
quiesced in shutdown?

2018-02-12 19:12:20

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH] nvme-pci: drain the entered requests after ctrl is shutdown

On Mon, Feb 12, 2018 at 08:43:58PM +0200, Sagi Grimberg wrote:
>
> > Currently, we will unquiesce the queues after the controller is
> > shutdown to avoid residual requests to be stuck. In fact, we can
> > do it more cleanly, just wait freeze and drain the queue in
> > nvme_dev_disable and finally leave the queues quiesced.
>
> Does this fix a bug? What is the benefit of leaving the queues
> quiesced in shutdown?

This doesn't appear to fix anything. The things this patch does do are
either unnecessary (quiece), or already done elsewhere (wait freeze).

2018-02-13 01:53:17

by jianchao.wang

[permalink] [raw]
Subject: Re: [PATCH] nvme-pci: drain the entered requests after ctrl is shutdown

Hi Keith andn Sagi

Thanks for your kindly response and comment on this.

On 02/13/2018 03:15 AM, Keith Busch wrote:
> On Mon, Feb 12, 2018 at 08:43:58PM +0200, Sagi Grimberg wrote:
>>
>>> Currently, we will unquiesce the queues after the controller is
>>> shutdown to avoid residual requests to be stuck. In fact, we can
>>> do it more cleanly, just wait freeze and drain the queue in
>>> nvme_dev_disable and finally leave the queues quiesced.
>>
>> Does this fix a bug? What is the benefit of leaving the queues
>> quiesced in shutdown?
>
> This doesn't appear to fix anything. The things this patch does do are
> either unnecessary (quiece), or already done elsewhere (wait freeze).
>

Yes, this patch doesn't fix any bug. Since we will let the request to be drained
for shutdown case to avoid to be stuck, why not do it in nvme_dev_disable and
then quiesce the queue again. In nvme_dev_disable, we unquiesce the queues finally,
it looks really something odd. And always give me a feeling that something is still
ongoing and not completed....It looks like something is leaking.... ;)

Why not we complete it in nvme_dev_disable ?

Thanks
Jianchao