2021-08-23 11:24:54

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH v6 0/3] Handle update hardware queues and queue freeze more carefully

Hi,

After our last discussion in v5, the nvme_freeze_start() is gone
(James provided a new patch). I also updated the commit message of the
first patch which adds the imported bit why we need to update the
number queues first.

Anyway, I think we figured out the details in this path and I am quiet
confident that we nailed it now (yeah, famous last words).

Daniel

v1:
- https://lore.kernel.org/linux-nvme/[email protected]/
v2:
- https://lore.kernel.org/linux-nvme/[email protected]/
- reviewed tags collected
- added 'update hardware queues' for all transport
- added fix for fc hanger in nvme_wait_freeze_timeout
v3:
- https://lore.kernel.org/linux-nvme/[email protected]/
- dropped 'nvme-fc: Freeze queues before destroying them'
- added James' two patches
v4:
- https://lore.kernel.org/linux-nvme/[email protected]/
- added 'nvme-*: Unfreeze queues on reconnect'
- added Hannes' reviewed tags
v5:
- https://lore.kernel.org/linux-nvme/[email protected]/
- dropped non nvme-fc patches
- updated 'nvme-fc: fix controller reset hang during traffic'
v6:
- updated commit message 'nvme-fc: Update hardware queues before using them'
- dropped 'nvme-fc: fix controller reset hang during traffic'
- added 'nvme-fc: Remove freeze/unfreeze around update_nr_hw_queues'

Daniel Wagner (1):
nvme-fc: Update hardware queues before using them

James Smart (2):
nvme-fc: avoid race between time out and tear down
nvme-fc: Remove freeze/unfreeze around update_nr_hw_queues

drivers/nvme/host/fc.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

--
2.29.2


2021-08-23 11:25:10

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH v6 2/3] nvme-fc: avoid race between time out and tear down

From: James Smart <[email protected]>

To avoid race between time out and tear down, in tear down process,
first we quiesce the queue, and then delete the timer and cancel
the time out work for the queue.

This patch merges the admin and io sync ops into the queue teardown logic
as shown in the RDMA patch 3017013dcc "nvme-rdma: avoid race between time
out and tear down". There is no teardown_lock in nvme-fc.

Signed-off-by: James Smart <[email protected]>
CC: Chao Leng <[email protected]>
Tested-by: Daniel Wagner <[email protected]>
Reviewed-by: Daniel Wagner <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
---
drivers/nvme/host/fc.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 8a903769364f..48aaa753be44 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2487,6 +2487,7 @@ __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues)
*/
if (ctrl->ctrl.queue_count > 1) {
nvme_stop_queues(&ctrl->ctrl);
+ nvme_sync_io_queues(&ctrl->ctrl);
blk_mq_tagset_busy_iter(&ctrl->tag_set,
nvme_fc_terminate_exchange, &ctrl->ctrl);
blk_mq_tagset_wait_completed_request(&ctrl->tag_set);
@@ -2510,6 +2511,7 @@ __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues)
* clean up the admin queue. Same thing as above.
*/
blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+ blk_sync_queue(ctrl->ctrl.admin_q);
blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
nvme_fc_terminate_exchange, &ctrl->ctrl);
blk_mq_tagset_wait_completed_request(&ctrl->admin_tag_set);
--
2.29.2

2021-08-23 11:25:22

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH v6 1/3] nvme-fc: Update hardware queues before using them

In case the number of hardware queues changes, we need to update the
tagset and the mapping of ctx to hctx first.

If we try to create and connect the I/O queues first, this operation
will fail (target will reject the connect call due to the wrong number
of queues) and hence we bail out of the recreate function. Then we
will to try the very same operation again, thus we don't make any
progress.

Reviewed-by: James Smart <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/nvme/host/fc.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 7f462af1b02a..8a903769364f 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2951,14 +2951,6 @@ nvme_fc_recreate_io_queues(struct nvme_fc_ctrl *ctrl)
if (ctrl->ctrl.queue_count == 1)
return 0;

- ret = nvme_fc_create_hw_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
- if (ret)
- goto out_free_io_queues;
-
- ret = nvme_fc_connect_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
- if (ret)
- goto out_delete_hw_queues;
-
if (prior_ioq_cnt != nr_io_queues) {
dev_info(ctrl->ctrl.device,
"reconnect: revising io queue count from %d to %d\n",
@@ -2968,6 +2960,14 @@ nvme_fc_recreate_io_queues(struct nvme_fc_ctrl *ctrl)
nvme_unfreeze(&ctrl->ctrl);
}

+ ret = nvme_fc_create_hw_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
+ if (ret)
+ goto out_free_io_queues;
+
+ ret = nvme_fc_connect_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
+ if (ret)
+ goto out_delete_hw_queues;
+
return 0;

out_delete_hw_queues:
--
2.29.2

2021-08-23 11:26:04

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH v6 3/3] nvme-fc: Remove freeze/unfreeze around update_nr_hw_queues

From: James Smart <[email protected]>

Remove the freeze/unfreeze around changes to the number of hardware
queues. Study and retest has indicated there are no ios that can be
active at this point so there is nothing to freeze.

This patch primarily reverts 883837ed0f1f "nvme-fc: wait for queues to
freeze before calling update_hr_hw_queues". It's not an exact revert as
it leaves the adjusting of hw queues only if the count changes.

Signed-off-by: James Smart <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/nvme/host/fc.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 48aaa753be44..b71d0c2d4d31 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2957,9 +2957,7 @@ nvme_fc_recreate_io_queues(struct nvme_fc_ctrl *ctrl)
dev_info(ctrl->ctrl.device,
"reconnect: revising io queue count from %d to %d\n",
prior_ioq_cnt, nr_io_queues);
- nvme_wait_freeze(&ctrl->ctrl);
blk_mq_update_nr_hw_queues(&ctrl->tag_set, nr_io_queues);
- nvme_unfreeze(&ctrl->ctrl);
}

ret = nvme_fc_create_hw_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
--
2.29.2

2021-08-24 20:39:29

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v6 3/3] nvme-fc: Remove freeze/unfreeze around update_nr_hw_queues


> From: James Smart <[email protected]>
>
> Remove the freeze/unfreeze around changes to the number of hardware
> queues. Study and retest has indicated there are no ios that can be
> active at this point so there is nothing to freeze.
>
> This patch primarily reverts 883837ed0f1f

Bogus commit ID.

"nvme-fc: wait for queues to
> freeze before calling update_hr_hw_queues". It's not an exact revert as
> it leaves the adjusting of hw queues only if the count changes.

I see that fc doesn't freeze the queues, so it obviously wrong to
unfreeze them. But is it correct to not freeze the queues?

2021-08-25 13:07:28

by Daniel Wagner

[permalink] [raw]
Subject: Re: [PATCH v6 3/3] nvme-fc: Remove freeze/unfreeze around update_nr_hw_queues

On Tue, Aug 24, 2021 at 01:38:20PM -0700, Sagi Grimberg wrote:
> > freeze before calling update_hr_hw_queues". It's not an exact revert as
> > it leaves the adjusting of hw queues only if the count changes.
>
> I see that fc doesn't freeze the queues, so it obviously wrong to
> unfreeze them. But is it correct to not freeze the queues?

nvme-fc is draining the queues in the error recovery path
(__nvme_fc_abort_outstanding_ios). There are no request in the queues
hence we don't have to freeze.

The only reason to keep the freeze/unfreeze in this path would be to
make it look alike the other transport. But it would be a no-op.

2021-08-25 16:14:01

by James Smart

[permalink] [raw]
Subject: Re: [PATCH v6 3/3] nvme-fc: Remove freeze/unfreeze around update_nr_hw_queues



On 8/25/2021 6:04 AM, Daniel Wagner wrote:
> On Tue, Aug 24, 2021 at 01:38:20PM -0700, Sagi Grimberg wrote:
>>> freeze before calling update_hr_hw_queues". It's not an exact revert as
>>> it leaves the adjusting of hw queues only if the count changes.
>> I see that fc doesn't freeze the queues, so it obviously wrong to
>> unfreeze them. But is it correct to not freeze the queues?
> nvme-fc is draining the queues in the error recovery path
> (__nvme_fc_abort_outstanding_ios). There are no request in the queues
> hence we don't have to freeze.
>
> The only reason to keep the freeze/unfreeze in this path would be to
> make it look alike the other transport. But it would be a no-op.
Yep.

And updated commit is :  88e837ed0f1f

-- james

--
This electronic communication and the information and any files transmitted
with it, or attached to it, are confidential and are intended solely for
the use of the individual or entity to whom it is addressed and may contain
information that is confidential, legally privileged, protected by privacy
laws, or otherwise restricted from disclosure to anyone else. If you are
not the intended recipient or the person responsible for delivering the
e-mail to the intended recipient, you are hereby notified that any use,
copying, distributing, dissemination, forwarding, printing, or copying of
this e-mail is strictly prohibited. If you received this e-mail in error,
please return the e-mail to the sender, delete it from your computer, and
destroy any printed copy of it.


Attachments:
smime.p7s (4.11 kB)
S/MIME Cryptographic Signature