2021-09-14 09:21:56

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH v7 0/3] Handle update hardware queues and queue freeze more carefully

Hi,

v7 is addressing the comments from Sagi (wrong commit id and add an
explenation why the queues are empty).

Daniel

v1:
- https://lore.kernel.org/linux-nvme/[email protected]/
v2:
- https://lore.kernel.org/linux-nvme/[email protected]/
- reviewed tags collected
- added 'update hardware queues' for all transport
- added fix for fc hanger in nvme_wait_freeze_timeout
v3:
- https://lore.kernel.org/linux-nvme/[email protected]/
- dropped 'nvme-fc: Freeze queues before destroying them'
- added James' two patches
v4:
- https://lore.kernel.org/linux-nvme/[email protected]/
- added 'nvme-*: Unfreeze queues on reconnect'
- added Hannes' reviewed tags
v5:
- https://lore.kernel.org/linux-nvme/[email protected]/
- dropped non nvme-fc patches
- updated 'nvme-fc: fix controller reset hang during traffic'
v6:
- https://lore.kernel.org/linux-nvme/[email protected]/
- updated commit message 'nvme-fc: Update hardware queues before using them'
- dropped 'nvme-fc: fix controller reset hang during traffic'
- added 'nvme-fc: Remove freeze/unfreeze around update_nr_hw_queues'
v7:
- updated commit id in last patch
- added explenation why the queues are empty in last patch

Daniel Wagner (1):
nvme-fc: Update hardware queues before using them

James Smart (2):
nvme-fc: avoid race between time out and tear down
nvme-fc: Remove freeze/unfreeze around update_nr_hw_queues

drivers/nvme/host/fc.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

--
2.29.2


2021-09-14 09:22:11

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH v7 3/3] nvme-fc: Remove freeze/unfreeze around update_nr_hw_queues

From: James Smart <[email protected]>

Remove the freeze/unfreeze around changes to the number of hardware
queues. Study and retest has indicated there are no ios that can be
active at this point so there is nothing to freeze.

nvme-fc is draining the queues in the shutdown and error recovery path
in __nvme_fc_abort_outstanding_ios.

This patch primarily reverts 88e837ed0f1f "nvme-fc: wait for queues to
freeze before calling update_hr_hw_queues". It's not an exact revert as
it leaves the adjusting of hw queues only if the count changes.

Signed-off-by: James Smart <[email protected]>
[dwagner: added explanation why no IO is pending]
Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/nvme/host/fc.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 6ebe68396712..aa14ad963d91 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2957,9 +2957,7 @@ nvme_fc_recreate_io_queues(struct nvme_fc_ctrl *ctrl)
dev_info(ctrl->ctrl.device,
"reconnect: revising io queue count from %d to %d\n",
prior_ioq_cnt, nr_io_queues);
- nvme_wait_freeze(&ctrl->ctrl);
blk_mq_update_nr_hw_queues(&ctrl->tag_set, nr_io_queues);
- nvme_unfreeze(&ctrl->ctrl);
}

ret = nvme_fc_create_hw_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
--
2.29.2

2021-09-14 09:22:33

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH v7 1/3] nvme-fc: Update hardware queues before using them

In case the number of hardware queues changes, we need to update the
tagset and the mapping of ctx to hctx first.

If we try to create and connect the I/O queues first, this operation
will fail (target will reject the connect call due to the wrong number
of queues) and hence we bail out of the recreate function. Then we
will to try the very same operation again, thus we don't make any
progress.

Reviewed-by: James Smart <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/nvme/host/fc.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index b08a61ca283f..b5d9a5507de5 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2951,14 +2951,6 @@ nvme_fc_recreate_io_queues(struct nvme_fc_ctrl *ctrl)
if (ctrl->ctrl.queue_count == 1)
return 0;

- ret = nvme_fc_create_hw_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
- if (ret)
- goto out_free_io_queues;
-
- ret = nvme_fc_connect_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
- if (ret)
- goto out_delete_hw_queues;
-
if (prior_ioq_cnt != nr_io_queues) {
dev_info(ctrl->ctrl.device,
"reconnect: revising io queue count from %d to %d\n",
@@ -2968,6 +2960,14 @@ nvme_fc_recreate_io_queues(struct nvme_fc_ctrl *ctrl)
nvme_unfreeze(&ctrl->ctrl);
}

+ ret = nvme_fc_create_hw_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
+ if (ret)
+ goto out_free_io_queues;
+
+ ret = nvme_fc_connect_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
+ if (ret)
+ goto out_delete_hw_queues;
+
return 0;

out_delete_hw_queues:
--
2.29.2

2021-09-14 09:24:45

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH v7 2/3] nvme-fc: avoid race between time out and tear down

From: James Smart <[email protected]>

To avoid race between time out and tear down, in tear down process,
first we quiesce the queue, and then delete the timer and cancel
the time out work for the queue.

This patch merges the admin and io sync ops into the queue teardown logic
as shown in the RDMA patch 3017013dcc "nvme-rdma: avoid race between time
out and tear down". There is no teardown_lock in nvme-fc.

Signed-off-by: James Smart <[email protected]>
CC: Chao Leng <[email protected]>
Tested-by: Daniel Wagner <[email protected]>
Reviewed-by: Daniel Wagner <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
---
drivers/nvme/host/fc.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index b5d9a5507de5..6ebe68396712 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2487,6 +2487,7 @@ __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues)
*/
if (ctrl->ctrl.queue_count > 1) {
nvme_stop_queues(&ctrl->ctrl);
+ nvme_sync_io_queues(&ctrl->ctrl);
blk_mq_tagset_busy_iter(&ctrl->tag_set,
nvme_fc_terminate_exchange, &ctrl->ctrl);
blk_mq_tagset_wait_completed_request(&ctrl->tag_set);
@@ -2510,6 +2511,7 @@ __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues)
* clean up the admin queue. Same thing as above.
*/
blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+ blk_sync_queue(ctrl->ctrl.admin_q);
blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
nvme_fc_terminate_exchange, &ctrl->ctrl);
blk_mq_tagset_wait_completed_request(&ctrl->admin_tag_set);
--
2.29.2

2021-09-14 11:00:01

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] nvme-fc: Remove freeze/unfreeze around update_nr_hw_queues

On 9/14/21 11:20 AM, Daniel Wagner wrote:
> From: James Smart <[email protected]>
>
> Remove the freeze/unfreeze around changes to the number of hardware
> queues. Study and retest has indicated there are no ios that can be
> active at this point so there is nothing to freeze.
>
> nvme-fc is draining the queues in the shutdown and error recovery path
> in __nvme_fc_abort_outstanding_ios.
>
> This patch primarily reverts 88e837ed0f1f "nvme-fc: wait for queues to
> freeze before calling update_hr_hw_queues". It's not an exact revert as
> it leaves the adjusting of hw queues only if the count changes.
>
> Signed-off-by: James Smart <[email protected]>
> [dwagner: added explanation why no IO is pending]
> Signed-off-by: Daniel Wagner <[email protected]>
> ---
> drivers/nvme/host/fc.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> index 6ebe68396712..aa14ad963d91 100644
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -2957,9 +2957,7 @@ nvme_fc_recreate_io_queues(struct nvme_fc_ctrl *ctrl)
> dev_info(ctrl->ctrl.device,
> "reconnect: revising io queue count from %d to %d\n",
> prior_ioq_cnt, nr_io_queues);
> - nvme_wait_freeze(&ctrl->ctrl);
> blk_mq_update_nr_hw_queues(&ctrl->tag_set, nr_io_queues);
> - nvme_unfreeze(&ctrl->ctrl);
> }
>
> ret = nvme_fc_create_hw_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
>
Reviewed-by: Hannes Reinecke <[email protected]>

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

2021-09-14 13:55:56

by Himanshu Madhani

[permalink] [raw]
Subject: Re: [PATCH v7 2/3] nvme-fc: avoid race between time out and tear down



> On Sep 14, 2021, at 4:20 AM, Daniel Wagner <[email protected]> wrote:
>
> From: James Smart <[email protected]>
>
> To avoid race between time out and tear down, in tear down process,
> first we quiesce the queue, and then delete the timer and cancel
> the time out work for the queue.
>
> This patch merges the admin and io sync ops into the queue teardown logic
> as shown in the RDMA patch 3017013dcc "nvme-rdma: avoid race between time
> out and tear down". There is no teardown_lock in nvme-fc.
>
> Signed-off-by: James Smart <[email protected]>
> CC: Chao Leng <[email protected]>
> Tested-by: Daniel Wagner <[email protected]>
> Reviewed-by: Daniel Wagner <[email protected]>
> Reviewed-by: Hannes Reinecke <[email protected]>
> ---
> drivers/nvme/host/fc.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> index b5d9a5507de5..6ebe68396712 100644
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -2487,6 +2487,7 @@ __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues)
> */
> if (ctrl->ctrl.queue_count > 1) {
> nvme_stop_queues(&ctrl->ctrl);
> + nvme_sync_io_queues(&ctrl->ctrl);
> blk_mq_tagset_busy_iter(&ctrl->tag_set,
> nvme_fc_terminate_exchange, &ctrl->ctrl);
> blk_mq_tagset_wait_completed_request(&ctrl->tag_set);
> @@ -2510,6 +2511,7 @@ __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues)
> * clean up the admin queue. Same thing as above.
> */
> blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
> + blk_sync_queue(ctrl->ctrl.admin_q);
> blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
> nvme_fc_terminate_exchange, &ctrl->ctrl);
> blk_mq_tagset_wait_completed_request(&ctrl->admin_tag_set);
> --
> 2.29.2
>

Reviewed-by: Himanshu Madhani <[email protected]>

--
Himanshu Madhani Oracle Linux Engineering

2021-09-14 13:56:47

by Himanshu Madhani

[permalink] [raw]
Subject: Re: [PATCH v7 1/3] nvme-fc: Update hardware queues before using them



> On Sep 14, 2021, at 4:20 AM, Daniel Wagner <[email protected]> wrote:
>
> In case the number of hardware queues changes, we need to update the
> tagset and the mapping of ctx to hctx first.
>
> If we try to create and connect the I/O queues first, this operation
> will fail (target will reject the connect call due to the wrong number
> of queues) and hence we bail out of the recreate function. Then we
> will to try the very same operation again, thus we don't make any
> progress.
>
> Reviewed-by: James Smart <[email protected]>
> Reviewed-by: Ming Lei <[email protected]>
> Reviewed-by: Hannes Reinecke <[email protected]>
> Signed-off-by: Daniel Wagner <[email protected]>
> ---
> drivers/nvme/host/fc.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> index b08a61ca283f..b5d9a5507de5 100644
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -2951,14 +2951,6 @@ nvme_fc_recreate_io_queues(struct nvme_fc_ctrl *ctrl)
> if (ctrl->ctrl.queue_count == 1)
> return 0;
>
> - ret = nvme_fc_create_hw_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
> - if (ret)
> - goto out_free_io_queues;
> -
> - ret = nvme_fc_connect_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
> - if (ret)
> - goto out_delete_hw_queues;
> -
> if (prior_ioq_cnt != nr_io_queues) {
> dev_info(ctrl->ctrl.device,
> "reconnect: revising io queue count from %d to %d\n",
> @@ -2968,6 +2960,14 @@ nvme_fc_recreate_io_queues(struct nvme_fc_ctrl *ctrl)
> nvme_unfreeze(&ctrl->ctrl);
> }
>
> + ret = nvme_fc_create_hw_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
> + if (ret)
> + goto out_free_io_queues;
> +
> + ret = nvme_fc_connect_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
> + if (ret)
> + goto out_delete_hw_queues;
> +
> return 0;
>
> out_delete_hw_queues:
> --
> 2.29.2
>

Reviewed-by: Himanshu Madhani <[email protected]>

--
Himanshu Madhani Oracle Linux Engineering

2021-09-14 13:57:43

by Himanshu Madhani

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] nvme-fc: Remove freeze/unfreeze around update_nr_hw_queues



> On Sep 14, 2021, at 4:20 AM, Daniel Wagner <[email protected]> wrote:
>
> From: James Smart <[email protected]>
>
> Remove the freeze/unfreeze around changes to the number of hardware
> queues. Study and retest has indicated there are no ios that can be
> active at this point so there is nothing to freeze.
>
> nvme-fc is draining the queues in the shutdown and error recovery path
> in __nvme_fc_abort_outstanding_ios.
>
> This patch primarily reverts 88e837ed0f1f "nvme-fc: wait for queues to
> freeze before calling update_hr_hw_queues". It's not an exact revert as
> it leaves the adjusting of hw queues only if the count changes.
>
> Signed-off-by: James Smart <[email protected]>
> [dwagner: added explanation why no IO is pending]
> Signed-off-by: Daniel Wagner <[email protected]>
> ---
> drivers/nvme/host/fc.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> index 6ebe68396712..aa14ad963d91 100644
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -2957,9 +2957,7 @@ nvme_fc_recreate_io_queues(struct nvme_fc_ctrl *ctrl)
> dev_info(ctrl->ctrl.device,
> "reconnect: revising io queue count from %d to %d\n",
> prior_ioq_cnt, nr_io_queues);
> - nvme_wait_freeze(&ctrl->ctrl);
> blk_mq_update_nr_hw_queues(&ctrl->tag_set, nr_io_queues);
> - nvme_unfreeze(&ctrl->ctrl);
> }
>
> ret = nvme_fc_create_hw_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
> --
> 2.29.2
>

Looks Good.

Reviewed-by: Himanshu Madhani <[email protected]>

--
Himanshu Madhani Oracle Linux Engineering

2021-09-15 00:14:18

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] nvme-fc: Remove freeze/unfreeze around update_nr_hw_queues

On Tue, Sep 14, 2021 at 11:20:08AM +0200, Daniel Wagner wrote:
> From: James Smart <[email protected]>
>
> Remove the freeze/unfreeze around changes to the number of hardware
> queues. Study and retest has indicated there are no ios that can be
> active at this point so there is nothing to freeze.
>
> nvme-fc is draining the queues in the shutdown and error recovery path
> in __nvme_fc_abort_outstanding_ios.
>
> This patch primarily reverts 88e837ed0f1f "nvme-fc: wait for queues to
> freeze before calling update_hr_hw_queues". It's not an exact revert as
> it leaves the adjusting of hw queues only if the count changes.
>
> Signed-off-by: James Smart <[email protected]>
> [dwagner: added explanation why no IO is pending]
> Signed-off-by: Daniel Wagner <[email protected]>
> ---
> drivers/nvme/host/fc.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> index 6ebe68396712..aa14ad963d91 100644
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -2957,9 +2957,7 @@ nvme_fc_recreate_io_queues(struct nvme_fc_ctrl *ctrl)
> dev_info(ctrl->ctrl.device,
> "reconnect: revising io queue count from %d to %d\n",
> prior_ioq_cnt, nr_io_queues);
> - nvme_wait_freeze(&ctrl->ctrl);
> blk_mq_update_nr_hw_queues(&ctrl->tag_set, nr_io_queues);
> - nvme_unfreeze(&ctrl->ctrl);
> }

Both connect queue and ns queues are added to this tag_set, so blk_mq_update_nr_hw_queues()
will freeze them all before updating nr_hw_queues.

Reviewed-by: Ming Lei <[email protected]>

--
Ming