2024-03-08 01:54:45

by brookxu.cn

[permalink] [raw]
Subject: [PATCH v3] nvme: fix reconnection fail due to reserved tag allocation

From: Chunguang Xu <[email protected]>

We found a issue on production environment while using NVMe over RDMA,
admin_q reconnect failed forever while remote target and network is ok.
After dig into it, we found it may caused by a ABBA deadlock due to tag
allocation. In my case, the tag was hold by a keep alive request
waiting inside admin_q, as we quiesced admin_q while reset ctrl, so the
request maked as idle and will not process before reset success. As
fabric_q shares tagset with admin_q, while reconnect remote target, we
need a tag for connect command, but the only one reserved tag was held
by keep alive command which waiting inside admin_q. As a result, we
failed to reconnect admin_q forever. In order to fix this issue, I
think we should keep two reserved tags for admin queue.

Fixes: ed01fee283a0 ("nvme-fabrics: only reserve a single tag")
Signed-off-by: Chunguang Xu <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
---
v3: rearrange commit log, no code change
v2: keep two reserved tags for admin_q instead of drop keep alive request

drivers/nvme/host/core.c | 4 ++--
drivers/nvme/host/fabrics.h | 10 ++++------
2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0a96362912ce..3d394a075d20 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4359,7 +4359,7 @@ int nvme_alloc_admin_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
set->ops = ops;
set->queue_depth = NVME_AQ_MQ_TAG_DEPTH;
if (ctrl->ops->flags & NVME_F_FABRICS)
- set->reserved_tags = NVMF_RESERVED_TAGS;
+ set->reserved_tags = NVMF_ADMIN_RESERVED_TAGS;
set->numa_node = ctrl->numa_node;
set->flags = BLK_MQ_F_NO_SCHED;
if (ctrl->ops->flags & NVME_F_BLOCKING)
@@ -4428,7 +4428,7 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
if (ctrl->quirks & NVME_QUIRK_SHARED_TAGS)
set->reserved_tags = NVME_AQ_DEPTH;
else if (ctrl->ops->flags & NVME_F_FABRICS)
- set->reserved_tags = NVMF_RESERVED_TAGS;
+ set->reserved_tags = NVMF_IO_RESERVED_TAGS;
set->numa_node = ctrl->numa_node;
set->flags = BLK_MQ_F_SHOULD_MERGE;
if (ctrl->ops->flags & NVME_F_BLOCKING)
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index 06cc54851b1b..a4def76a182d 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -18,12 +18,10 @@
/* default is -1: the fail fast mechanism is disabled */
#define NVMF_DEF_FAIL_FAST_TMO -1

-/*
- * Reserved one command for internal usage. This command is used for sending
- * the connect command, as well as for the keep alive command on the admin
- * queue once live.
- */
-#define NVMF_RESERVED_TAGS 1
+/* Reserved for connect */
+#define NVMF_IO_RESERVED_TAGS 1
+/* Reserved for connect and keep alive */
+#define NVMF_ADMIN_RESERVED_TAGS 2

/*
* Define a host as seen by the target. We allocate one at boot, but also
--
2.25.1



2024-03-08 14:43:54

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3] nvme: fix reconnection fail due to reserved tag allocation

The fix looks good:

Reviewed-by: Christoph Hellwig <[email protected]>

But given that's we consolidate to a single place each for setting
up the tagsets for admin and I/O queues, what about just killing
the symbolic name and moving the assignments and comments directly
into the only users?


2024-03-08 14:44:29

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v3] nvme: fix reconnection fail due to reserved tag allocation



On 08/03/2024 16:42, Christoph Hellwig wrote:
> The fix looks good:
>
> Reviewed-by: Christoph Hellwig <[email protected]>
>
> But given that's we consolidate to a single place each for setting
> up the tagsets for admin and I/O queues, what about just killing
> the symbolic name and moving the assignments and comments directly
> into the only users?
>

That works too

2024-03-08 16:30:34

by brookxu.cn

[permalink] [raw]
Subject: Re: [PATCH v3] nvme: fix reconnection fail due to reserved tag allocation

Christoph Hellwig <[email protected]> 于2024年3月8日周五 22:42写道:
>
> The fix looks good:
>
> Reviewed-by: Christoph Hellwig <[email protected]>
>
> But given that's we consolidate to a single place each for setting
> up the tagsets for admin and I/O queues, what about just killing
> the symbolic name and moving the assignments and comments directly
> into the only users?

This works now, but I donot know whether
nvme_alloc_admin_tag_set()/nvme_alloc_io_tag_set()
will be suitable for all driver in future, such as driver for apple
device not use these two funcs
to init tagset (anyway it not use these two macros too), so maybe new
driver would use these
value in other position.

>

2024-03-08 16:31:15

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3] nvme: fix reconnection fail due to reserved tag allocation

On Sat, Mar 09, 2024 at 12:29:27AM +0800, 许春光 wrote:
> This works now, but I donot know whether
> nvme_alloc_admin_tag_set()/nvme_alloc_io_tag_set()
> will be suitable for all driver in future, such as driver for apple
> device not use these two funcs
> to init tagset (anyway it not use these two macros too), so maybe new
> driver would use these
> value in other position.

nvme-apply should realy be converted to use the generic helpers,
I just need some help from the maintainers. I'll ping them.

But I'm fine with just taking this bug fix as-is and clean this up
later.

2024-03-08 16:33:25

by brookxu.cn

[permalink] [raw]
Subject: Re: [PATCH v3] nvme: fix reconnection fail due to reserved tag allocation

noted, sounds great.

Christoph Hellwig <[email protected]> 于2024年3月9日周六 00:31写道:
>
> On Sat, Mar 09, 2024 at 12:29:27AM +0800, 许春光 wrote:
> > This works now, but I donot know whether
> > nvme_alloc_admin_tag_set()/nvme_alloc_io_tag_set()
> > will be suitable for all driver in future, such as driver for apple
> > device not use these two funcs
> > to init tagset (anyway it not use these two macros too), so maybe new
> > driver would use these
> > value in other position.
>
> nvme-apply should realy be converted to use the generic helpers,
> I just need some help from the maintainers. I'll ping them.
>
> But I'm fine with just taking this bug fix as-is and clean this up
> later.

2024-03-08 16:43:30

by brookxu.cn

[permalink] [raw]
Subject: Re: [PATCH v3] nvme: fix reconnection fail due to reserved tag allocation

Christoph Hellwig <[email protected]> 于2024年3月9日周六 00:31写道:
>
> On Sat, Mar 09, 2024 at 12:29:27AM +0800, 许春光 wrote:
> > This works now, but I donot know whether
> > nvme_alloc_admin_tag_set()/nvme_alloc_io_tag_set()
> > will be suitable for all driver in future, such as driver for apple
> > device not use these two funcs
> > to init tagset (anyway it not use these two macros too), so maybe new
> > driver would use these
> > value in other position.
>
> nvme-apply should realy be converted to use the generic helpers,
> I just need some help from the maintainers. I'll ping them.
noted, that sounds great.

>
> But I'm fine with just taking this bug fix as-is and clean this up
> later.

Sorry for delay to reply, I found the patch have applied just about 10
minutes ago.
According what you plan to do, I think as-is maybe fine, But anyway if
need, I will
send another patch to cleanup, thanks.

2024-03-08 16:48:55

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH v3] nvme: fix reconnection fail due to reserved tag allocation

On Sat, Mar 09, 2024 at 12:43:12AM +0800, 许春光 wrote:
> Sorry for delay to reply, I found the patch have applied just about 10
> minutes ago.
> According what you plan to do, I think as-is maybe fine, But anyway if
> need, I will
> send another patch to cleanup, thanks.

The next pull request will be late next week, so I'll back out the
commit if you want to submit something else before then. Or you can just
submit an incremental improvement against the current tree, and that's
also fine.

2024-03-08 16:55:10

by brookxu.cn

[permalink] [raw]
Subject: Re: [PATCH v3] nvme: fix reconnection fail due to reserved tag allocation

Keith Busch <[email protected]> 于2024年3月9日周六 00:48写道:
>
> On Sat, Mar 09, 2024 at 12:43:12AM +0800, 许春光 wrote:
> > Sorry for delay to reply, I found the patch have applied just about 10
> > minutes ago.
> > According what you plan to do, I think as-is maybe fine, But anyway if
> > need, I will
> > send another patch to cleanup, thanks.
>
> The next pull request will be late next week, so I'll back out the
> commit if you want to submit something else before then. Or you can just
> submit an incremental improvement against the current tree, and that's
> also fine.

Got, I will send V4 according to the suggestion of Christoph, thanks.