2024-04-09 09:35:28

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH v5 0/6] nvme-fabrics: short-circuit connect retries

The first patch returns only kernel error codes now and avoids overwriting error
codes later. Thje newly introduced helper for deciding if a reconnect should be
attempted is the only place where we have the logic (and documentation).

On the target side I've separate the nvme status from the dhchap status handling
which made it a bit clearer. I was tempted to refactor the code in
nvmet_execute_auth_send to avoid hitting the 80 chars limit but didn't came up
with something nice yet. So let's keep this change at a minimum before any
refactoring attempts.

I've tested with blktests and also an real hardware for nvme-fc.

changes:
v5:
- nvme: do not mix kernel error code with nvme status
- nvmet: separate nvme status from dhchap status
- https://lore.kernel.org/all/[email protected]/

v4:
- rebased
- added 'nvme: fixes for authentication errors' series
https://lore.kernel.org/linux-nvme/[email protected]/

v3:
- added my SOB tag
- fixed indention
- https://lore.kernel.org/linux-nvme/[email protected]/

v2:
- refresh/rebase on current head
- extended blktests (nvme/045) to cover this case
(see separate post)
- https://lore.kernel.org/linux-nvme/[email protected]/

v1:
- initial version
- https://lore.kernel.org/linux-nvme/[email protected]/


Daniel Wagner (1):
nvme-fc: use nvme_ctrl_reconnect to decide reconnect attempts

Hannes Reinecke (5):
nvme: authentication error are always non-retryable
nvmet: lock config semaphore when accessing DH-HMAC-CHAP key
nvme-tcp: short-circuit reconnect retries
nvme-rdma: short-circuit reconnect retries
nvmet: return DHCHAP status codes from nvmet_setup_auth()

drivers/nvme/host/core.c | 6 ++--
drivers/nvme/host/fabrics.c | 25 ++++++-------
drivers/nvme/host/fc.c | 4 +--
drivers/nvme/host/nvme.h | 26 +++++++++++++-
drivers/nvme/host/rdma.c | 22 ++++++++----
drivers/nvme/host/tcp.c | 23 +++++++-----
drivers/nvme/target/auth.c | 22 ++++++------
drivers/nvme/target/configfs.c | 22 +++++++++---
drivers/nvme/target/fabrics-cmd-auth.c | 49 +++++++++++++-------------
drivers/nvme/target/fabrics-cmd.c | 11 +++---
drivers/nvme/target/nvmet.h | 8 ++---
11 files changed, 134 insertions(+), 84 deletions(-)

--
2.44.0



2024-04-09 09:35:51

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH v5 3/6] nvme-tcp: short-circuit reconnect retries

From: Hannes Reinecke <[email protected]>

Returning an nvme status from nvme_tcp_setup_ctrl() indicates that the
association was established and we have received a status from the
controller; consequently we should honour the DNR bit. If not any future
reconnect attempts will just return the same error, so we can
short-circuit the reconnect attempts and fail the connection directly.

Signed-off-by: Hannes Reinecke <[email protected]>
[dwagner: add helper to decide to reconnect]
Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/nvme/host/nvme.h | 24 ++++++++++++++++++++++++
drivers/nvme/host/tcp.c | 23 +++++++++++++++--------
2 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 9b8904a476b8..dfe103283a3d 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -701,6 +701,30 @@ static inline bool nvme_is_path_error(u16 status)
return (status & 0x700) == 0x300;
}

+/*
+ * Evaluate the status information returned by the LLDD in order to
+ * decided if a reconnect attempt should be scheduled.
+ *
+ * There are two cases where no reconnect attempt should be attempted:
+ *
+ * 1) The LLDD reports an negative status. There was an error (e.g. no
+ * memory) on the host side and thus abort the operation.
+ * Note, there are exception such as ENOTCONN which is
+ * not an internal driver error, thus we filter these errors
+ * out and retry later.
+ * 2) The DNR bit is set and the specification states no further
+ * connect attempts with the same set of paramenters should be
+ * attempted.
+ */
+static inline bool nvme_ctrl_reconnect(int status)
+{
+ if (status < 0 && status != -ENOTCONN)
+ return false;
+ else if (status > 0 && (status & NVME_SC_DNR))
+ return false;
+ return true;
+}
+
/*
* Fill in the status and result information from the CQE, and then figure out
* if blk-mq will need to use IPI magic to complete the request, and if yes do
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index fdbcdcedcee9..7e25a96e9870 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2155,9 +2155,11 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
nvme_tcp_destroy_io_queues(ctrl, remove);
}

-static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
+static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl,
+ int status)
{
enum nvme_ctrl_state state = nvme_ctrl_state(ctrl);
+ bool recon = nvme_ctrl_reconnect(status);

/* If we are resetting/deleting then do nothing */
if (state != NVME_CTRL_CONNECTING) {
@@ -2165,13 +2167,14 @@ static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
return;
}

- if (nvmf_should_reconnect(ctrl)) {
+ if (recon && nvmf_should_reconnect(ctrl)) {
dev_info(ctrl->device, "Reconnecting in %d seconds...\n",
ctrl->opts->reconnect_delay);
queue_delayed_work(nvme_wq, &to_tcp_ctrl(ctrl)->connect_work,
ctrl->opts->reconnect_delay * HZ);
} else {
- dev_info(ctrl->device, "Removing controller...\n");
+ dev_info(ctrl->device, "Removing controller (%d)...\n",
+ status);
nvme_delete_ctrl(ctrl);
}
}
@@ -2252,10 +2255,12 @@ static void nvme_tcp_reconnect_ctrl_work(struct work_struct *work)
struct nvme_tcp_ctrl *tcp_ctrl = container_of(to_delayed_work(work),
struct nvme_tcp_ctrl, connect_work);
struct nvme_ctrl *ctrl = &tcp_ctrl->ctrl;
+ int ret;

++ctrl->nr_reconnects;

- if (nvme_tcp_setup_ctrl(ctrl, false))
+ ret = nvme_tcp_setup_ctrl(ctrl, false);
+ if (ret)
goto requeue;

dev_info(ctrl->device, "Successfully reconnected (%d attempt)\n",
@@ -2268,7 +2273,7 @@ static void nvme_tcp_reconnect_ctrl_work(struct work_struct *work)
requeue:
dev_info(ctrl->device, "Failed reconnect attempt %d\n",
ctrl->nr_reconnects);
- nvme_tcp_reconnect_or_remove(ctrl);
+ nvme_tcp_reconnect_or_remove(ctrl, ret);
}

static void nvme_tcp_error_recovery_work(struct work_struct *work)
@@ -2295,7 +2300,7 @@ static void nvme_tcp_error_recovery_work(struct work_struct *work)
return;
}

- nvme_tcp_reconnect_or_remove(ctrl);
+ nvme_tcp_reconnect_or_remove(ctrl, -ENOTCONN);
}

static void nvme_tcp_teardown_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
@@ -2315,6 +2320,7 @@ static void nvme_reset_ctrl_work(struct work_struct *work)
{
struct nvme_ctrl *ctrl =
container_of(work, struct nvme_ctrl, reset_work);
+ int ret;

nvme_stop_ctrl(ctrl);
nvme_tcp_teardown_ctrl(ctrl, false);
@@ -2328,14 +2334,15 @@ static void nvme_reset_ctrl_work(struct work_struct *work)
return;
}

- if (nvme_tcp_setup_ctrl(ctrl, false))
+ ret = nvme_tcp_setup_ctrl(ctrl, false);
+ if (ret)
goto out_fail;

return;

out_fail:
++ctrl->nr_reconnects;
- nvme_tcp_reconnect_or_remove(ctrl);
+ nvme_tcp_reconnect_or_remove(ctrl, ret);
}

static void nvme_tcp_stop_ctrl(struct nvme_ctrl *ctrl)
--
2.44.0


2024-04-09 09:36:09

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH v5 5/6] nvme-fc: use nvme_ctrl_reconnect to decide reconnect attempts

The fc transport handles the DNR for reconnect correctly, though it
ignores all the negative error code returned by the LLDD. Thus, use the
nvme_ctrl_reconnect helper to ensure to have consistent behavior for all
transports.

Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/nvme/host/fc.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index a5b29e9ad342..a58f08304459 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3301,7 +3301,7 @@ nvme_fc_reconnect_or_delete(struct nvme_fc_ctrl *ctrl, int status)
struct nvme_fc_rport *rport = ctrl->rport;
struct nvme_fc_remote_port *portptr = &rport->remoteport;
unsigned long recon_delay = ctrl->ctrl.opts->reconnect_delay * HZ;
- bool recon = true;
+ bool recon = nvme_ctrl_reconnect(status);

if (nvme_ctrl_state(&ctrl->ctrl) != NVME_CTRL_CONNECTING)
return;
@@ -3310,8 +3310,6 @@ nvme_fc_reconnect_or_delete(struct nvme_fc_ctrl *ctrl, int status)
dev_info(ctrl->ctrl.device,
"NVME-FC{%d}: reset: Reconnect attempt failed (%d)\n",
ctrl->cnum, status);
- if (status > 0 && (status & NVME_SC_DNR))
- recon = false;
} else if (time_after_eq(jiffies, rport->dev_loss_end))
recon = false;

--
2.44.0


2024-04-09 09:36:12

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH v5 4/6] nvme-rdma: short-circuit reconnect retries

From: Hannes Reinecke <[email protected]>

Returning an nvme status from nvme_rdma_setup_ctrl() indicates that the
association was established and we have received a status from the
controller; consequently we should honour the DNR bit. If not any future
reconnect attempts will just return the same error, so we can
short-circuit the reconnect attempts and fail the connection directly.

Reviewed-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Hannes Reinecke <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/nvme/host/rdma.c | 22 +++++++++++++++-------
1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 366f0bb4ebfc..53d51df26ae1 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -982,9 +982,11 @@ static void nvme_rdma_free_ctrl(struct nvme_ctrl *nctrl)
kfree(ctrl);
}

-static void nvme_rdma_reconnect_or_remove(struct nvme_rdma_ctrl *ctrl)
+static void nvme_rdma_reconnect_or_remove(struct nvme_rdma_ctrl *ctrl,
+ int status)
{
enum nvme_ctrl_state state = nvme_ctrl_state(&ctrl->ctrl);
+ bool recon = nvme_ctrl_reconnect(status);

/* If we are resetting/deleting then do nothing */
if (state != NVME_CTRL_CONNECTING) {
@@ -992,12 +994,14 @@ static void nvme_rdma_reconnect_or_remove(struct nvme_rdma_ctrl *ctrl)
return;
}

- if (nvmf_should_reconnect(&ctrl->ctrl)) {
+ if (recon && nvmf_should_reconnect(&ctrl->ctrl)) {
dev_info(ctrl->ctrl.device, "Reconnecting in %d seconds...\n",
ctrl->ctrl.opts->reconnect_delay);
queue_delayed_work(nvme_wq, &ctrl->reconnect_work,
ctrl->ctrl.opts->reconnect_delay * HZ);
} else {
+ dev_info(ctrl->ctrl.device, "Removing controller (%d)...\n",
+ status);
nvme_delete_ctrl(&ctrl->ctrl);
}
}
@@ -1104,10 +1108,12 @@ static void nvme_rdma_reconnect_ctrl_work(struct work_struct *work)
{
struct nvme_rdma_ctrl *ctrl = container_of(to_delayed_work(work),
struct nvme_rdma_ctrl, reconnect_work);
+ int ret;

++ctrl->ctrl.nr_reconnects;

- if (nvme_rdma_setup_ctrl(ctrl, false))
+ ret = nvme_rdma_setup_ctrl(ctrl, false);
+ if (ret)
goto requeue;

dev_info(ctrl->ctrl.device, "Successfully reconnected (%d attempts)\n",
@@ -1120,7 +1126,7 @@ static void nvme_rdma_reconnect_ctrl_work(struct work_struct *work)
requeue:
dev_info(ctrl->ctrl.device, "Failed reconnect attempt %d\n",
ctrl->ctrl.nr_reconnects);
- nvme_rdma_reconnect_or_remove(ctrl);
+ nvme_rdma_reconnect_or_remove(ctrl, ret);
}

static void nvme_rdma_error_recovery_work(struct work_struct *work)
@@ -1145,7 +1151,7 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
return;
}

- nvme_rdma_reconnect_or_remove(ctrl);
+ nvme_rdma_reconnect_or_remove(ctrl, -ENOTCONN);
}

static void nvme_rdma_error_recovery(struct nvme_rdma_ctrl *ctrl)
@@ -2169,6 +2175,7 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
{
struct nvme_rdma_ctrl *ctrl =
container_of(work, struct nvme_rdma_ctrl, ctrl.reset_work);
+ int ret;

nvme_stop_ctrl(&ctrl->ctrl);
nvme_rdma_shutdown_ctrl(ctrl, false);
@@ -2179,14 +2186,15 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
return;
}

- if (nvme_rdma_setup_ctrl(ctrl, false))
+ ret = nvme_rdma_setup_ctrl(ctrl, false);
+ if (ret)
goto out_fail;

return;

out_fail:
++ctrl->ctrl.nr_reconnects;
- nvme_rdma_reconnect_or_remove(ctrl);
+ nvme_rdma_reconnect_or_remove(ctrl, ret);
}

static const struct nvme_ctrl_ops nvme_rdma_ctrl_ops = {
--
2.44.0


2024-04-09 09:36:20

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH v5 6/6] nvmet: return DHCHAP status codes from nvmet_setup_auth()

From: Hannes Reinecke <[email protected]>

A failure in nvmet_setup_auth() does not mean that the NVMe
authentication command failed, so we should rather return a protocol
error with a 'failure1' response than an NVMe status.

Also update the type used for dhchap_step and dhchap_status to u8 to
avoid confusions with nvme status. Furthermore, split dhchap_status and
nvme status so we don't accidentally mix these return values.

Signed-off-by: Hannes Reinecke <[email protected]>
[dwagner: - use u8 as type for dhchap_{step|status}
- separate nvme status from dhcap_status]
Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/nvme/target/auth.c | 20 +++++------
drivers/nvme/target/fabrics-cmd-auth.c | 49 +++++++++++++-------------
drivers/nvme/target/fabrics-cmd.c | 11 +++---
drivers/nvme/target/nvmet.h | 8 ++---
4 files changed, 43 insertions(+), 45 deletions(-)

diff --git a/drivers/nvme/target/auth.c b/drivers/nvme/target/auth.c
index 9afc28f1ffac..53bf1a084469 100644
--- a/drivers/nvme/target/auth.c
+++ b/drivers/nvme/target/auth.c
@@ -126,12 +126,11 @@ int nvmet_setup_dhgroup(struct nvmet_ctrl *ctrl, u8 dhgroup_id)
return ret;
}

-int nvmet_setup_auth(struct nvmet_ctrl *ctrl)
+u8 nvmet_setup_auth(struct nvmet_ctrl *ctrl)
{
int ret = 0;
struct nvmet_host_link *p;
struct nvmet_host *host = NULL;
- const char *hash_name;

down_read(&nvmet_config_sem);
if (nvmet_is_disc_subsys(ctrl->subsys))
@@ -149,13 +148,16 @@ int nvmet_setup_auth(struct nvmet_ctrl *ctrl)
}
if (!host) {
pr_debug("host %s not found\n", ctrl->hostnqn);
- ret = -EPERM;
+ ret = NVME_AUTH_DHCHAP_FAILURE_FAILED;
goto out_unlock;
}

ret = nvmet_setup_dhgroup(ctrl, host->dhchap_dhgroup_id);
- if (ret < 0)
+ if (ret < 0) {
pr_warn("Failed to setup DH group");
+ ret = NVME_AUTH_DHCHAP_FAILURE_DHGROUP_UNUSABLE;
+ goto out_unlock;
+ }

if (!host->dhchap_secret) {
pr_debug("No authentication provided\n");
@@ -166,12 +168,6 @@ int nvmet_setup_auth(struct nvmet_ctrl *ctrl)
pr_debug("Re-use existing hash ID %d\n",
ctrl->shash_id);
} else {
- hash_name = nvme_auth_hmac_name(host->dhchap_hash_id);
- if (!hash_name) {
- pr_warn("Hash ID %d invalid\n", host->dhchap_hash_id);
- ret = -EINVAL;
- goto out_unlock;
- }
ctrl->shash_id = host->dhchap_hash_id;
}

@@ -180,7 +176,7 @@ int nvmet_setup_auth(struct nvmet_ctrl *ctrl)
ctrl->host_key = nvme_auth_extract_key(host->dhchap_secret + 10,
host->dhchap_key_hash);
if (IS_ERR(ctrl->host_key)) {
- ret = PTR_ERR(ctrl->host_key);
+ ret = NVME_AUTH_DHCHAP_FAILURE_NOT_USABLE;
ctrl->host_key = NULL;
goto out_free_hash;
}
@@ -198,7 +194,7 @@ int nvmet_setup_auth(struct nvmet_ctrl *ctrl)
ctrl->ctrl_key = nvme_auth_extract_key(host->dhchap_ctrl_secret + 10,
host->dhchap_ctrl_key_hash);
if (IS_ERR(ctrl->ctrl_key)) {
- ret = PTR_ERR(ctrl->ctrl_key);
+ ret = NVME_AUTH_DHCHAP_FAILURE_NOT_USABLE;
ctrl->ctrl_key = NULL;
goto out_free_hash;
}
diff --git a/drivers/nvme/target/fabrics-cmd-auth.c b/drivers/nvme/target/fabrics-cmd-auth.c
index eb7785be0ca7..d61b8c6ff3b2 100644
--- a/drivers/nvme/target/fabrics-cmd-auth.c
+++ b/drivers/nvme/target/fabrics-cmd-auth.c
@@ -31,7 +31,7 @@ void nvmet_auth_sq_init(struct nvmet_sq *sq)
sq->dhchap_step = NVME_AUTH_DHCHAP_MESSAGE_NEGOTIATE;
}

-static u16 nvmet_auth_negotiate(struct nvmet_req *req, void *d)
+static u8 nvmet_auth_negotiate(struct nvmet_req *req, void *d)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
struct nvmf_auth_dhchap_negotiate_data *data = d;
@@ -109,7 +109,7 @@ static u16 nvmet_auth_negotiate(struct nvmet_req *req, void *d)
return 0;
}

-static u16 nvmet_auth_reply(struct nvmet_req *req, void *d)
+static u8 nvmet_auth_reply(struct nvmet_req *req, void *d)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
struct nvmf_auth_dhchap_reply_data *data = d;
@@ -172,7 +172,7 @@ static u16 nvmet_auth_reply(struct nvmet_req *req, void *d)
return 0;
}

-static u16 nvmet_auth_failure2(void *d)
+static u8 nvmet_auth_failure2(void *d)
{
struct nvmf_auth_dhchap_failure_data *data = d;

@@ -186,6 +186,7 @@ void nvmet_execute_auth_send(struct nvmet_req *req)
void *d;
u32 tl;
u16 status = 0;
+ u8 dhchap_status;

if (req->cmd->auth_send.secp != NVME_AUTH_DHCHAP_PROTOCOL_IDENTIFIER) {
status = NVME_SC_INVALID_FIELD | NVME_SC_DNR;
@@ -237,30 +238,32 @@ void nvmet_execute_auth_send(struct nvmet_req *req)
if (data->auth_type == NVME_AUTH_COMMON_MESSAGES) {
if (data->auth_id == NVME_AUTH_DHCHAP_MESSAGE_NEGOTIATE) {
/* Restart negotiation */
- pr_debug("%s: ctrl %d qid %d reset negotiation\n", __func__,
- ctrl->cntlid, req->sq->qid);
+ pr_debug("%s: ctrl %d qid %d reset negotiation\n",
+ __func__, ctrl->cntlid, req->sq->qid);
if (!req->sq->qid) {
- if (nvmet_setup_auth(ctrl) < 0) {
- status = NVME_SC_INTERNAL;
- pr_err("ctrl %d qid 0 failed to setup"
- "re-authentication",
+ dhchap_status = nvmet_setup_auth(ctrl);
+ if (dhchap_status) {
+ pr_err("ctrl %d qid 0 failed to setup re-authentication\n",
ctrl->cntlid);
- goto done_failure1;
+ req->sq->dhchap_status = dhchap_status;
+ req->sq->dhchap_step =
+ NVME_AUTH_DHCHAP_MESSAGE_FAILURE1;
+ goto done_kfree;
}
}
- req->sq->dhchap_step = NVME_AUTH_DHCHAP_MESSAGE_NEGOTIATE;
+ req->sq->dhchap_step =
+ NVME_AUTH_DHCHAP_MESSAGE_NEGOTIATE;
} else if (data->auth_id != req->sq->dhchap_step)
goto done_failure1;
/* Validate negotiation parameters */
- status = nvmet_auth_negotiate(req, d);
- if (status == 0)
+ dhchap_status = nvmet_auth_negotiate(req, d);
+ if (dhchap_status == 0)
req->sq->dhchap_step =
NVME_AUTH_DHCHAP_MESSAGE_CHALLENGE;
else {
req->sq->dhchap_step =
NVME_AUTH_DHCHAP_MESSAGE_FAILURE1;
- req->sq->dhchap_status = status;
- status = 0;
+ req->sq->dhchap_status = dhchap_status;
}
goto done_kfree;
}
@@ -284,15 +287,14 @@ void nvmet_execute_auth_send(struct nvmet_req *req)

switch (data->auth_id) {
case NVME_AUTH_DHCHAP_MESSAGE_REPLY:
- status = nvmet_auth_reply(req, d);
- if (status == 0)
+ dhchap_status = nvmet_auth_reply(req, d);
+ if (dhchap_status == 0)
req->sq->dhchap_step =
NVME_AUTH_DHCHAP_MESSAGE_SUCCESS1;
else {
req->sq->dhchap_step =
NVME_AUTH_DHCHAP_MESSAGE_FAILURE1;
- req->sq->dhchap_status = status;
- status = 0;
+ req->sq->dhchap_status = dhchap_status;
}
goto done_kfree;
case NVME_AUTH_DHCHAP_MESSAGE_SUCCESS2:
@@ -301,13 +303,12 @@ void nvmet_execute_auth_send(struct nvmet_req *req)
__func__, ctrl->cntlid, req->sq->qid);
goto done_kfree;
case NVME_AUTH_DHCHAP_MESSAGE_FAILURE2:
- status = nvmet_auth_failure2(d);
- if (status) {
+ dhchap_status = nvmet_auth_failure2(d);
+ if (dhchap_status) {
pr_warn("ctrl %d qid %d: authentication failed (%d)\n",
- ctrl->cntlid, req->sq->qid, status);
- req->sq->dhchap_status = status;
+ ctrl->cntlid, req->sq->qid, dhchap_status);
+ req->sq->dhchap_status = dhchap_status;
req->sq->authenticated = false;
- status = 0;
}
goto done_kfree;
default:
diff --git a/drivers/nvme/target/fabrics-cmd.c b/drivers/nvme/target/fabrics-cmd.c
index b23f4cf840bd..042b379cbb36 100644
--- a/drivers/nvme/target/fabrics-cmd.c
+++ b/drivers/nvme/target/fabrics-cmd.c
@@ -211,7 +211,7 @@ static void nvmet_execute_admin_connect(struct nvmet_req *req)
struct nvmf_connect_data *d;
struct nvmet_ctrl *ctrl = NULL;
u16 status;
- int ret;
+ u8 dhchap_status;

if (!nvmet_check_transfer_len(req, sizeof(struct nvmf_connect_data)))
return;
@@ -254,11 +254,12 @@ static void nvmet_execute_admin_connect(struct nvmet_req *req)

uuid_copy(&ctrl->hostid, &d->hostid);

- ret = nvmet_setup_auth(ctrl);
- if (ret < 0) {
- pr_err("Failed to setup authentication, error %d\n", ret);
+ dhchap_status = nvmet_setup_auth(ctrl);
+ if (dhchap_status) {
+ pr_err("Failed to setup authentication, dhchap status %u\n",
+ dhchap_status);
nvmet_ctrl_put(ctrl);
- if (ret == -EPERM)
+ if (dhchap_status == NVME_AUTH_DHCHAP_FAILURE_FAILED)
status = (NVME_SC_CONNECT_INVALID_HOST | NVME_SC_DNR);
else
status = NVME_SC_INTERNAL;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 257cace53924..5d6adb5c2fc0 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -113,8 +113,8 @@ struct nvmet_sq {
bool authenticated;
struct delayed_work auth_expired_work;
u16 dhchap_tid;
- u16 dhchap_status;
- int dhchap_step;
+ u8 dhchap_status;
+ u8 dhchap_step;
u8 *dhchap_c1;
u8 *dhchap_c2;
u32 dhchap_s1;
@@ -721,7 +721,7 @@ void nvmet_execute_auth_receive(struct nvmet_req *req);
int nvmet_auth_set_key(struct nvmet_host *host, const char *secret,
bool set_ctrl);
int nvmet_auth_set_host_hash(struct nvmet_host *host, const char *hash);
-int nvmet_setup_auth(struct nvmet_ctrl *ctrl);
+u8 nvmet_setup_auth(struct nvmet_ctrl *ctrl);
void nvmet_auth_sq_init(struct nvmet_sq *sq);
void nvmet_destroy_auth(struct nvmet_ctrl *ctrl);
void nvmet_auth_sq_free(struct nvmet_sq *sq);
@@ -740,7 +740,7 @@ int nvmet_auth_ctrl_exponential(struct nvmet_req *req,
int nvmet_auth_ctrl_sesskey(struct nvmet_req *req,
u8 *buf, int buf_size);
#else
-static inline int nvmet_setup_auth(struct nvmet_ctrl *ctrl)
+static inline u8 nvmet_setup_auth(struct nvmet_ctrl *ctrl)
{
return 0;
}
--
2.44.0


2024-04-09 09:40:38

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH v5 2/6] nvmet: lock config semaphore when accessing DH-HMAC-CHAP key

From: Hannes Reinecke <[email protected]>

When the DH-HMAC-CHAP key is accessed via configfs we need to take the
config semaphore as a reconnect might be running at the same time.

Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Hannes Reinecke <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/nvme/target/auth.c | 2 ++
drivers/nvme/target/configfs.c | 22 +++++++++++++++++-----
2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/target/auth.c b/drivers/nvme/target/auth.c
index 3ddbc3880cac..9afc28f1ffac 100644
--- a/drivers/nvme/target/auth.c
+++ b/drivers/nvme/target/auth.c
@@ -44,6 +44,7 @@ int nvmet_auth_set_key(struct nvmet_host *host, const char *secret,
dhchap_secret = kstrdup(secret, GFP_KERNEL);
if (!dhchap_secret)
return -ENOMEM;
+ down_write(&nvmet_config_sem);
if (set_ctrl) {
kfree(host->dhchap_ctrl_secret);
host->dhchap_ctrl_secret = strim(dhchap_secret);
@@ -53,6 +54,7 @@ int nvmet_auth_set_key(struct nvmet_host *host, const char *secret,
host->dhchap_secret = strim(dhchap_secret);
host->dhchap_key_hash = key_hash;
}
+ up_write(&nvmet_config_sem);
return 0;
}

diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 77a6e817b315..7c28b9c0ee57 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -1990,11 +1990,17 @@ static struct config_group nvmet_ports_group;
static ssize_t nvmet_host_dhchap_key_show(struct config_item *item,
char *page)
{
- u8 *dhchap_secret = to_host(item)->dhchap_secret;
+ u8 *dhchap_secret;
+ ssize_t ret;

+ down_read(&nvmet_config_sem);
+ dhchap_secret = to_host(item)->dhchap_secret;
if (!dhchap_secret)
- return sprintf(page, "\n");
- return sprintf(page, "%s\n", dhchap_secret);
+ ret = sprintf(page, "\n");
+ else
+ ret = sprintf(page, "%s\n", dhchap_secret);
+ up_read(&nvmet_config_sem);
+ return ret;
}

static ssize_t nvmet_host_dhchap_key_store(struct config_item *item,
@@ -2018,10 +2024,16 @@ static ssize_t nvmet_host_dhchap_ctrl_key_show(struct config_item *item,
char *page)
{
u8 *dhchap_secret = to_host(item)->dhchap_ctrl_secret;
+ ssize_t ret;

+ down_read(&nvmet_config_sem);
+ dhchap_secret = to_host(item)->dhchap_ctrl_secret;
if (!dhchap_secret)
- return sprintf(page, "\n");
- return sprintf(page, "%s\n", dhchap_secret);
+ ret = sprintf(page, "\n");
+ else
+ ret = sprintf(page, "%s\n", dhchap_secret);
+ up_read(&nvmet_config_sem);
+ return ret;
}

static ssize_t nvmet_host_dhchap_ctrl_key_store(struct config_item *item,
--
2.44.0


2024-04-09 13:59:44

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v5 3/6] nvme-tcp: short-circuit reconnect retries

Looks good:

Reviewed-by: Christoph Hellwig <[email protected]>

2024-04-09 14:05:37

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v5 4/6] nvme-rdma: short-circuit reconnect retries

On Tue, Apr 09, 2024 at 11:35:08AM +0200, Daniel Wagner wrote:
> From: Hannes Reinecke <[email protected]>
>
> Returning an nvme status from nvme_rdma_setup_ctrl() indicates that the

Shouldn't this an be an a based on my highschool english. Or does
Eeenvme count as a vowel?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <[email protected]>

2024-04-09 14:44:50

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH v5 4/6] nvme-rdma: short-circuit reconnect retries

On Tue, Apr 09, 2024 at 04:00:54PM +0200, Christoph Hellwig wrote:
> On Tue, Apr 09, 2024 at 11:35:08AM +0200, Daniel Wagner wrote:
> > From: Hannes Reinecke <[email protected]>
> >
> > Returning an nvme status from nvme_rdma_setup_ctrl() indicates that the
>
> Shouldn't this an be an a based on my highschool english. Or does
> Eeenvme count as a vowel?

It depends on how you hear it when you read it. If you automatically
expand the acronym, "Non-Volatile ...", then it should get the "a"
article.

If you instead try to pronounce "nvme" directly, it sounds like you're
saying "envy me", like commanding everyone to acknowledge your
awesomeness. Not sure if they had that in mind when deciding on the
name, but it's kind of amusing. Anyway, pronounce it that way, it gets
an "an". :)

2024-04-09 20:21:16

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v5 3/6] nvme-tcp: short-circuit reconnect retries



On 09/04/2024 12:35, Daniel Wagner wrote:
> From: Hannes Reinecke <[email protected]>
>
> Returning an nvme status from nvme_tcp_setup_ctrl() indicates that the
> association was established and we have received a status from the
> controller; consequently we should honour the DNR bit. If not any future
> reconnect attempts will just return the same error, so we can
> short-circuit the reconnect attempts and fail the connection directly.
>
> Signed-off-by: Hannes Reinecke <[email protected]>
> [dwagner: add helper to decide to reconnect]
> Signed-off-by: Daniel Wagner <[email protected]>
> ---
> drivers/nvme/host/nvme.h | 24 ++++++++++++++++++++++++
> drivers/nvme/host/tcp.c | 23 +++++++++++++++--------
> 2 files changed, 39 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 9b8904a476b8..dfe103283a3d 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -701,6 +701,30 @@ static inline bool nvme_is_path_error(u16 status)
> return (status & 0x700) == 0x300;
> }
>
> +/*
> + * Evaluate the status information returned by the LLDD in order to
> + * decided if a reconnect attempt should be scheduled.
> + *
> + * There are two cases where no reconnect attempt should be attempted:
> + *
> + * 1) The LLDD reports an negative status. There was an error (e.g. no
> + * memory) on the host side and thus abort the operation.
> + * Note, there are exception such as ENOTCONN which is
> + * not an internal driver error, thus we filter these errors
> + * out and retry later.
> + * 2) The DNR bit is set and the specification states no further
> + * connect attempts with the same set of paramenters should be
> + * attempted.
> + */
> +static inline bool nvme_ctrl_reconnect(int status)
> +{
> + if (status < 0 && status != -ENOTCONN)
> + return false;

So if the host failed to allocate a buffer it will never attempt
another reconnect? doesn't sound right to me..,

2024-04-09 20:22:02

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v5 4/6] nvme-rdma: short-circuit reconnect retries



On 09/04/2024 17:19, Keith Busch wrote:
> On Tue, Apr 09, 2024 at 04:00:54PM +0200, Christoph Hellwig wrote:
>> On Tue, Apr 09, 2024 at 11:35:08AM +0200, Daniel Wagner wrote:
>>> From: Hannes Reinecke <[email protected]>
>>>
>>> Returning an nvme status from nvme_rdma_setup_ctrl() indicates that the
>> Shouldn't this an be an a based on my highschool english. Or does
>> Eeenvme count as a vowel?
> It depends on how you hear it when you read it. If you automatically
> expand the acronym, "Non-Volatile ...", then it should get the "a"
> article.
>
> If you instead try to pronounce "nvme" directly, it sounds like you're
> saying "envy me", like commanding everyone to acknowledge your
> awesomeness. Not sure if they had that in mind when deciding on the
> name, but it's kind of amusing. Anyway, pronounce it that way, it gets
> an "an". :)

:)

2024-04-09 20:23:39

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v5 6/6] nvmet: return DHCHAP status codes from nvmet_setup_auth()



On 09/04/2024 12:35, Daniel Wagner wrote:
> From: Hannes Reinecke <[email protected]>
>
> A failure in nvmet_setup_auth() does not mean that the NVMe
> authentication command failed, so we should rather return a protocol
> error with a 'failure1' response than an NVMe status.
>
> Also update the type used for dhchap_step and dhchap_status to u8 to
> avoid confusions with nvme status. Furthermore, split dhchap_status and
> nvme status so we don't accidentally mix these return values.

What is the implications of this on the host behavior? In other
words, why is this a part of the series?

2024-04-09 20:28:16

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v5 4/6] nvme-rdma: short-circuit reconnect retries



On 09/04/2024 12:35, Daniel Wagner wrote:
> From: Hannes Reinecke <[email protected]>
>
> Returning an nvme status from nvme_rdma_setup_ctrl() indicates that the
> association was established and we have received a status from the
> controller; consequently we should honour
honor ?

2024-04-09 20:40:48

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v5 3/6] nvme-tcp: short-circuit reconnect retries



On 09/04/2024 12:35, Daniel Wagner wrote:
> From: Hannes Reinecke <[email protected]>
>
> Returning an nvme status from nvme_tcp_setup_ctrl() indicates that the
> association was established and we have received a status from the
> controller; consequently we should honour the DNR bit. If not any future
> reconnect attempts will just return the same error, so we can
> short-circuit the reconnect attempts and fail the connection directly.
>
> Signed-off-by: Hannes Reinecke <[email protected]>
> [dwagner: add helper to decide to reconnect]
> Signed-off-by: Daniel Wagner <[email protected]>
> ---
> drivers/nvme/host/nvme.h | 24 ++++++++++++++++++++++++
> drivers/nvme/host/tcp.c | 23 +++++++++++++++--------
> 2 files changed, 39 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 9b8904a476b8..dfe103283a3d 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -701,6 +701,30 @@ static inline bool nvme_is_path_error(u16 status)
> return (status & 0x700) == 0x300;
> }
>
> +/*
> + * Evaluate the status information returned by the LLDD

We usually say transport, less so LLDD.

> in order to
> + * decided if a reconnect attempt should be scheduled.
> + *
> + * There are two cases where no reconnect attempt should be attempted:
> + *
> + * 1) The LLDD reports an negative status. There was an error (e.g. no
> + * memory) on the host side and thus abort the operation.
> + * Note, there are exception such as ENOTCONN which is
> + * not an internal driver error, thus we filter these errors
> + * out and retry later.

What is this ENOTCONN here? From what I see its just a random
escape value to bypass the status check. I think that 0 would do the same
wouldn't it?

> + * 2) The DNR bit is set and the specification states no further
> + * connect attempts with the same set of paramenters should be
> + * attempted.
> + */
> +static inline bool nvme_ctrl_reconnect(int status)
> +{
> + if (status < 0 && status != -ENOTCONN)
> + return false;
> + else if (status > 0 && (status & NVME_SC_DNR))
> + return false;
> + return true;
> +}
> +
> /*
> * Fill in the status and result information from the CQE, and then figure out
> * if blk-mq will need to use IPI magic to complete the request, and if yes do
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index fdbcdcedcee9..7e25a96e9870 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -2155,9 +2155,11 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
> nvme_tcp_destroy_io_queues(ctrl, remove);
> }
>
> -static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
> +static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl,
> + int status)
> {
> enum nvme_ctrl_state state = nvme_ctrl_state(ctrl);
> + bool recon = nvme_ctrl_reconnect(status);
>
> /* If we are resetting/deleting then do nothing */
> if (state != NVME_CTRL_CONNECTING) {
> @@ -2165,13 +2167,14 @@ static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
> return;
> }
>
> - if (nvmf_should_reconnect(ctrl)) {
> + if (recon && nvmf_should_reconnect(ctrl)) {

its weird to have a condition that checks _and_ condition
of two different "should reconnect" conditions.

Why don't we simply pass the status to nvmf_should_reconnect()
and evaluate there?

> dev_info(ctrl->device, "Reconnecting in %d seconds...\n",
> ctrl->opts->reconnect_delay);
> queue_delayed_work(nvme_wq, &to_tcp_ctrl(ctrl)->connect_work,
> ctrl->opts->reconnect_delay * HZ);
> } else {
> - dev_info(ctrl->device, "Removing controller...\n");
> + dev_info(ctrl->device, "Removing controller (%d)...\n",
> + status);
> nvme_delete_ctrl(ctrl);
> }
> }
> @@ -2252,10 +2255,12 @@ static void nvme_tcp_reconnect_ctrl_work(struct work_struct *work)
> struct nvme_tcp_ctrl *tcp_ctrl = container_of(to_delayed_work(work),
> struct nvme_tcp_ctrl, connect_work);
> struct nvme_ctrl *ctrl = &tcp_ctrl->ctrl;
> + int ret;
>
> ++ctrl->nr_reconnects;
>
> - if (nvme_tcp_setup_ctrl(ctrl, false))
> + ret = nvme_tcp_setup_ctrl(ctrl, false);
> + if (ret)
> goto requeue;
>
> dev_info(ctrl->device, "Successfully reconnected (%d attempt)\n",
> @@ -2268,7 +2273,7 @@ static void nvme_tcp_reconnect_ctrl_work(struct work_struct *work)
> requeue:
> dev_info(ctrl->device, "Failed reconnect attempt %d\n",
> ctrl->nr_reconnects);
> - nvme_tcp_reconnect_or_remove(ctrl);
> + nvme_tcp_reconnect_or_remove(ctrl, ret);
> }
>
> static void nvme_tcp_error_recovery_work(struct work_struct *work)
> @@ -2295,7 +2300,7 @@ static void nvme_tcp_error_recovery_work(struct work_struct *work)
> return;
> }
>
> - nvme_tcp_reconnect_or_remove(ctrl);
> + nvme_tcp_reconnect_or_remove(ctrl, -ENOTCONN);
> }
>
> static void nvme_tcp_teardown_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
> @@ -2315,6 +2320,7 @@ static void nvme_reset_ctrl_work(struct work_struct *work)
> {
> struct nvme_ctrl *ctrl =
> container_of(work, struct nvme_ctrl, reset_work);
> + int ret;
>
> nvme_stop_ctrl(ctrl);
> nvme_tcp_teardown_ctrl(ctrl, false);
> @@ -2328,14 +2334,15 @@ static void nvme_reset_ctrl_work(struct work_struct *work)
> return;
> }
>
> - if (nvme_tcp_setup_ctrl(ctrl, false))
> + ret = nvme_tcp_setup_ctrl(ctrl, false);
> + if (ret)
> goto out_fail;
>
> return;
>
> out_fail:
> ++ctrl->nr_reconnects;
> - nvme_tcp_reconnect_or_remove(ctrl);
> + nvme_tcp_reconnect_or_remove(ctrl, ret);
> }
>
> static void nvme_tcp_stop_ctrl(struct nvme_ctrl *ctrl)


2024-04-10 06:02:00

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH v5 3/6] nvme-tcp: short-circuit reconnect retries

On 4/9/24 22:20, Sagi Grimberg wrote:
>
>
> On 09/04/2024 12:35, Daniel Wagner wrote:
>> From: Hannes Reinecke <[email protected]>
>>
>> Returning an nvme status from nvme_tcp_setup_ctrl() indicates that the
>> association was established and we have received a status from the
>> controller; consequently we should honour the DNR bit. If not any future
>> reconnect attempts will just return the same error, so we can
>> short-circuit the reconnect attempts and fail the connection directly.
>>
>> Signed-off-by: Hannes Reinecke <[email protected]>
>> [dwagner: add helper to decide to reconnect]
>> Signed-off-by: Daniel Wagner <[email protected]>
>> ---
>>   drivers/nvme/host/nvme.h | 24 ++++++++++++++++++++++++
>>   drivers/nvme/host/tcp.c  | 23 +++++++++++++++--------
>>   2 files changed, 39 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
>> index 9b8904a476b8..dfe103283a3d 100644
>> --- a/drivers/nvme/host/nvme.h
>> +++ b/drivers/nvme/host/nvme.h
>> @@ -701,6 +701,30 @@ static inline bool nvme_is_path_error(u16 status)
>>       return (status & 0x700) == 0x300;
>>   }
>> +/*
>> + * Evaluate the status information returned by the LLDD in order to
>> + * decided if a reconnect attempt should be scheduled.
>> + *
>> + * There are two cases where no reconnect attempt should be attempted:
>> + *
>> + * 1) The LLDD reports an negative status. There was an error (e.g. no
>> + *    memory) on the host side and thus abort the operation.
>> + *    Note, there are exception such as ENOTCONN which is
>> + *    not an internal driver error, thus we filter these errors
>> + *    out and retry later.
>> + * 2) The DNR bit is set and the specification states no further
>> + *    connect attempts with the same set of paramenters should be
>> + *    attempted.
>> + */
>> +static inline bool nvme_ctrl_reconnect(int status)
>> +{
>> +    if (status < 0 && status != -ENOTCONN)
>> +        return false;
>
> So if the host failed to allocate a buffer it will never attempt
> another reconnect? doesn't sound right to me..,

Memory allocation errors are always tricky. If a system is under memory
pressure yours won't be the only process which will exhibit issues,
so it's questionable whether you should retry, or whether it wouldn't
be safer to just abort the operation, and retry manually once the
system has recovered.
Also we just have the interesting case where RDMA goes haywire and
reports a log buffer size of several TB. No amount of retries will
be fixing that.

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


2024-04-10 06:46:46

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH v5 6/6] nvmet: return DHCHAP status codes from nvmet_setup_auth()

On 4/9/24 22:23, Sagi Grimberg wrote:
>
>
> On 09/04/2024 12:35, Daniel Wagner wrote:
>> From: Hannes Reinecke <[email protected]>
>>
>> A failure in nvmet_setup_auth() does not mean that the NVMe
>> authentication command failed, so we should rather return a protocol
>> error with a 'failure1' response than an NVMe status.
>>
>> Also update the type used for dhchap_step and dhchap_status to u8 to
>> avoid confusions with nvme status. Furthermore, split dhchap_status and
>> nvme status so we don't accidentally mix these return values.
>
> What is the implications of this on the host behavior? In other
> words, why is this a part of the series?

This issue came up during testing the series, where we found that the
target would cause connection failure (ie return the NVMe CQE with a
status code) rather than a protocol error (ie return the NVMe CQE
with a 'good' status, and set the 'failure1' status for DH-CHAP).
And with that a termination of the DH-CHAP protocol, and not
a connection reset.
So not directly related, but required to get the testcase for this
series working.

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


2024-04-10 10:07:56

by Daniel Wagner

[permalink] [raw]
Subject: Re: [PATCH v5 3/6] nvme-tcp: short-circuit reconnect retries

On Tue, Apr 09, 2024 at 11:40:03PM +0300, Sagi Grimberg wrote:
> > +/*
> > + * Evaluate the status information returned by the LLDD
>
> We usually say transport, less so LLDD.

Updated the documentation accordingly.

> > + * 1) The LLDD reports an negative status. There was an error (e.g. no
> > + * memory) on the host side and thus abort the operation.
> > + * Note, there are exception such as ENOTCONN which is
> > + * not an internal driver error, thus we filter these errors
> > + * out and retry later.
>
> What is this ENOTCONN here? From what I see its just a random
> escape value to bypass the status check. I think that 0 would do the same
> wouldn't it?

ENOTCONN is issued by the reset handler. Sure, 0 will work as well, so
if the consense is that the reset handler should do that. see below.

> > - if (nvmf_should_reconnect(ctrl)) {
> > + if (recon && nvmf_should_reconnect(ctrl)) {
>
> its weird to have a condition that checks _and_ condition
> of two different "should reconnect" conditions.
>
> Why don't we simply pass the status to nvmf_should_reconnect()
> and evaluate there?

Sure, makes sense.

> > static void nvme_tcp_error_recovery_work(struct work_struct *work)
> > @@ -2295,7 +2300,7 @@ static void nvme_tcp_error_recovery_work(struct work_struct *work)
> > return;
> > }
> > - nvme_tcp_reconnect_or_remove(ctrl);
> > + nvme_tcp_reconnect_or_remove(ctrl, -ENOTCONN);

Here is the source of ENOTCONN.

2024-04-12 00:35:36

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH v5 0/6] nvme-fabrics: short-circuit connect retries

On Tue, Apr 09, 2024 at 11:35:04AM +0200, Daniel Wagner wrote:
> The first patch returns only kernel error codes now and avoids overwriting error
> codes later. Thje newly introduced helper for deciding if a reconnect should be
> attempted is the only place where we have the logic (and documentation).
>
> On the target side I've separate the nvme status from the dhchap status handling
> which made it a bit clearer. I was tempted to refactor the code in
> nvmet_execute_auth_send to avoid hitting the 80 chars limit but didn't came up
> with something nice yet. So let's keep this change at a minimum before any
> refactoring attempts.
>
> I've tested with blktests and also an real hardware for nvme-fc.

Thanks, series applied to nvme-6.9.

2024-04-12 02:50:34

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH v5 4/6] nvme-rdma: short-circuit reconnect retries

On Tue, Apr 09, 2024 at 11:28:04PM +0300, Sagi Grimberg wrote:
> On 09/04/2024 12:35, Daniel Wagner wrote:
> >
> > Returning an nvme status from nvme_rdma_setup_ctrl() indicates that the
> > association was established and we have received a status from the
> > controller; consequently we should honour
> honor ?

King's English vs. Freedom English? Whichever flavour or color you like,
both are fine here!

2024-04-12 07:24:21

by Daniel Wagner

[permalink] [raw]
Subject: Re: [PATCH v5 0/6] nvme-fabrics: short-circuit connect retries

On Thu, Apr 11, 2024 at 06:35:25PM -0600, Keith Busch wrote:
> On Tue, Apr 09, 2024 at 11:35:04AM +0200, Daniel Wagner wrote:
> > The first patch returns only kernel error codes now and avoids overwriting error
> > codes later. Thje newly introduced helper for deciding if a reconnect should be
> > attempted is the only place where we have the logic (and documentation).
> >
> > On the target side I've separate the nvme status from the dhchap status handling
> > which made it a bit clearer. I was tempted to refactor the code in
> > nvmet_execute_auth_send to avoid hitting the 80 chars limit but didn't came up
> > with something nice yet. So let's keep this change at a minimum before any
> > refactoring attempts.
> >
> > I've tested with blktests and also an real hardware for nvme-fc.
>
> Thanks, series applied to nvme-6.9.

Thanks! I have an updated version here which addresses some of Sagi's
feedback, e.g. using only one helper function. Sorry I didn't send out
it earlier, I got a bit side tracked in testing because of the 'funky'
results with RDMA.

Do you want me to send a complete fresh series or patches on top of this
series? I'm fine either way.

2024-04-12 15:25:02

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH v5 0/6] nvme-fabrics: short-circuit connect retries

On Fri, Apr 12, 2024 at 09:24:04AM +0200, Daniel Wagner wrote:
> On Thu, Apr 11, 2024 at 06:35:25PM -0600, Keith Busch wrote:
> > On Tue, Apr 09, 2024 at 11:35:04AM +0200, Daniel Wagner wrote:
> > > The first patch returns only kernel error codes now and avoids overwriting error
> > > codes later. Thje newly introduced helper for deciding if a reconnect should be
> > > attempted is the only place where we have the logic (and documentation).
> > >
> > > On the target side I've separate the nvme status from the dhchap status handling
> > > which made it a bit clearer. I was tempted to refactor the code in
> > > nvmet_execute_auth_send to avoid hitting the 80 chars limit but didn't came up
> > > with something nice yet. So let's keep this change at a minimum before any
> > > refactoring attempts.
> > >
> > > I've tested with blktests and also an real hardware for nvme-fc.
> >
> > Thanks, series applied to nvme-6.9.
>
> Thanks! I have an updated version here which addresses some of Sagi's
> feedback, e.g. using only one helper function. Sorry I didn't send out
> it earlier, I got a bit side tracked in testing because of the 'funky'
> results with RDMA.
>
> Do you want me to send a complete fresh series or patches on top of this
> series? I'm fine either way.

Oh sorry, I didn't notice the discussion carried on after the "review"
tag. Please send me the update, I'll force push.

2024-04-14 08:35:13

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v5 0/6] nvme-fabrics: short-circuit connect retries



On 12/04/2024 18:24, Keith Busch wrote:
> On Fri, Apr 12, 2024 at 09:24:04AM +0200, Daniel Wagner wrote:
>> On Thu, Apr 11, 2024 at 06:35:25PM -0600, Keith Busch wrote:
>>> On Tue, Apr 09, 2024 at 11:35:04AM +0200, Daniel Wagner wrote:
>>>> The first patch returns only kernel error codes now and avoids overwriting error
>>>> codes later. Thje newly introduced helper for deciding if a reconnect should be
>>>> attempted is the only place where we have the logic (and documentation).
>>>>
>>>> On the target side I've separate the nvme status from the dhchap status handling
>>>> which made it a bit clearer. I was tempted to refactor the code in
>>>> nvmet_execute_auth_send to avoid hitting the 80 chars limit but didn't came up
>>>> with something nice yet. So let's keep this change at a minimum before any
>>>> refactoring attempts.
>>>>
>>>> I've tested with blktests and also an real hardware for nvme-fc.
>>> Thanks, series applied to nvme-6.9.
>> Thanks! I have an updated version here which addresses some of Sagi's
>> feedback, e.g. using only one helper function. Sorry I didn't send out
>> it earlier, I got a bit side tracked in testing because of the 'funky'
>> results with RDMA.
>>
>> Do you want me to send a complete fresh series or patches on top of this
>> series? I'm fine either way.
> Oh sorry, I didn't notice the discussion carried on after the "review"
> tag. Please send me the update, I'll force push.

I think that what we want is to have a special handling in the very specific
connect failure when the host/ctrl credentials mismatch, because out of all
the reasons to a connection failure, this _could_ be transient.