Hello
NVME_CTRL_RESETTING used to indicate the range of nvme initializing
strictly in fd634f41(nvme: merge probe_work and reset_work), but it
is not now. The NVME_CTRL_RESETTING is set before queue the
reset_work, there could be a big gap before the reset work handles
the outstanding requests. So when the NVME_CTRL_RESETTING is set,
nvme_timeout will not only meet the admin requests from the
initializing procedure, but also the IO and admin requests from
previous work before nvme_dev_disable is invoked.
To fix this, based on Christoph's suggestion, splits the
NVME_CTRL_RESETTING into NVME_CTRL_RESET_PREPARE
and NVME_CTRL_RESETTING. Before queue the reset work, changes state
to NVME_CTRL_RESET_PREPARE, after disable work completes, changes
state to NVME_CTRL_RESETTING. Then we could distinguish the different
requests and handle them separately. More details, please refer to
the comment of the 2nd patch.
V3:
- fix wrong reference in loop.c
- other misc changes
V2:
- split NVME_CTRL_RESETTING into NVME_CTRL_RESET_PREPARE and
NVME_CTRL_RESETTING. Introduce new patch based on this.
- distinguish the requests based on the new state in nvme_timeout
- change comments of patch
Jianchao Wang(2)
0001-nvme-split-resetting-state-into-reset_prepate-and-re.patch
0002-nvme-pci-fix-the-timeout-case-when-reset-is-ongoing.patch
drivers/nvme/host/core.c | 17 ++++++++++++++--
drivers/nvme/host/fc.c | 2 ++
drivers/nvme/host/nvme.h | 1 +
drivers/nvme/host/pci.c | 51 +++++++++++++++++++++++++++++++++-------------
drivers/nvme/host/rdma.c | 8 ++++++++
drivers/nvme/target/loop.c | 5 +++++
6 files changed, 68 insertions(+), 16 deletions(-)
Thanks Jianchao
There could be request timeout when the reset is ongoing.
nvme_timeout will not only meet the admin requests from the
initializing procedure, but also the IO and admin requests
from previous work before nvme_dev_disable is invoked. These
requests should be handled separately.
We could distinguish them through the ctrl->state.
If the state is NVME_CTRL_RESET_PREPARE, handle the expried
requests as nvme_cancel_request.
If the state is NVME_CTRL_RESETTING, the requests should be
from the initializing procedure. Handle them as before. Because the
nvme_reset_work will see the error and disable the dev itself, so
discard the nvme_dev_disable here.
Signed-off-by: Jianchao Wang <[email protected]>
---
drivers/nvme/host/pci.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index e477c35..0530432 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1212,19 +1212,26 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
}
/*
- * Shutdown immediately if controller times out while starting. The
- * reset work will see the pci device disabled when it gets the forced
- * cancellation error. All outstanding requests are completed on
- * shutdown, so we return BLK_EH_HANDLED.
+ * There could be two kinds of expired reqs when reset is ongoing.
+ * - Outstanding IO or admin requests from previous work before the
+ * nvme_reset_work invokes nvme_dev_disable. Handle them as the
+ * nvme_cancel_request.
+ * - Outstanding admin requests from the initializing procedure.
+ * Set NVME_REQ_CANCELLED flag on them, then nvme_reset_work will
+ * see the error, then disable the device and remove the ctrl.
*/
- if (dev->ctrl.state == NVME_CTRL_RESETTING) {
- dev_warn(dev->ctrl.device,
- "I/O %d QID %d timeout, disable controller\n",
- req->tag, nvmeq->qid);
- nvme_dev_disable(dev, false);
+ switch (dev->ctrl.state) {
+ case NVME_CTRL_RESET_PREPARE:
+ nvme_req(req)->status = NVME_SC_ABORT_REQ;
+ return BLK_EH_HANDLED;
+ case NVME_CTRL_RESETTING:
+ WARN_ON_ONCE(nvmeq->qid);
nvme_req(req)->flags |= NVME_REQ_CANCELLED;
return BLK_EH_HANDLED;
+ default:
+ break;
}
+
/*
* Shutdown the controller immediately and schedule a reset if the
* command was already aborted once before and still hasn't been
--
2.7.4
Currently, the ctrl->state will be changed to NVME_CTRL_RESETTING
before queue the reset work. This is not so strict. There could be
a big gap before the reset_work callback is invoked. In addition,
there is some disable work in the reset_work callback, strictly
speaking, not part of reset work, and could lead to some confusion.
This patch splits the NVME_CTRL_RESETTING into NVME_CTRL_RESET_PREPARE
and NVME_CTRL_RESETTING. Before queue the reset work, changes state
to NVME_CTRL_RESET_PREPARE, after disable work completes, changes
state to NVME_CTRL_RESETTING.
Suggested-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jianchao Wang <[email protected]>
---
drivers/nvme/host/core.c | 17 +++++++++++++++--
drivers/nvme/host/fc.c | 2 ++
drivers/nvme/host/nvme.h | 1 +
drivers/nvme/host/pci.c | 28 ++++++++++++++++++++++------
drivers/nvme/host/rdma.c | 8 ++++++++
drivers/nvme/target/loop.c | 5 +++++
6 files changed, 53 insertions(+), 8 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1e46e60..106a437 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -87,7 +87,7 @@ static __le32 nvme_get_log_dw10(u8 lid, size_t size)
int nvme_reset_ctrl(struct nvme_ctrl *ctrl)
{
- if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING))
+ if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESET_PREPARE))
return -EBUSY;
if (!queue_work(nvme_wq, &ctrl->reset_work))
return -EBUSY;
@@ -243,7 +243,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
break;
}
break;
- case NVME_CTRL_RESETTING:
+ case NVME_CTRL_RESET_PREPARE:
switch (old_state) {
case NVME_CTRL_NEW:
case NVME_CTRL_LIVE:
@@ -253,10 +253,21 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
break;
}
break;
+
+ case NVME_CTRL_RESETTING:
+ switch (old_state) {
+ case NVME_CTRL_RESET_PREPARE:
+ changed = true;
+ /* FALLTHRU */
+ default:
+ break;
+ }
+ break;
case NVME_CTRL_RECONNECTING:
switch (old_state) {
case NVME_CTRL_LIVE:
case NVME_CTRL_RESETTING:
+ case NVME_CTRL_RESET_PREPARE:
changed = true;
/* FALLTHRU */
default:
@@ -267,6 +278,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
switch (old_state) {
case NVME_CTRL_LIVE:
case NVME_CTRL_RESETTING:
+ case NVME_CTRL_RESET_PREPARE:
case NVME_CTRL_RECONNECTING:
changed = true;
/* FALLTHRU */
@@ -2603,6 +2615,7 @@ static ssize_t nvme_sysfs_show_state(struct device *dev,
[NVME_CTRL_NEW] = "new",
[NVME_CTRL_LIVE] = "live",
[NVME_CTRL_RESETTING] = "resetting",
+ [NVME_CTRL_RESET_PREPARE] = "reset-prepare",
[NVME_CTRL_RECONNECTING]= "reconnecting",
[NVME_CTRL_DELETING] = "deleting",
[NVME_CTRL_DEAD] = "dead",
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 794e66e..516c1ea 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -547,6 +547,7 @@ nvme_fc_resume_controller(struct nvme_fc_ctrl *ctrl)
break;
case NVME_CTRL_RESETTING:
+ case NVME_CTRL_RESET_PREPARE:
/*
* Controller is already in the process of terminating the
* association. No need to do anything further. The reconnect
@@ -790,6 +791,7 @@ nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl)
break;
case NVME_CTRL_RESETTING:
+ case NVME_CTRL_RESET_PREPARE:
/*
* Controller is already in the process of terminating the
* association. No need to do anything further. The reconnect
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index ea1aa52..1f095d7 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -120,6 +120,7 @@ enum nvme_ctrl_state {
NVME_CTRL_NEW,
NVME_CTRL_LIVE,
NVME_CTRL_RESETTING,
+ NVME_CTRL_RESET_PREPARE,
NVME_CTRL_RECONNECTING,
NVME_CTRL_DELETING,
NVME_CTRL_DEAD,
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index f5800c3..e477c35 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1141,8 +1141,13 @@ static bool nvme_should_reset(struct nvme_dev *dev, u32 csts)
bool nssro = dev->subsystem && (csts & NVME_CSTS_NSSRO);
/* If there is a reset ongoing, we shouldn't reset again. */
- if (dev->ctrl.state == NVME_CTRL_RESETTING)
+ switch (dev->ctrl.state) {
+ case NVME_CTRL_RESETTING:
+ case NVME_CTRL_RESET_PREPARE:
return false;
+ default:
+ break;
+ }
/* We shouldn't reset unless the controller is on fatal error state
* _or_ if we lost the communication with it.
@@ -1220,7 +1225,6 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
nvme_req(req)->flags |= NVME_REQ_CANCELLED;
return BLK_EH_HANDLED;
}
-
/*
* Shutdown the controller immediately and schedule a reset if the
* command was already aborted once before and still hasn't been
@@ -2180,9 +2184,16 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
if (pci_is_enabled(pdev)) {
u32 csts = readl(dev->bar + NVME_REG_CSTS);
- if (dev->ctrl.state == NVME_CTRL_LIVE ||
- dev->ctrl.state == NVME_CTRL_RESETTING)
+ switch (dev->ctrl.state) {
+ case NVME_CTRL_LIVE:
+ case NVME_CTRL_RESETTING:
+ case NVME_CTRL_RESET_PREPARE:
nvme_start_freeze(&dev->ctrl);
+ break;
+ default:
+ break;
+ }
+
dead = !!((csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) ||
pdev->error_state != pci_channel_io_normal);
}
@@ -2292,7 +2303,7 @@ static void nvme_reset_work(struct work_struct *work)
bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
int result = -ENODEV;
- if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
+ if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESET_PREPARE))
goto out;
/*
@@ -2302,6 +2313,11 @@ static void nvme_reset_work(struct work_struct *work)
if (dev->ctrl.ctrl_config & NVME_CC_ENABLE)
nvme_dev_disable(dev, false);
+ if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
+ WARN_ON_ONCE(dev->ctrl.state != NVME_CTRL_DELETING);
+ goto out;
+ }
+
result = nvme_pci_enable(dev);
if (result)
goto out;
@@ -2498,7 +2514,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (result)
goto release_pools;
- nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING);
+ nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESET_PREPARE);
dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
queue_work(nvme_wq, &dev->ctrl.reset_work);
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 37af565..8ae073e 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1753,6 +1753,14 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
nvme_stop_ctrl(&ctrl->ctrl);
nvme_rdma_shutdown_ctrl(ctrl, false);
+ changed = nvme_change_ctrl_state(&ctrl->ctrl,
+ NVME_CTRL_RESET_PREPARE);
+ if (!changed) {
+ /* state change failure is ok if we're in DELETING state */
+ WARN_ON_ONCE(ctrl->ctrl.state != NVME_CTRL_DELETING);
+ return;
+ }
+
ret = nvme_rdma_configure_admin_queue(ctrl, false);
if (ret)
goto out_fail;
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 1e21b28..30db3b7 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -481,6 +481,11 @@ static void nvme_loop_reset_ctrl_work(struct work_struct *work)
nvme_stop_ctrl(&ctrl->ctrl);
nvme_loop_shutdown_ctrl(ctrl);
+ changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING);
+ if (!changed) {
+ WARN_ON_ONCE(ctrl->ctrl.state != NVME_CTRL_DELETING);
+ return;
+ }
ret = nvme_loop_configure_admin_queue(ctrl);
if (ret)
goto out_disable;
--
2.7.4
> Currently, the ctrl->state will be changed to NVME_CTRL_RESETTING
> before queue the reset work. This is not so strict. There could be
> a big gap before the reset_work callback is invoked. In addition,
> there is some disable work in the reset_work callback, strictly
> speaking, not part of reset work, and could lead to some confusion.
>
> This patch splits the NVME_CTRL_RESETTING into NVME_CTRL_RESET_PREPARE
> and NVME_CTRL_RESETTING. Before queue the reset work, changes state
> to NVME_CTRL_RESET_PREPARE, after disable work completes, changes
> state to NVME_CTRL_RESETTING.
What guarantees that nvme_timeout will do the right thing? If it
is relying on the fact that nvme-pci sets the state to RESETTING
only _after_ quiescing the requests queues it needs to be documented
as its a real delicate dependency.
>
> Suggested-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Jianchao Wang <[email protected]>
> ---
> drivers/nvme/host/core.c | 17 +++++++++++++++--
> drivers/nvme/host/fc.c | 2 ++
> drivers/nvme/host/nvme.h | 1 +
> drivers/nvme/host/pci.c | 28 ++++++++++++++++++++++------
> drivers/nvme/host/rdma.c | 8 ++++++++
> drivers/nvme/target/loop.c | 5 +++++
> 6 files changed, 53 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 1e46e60..106a437 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -87,7 +87,7 @@ static __le32 nvme_get_log_dw10(u8 lid, size_t size)
>
> int nvme_reset_ctrl(struct nvme_ctrl *ctrl)
> {
> - if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING))
> + if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESET_PREPARE))
> return -EBUSY;
> if (!queue_work(nvme_wq, &ctrl->reset_work))
> return -EBUSY;
> @@ -243,7 +243,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
> break;
> }
> break;
> - case NVME_CTRL_RESETTING:
> + case NVME_CTRL_RESET_PREPARE:
> switch (old_state) {
> case NVME_CTRL_NEW:
> case NVME_CTRL_LIVE:
> @@ -253,10 +253,21 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
> break;
> }
> break;
> +
> + case NVME_CTRL_RESETTING:
> + switch (old_state) {
> + case NVME_CTRL_RESET_PREPARE:
> + changed = true;
> + /* FALLTHRU */
> + default:
> + break;
> + }
> + break;
> case NVME_CTRL_RECONNECTING:
> switch (old_state) {
> case NVME_CTRL_LIVE:
> case NVME_CTRL_RESETTING:
> + case NVME_CTRL_RESET_PREPARE:
> changed = true;
> /* FALLTHRU */
> default:
> @@ -267,6 +278,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
> switch (old_state) {
> case NVME_CTRL_LIVE:
> case NVME_CTRL_RESETTING:
> + case NVME_CTRL_RESET_PREPARE:
> case NVME_CTRL_RECONNECTING:
> changed = true;
> /* FALLTHRU */
> @@ -2603,6 +2615,7 @@ static ssize_t nvme_sysfs_show_state(struct device *dev,
> [NVME_CTRL_NEW] = "new",
> [NVME_CTRL_LIVE] = "live",
> [NVME_CTRL_RESETTING] = "resetting",
> + [NVME_CTRL_RESET_PREPARE] = "reset-prepare",
> [NVME_CTRL_RECONNECTING]= "reconnecting",
> [NVME_CTRL_DELETING] = "deleting",
> [NVME_CTRL_DEAD] = "dead",
> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> index 794e66e..516c1ea 100644
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -547,6 +547,7 @@ nvme_fc_resume_controller(struct nvme_fc_ctrl *ctrl)
> break;
>
> case NVME_CTRL_RESETTING:
> + case NVME_CTRL_RESET_PREPARE:
> /*
> * Controller is already in the process of terminating the
> * association. No need to do anything further. The reconnect
> @@ -790,6 +791,7 @@ nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl)
> break;
>
> case NVME_CTRL_RESETTING:
> + case NVME_CTRL_RESET_PREPARE:
> /*
> * Controller is already in the process of terminating the
> * association. No need to do anything further. The reconnect
James, can you take a look at this?
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index f5800c3..e477c35 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1141,8 +1141,13 @@ static bool nvme_should_reset(struct nvme_dev *dev, u32 csts)
> bool nssro = dev->subsystem && (csts & NVME_CSTS_NSSRO);
>
> /* If there is a reset ongoing, we shouldn't reset again. */
> - if (dev->ctrl.state == NVME_CTRL_RESETTING)
> + switch (dev->ctrl.state) {
> + case NVME_CTRL_RESETTING:
> + case NVME_CTRL_RESET_PREPARE:
> return false;
> + default:
> + break;
> + }
Isn't the indentation off? also in other places...
> @@ -2292,7 +2303,7 @@ static void nvme_reset_work(struct work_struct *work)
> bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
> int result = -ENODEV;
>
> - if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
> + if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESET_PREPARE))
> goto out;
>
> /*
> @@ -2302,6 +2313,11 @@ static void nvme_reset_work(struct work_struct *work)
> if (dev->ctrl.ctrl_config & NVME_CC_ENABLE)
> nvme_dev_disable(dev, false);
>
> + if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
> + WARN_ON_ONCE(dev->ctrl.state != NVME_CTRL_DELETING);
> + goto out;
> + }
> +
This needs documentation on _why_ here.
> - nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING);
> + nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESET_PREPARE);
> dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
Can you rebase on top of my commit:
4caff8fc19f1 nvme-pci: don't open-code nvme_reset_ctrl
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 37af565..8ae073e 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -1753,6 +1753,14 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
> nvme_stop_ctrl(&ctrl->ctrl);
> nvme_rdma_shutdown_ctrl(ctrl, false);
>
> + changed = nvme_change_ctrl_state(&ctrl->ctrl,
> + NVME_CTRL_RESET_PREPARE);
> + if (!changed) {
> + /* state change failure is ok if we're in DELETING state */
> + WARN_ON_ONCE(ctrl->ctrl.state != NVME_CTRL_DELETING);
> + return;
> + }
> +
setting RESET_PREPARE here??
Also, the error recovery code is mutually excluded from reset_work
by trying to set the same state which is protected by the ctrl state
machine, so a similar change is needed there.
So in order not to break rdma ctrl reset (again) please look at:
bcdacf7fc3d8 nvme-rdma: fix concurrent reset and reconnect
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index e477c35..0530432 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1212,19 +1212,26 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
> }
>
> /*
> - * Shutdown immediately if controller times out while starting. The
> - * reset work will see the pci device disabled when it gets the forced
> - * cancellation error. All outstanding requests are completed on
> - * shutdown, so we return BLK_EH_HANDLED.
> + * There could be two kinds of expired reqs when reset is ongoing.
> + * - Outstanding IO or admin requests from previous work before the
> + * nvme_reset_work invokes nvme_dev_disable. Handle them as the
> + * nvme_cancel_request.
> + * - Outstanding admin requests from the initializing procedure.
> + * Set NVME_REQ_CANCELLED flag on them, then nvme_reset_work will
> + * see the error, then disable the device and remove the ctrl.
> */
> - if (dev->ctrl.state == NVME_CTRL_RESETTING) {
> - dev_warn(dev->ctrl.device,
> - "I/O %d QID %d timeout, disable controller\n",
> - req->tag, nvmeq->qid);
> - nvme_dev_disable(dev, false);
> + switch (dev->ctrl.state) {
> + case NVME_CTRL_RESET_PREPARE:
> + nvme_req(req)->status = NVME_SC_ABORT_REQ;
> + return BLK_EH_HANDLED;
> + case NVME_CTRL_RESETTING:
> + WARN_ON_ONCE(nvmeq->qid);
> nvme_req(req)->flags |= NVME_REQ_CANCELLED;
> return BLK_EH_HANDLED;
> + default:
> + break;
> }
> +
Indentation is off...
Hi keith
Thanks for your kindly review and response.
On 01/14/2018 05:48 PM, Sagi Grimberg wrote:
>
>> Currently, the ctrl->state will be changed to NVME_CTRL_RESETTING
>> before queue the reset work. This is not so strict. There could be
>> a big gap before the reset_work callback is invoked. In addition,
>> there is some disable work in the reset_work callback, strictly
>> speaking, not part of reset work, and could lead to some confusion.
>>
>> This patch splits the NVME_CTRL_RESETTING into NVME_CTRL_RESET_PREPARE
>> and NVME_CTRL_RESETTING. Before queue the reset work, changes state
>> to NVME_CTRL_RESET_PREPARE, after disable work completes, changes
>> state to NVME_CTRL_RESETTING.
>
> What guarantees that nvme_timeout will do the right thing? If it
> is relying on the fact that nvme-pci sets the state to RESETTING
> only _after_ quiescing the requests queues it needs to be documented
> as its a real delicate dependency.
nvme_dev_disable will iterate all the outstanding request with nvme_cancel_request to requeue
the IO requests and complete admin requests. This is the boundary.
Transit the state from RESET_PREPARE to RESETTING after nmve_dev_disable.
So when the state is RESETTING, all the previous outstanding requests all has been handled by
nvme_cancel_request, the only requests can expire at this moment is from the adminq of initializing
procedure.
>
>
>
>>
>> Suggested-by: Christoph Hellwig <[email protected]>
>> Signed-off-by: Jianchao Wang <[email protected]>
>> ---
>> drivers/nvme/host/core.c | 17 +++++++++++++++--
>> drivers/nvme/host/fc.c | 2 ++
>> drivers/nvme/host/nvme.h | 1 +
>> drivers/nvme/host/pci.c | 28 ++++++++++++++++++++++------
>> drivers/nvme/host/rdma.c | 8 ++++++++
>> drivers/nvme/target/loop.c | 5 +++++
>> 6 files changed, 53 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 1e46e60..106a437 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -87,7 +87,7 @@ static __le32 nvme_get_log_dw10(u8 lid, size_t size)
>> int nvme_reset_ctrl(struct nvme_ctrl *ctrl)
>> {
>> - if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING))
>> + if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESET_PREPARE))
>> return -EBUSY;
>> if (!queue_work(nvme_wq, &ctrl->reset_work))
>> return -EBUSY;
>> @@ -243,7 +243,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
>> break;
>> }
>> break;
>> - case NVME_CTRL_RESETTING:
>> + case NVME_CTRL_RESET_PREPARE:
>> switch (old_state) {
>> case NVME_CTRL_NEW:
>> case NVME_CTRL_LIVE:
>> @@ -253,10 +253,21 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
>> break;
>> }
>> break;
>> +
>> + case NVME_CTRL_RESETTING:
>> + switch (old_state) {
>> + case NVME_CTRL_RESET_PREPARE:
>> + changed = true;
>> + /* FALLTHRU */
>> + default:
>> + break;
>> + }
>> + break;
>> case NVME_CTRL_RECONNECTING:
>> switch (old_state) {
>> case NVME_CTRL_LIVE:
>> case NVME_CTRL_RESETTING:
>> + case NVME_CTRL_RESET_PREPARE:
>> changed = true;
>> /* FALLTHRU */
>> default:
>> @@ -267,6 +278,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
>> switch (old_state) {
>> case NVME_CTRL_LIVE:
>> case NVME_CTRL_RESETTING:
>> + case NVME_CTRL_RESET_PREPARE:
>> case NVME_CTRL_RECONNECTING:
>> changed = true;
>> /* FALLTHRU */
>> @@ -2603,6 +2615,7 @@ static ssize_t nvme_sysfs_show_state(struct device *dev,
>> [NVME_CTRL_NEW] = "new",
>> [NVME_CTRL_LIVE] = "live",
>> [NVME_CTRL_RESETTING] = "resetting",
>> + [NVME_CTRL_RESET_PREPARE] = "reset-prepare",
>> [NVME_CTRL_RECONNECTING]= "reconnecting",
>> [NVME_CTRL_DELETING] = "deleting",
>> [NVME_CTRL_DEAD] = "dead",
>> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
>> index 794e66e..516c1ea 100644
>> --- a/drivers/nvme/host/fc.c
>> +++ b/drivers/nvme/host/fc.c
>> @@ -547,6 +547,7 @@ nvme_fc_resume_controller(struct nvme_fc_ctrl *ctrl)
>> break;
>> case NVME_CTRL_RESETTING:
>> + case NVME_CTRL_RESET_PREPARE:
>> /*
>> * Controller is already in the process of terminating the
>> * association. No need to do anything further. The reconnect
>> @@ -790,6 +791,7 @@ nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl)
>> break;
>> case NVME_CTRL_RESETTING:
>> + case NVME_CTRL_RESET_PREPARE:
>> /*
>> * Controller is already in the process of terminating the
>> * association. No need to do anything further. The reconnect
>
> James, can you take a look at this?
>
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index f5800c3..e477c35 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -1141,8 +1141,13 @@ static bool nvme_should_reset(struct nvme_dev *dev, u32 csts)
>> bool nssro = dev->subsystem && (csts & NVME_CSTS_NSSRO);
>> /* If there is a reset ongoing, we shouldn't reset again. */
>> - if (dev->ctrl.state == NVME_CTRL_RESETTING)
>> + switch (dev->ctrl.state) {
>> + case NVME_CTRL_RESETTING:
>> + case NVME_CTRL_RESET_PREPARE:
>> return false;
>> + default:
>> + break;
>> + }
>
> Isn't the indentation off? also in other places...
I see all the "switch" in core.c and pci.c is this style.
switch(x) {
case xxx:
...
case xxx:
...
}
So I followed.... :)
>
>> @@ -2292,7 +2303,7 @@ static void nvme_reset_work(struct work_struct *work)
>> bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
>> int result = -ENODEV;
>> - if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
>> + if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESET_PREPARE))
>> goto out;
>> /*
>> @@ -2302,6 +2313,11 @@ static void nvme_reset_work(struct work_struct *work)
>> if (dev->ctrl.ctrl_config & NVME_CC_ENABLE)
>> nvme_dev_disable(dev, false);
>> + if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
>> + WARN_ON_ONCE(dev->ctrl.state != NVME_CTRL_DELETING);
>> + goto out;
>> + }
>> +
>
> This needs documentation on _why_ here.
OK, sure.
>
>> - nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING);
>> + nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESET_PREPARE);
>> dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
>
> Can you rebase on top of my commit:
> 4caff8fc19f1 nvme-pci: don't open-code nvme_reset_ctrl
OK, I will
>
>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>> index 37af565..8ae073e 100644
>> --- a/drivers/nvme/host/rdma.c
>> +++ b/drivers/nvme/host/rdma.c
>> @@ -1753,6 +1753,14 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
>> nvme_stop_ctrl(&ctrl->ctrl);
>> nvme_rdma_shutdown_ctrl(ctrl, false);
>> + changed = nvme_change_ctrl_state(&ctrl->ctrl,
>> + NVME_CTRL_RESET_PREPARE);
>> + if (!changed) {
>> + /* state change failure is ok if we're in DELETING state */
>> + WARN_ON_ONCE(ctrl->ctrl.state != NVME_CTRL_DELETING);
>> + return;
>> + }
>> +
>
> setting RESET_PREPARE here??
This is my fault. I checked the code time by time, but still a stupid fault here.
>
> Also, the error recovery code is mutually excluded from reset_work
> by trying to set the same state which is protected by the ctrl state
> machine, so a similar change is needed there.
>
> So in order not to break rdma ctrl reset (again) please look at:
> bcdacf7fc3d8 nvme-rdma: fix concurrent reset and reconnect
>
Ok, I will look into it.
Thanks for your directive. That's really appreciated.
Thanks
Jianchao
On Mon, Jan 15, 2018 at 10:02:04AM +0800, jianchao.wang wrote:
> Hi keith
>
> Thanks for your kindly review and response.
I agree with Sagi's feedback, but I can't take credit for it. :)
On 01/15/2018 10:11 AM, Keith Busch wrote:
> On Mon, Jan 15, 2018 at 10:02:04AM +0800, jianchao.wang wrote:
>> Hi keith
>>
>> Thanks for your kindly review and response.
>
> I agree with Sagi's feedback, but I can't take credit for it. :)
>
ahh...but sill thank you
On the other hand, welcome to China and good traveling here !
On 1/14/2018 11:48 AM, Sagi Grimberg wrote:
>
>> Currently, the ctrl->state will be changed to NVME_CTRL_RESETTING
>> before queue the reset work. This is not so strict. There could be
>> a big gap before the reset_work callback is invoked. In addition,
>> there is some disable work in the reset_work callback, strictly
>> speaking, not part of reset work, and could lead to some confusion.
>>
>> This patch splits the NVME_CTRL_RESETTING into NVME_CTRL_RESET_PREPARE
>> and NVME_CTRL_RESETTING. Before queue the reset work, changes state
>> to NVME_CTRL_RESET_PREPARE, after disable work completes, changes
>> state to NVME_CTRL_RESETTING.
>
> What guarantees that nvme_timeout will do the right thing? If it
> is relying on the fact that nvme-pci sets the state to RESETTING
> only _after_ quiescing the requests queues it needs to be documented
> as its a real delicate dependency.
>
>
>
>>
>> Suggested-by: Christoph Hellwig <[email protected]>
>> Signed-off-by: Jianchao Wang <[email protected]>
>> ---
>> ? drivers/nvme/host/core.c?? | 17 +++++++++++++++--
>> ? drivers/nvme/host/fc.c???? |? 2 ++
>> ? drivers/nvme/host/nvme.h?? |? 1 +
>> ? drivers/nvme/host/pci.c??? | 28 ++++++++++++++++++++++------
>> ? drivers/nvme/host/rdma.c?? |? 8 ++++++++
>> ? drivers/nvme/target/loop.c |? 5 +++++
>> ? 6 files changed, 53 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 1e46e60..106a437 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -87,7 +87,7 @@ static __le32 nvme_get_log_dw10(u8 lid, size_t size)
>> ? int nvme_reset_ctrl(struct nvme_ctrl *ctrl)
>> ? {
>> -??? if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING))
>> +??? if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESET_PREPARE))
>> ????????? return -EBUSY;
>> ????? if (!queue_work(nvme_wq, &ctrl->reset_work))
>> ????????? return -EBUSY;
>> @@ -243,7 +243,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
>> ????????????? break;
>> ????????? }
>> ????????? break;
>> -??? case NVME_CTRL_RESETTING:
>> +??? case NVME_CTRL_RESET_PREPARE:
>> ????????? switch (old_state) {
>> ????????? case NVME_CTRL_NEW:
>> ????????? case NVME_CTRL_LIVE:
>> @@ -253,10 +253,21 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
>> ????????????? break;
>> ????????? }
>> ????????? break;
>> +
>> +??? case NVME_CTRL_RESETTING:
>> +??????? switch (old_state) {
>> +??????? case NVME_CTRL_RESET_PREPARE:
>> +??????????? changed = true;
>> +??????????? /* FALLTHRU */
>> +??????? default:
>> +??????????? break;
>> +??????? }
>> +??????? break;
>> ????? case NVME_CTRL_RECONNECTING:
>> ????????? switch (old_state) {
>> ????????? case NVME_CTRL_LIVE:
>> ????????? case NVME_CTRL_RESETTING:
>> +??????? case NVME_CTRL_RESET_PREPARE:
>> ????????????? changed = true;
>> ????????????? /* FALLTHRU */
>> ????????? default:
>> @@ -267,6 +278,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
>> ????????? switch (old_state) {
>> ????????? case NVME_CTRL_LIVE:
>> ????????? case NVME_CTRL_RESETTING:
>> +??????? case NVME_CTRL_RESET_PREPARE:
>> ????????? case NVME_CTRL_RECONNECTING:
>> ????????????? changed = true;
>> ????????????? /* FALLTHRU */
>> @@ -2603,6 +2615,7 @@ static ssize_t nvme_sysfs_show_state(struct
>> device *dev,
>> ????????? [NVME_CTRL_NEW]??????? = "new",
>> ????????? [NVME_CTRL_LIVE]??? = "live",
>> ????????? [NVME_CTRL_RESETTING]??? = "resetting",
>> +??????? [NVME_CTRL_RESET_PREPARE]??? = "reset-prepare",
>> ????????? [NVME_CTRL_RECONNECTING]= "reconnecting",
>> ????????? [NVME_CTRL_DELETING]??? = "deleting",
>> ????????? [NVME_CTRL_DEAD]??? = "dead",
>> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
>> index 794e66e..516c1ea 100644
>> --- a/drivers/nvme/host/fc.c
>> +++ b/drivers/nvme/host/fc.c
>> @@ -547,6 +547,7 @@ nvme_fc_resume_controller(struct nvme_fc_ctrl *ctrl)
>> ????????? break;
>> ????? case NVME_CTRL_RESETTING:
>> +??? case NVME_CTRL_RESET_PREPARE:
>> ????????? /*
>> ?????????? * Controller is already in the process of terminating the
>> ?????????? * association. No need to do anything further. The reconnect
>> @@ -790,6 +791,7 @@ nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl
>> *ctrl)
>> ????????? break;
>> ????? case NVME_CTRL_RESETTING:
>> +??? case NVME_CTRL_RESET_PREPARE:
>> ????????? /*
>> ?????????? * Controller is already in the process of terminating the
>> ?????????? * association.? No need to do anything further. The reconnect
>
> James, can you take a look at this?
>
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index f5800c3..e477c35 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -1141,8 +1141,13 @@ static bool nvme_should_reset(struct nvme_dev
>> *dev, u32 csts)
>> ????? bool nssro = dev->subsystem && (csts & NVME_CSTS_NSSRO);
>> ????? /* If there is a reset ongoing, we shouldn't reset again. */
>> -??? if (dev->ctrl.state == NVME_CTRL_RESETTING)
>> +??? switch (dev->ctrl.state) {
>> +??? case NVME_CTRL_RESETTING:
>> +??? case NVME_CTRL_RESET_PREPARE:
>> ????????? return false;
>> +??? default:
>> +??????? break;
>> +??? }
>
> Isn't the indentation off? also in other places...
>
>> @@ -2292,7 +2303,7 @@ static void nvme_reset_work(struct work_struct
>> *work)
>> ????? bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
>> ????? int result = -ENODEV;
>> -??? if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
>> +??? if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESET_PREPARE))
>> ????????? goto out;
>> ????? /*
>> @@ -2302,6 +2313,11 @@ static void nvme_reset_work(struct work_struct
>> *work)
>> ????? if (dev->ctrl.ctrl_config & NVME_CC_ENABLE)
>> ????????? nvme_dev_disable(dev, false);
>> +??? if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
>> +??????? WARN_ON_ONCE(dev->ctrl.state != NVME_CTRL_DELETING);
>> +??????? goto out;
>> +??? }
>> +
>
> This needs documentation on _why_ here.
>
>> -??? nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING);
>> +??? nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESET_PREPARE);
>> ????? dev_info(dev->ctrl.device, "pci function %s\n",
>> dev_name(&pdev->dev));
>
> Can you rebase on top of my commit:
> 4caff8fc19f1 nvme-pci: don't open-code nvme_reset_ctrl
>
>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>> index 37af565..8ae073e 100644
>> --- a/drivers/nvme/host/rdma.c
>> +++ b/drivers/nvme/host/rdma.c
>> @@ -1753,6 +1753,14 @@ static void nvme_rdma_reset_ctrl_work(struct
>> work_struct *work)
>> ????? nvme_stop_ctrl(&ctrl->ctrl);
>> ????? nvme_rdma_shutdown_ctrl(ctrl, false);
>> +??? changed = nvme_change_ctrl_state(&ctrl->ctrl,
>> +??????????? NVME_CTRL_RESET_PREPARE);
>> +??? if (!changed) {
>> +??????? /* state change failure is ok if we're in DELETING state */
>> +??????? WARN_ON_ONCE(ctrl->ctrl.state != NVME_CTRL_DELETING);
>> +??????? return;
>> +??? }
>> +
>
> setting RESET_PREPARE here??
>
> Also, the error recovery code is mutually excluded from reset_work
> by trying to set the same state which is protected by the ctrl state
> machine, so a similar change is needed there.
Sagi,
Do you mean to add this (if so, I agree):
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index d06641b..44ef52a 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -957,6 +957,12 @@ static void nvme_rdma_error_recovery_work(struct
work_struct *work)
struct nvme_rdma_ctrl *ctrl = container_of(work,
struct nvme_rdma_ctrl, err_work);
+ if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETING)) {
+ /* state change failure should never happen */
+ WARN_ON_ONCE(1);
+ return;
+ }
+
nvme_stop_keep_alive(&ctrl->ctrl);
if (ctrl->ctrl.queue_count > 1) {
@@ -989,7 +995,7 @@ static void nvme_rdma_error_recovery_work(struct
work_struct *work)
static void nvme_rdma_error_recovery(struct nvme_rdma_ctrl *ctrl)
{
- if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING))
+ if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESET_PREPARE))
return;
queue_work(nvme_wq, &ctrl->err_work);
@@ -1760,6 +1766,12 @@ static void nvme_rdma_reset_ctrl_work(struct
work_struct *work)
int ret;
bool changed;
+ if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETING)) {
+ /* state change failure should never happen */
+ WARN_ON_ONCE(1);
+ return;
+ }
+
nvme_stop_ctrl(&ctrl->ctrl);
nvme_rdma_shutdown_ctrl(ctrl, false);
>
> So in order not to break rdma ctrl reset (again) please look at:
> bcdacf7fc3d8 nvme-rdma: fix concurrent reset and reconnect
>
> _______________________________________________
> Linux-nvme mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-nvme
On 1/15/2018 3:28 PM, Max Gurtovoy wrote:
>
>
> On 1/14/2018 11:48 AM, Sagi Grimberg wrote:
>>
>>> Currently, the ctrl->state will be changed to NVME_CTRL_RESETTING
>>> before queue the reset work. This is not so strict. There could be
>>> a big gap before the reset_work callback is invoked. In addition,
>>> there is some disable work in the reset_work callback, strictly
>>> speaking, not part of reset work, and could lead to some confusion.
>>>
>>> This patch splits the NVME_CTRL_RESETTING into NVME_CTRL_RESET_PREPARE
>>> and NVME_CTRL_RESETTING. Before queue the reset work, changes state
>>> to NVME_CTRL_RESET_PREPARE, after disable work completes, changes
>>> state to NVME_CTRL_RESETTING.
>>
>> What guarantees that nvme_timeout will do the right thing? If it
>> is relying on the fact that nvme-pci sets the state to RESETTING
>> only _after_ quiescing the requests queues it needs to be documented
>> as its a real delicate dependency.
>>
>>
>>
>>>
>>> Suggested-by: Christoph Hellwig <[email protected]>
>>> Signed-off-by: Jianchao Wang <[email protected]>
>>> ---
>>> ? drivers/nvme/host/core.c?? | 17 +++++++++++++++--
>>> ? drivers/nvme/host/fc.c???? |? 2 ++
>>> ? drivers/nvme/host/nvme.h?? |? 1 +
>>> ? drivers/nvme/host/pci.c??? | 28 ++++++++++++++++++++++------
>>> ? drivers/nvme/host/rdma.c?? |? 8 ++++++++
>>> ? drivers/nvme/target/loop.c |? 5 +++++
>>> ? 6 files changed, 53 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index 1e46e60..106a437 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -87,7 +87,7 @@ static __le32 nvme_get_log_dw10(u8 lid, size_t size)
>>> ? int nvme_reset_ctrl(struct nvme_ctrl *ctrl)
>>> ? {
>>> -??? if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING))
>>> +??? if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESET_PREPARE))
>>> ????????? return -EBUSY;
>>> ????? if (!queue_work(nvme_wq, &ctrl->reset_work))
>>> ????????? return -EBUSY;
>>> @@ -243,7 +243,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
>>> ????????????? break;
>>> ????????? }
>>> ????????? break;
>>> -??? case NVME_CTRL_RESETTING:
>>> +??? case NVME_CTRL_RESET_PREPARE:
>>> ????????? switch (old_state) {
>>> ????????? case NVME_CTRL_NEW:
>>> ????????? case NVME_CTRL_LIVE:
>>> @@ -253,10 +253,21 @@ bool nvme_change_ctrl_state(struct nvme_ctrl
>>> *ctrl,
>>> ????????????? break;
>>> ????????? }
>>> ????????? break;
>>> +
>>> +??? case NVME_CTRL_RESETTING:
>>> +??????? switch (old_state) {
>>> +??????? case NVME_CTRL_RESET_PREPARE:
>>> +??????????? changed = true;
>>> +??????????? /* FALLTHRU */
>>> +??????? default:
>>> +??????????? break;
>>> +??????? }
>>> +??????? break;
>>> ????? case NVME_CTRL_RECONNECTING:
>>> ????????? switch (old_state) {
>>> ????????? case NVME_CTRL_LIVE:
>>> ????????? case NVME_CTRL_RESETTING:
>>> +??????? case NVME_CTRL_RESET_PREPARE:
I forget to add that we shouldn't move from RESET_PREPARE to
RECONNECTING (with my suggestion to rdma.c).
Also need to consider adding another check in nvmf_check_init_req
(/drivers/nvme/host/fabrics.h) for the new state.
>>> ????????????? changed = true;
>>> ????????????? /* FALLTHRU */
>>> ????????? default:
>>> @@ -267,6 +278,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
>>> ????????? switch (old_state) {
>>> ????????? case NVME_CTRL_LIVE:
>>> ????????? case NVME_CTRL_RESETTING:
>>> +??????? case NVME_CTRL_RESET_PREPARE:
>>> ????????? case NVME_CTRL_RECONNECTING:
>>> ????????????? changed = true;
>>> ????????????? /* FALLTHRU */
>>> @@ -2603,6 +2615,7 @@ static ssize_t nvme_sysfs_show_state(struct
>>> device *dev,
>>> ????????? [NVME_CTRL_NEW]??????? = "new",
>>> ????????? [NVME_CTRL_LIVE]??? = "live",
>>> ????????? [NVME_CTRL_RESETTING]??? = "resetting",
>>> +??????? [NVME_CTRL_RESET_PREPARE]??? = "reset-prepare",
>>> ????????? [NVME_CTRL_RECONNECTING]= "reconnecting",
>>> ????????? [NVME_CTRL_DELETING]??? = "deleting",
>>> ????????? [NVME_CTRL_DEAD]??? = "dead",
>>> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
>>> index 794e66e..516c1ea 100644
>>> --- a/drivers/nvme/host/fc.c
>>> +++ b/drivers/nvme/host/fc.c
>>> @@ -547,6 +547,7 @@ nvme_fc_resume_controller(struct nvme_fc_ctrl *ctrl)
>>> ????????? break;
>>> ????? case NVME_CTRL_RESETTING:
>>> +??? case NVME_CTRL_RESET_PREPARE:
>>> ????????? /*
>>> ?????????? * Controller is already in the process of terminating the
>>> ?????????? * association. No need to do anything further. The reconnect
>>> @@ -790,6 +791,7 @@ nvme_fc_ctrl_connectivity_loss(struct
>>> nvme_fc_ctrl *ctrl)
>>> ????????? break;
>>> ????? case NVME_CTRL_RESETTING:
>>> +??? case NVME_CTRL_RESET_PREPARE:
>>> ????????? /*
>>> ?????????? * Controller is already in the process of terminating the
>>> ?????????? * association.? No need to do anything further. The reconnect
>>
>> James, can you take a look at this?
>>
>>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>>> index f5800c3..e477c35 100644
>>> --- a/drivers/nvme/host/pci.c
>>> +++ b/drivers/nvme/host/pci.c
>>> @@ -1141,8 +1141,13 @@ static bool nvme_should_reset(struct nvme_dev
>>> *dev, u32 csts)
>>> ????? bool nssro = dev->subsystem && (csts & NVME_CSTS_NSSRO);
>>> ????? /* If there is a reset ongoing, we shouldn't reset again. */
>>> -??? if (dev->ctrl.state == NVME_CTRL_RESETTING)
>>> +??? switch (dev->ctrl.state) {
>>> +??? case NVME_CTRL_RESETTING:
>>> +??? case NVME_CTRL_RESET_PREPARE:
>>> ????????? return false;
>>> +??? default:
>>> +??????? break;
>>> +??? }
>>
>> Isn't the indentation off? also in other places...
>>
>>> @@ -2292,7 +2303,7 @@ static void nvme_reset_work(struct work_struct
>>> *work)
>>> ????? bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
>>> ????? int result = -ENODEV;
>>> -??? if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
>>> +??? if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESET_PREPARE))
>>> ????????? goto out;
>>> ????? /*
>>> @@ -2302,6 +2313,11 @@ static void nvme_reset_work(struct work_struct
>>> *work)
>>> ????? if (dev->ctrl.ctrl_config & NVME_CC_ENABLE)
>>> ????????? nvme_dev_disable(dev, false);
>>> +??? if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
>>> +??????? WARN_ON_ONCE(dev->ctrl.state != NVME_CTRL_DELETING);
>>> +??????? goto out;
>>> +??? }
>>> +
>>
>> This needs documentation on _why_ here.
>>
>>> -??? nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING);
>>> +??? nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESET_PREPARE);
>>> ????? dev_info(dev->ctrl.device, "pci function %s\n",
>>> dev_name(&pdev->dev));
>>
>> Can you rebase on top of my commit:
>> 4caff8fc19f1 nvme-pci: don't open-code nvme_reset_ctrl
>>
>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>>> index 37af565..8ae073e 100644
>>> --- a/drivers/nvme/host/rdma.c
>>> +++ b/drivers/nvme/host/rdma.c
>>> @@ -1753,6 +1753,14 @@ static void nvme_rdma_reset_ctrl_work(struct
>>> work_struct *work)
>>> ????? nvme_stop_ctrl(&ctrl->ctrl);
>>> ????? nvme_rdma_shutdown_ctrl(ctrl, false);
>>> +??? changed = nvme_change_ctrl_state(&ctrl->ctrl,
>>> +??????????? NVME_CTRL_RESET_PREPARE);
>>> +??? if (!changed) {
>>> +??????? /* state change failure is ok if we're in DELETING state */
>>> +??????? WARN_ON_ONCE(ctrl->ctrl.state != NVME_CTRL_DELETING);
>>> +??????? return;
>>> +??? }
>>> +
>>
>> setting RESET_PREPARE here??
>>
>> Also, the error recovery code is mutually excluded from reset_work
>> by trying to set the same state which is protected by the ctrl state
>> machine, so a similar change is needed there.
>
> Sagi,
> Do you mean to add this (if so, I agree):
>
>
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index d06641b..44ef52a 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -957,6 +957,12 @@ static void nvme_rdma_error_recovery_work(struct
> work_struct *work)
> ??? struct nvme_rdma_ctrl *ctrl = container_of(work,
> ??????????? struct nvme_rdma_ctrl, err_work);
>
> +?? if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETING)) {
> +?????? /* state change failure should never happen */
> +?????? WARN_ON_ONCE(1);
> +?????? return;
> +?? }
> +
> ??? nvme_stop_keep_alive(&ctrl->ctrl);
>
> ??? if (ctrl->ctrl.queue_count > 1) {
> @@ -989,7 +995,7 @@ static void nvme_rdma_error_recovery_work(struct
> work_struct *work)
>
> ?static void nvme_rdma_error_recovery(struct nvme_rdma_ctrl *ctrl)
> ?{
> -?? if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING))
> +?? if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESET_PREPARE))
> ??????? return;
>
> ??? queue_work(nvme_wq, &ctrl->err_work);
> @@ -1760,6 +1766,12 @@ static void nvme_rdma_reset_ctrl_work(struct
> work_struct *work)
> ??? int ret;
> ??? bool changed;
>
> +?? if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETING)) {
> +?????? /* state change failure should never happen */
> +?????? WARN_ON_ONCE(1);
> +?????? return;
> +?? }
> +
> ??? nvme_stop_ctrl(&ctrl->ctrl);
> ??? nvme_rdma_shutdown_ctrl(ctrl, false);
>
>
>
>
>
>
>>
>> So in order not to break rdma ctrl reset (again) please look at:
>> bcdacf7fc3d8 nvme-rdma: fix concurrent reset and reconnect
>>
>> _______________________________________________
>> Linux-nvme mailing list
>> [email protected]
>> http://lists.infradead.org/mailman/listinfo/linux-nvme
>
> _______________________________________________
> Linux-nvme mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-nvme
Hi max
Thanks for your kindly response and comment.
On 01/15/2018 09:28 PM, Max Gurtovoy wrote:
>>>
>>
>> setting RESET_PREPARE here??
>>
>> Also, the error recovery code is mutually excluded from reset_work
>> by trying to set the same state which is protected by the ctrl state
>> machine, so a similar change is needed there.
>
> Sagi,
> Do you mean to add this (if so, I agree):
>
>
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index d06641b..44ef52a 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -957,6 +957,12 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
> ??? struct nvme_rdma_ctrl *ctrl = container_of(work,
> ??????????? struct nvme_rdma_ctrl, err_work);
>
> +?? if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETING)) {
> +?????? /* state change failure should never happen */
> +?????? WARN_ON_ONCE(1);
> +?????? return;
> +?? }
> +
> ??? nvme_stop_keep_alive(&ctrl->ctrl);
>
> ??? if (ctrl->ctrl.queue_count > 1) {
> @@ -989,7 +995,7 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
>
> ?static void nvme_rdma_error_recovery(struct nvme_rdma_ctrl *ctrl)
> ?{
> -?? if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING))
> +?? if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESET_PREPARE))
> ??????? return;
>
> ??? queue_work(nvme_wq, &ctrl->err_work);
> @@ -1760,6 +1766,12 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
> ??? int ret;
> ??? bool changed;
>
> +?? if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETING)) {
> +?????? /* state change failure should never happen */
> +?????? WARN_ON_ONCE(1);
> +?????? return;
> +?? }
> +
> ??? nvme_stop_ctrl(&ctrl->ctrl);
> ??? nvme_rdma_shutdown_ctrl(ctrl, false);
RESET_PREPARE state should include not only the scheduling gap of reset_work, but also
the device disable procedure. All the previous state and work are cleared and canceled,
then start new one. It is a very obvious boundary there.
nvme_stop_ctrl(&ctrl->ctrl);
nvme_rdma_shutdown_ctrl(ctrl, false);
+ if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETING)) {
+ /* state change failure should never happen */
+ WARN_ON_ONCE(1);
+ return;
+ }
What do you think about ? :)
Thanks
Jianchao
Hi Max
Thanks for your kindly comment.
On 01/15/2018 09:36 PM, Max Gurtovoy wrote:
>>>> ????? case NVME_CTRL_RECONNECTING:
>>>> ????????? switch (old_state) {
>>>> ????????? case NVME_CTRL_LIVE:
>>>> ????????? case NVME_CTRL_RESETTING:
>>>> +??????? case NVME_CTRL_RESET_PREPARE:
>
> I forget to add that we shouldn't move from RESET_PREPARE to RECONNECTING (with my suggestion to rdma.c).
> Also need to consider adding another check in nvmf_check_init_req (/drivers/nvme/host/fabrics.h) for the new state.
After Sagi's nvme-rdma: fix concurrent reset and reconnect, the rdma ctrl state is changed to RECONNECTING state
after some clearing and shutdown work, then some initializing procedure, no matter reset work path or error recovery path.
The fc reset work also does the same thing.
So if we define the range that RESET_PREPARE includes scheduling gap and disable and clear work, RESETTING includes initializing
procedure, RECONNECTING is very similar with RESETTING.
Maybe we could do like this;
In nvme fc/rdma
- set state to RESET_PREPARE, queue reset_work/err_work
- clear/shutdown works, set state to RECONNECTING
- initialization, set state to LIVE
In nvme pci
- set state to RESET_PREPARE, queue reset_work
- clear/shutdown works, set state to RESETTING
- initialization, set state to LIVE
Thanks
Jianchao
On 01/16/2018 01:57 PM, jianchao.wang wrote:
> Hi Max
>
> Thanks for your kindly comment.
>
> On 01/15/2018 09:36 PM, Max Gurtovoy wrote:
>>>>> ????? case NVME_CTRL_RECONNECTING:
>>>>> ????????? switch (old_state) {
>>>>> ????????? case NVME_CTRL_LIVE:
>>>>> ????????? case NVME_CTRL_RESETTING:
>>>>> +??????? case NVME_CTRL_RESET_PREPARE:
>>
>> I forget to add that we shouldn't move from RESET_PREPARE to RECONNECTING (with my suggestion to rdma.c).
>> Also need to consider adding another check in nvmf_check_init_req (/drivers/nvme/host/fabrics.h) for the new state.
>
> After Sagi's nvme-rdma: fix concurrent reset and reconnect, the rdma ctrl state is changed to RECONNECTING state
> after some clearing and shutdown work, then some initializing procedure, no matter reset work path or error recovery path.
> The fc reset work also does the same thing.
> So if we define the range that RESET_PREPARE includes scheduling gap and disable and clear work, RESETTING includes initializing
> procedure, RECONNECTING is very similar with RESETTING.
>
> Maybe we could do like this;
> In nvme fc/rdma
> - set state to RESET_PREPARE, queue reset_work/err_work
> - clear/shutdown works, set state to RECONNECTING
> - initialization, set state to LIVE
>
> In nvme pci
> - set state to RESET_PREPARE, queue reset_work
> - clear/shutdown works, set state to RESETTING
> - initialization, set state to LIVE
Hi Christoph, Keith, Sagi
Can you please comment on this ?
Thanks in advance.
Jianchao
> After Sagi's nvme-rdma: fix concurrent reset and reconnect, the rdma ctrl state is changed to RECONNECTING state
> after some clearing and shutdown work, then some initializing procedure, no matter reset work path or error recovery path.
> The fc reset work also does the same thing.
> So if we define the range that RESET_PREPARE includes scheduling gap and disable and clear work, RESETTING includes initializing
> procedure, RECONNECTING is very similar with RESETTING.
>
> Maybe we could do like this;
> In nvme fc/rdma
> - set state to RESET_PREPARE, queue reset_work/err_work
> - clear/shutdown works, set state to RECONNECTING
Should be fine.
> In nvme pci
> - set state to RESET_PREPARE, queue reset_work
> - clear/shutdown works, set state to RESETTING
> - initialization, set state to LIVE
Given that we split reset state and we have a clear symmetry between
the transports, do we want to maybe come up with a unique state that is
coherent across all transports?
Maybe we rename them to NVME_CTRL_SHUTTING_DOWN and
NVME_CTRL_ESTABLISHING? I'm open for better names..
On 1/17/2018 2:37 AM, Sagi Grimberg wrote:
>
>> After Sagi's nvme-rdma: fix concurrent reset and reconnect, the rdma
>> ctrl state is changed to RECONNECTING state
>> after some clearing and shutdown work, then some initializing
>> procedure, no matter reset work path or error recovery path.
>> The fc reset work also does the same thing.
>> So if we define the range that RESET_PREPARE includes scheduling gap
>> and disable and clear work, RESETTING includes initializing
>> procedure, RECONNECTING is very similar with RESETTING.
>>
>> Maybe we could do like this;
>> In nvme fc/rdma
>> - set state to RESET_PREPARE, queue reset_work/err_work
>> - clear/shutdown works, set state to RECONNECTING
>
> Should be fine.
>
>> In nvme pci
>> - set state to RESET_PREPARE, queue reset_work
>> - clear/shutdown works, set state to RESETTING
>> - initialization, set state to LIVE
>
> Given that we split reset state and we have a clear symmetry between
> the transports, do we want to maybe come up with a unique state that is
> coherent across all transports?
>
> Maybe we rename them to NVME_CTRL_SHUTTING_DOWN and
> NVME_CTRL_ESTABLISHING? I'm open for better names..
I'm leaning toward this latter suggestion - we need to define the states
and the actions they take. It seems to me, that RESETTING became the
"init controller" part in Jainchao's model. So maybe it's not the
shutting down that needs a new state, but rather the REINIT part.
-- james
Hi James and Sagi
Thanks for your kindly response and directive.
On 01/18/2018 05:08 AM, James Smart wrote:
> On 1/17/2018 2:37 AM, Sagi Grimberg wrote:
>>
>>> After Sagi's nvme-rdma: fix concurrent reset and reconnect, the rdma ctrl state is changed to RECONNECTING state
>>> after some clearing and shutdown work, then some initializing procedure, no matter reset work path or error recovery path.
>>> The fc reset work also does the same thing.
>>> So if we define the range that RESET_PREPARE includes scheduling gap and disable and clear work, RESETTING includes initializing
>>> procedure, RECONNECTING is very similar with RESETTING.
>>>
>>> Maybe we could do like this;
>>> In nvme fc/rdma
>>> - set state to RESET_PREPARE, queue reset_work/err_work
>>> - clear/shutdown works, set state to RECONNECTING
>>
>> Should be fine.
>>
>>> In nvme pci
>>> - set state to RESET_PREPARE, queue reset_work
>>> - clear/shutdown works, set state to RESETTING
>>> - initialization, set state to LIVE
>>
>> Given that we split reset state and we have a clear symmetry between
>> the transports, do we want to maybe come up with a unique state that is
>> coherent across all transports?
>>
>> Maybe we rename them to NVME_CTRL_SHUTTING_DOWN and
>> NVME_CTRL_ESTABLISHING? I'm open for better names..
>
> I'm leaning toward this latter suggestion - we need to define the states and the actions they take. It seems to me, that RESETTING became the "init controller" part in Jainchao's model. So maybe it's not the shutting down that needs a new state, but rather the REINIT part.
In fact, it is the nvme-pci need a new state. The "shutdown" part and "reinit" part in nvme-pci are both in RESETTING state.
But in nvme-fc/rdma, they have a RECONNECTING state to mark the "reinit" part.
So we have a SHUTTING_DOWN state to mark the "shutdown" part in nvme-pci and nvme-fc/rdma, but as James pointed out, there is a big difference in "reinit" part between nvme-pci and nvme-fc/rdma, we use a coherent name ESTABLISHING ? or separate names REINIT and RECONNECTING, for the "reinit" part.
Thanks
Jianchao