2023-05-29 11:14:29

by 许春光

[permalink] [raw]
Subject: [RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting

From: Chunguang Xu <[email protected]>

We found that nvme_remove_namespaces() may hang in flush_work(&ctrl->scan_work)
while removing ctrl. The root cause may due to the state of ctrl changed to
NVME_CTRL_DELETING while removing ctrl , which intterupt nvme_tcp_error_recovery_work()/
nvme_reset_ctrl_work()/nvme_tcp_reconnect_or_remove(). At this time, ctrl is
freezed and queue is quiescing . Since scan_work may continue to issue IOs to
load partition table, make it blocked, and lead to nvme_tcp_error_recovery_work()
hang in flush_work(&ctrl->scan_work).

After analyzation, we found that there are mainly two case:
1. Since ctrl is freeze, scan_work hang in __bio_queue_enter() while it issue
new IO to load partition table.
2. Since queus is quiescing, requeue timeouted IO may hang in hctx->dispatch
queue, leading scan_work waiting for IO completion.

CallTrace:
Removing nvme_ctrl
[<0>] __flush_work+0x14c/0x280
[<0>] flush_work+0x14/0x20
[<0>] nvme_remove_namespaces+0x45/0x100
[<0>] nvme_do_delete_ctrl+0x79/0xa0
[<0>] nvme_sysfs_delete+0x6b/0x80
[<0>] dev_attr_store+0x18/0x30
[<0>] sysfs_kf_write+0x3f/0x50
[<0>] kernfs_fop_write_iter+0x141/0x1d0
[<0>] vfs_write+0x25b/0x3d0
[<0>] ksys_write+0x6b/0xf0
[<0>] __x64_sys_write+0x1e/0x30
[<0>] do_syscall_64+0x5d/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x72/0xdc

Scan_work:
Stack 0
[<0>] __bio_queue_enter+0x15a/0x210
[<0>] blk_mq_submit_bio+0x260/0x5e0
[<0>] __submit_bio+0xa6/0x1a0
[<0>] submit_bio_noacct_nocheck+0x2e5/0x390
[<0>] submit_bio_noacct+0x1cd/0x560
[<0>] submit_bio+0x3b/0x60
[<0>] submit_bh_wbc+0x137/0x160
[<0>] block_read_full_folio+0x24d/0x470
[<0>] blkdev_read_folio+0x1c/0x30
[<0>] filemap_read_folio+0x44/0x2a0
[<0>] do_read_cache_folio+0x135/0x390
[<0>] read_cache_folio+0x16/0x20
[<0>] read_part_sector+0x3e/0xd0
[<0>] sgi_partition+0x35/0x1d0
[<0>] bdev_disk_changed+0x1f6/0x650
[<0>] blkdev_get_whole+0x7e/0x90
[<0>] blkdev_get_by_dev+0x19c/0x2e0
[<0>] disk_scan_partitions+0x72/0x100
[<0>] device_add_disk+0x415/0x420
[<0>] nvme_scan_ns+0x636/0xcd0
[<0>] nvme_scan_work+0x26f/0x450
[<0>] process_one_work+0x21c/0x430
[<0>] worker_thread+0x4e/0x3c0
[<0>] kthread+0xfb/0x130
[<0>] ret_from_fork+0x29/0x50

Stack 1
[<0>] filemap_read_folio+0x195/0x2a0
[<0>] do_read_cache_folio+0x135/0x390
[<0>] read_cache_folio+0x16/0x20
[<0>] read_part_sector+0x3e/0xd0
[<0>] read_lba+0xcc/0x1b0
[<0>] efi_partition+0xec/0x7f0
[<0>] bdev_disk_changed+0x1f6/0x650
[<0>] blkdev_get_whole+0x7e/0x90
[<0>] blkdev_get_by_dev+0x19c/0x2e0
[<0>] disk_scan_partitions+0x72/0x100
[<0>] device_add_disk+0x433/0x440
[<0>] nvme_scan_ns+0x636/0xcd0
[<0>] nvme_scan_work+0x26f/0x450
[<0>] process_one_work+0x21c/0x430
[<0>] worker_thread+0x4e/0x3c0
[<0>] kthread+0xfb/0x130
[<0>] ret_from_fork+0x29/0x50

Here try to fix this issue by make sure ctrl is unfreezed and queue is quiescing
while exit from error recovery or reset.

Chunguang Xu (4):
nvme: unfreeze while exit from recovery or resetting
nvme: donot retry request for NVME_CTRL_DELETING_NOIO
nvme: optimize nvme_check_ready() for NVME_CTRL_DELETING_NOIO
nvme-tcp: remove admin_q quiescing from nvme_tcp_teardown_io_queues

drivers/nvme/host/core.c | 5 ++++-
drivers/nvme/host/nvme.h | 3 ++-
drivers/nvme/host/tcp.c | 25 ++++++++++++++++---------
3 files changed, 22 insertions(+), 11 deletions(-)

--
2.25.1



2023-05-29 11:15:05

by 许春光

[permalink] [raw]
Subject: [RFC PATCH 4/4] nvme-tcp: remove admin_q quiescing from nvme_tcp_teardown_io_queues

From: Chunguang Xu <[email protected]>

remove admin q quiescing form nvme_tcp_teardown_io_queues(),
as it will done by nvme_tcp_teardown_admin_queue() and other
functions.

Signed-off-by: Chunguang Xu <[email protected]>
---
drivers/nvme/host/tcp.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index cfebcae7fc9b..75f39a02e685 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2020,7 +2020,6 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
{
if (ctrl->queue_count <= 1)
return;
- nvme_quiesce_admin_queue(ctrl);
nvme_start_freeze(ctrl);
nvme_quiesce_io_queues(ctrl);
nvme_sync_io_queues(ctrl);
--
2.25.1


2023-05-29 11:15:18

by 许春光

[permalink] [raw]
Subject: [RFC PATCH 3/4] nvme: optimize nvme_check_ready() for NVME_CTRL_DELETING_NOIO

From: Chunguang Xu <[email protected]>

nvme_check_ready() will directly return queue_alive while
controller is NVME_CTRL_DELETING, maybe we should do the
same things for NVME_CTRL_DELETING_NOIO.

Signed-off-by: Chunguang Xu <[email protected]>
---
drivers/nvme/host/nvme.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a2d4f59e0535..35066ad10cd9 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -793,7 +793,8 @@ static inline bool nvme_check_ready(struct nvme_ctrl *ctrl, struct request *rq,
if (likely(ctrl->state == NVME_CTRL_LIVE))
return true;
if (ctrl->ops->flags & NVME_F_FABRICS &&
- ctrl->state == NVME_CTRL_DELETING)
+ (ctrl->state == NVME_CTRL_DELETING ||
+ ctrl->state == NVME_CTRL_DELETING_NOIO))
return queue_live;
return __nvme_check_ready(ctrl, rq, queue_live);
}
--
2.25.1


2023-05-29 11:34:59

by 许春光

[permalink] [raw]
Subject: [RFC PATCH 1/4] nvme: unfreeze while exit from recovery or resetting

From: Chunguang Xu <[email protected]>

Remove controller will interrupt err_work/connect_work, leave
controller at freezed and IO queues at quiescing. Then IOs
issued by scan_work will be blocked, nvme_remove_namespaces()
will hang on fush scan_work. Try to fix that we should unfreeze
contrller and unquiescing IO queues while exit from
error_recovery or resetting.

Signed-off-by: Chunguang Xu <[email protected]>
---
drivers/nvme/host/tcp.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index bf0230442d57..cfebcae7fc9b 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2031,12 +2031,24 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
nvme_tcp_destroy_io_queues(ctrl, remove);
}

+static inline void nvme_ctrl_reconnect_exit(struct nvme_ctrl *ctrl)
+{
+ /* fast fail all pending requests */
+ blk_mq_unquiesce_queue(ctrl->admin_q);
+
+ if (ctrl->queue_count > 1) {
+ nvme_unquiesce_io_queues(ctrl);
+ nvme_unfreeze(ctrl);
+ }
+}
+
static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
{
/* If we are resetting/deleting then do nothing */
if (ctrl->state != NVME_CTRL_CONNECTING) {
WARN_ON_ONCE(ctrl->state == NVME_CTRL_NEW ||
ctrl->state == NVME_CTRL_LIVE);
+ nvme_ctrl_reconnect_exit(ctrl);
return;
}

@@ -2107,13 +2119,7 @@ static int nvme_tcp_setup_ctrl(struct nvme_ctrl *ctrl, bool new)
return 0;

destroy_io:
- if (ctrl->queue_count > 1) {
- nvme_quiesce_io_queues(ctrl);
- nvme_sync_io_queues(ctrl);
- nvme_tcp_stop_io_queues(ctrl);
- nvme_cancel_tagset(ctrl);
- nvme_tcp_destroy_io_queues(ctrl, new);
- }
+ nvme_tcp_teardown_io_queues(ctrl, new);
destroy_admin:
nvme_quiesce_admin_queue(ctrl);
blk_sync_queue(ctrl->admin_q);
@@ -2166,6 +2172,7 @@ static void nvme_tcp_error_recovery_work(struct work_struct *work)
/* state change failure is ok if we started ctrl delete */
WARN_ON_ONCE(ctrl->state != NVME_CTRL_DELETING &&
ctrl->state != NVME_CTRL_DELETING_NOIO);
+ nvme_ctrl_reconnect_exit(ctrl);
return;
}

@@ -2197,6 +2204,7 @@ static void nvme_reset_ctrl_work(struct work_struct *work)
/* state change failure is ok if we started ctrl delete */
WARN_ON_ONCE(ctrl->state != NVME_CTRL_DELETING &&
ctrl->state != NVME_CTRL_DELETING_NOIO);
+ nvme_ctrl_reconnect_exit(ctrl);
return;
}

@@ -2213,7 +2221,7 @@ static void nvme_reset_ctrl_work(struct work_struct *work)
static void nvme_tcp_stop_ctrl(struct nvme_ctrl *ctrl)
{
flush_work(&to_tcp_ctrl(ctrl)->err_work);
- cancel_delayed_work_sync(&to_tcp_ctrl(ctrl)->connect_work);
+ flush_delayed_work(&to_tcp_ctrl(ctrl)->connect_work);
}

static void nvme_tcp_free_ctrl(struct nvme_ctrl *nctrl)
--
2.25.1


2023-05-29 11:35:04

by 许春光

[permalink] [raw]
Subject: [RFC PATCH 2/4] nvme: donot retry request for NVME_CTRL_DELETING_NOIO

From: Chunguang Xu <[email protected]>

According to NVME_CTRL_DELETING_NOIO definition, we should
not requeue any request while controller at this state.

Signed-off-by: Chunguang Xu <[email protected]>
---
drivers/nvme/host/core.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1f0cbb77b249..304eb4eda6c4 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -346,6 +346,8 @@ enum nvme_disposition {

static inline enum nvme_disposition nvme_decide_disposition(struct request *req)
{
+ struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
+
if (likely(nvme_req(req)->status == 0))
return COMPLETE;

@@ -362,7 +364,8 @@ static inline enum nvme_disposition nvme_decide_disposition(struct request *req)
blk_queue_dying(req->q))
return FAILOVER;
} else {
- if (blk_queue_dying(req->q))
+ if (ctrl->state == NVME_CTRL_DELETING_NOIO ||
+ blk_queue_dying(req->q))
return COMPLETE;
}

--
2.25.1


2023-06-05 23:40:10

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting


> From: Chunguang Xu <[email protected]>
>
> We found that nvme_remove_namespaces() may hang in flush_work(&ctrl->scan_work)
> while removing ctrl. The root cause may due to the state of ctrl changed to
> NVME_CTRL_DELETING while removing ctrl , which intterupt nvme_tcp_error_recovery_work()/
> nvme_reset_ctrl_work()/nvme_tcp_reconnect_or_remove(). At this time, ctrl is
> freezed and queue is quiescing . Since scan_work may continue to issue IOs to
> load partition table, make it blocked, and lead to nvme_tcp_error_recovery_work()
> hang in flush_work(&ctrl->scan_work).
>
> After analyzation, we found that there are mainly two case:
> 1. Since ctrl is freeze, scan_work hang in __bio_queue_enter() while it issue
> new IO to load partition table.
> 2. Since queus is quiescing, requeue timeouted IO may hang in hctx->dispatch
> queue, leading scan_work waiting for IO completion.

Hey, can you please look at the discussion with Mings' proposal in
"nvme: add nvme_delete_dead_ctrl for avoiding io deadlock" ?

Looks the same to me.

2023-06-06 14:46:15

by 许春光

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting

Hi grimberg, I have read Ming's patch, it seems that MIng fix the case
my patchset missed, Ming mainly fixes the hang when reconnect fails,
my patchset fixes the issue that while processing error_recover or
reconnect(have not reach max retries), user actively remove ctrl(nvme
disconnect), this will interrupt error_recovery or recoonect, but
ctrl freezed and th request queue quiescing, the new IO or timeouted
IOs cannot continue to process, as a result nvme_remove_namespaces
hang on flush scan_work or blk_mq_freeze_queue_wait, new IO hang or
__bio_queue_enter(),it seems that if the first patch add the next
code, it may cover Ming's case:

static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
{
/* If we are resetting/deleting then do nothing */
if (ctrl->state != NVME_CTRL_CONNECTING) {
WARN_ON_ONCE(ctrl->state == NVME_CTRL_NEW ||
ctrl->state == NVME_CTRL_LIVE);
return;
}

if (nvmf_should_reconnect(ctrl)) {
dev_info(ctrl->device, "Reconnecting in %d seconds...\n",
ctrl->opts->reconnect_delay);
queue_delayed_work(nvme_wq, &to_tcp_ctrl(ctrl)->connect_work,
ctrl->opts->reconnect_delay * HZ);
} else {
dev_info(ctrl->device, "Removing controller...\n");
nvme_delete_ctrl(ctrl);
+ nvme_ctrl_reconnect_exit(ctrl);
}
}

Thanls.


Sagi Grimberg <[email protected]> 于2023年6月6日周二 07:09写道:
>
>
> > From: Chunguang Xu <[email protected]>
> >
> > We found that nvme_remove_namespaces() may hang in flush_work(&ctrl->scan_work)
> > while removing ctrl. The root cause may due to the state of ctrl changed to
> > NVME_CTRL_DELETING while removing ctrl , which intterupt nvme_tcp_error_recovery_work()/
> > nvme_reset_ctrl_work()/nvme_tcp_reconnect_or_remove(). At this time, ctrl is
> > freezed and queue is quiescing . Since scan_work may continue to issue IOs to
> > load partition table, make it blocked, and lead to nvme_tcp_error_recovery_work()
> > hang in flush_work(&ctrl->scan_work).
> >
> > After analyzation, we found that there are mainly two case:
> > 1. Since ctrl is freeze, scan_work hang in __bio_queue_enter() while it issue
> > new IO to load partition table.
> > 2. Since queus is quiescing, requeue timeouted IO may hang in hctx->dispatch
> > queue, leading scan_work waiting for IO completion.
>
> Hey, can you please look at the discussion with Mings' proposal in
> "nvme: add nvme_delete_dead_ctrl for avoiding io deadlock" ?
>
Hi grimberg, I have look MIng's patch, I think we may fix
> Looks the same to me.

2023-06-06 15:41:49

by Ming Lei

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting

Hello Chunguang,

On Mon, May 29, 2023 at 06:59:22PM +0800, brookxu.cn wrote:
> From: Chunguang Xu <[email protected]>
>
> We found that nvme_remove_namespaces() may hang in flush_work(&ctrl->scan_work)
> while removing ctrl. The root cause may due to the state of ctrl changed to
> NVME_CTRL_DELETING while removing ctrl , which intterupt nvme_tcp_error_recovery_work()/
> nvme_reset_ctrl_work()/nvme_tcp_reconnect_or_remove(). At this time, ctrl is

I didn't dig into ctrl state check in these error handler yet, but error
handling is supposed to provide forward progress for any controller state.

Can you explain a bit how switching to DELETING interrupts the above
error handling and breaks the forward progress guarantee?

> freezed and queue is quiescing . Since scan_work may continue to issue IOs to
> load partition table, make it blocked, and lead to nvme_tcp_error_recovery_work()
> hang in flush_work(&ctrl->scan_work).
>
> After analyzation, we found that there are mainly two case:
> 1. Since ctrl is freeze, scan_work hang in __bio_queue_enter() while it issue
> new IO to load partition table.

Yeah, nvme freeze usage is fragile, and I suggested to move
nvme_start_freeze() from nvme_tcp_teardown_io_queues to
nvme_tcp_configure_io_queues(), such as the posted change on rdma:

https://lore.kernel.org/linux-block/CAHj4cs-4gQHnp5aiekvJmb6o8qAcb6nLV61uOGFiisCzM49_dg@mail.gmail.com/T/#ma0d6bbfaa0c8c1be79738ff86a2fdcf7582e06b0

> 2. Since queus is quiescing, requeue timeouted IO may hang in hctx->dispatch
> queue, leading scan_work waiting for IO completion.

That still looks one problem in related error handling code, which is
supposed to recover and unquiesce queue finally.


Thanks,
Ming


2023-06-06 15:43:52

by 许春光

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting

Sagi Grimberg <[email protected]> 于2023年6月6日周二 07:09写道:
>
>
> > From: Chunguang Xu <[email protected]>
> >
> > We found that nvme_remove_namespaces() may hang in flush_work(&ctrl->scan_work)
> > while removing ctrl. The root cause may due to the state of ctrl changed to
> > NVME_CTRL_DELETING while removing ctrl , which intterupt nvme_tcp_error_recovery_work()/
> > nvme_reset_ctrl_work()/nvme_tcp_reconnect_or_remove(). At this time, ctrl is
> > freezed and queue is quiescing . Since scan_work may continue to issue IOs to
> > load partition table, make it blocked, and lead to nvme_tcp_error_recovery_work()
> > hang in flush_work(&ctrl->scan_work).
> >
> > After analyzation, we found that there are mainly two case:
> > 1. Since ctrl is freeze, scan_work hang in __bio_queue_enter() while it issue
> > new IO to load partition table.
> > 2. Since queus is quiescing, requeue timeouted IO may hang in hctx->dispatch
> > queue, leading scan_work waiting for IO completion.
>
> Hey, can you please look at the discussion with Mings' proposal in
> "nvme: add nvme_delete_dead_ctrl for avoiding io deadlock" ?

Hi grimberg:

I have read Ming's patch, it seems that MIng fixed the case my patchset missed,
Ming fixes the hang occur while reconnect fails(reach max retries), my patchset
fixes the issue that while driver processing error_recover or reconnect(have not
reach max retries), user actively remove ctrl(nvme disconnect), this
will interrupt
error_recovery or reconnect, but ctrl is freezed and request queue is
quiescing,
so the new IO or timeouted IOs(requeue to hctx->dispatch_list) cannot continue
to process, as a result, nvme_remove_namespaces hang on flush scan_work or
blk_mq_freeze_queue_wait, new IO hang on __bio_queue_enter(), it seems that
if the first patch of my patchset add the next code, it may cover Ming's case:

static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
{
/* If we are resetting/deleting then do nothing */
if (ctrl->state != NVME_CTRL_CONNECTING) {
WARN_ON_ONCE(ctrl->state == NVME_CTRL_NEW ||
ctrl->state == NVME_CTRL_LIVE);
return;
}

if (nvmf_should_reconnect(ctrl)) {
dev_info(ctrl->device, "Reconnecting in %d seconds...\n",
ctrl->opts->reconnect_delay);
queue_delayed_work(nvme_wq, &to_tcp_ctrl(ctrl)->connect_work,
ctrl->opts->reconnect_delay * HZ);
} else {
dev_info(ctrl->device, "Removing controller...\n");
+ nvme_ctrl_reconnect_exit(ctrl);
nvme_delete_ctrl(ctrl);
}
}

Thanks.

> Looks the same to me.

2023-06-07 04:33:35

by 许春光

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting

Hi Ming:

Ming Lei <[email protected]> 于2023年6月6日周二 23:15写道:
>
> Hello Chunguang,
>
> On Mon, May 29, 2023 at 06:59:22PM +0800, brookxu.cn wrote:
> > From: Chunguang Xu <[email protected]>
> >
> > We found that nvme_remove_namespaces() may hang in flush_work(&ctrl->scan_work)
> > while removing ctrl. The root cause may due to the state of ctrl changed to
> > NVME_CTRL_DELETING while removing ctrl , which intterupt nvme_tcp_error_recovery_work()/
> > nvme_reset_ctrl_work()/nvme_tcp_reconnect_or_remove(). At this time, ctrl is
>
> I didn't dig into ctrl state check in these error handler yet, but error
> handling is supposed to provide forward progress for any controller state.
>
> Can you explain a bit how switching to DELETING interrupts the above
> error handling and breaks the forward progress guarantee?

Here we freezed ctrl, if ctrl state has changed to DELETING or
DELETING_NIO(by nvme disconnect), we will break up and lease ctrl
freeze, so nvme_remove_namespaces() hang.

static void nvme_tcp_error_recovery_work(struct work_struct *work)
{
...
if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) {
/* state change failure is ok if we started ctrl delete */
WARN_ON_ONCE(ctrl->state != NVME_CTRL_DELETING &&
ctrl->state != NVME_CTRL_DELETING_NOIO);
return;
}

nvme_tcp_reconnect_or_remove(ctrl);
}


Another path, we will check ctrl state while reconnecting, if it changes to
DELETING or DELETING_NIO, we will break up and lease ctrl freeze and
queue quiescing (through reset path), as a result Hang occurs.

static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
{
/* If we are resetting/deleting then do nothing */
if (ctrl->state != NVME_CTRL_CONNECTING) {
WARN_ON_ONCE(ctrl->state == NVME_CTRL_NEW ||
ctrl->state == NVME_CTRL_LIVE);
return;
}
...
}

> > freezed and queue is quiescing . Since scan_work may continue to issue IOs to
> > load partition table, make it blocked, and lead to nvme_tcp_error_recovery_work()
> > hang in flush_work(&ctrl->scan_work).
> >
> > After analyzation, we found that there are mainly two case:
> > 1. Since ctrl is freeze, scan_work hang in __bio_queue_enter() while it issue
> > new IO to load partition table.
>
> Yeah, nvme freeze usage is fragile, and I suggested to move
> nvme_start_freeze() from nvme_tcp_teardown_io_queues to
> nvme_tcp_configure_io_queues(), such as the posted change on rdma:
>
> https://lore.kernel.org/linux-block/CAHj4cs-4gQHnp5aiekvJmb6o8qAcb6nLV61uOGFiisCzM49_dg@mail.gmail.com/T/#ma0d6bbfaa0c8c1be79738ff86a2fdcf7582e06b0

While drive reconnecting, I think we should freeze ctrl or quiescing queue,
otherwise nvme_fail_nonready_command()may return BLK_STS_RESOURCE,
and the IOs may retry frequently. So I think we may better freeze ctrl
while entering
error_recovery/reconnect, but need to unfreeze it while exit.


> > 2. Since queus is quiescing, requeue timeouted IO may hang in hctx->dispatch
> > queue, leading scan_work waiting for IO completion.
>
> That still looks one problem in related error handling code, which is
> supposed to recover and unquiesce queue finally.

If I have not misunderstood that is what this patchset does.

Thanks.
>
> Thanks,
> Ming
>

2023-06-08 01:14:43

by Ming Lei

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting

On Wed, Jun 07, 2023 at 12:09:17PM +0800, 许春光 wrote:
> Hi Ming:
>
> Ming Lei <[email protected]> 于2023年6月6日周二 23:15写道:
> >
> > Hello Chunguang,
> >
> > On Mon, May 29, 2023 at 06:59:22PM +0800, brookxu.cn wrote:
> > > From: Chunguang Xu <[email protected]>
> > >
> > > We found that nvme_remove_namespaces() may hang in flush_work(&ctrl->scan_work)
> > > while removing ctrl. The root cause may due to the state of ctrl changed to
> > > NVME_CTRL_DELETING while removing ctrl , which intterupt nvme_tcp_error_recovery_work()/
> > > nvme_reset_ctrl_work()/nvme_tcp_reconnect_or_remove(). At this time, ctrl is
> >
> > I didn't dig into ctrl state check in these error handler yet, but error
> > handling is supposed to provide forward progress for any controller state.
> >
> > Can you explain a bit how switching to DELETING interrupts the above
> > error handling and breaks the forward progress guarantee?
>
> Here we freezed ctrl, if ctrl state has changed to DELETING or
> DELETING_NIO(by nvme disconnect), we will break up and lease ctrl
> freeze, so nvme_remove_namespaces() hang.
>
> static void nvme_tcp_error_recovery_work(struct work_struct *work)
> {
> ...
> if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) {
> /* state change failure is ok if we started ctrl delete */
> WARN_ON_ONCE(ctrl->state != NVME_CTRL_DELETING &&
> ctrl->state != NVME_CTRL_DELETING_NOIO);
> return;
> }
>
> nvme_tcp_reconnect_or_remove(ctrl);
> }
>
>
> Another path, we will check ctrl state while reconnecting, if it changes to
> DELETING or DELETING_NIO, we will break up and lease ctrl freeze and
> queue quiescing (through reset path), as a result Hang occurs.
>
> static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
> {
> /* If we are resetting/deleting then do nothing */
> if (ctrl->state != NVME_CTRL_CONNECTING) {
> WARN_ON_ONCE(ctrl->state == NVME_CTRL_NEW ||
> ctrl->state == NVME_CTRL_LIVE);
> return;
> }
> ...
> }
>
> > > freezed and queue is quiescing . Since scan_work may continue to issue IOs to
> > > load partition table, make it blocked, and lead to nvme_tcp_error_recovery_work()
> > > hang in flush_work(&ctrl->scan_work).
> > >
> > > After analyzation, we found that there are mainly two case:
> > > 1. Since ctrl is freeze, scan_work hang in __bio_queue_enter() while it issue
> > > new IO to load partition table.
> >
> > Yeah, nvme freeze usage is fragile, and I suggested to move
> > nvme_start_freeze() from nvme_tcp_teardown_io_queues to
> > nvme_tcp_configure_io_queues(), such as the posted change on rdma:
> >
> > https://lore.kernel.org/linux-block/CAHj4cs-4gQHnp5aiekvJmb6o8qAcb6nLV61uOGFiisCzM49_dg@mail.gmail.com/T/#ma0d6bbfaa0c8c1be79738ff86a2fdcf7582e06b0
>
> While drive reconnecting, I think we should freeze ctrl or quiescing queue,
> otherwise nvme_fail_nonready_command()may return BLK_STS_RESOURCE,
> and the IOs may retry frequently. So I think we may better freeze ctrl
> while entering
> error_recovery/reconnect, but need to unfreeze it while exit.

quiescing is always done in error handling, and freeze is actually
not a must, and it is easier to cause race by calling freeze & unfreeze
from different contexts.

But yes, unquiesce should have been done after exiting error handling, or
simply do it in nvme_unquiesce_io_queues().

And the following patch should cover all these hangs:


diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3ec38e2b9173..83d3818fc60b 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4692,6 +4692,9 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
*/
nvme_mpath_clear_ctrl_paths(ctrl);

+ /* unquiesce io queues so scan work won't hang */
+ nvme_unquiesce_io_queues(ctrl);
+
/* prevent racing with ns scanning */
flush_work(&ctrl->scan_work);

@@ -4701,10 +4704,8 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
* removing the namespaces' disks; fail all the queues now to avoid
* potentially having to clean up the failed sync later.
*/
- if (ctrl->state == NVME_CTRL_DEAD) {
+ if (ctrl->state == NVME_CTRL_DEAD)
nvme_mark_namespaces_dead(ctrl);
- nvme_unquiesce_io_queues(ctrl);
- }

/* this is a no-op when called from the controller reset handler */
nvme_change_ctrl_state(ctrl, NVME_CTRL_DELETING_NOIO);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 492f319ebdf3..5d775b76baca 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2578,14 +2578,15 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
dead = nvme_pci_ctrl_is_dead(dev);
if (dev->ctrl.state == NVME_CTRL_LIVE ||
dev->ctrl.state == NVME_CTRL_RESETTING) {
- if (pci_is_enabled(pdev))
- nvme_start_freeze(&dev->ctrl);
/*
* Give the controller a chance to complete all entered requests
* if doing a safe shutdown.
*/
- if (!dead && shutdown)
+ if (!dead && shutdown & pci_is_enabled(pdev)) {
+ nvme_start_freeze(&dev->ctrl);
nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
+ nvme_unfreeze(&dev->ctrl);
+ }
}

nvme_quiesce_io_queues(&dev->ctrl);
@@ -2740,6 +2741,7 @@ static void nvme_reset_work(struct work_struct *work)
* controller around but remove all namespaces.
*/
if (dev->online_queues > 1) {
+ nvme_start_freeze(&dev->ctrl);
nvme_unquiesce_io_queues(&dev->ctrl);
nvme_wait_freeze(&dev->ctrl);
nvme_pci_update_nr_queues(dev);
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 0eb79696fb73..354cce8853c1 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -918,6 +918,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
goto out_cleanup_tagset;

if (!new) {
+ nvme_start_freeze(&ctrl->ctrl);
nvme_unquiesce_io_queues(&ctrl->ctrl);
if (!nvme_wait_freeze_timeout(&ctrl->ctrl, NVME_IO_TIMEOUT)) {
/*
@@ -926,6 +927,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
* to be safe.
*/
ret = -ENODEV;
+ nvme_unfreeze(&ctrl->ctrl);
goto out_wait_freeze_timed_out;
}
blk_mq_update_nr_hw_queues(ctrl->ctrl.tagset,
@@ -975,7 +977,6 @@ static void nvme_rdma_teardown_io_queues(struct nvme_rdma_ctrl *ctrl,
bool remove)
{
if (ctrl->ctrl.queue_count > 1) {
- nvme_start_freeze(&ctrl->ctrl);
nvme_quiesce_io_queues(&ctrl->ctrl);
nvme_sync_io_queues(&ctrl->ctrl);
nvme_rdma_stop_io_queues(ctrl);
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index bf0230442d57..5ae08e9cb16d 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1909,6 +1909,7 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
goto out_cleanup_connect_q;

if (!new) {
+ nvme_start_freeze(ctrl);
nvme_unquiesce_io_queues(ctrl);
if (!nvme_wait_freeze_timeout(ctrl, NVME_IO_TIMEOUT)) {
/*
@@ -1917,6 +1918,7 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
* to be safe.
*/
ret = -ENODEV;
+ nvme_unfreeze(ctrl);
goto out_wait_freeze_timed_out;
}
blk_mq_update_nr_hw_queues(ctrl->tagset,
@@ -2021,7 +2023,6 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
if (ctrl->queue_count <= 1)
return;
nvme_quiesce_admin_queue(ctrl);
- nvme_start_freeze(ctrl);
nvme_quiesce_io_queues(ctrl);
nvme_sync_io_queues(ctrl);
nvme_tcp_stop_io_queues(ctrl);




Thanks
Ming


2023-06-08 03:02:43

by 许春光

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting

Ming Lei <[email protected]> 于2023年6月8日周四 08:56写道:
>
> On Wed, Jun 07, 2023 at 12:09:17PM +0800, 许春光 wrote:
> > Hi Ming:
> >
> > Ming Lei <[email protected]> 于2023年6月6日周二 23:15写道:
> > >
> > > Hello Chunguang,
> > >
> > > On Mon, May 29, 2023 at 06:59:22PM +0800, brookxu.cn wrote:
> > > > From: Chunguang Xu <[email protected]>
> > > >
> > > > We found that nvme_remove_namespaces() may hang in flush_work(&ctrl->scan_work)
> > > > while removing ctrl. The root cause may due to the state of ctrl changed to
> > > > NVME_CTRL_DELETING while removing ctrl , which intterupt nvme_tcp_error_recovery_work()/
> > > > nvme_reset_ctrl_work()/nvme_tcp_reconnect_or_remove(). At this time, ctrl is
> > >
> > > I didn't dig into ctrl state check in these error handler yet, but error
> > > handling is supposed to provide forward progress for any controller state.
> > >
> > > Can you explain a bit how switching to DELETING interrupts the above
> > > error handling and breaks the forward progress guarantee?
> >
> > Here we freezed ctrl, if ctrl state has changed to DELETING or
> > DELETING_NIO(by nvme disconnect), we will break up and lease ctrl
> > freeze, so nvme_remove_namespaces() hang.
> >
> > static void nvme_tcp_error_recovery_work(struct work_struct *work)
> > {
> > ...
> > if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) {
> > /* state change failure is ok if we started ctrl delete */
> > WARN_ON_ONCE(ctrl->state != NVME_CTRL_DELETING &&
> > ctrl->state != NVME_CTRL_DELETING_NOIO);
> > return;
> > }
> >
> > nvme_tcp_reconnect_or_remove(ctrl);
> > }
> >
> >
> > Another path, we will check ctrl state while reconnecting, if it changes to
> > DELETING or DELETING_NIO, we will break up and lease ctrl freeze and
> > queue quiescing (through reset path), as a result Hang occurs.
> >
> > static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
> > {
> > /* If we are resetting/deleting then do nothing */
> > if (ctrl->state != NVME_CTRL_CONNECTING) {
> > WARN_ON_ONCE(ctrl->state == NVME_CTRL_NEW ||
> > ctrl->state == NVME_CTRL_LIVE);
> > return;
> > }
> > ...
> > }
> >
> > > > freezed and queue is quiescing . Since scan_work may continue to issue IOs to
> > > > load partition table, make it blocked, and lead to nvme_tcp_error_recovery_work()
> > > > hang in flush_work(&ctrl->scan_work).
> > > >
> > > > After analyzation, we found that there are mainly two case:
> > > > 1. Since ctrl is freeze, scan_work hang in __bio_queue_enter() while it issue
> > > > new IO to load partition table.
> > >
> > > Yeah, nvme freeze usage is fragile, and I suggested to move
> > > nvme_start_freeze() from nvme_tcp_teardown_io_queues to
> > > nvme_tcp_configure_io_queues(), such as the posted change on rdma:
> > >
> > > https://lore.kernel.org/linux-block/CAHj4cs-4gQHnp5aiekvJmb6o8qAcb6nLV61uOGFiisCzM49_dg@mail.gmail.com/T/#ma0d6bbfaa0c8c1be79738ff86a2fdcf7582e06b0
> >
> > While drive reconnecting, I think we should freeze ctrl or quiescing queue,
> > otherwise nvme_fail_nonready_command()may return BLK_STS_RESOURCE,
> > and the IOs may retry frequently. So I think we may better freeze ctrl
> > while entering
> > error_recovery/reconnect, but need to unfreeze it while exit.
>
> quiescing is always done in error handling, and freeze is actually
> not a must, and it is easier to cause race by calling freeze & unfreeze
> from different contexts.

I think if we donot freeze ctrl, as the IO already submit (just queue
to hctx->dispatch) and may
pending for a long time, it may trigger new hang task issue, but
freeze ctrl may can avoid these
hang task.

> But yes, unquiesce should have been done after exiting error handling, or
> simply do it in nvme_unquiesce_io_queues().
>
> And the following patch should cover all these hangs:
>
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 3ec38e2b9173..83d3818fc60b 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4692,6 +4692,9 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
> */
> nvme_mpath_clear_ctrl_paths(ctrl);
>
> + /* unquiesce io queues so scan work won't hang */
> + nvme_unquiesce_io_queues(ctrl);
> +
> /* prevent racing with ns scanning */
> flush_work(&ctrl->scan_work);
>
> @@ -4701,10 +4704,8 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
> * removing the namespaces' disks; fail all the queues now to avoid
> * potentially having to clean up the failed sync later.
> */
> - if (ctrl->state == NVME_CTRL_DEAD) {
> + if (ctrl->state == NVME_CTRL_DEAD)
> nvme_mark_namespaces_dead(ctrl);
> - nvme_unquiesce_io_queues(ctrl);
> - }
>
> /* this is a no-op when called from the controller reset handler */
> nvme_change_ctrl_state(ctrl, NVME_CTRL_DELETING_NOIO);
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 492f319ebdf3..5d775b76baca 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2578,14 +2578,15 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
> dead = nvme_pci_ctrl_is_dead(dev);
> if (dev->ctrl.state == NVME_CTRL_LIVE ||
> dev->ctrl.state == NVME_CTRL_RESETTING) {
> - if (pci_is_enabled(pdev))
> - nvme_start_freeze(&dev->ctrl);
> /*
> * Give the controller a chance to complete all entered requests
> * if doing a safe shutdown.
> */
> - if (!dead && shutdown)
> + if (!dead && shutdown & pci_is_enabled(pdev)) {
> + nvme_start_freeze(&dev->ctrl);
> nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
> + nvme_unfreeze(&dev->ctrl);
> + }
> }
>
> nvme_quiesce_io_queues(&dev->ctrl);
> @@ -2740,6 +2741,7 @@ static void nvme_reset_work(struct work_struct *work)
> * controller around but remove all namespaces.
> */
> if (dev->online_queues > 1) {
> + nvme_start_freeze(&dev->ctrl);
> nvme_unquiesce_io_queues(&dev->ctrl);
> nvme_wait_freeze(&dev->ctrl);
> nvme_pci_update_nr_queues(dev);
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 0eb79696fb73..354cce8853c1 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -918,6 +918,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
> goto out_cleanup_tagset;
>
> if (!new) {
> + nvme_start_freeze(&ctrl->ctrl);
> nvme_unquiesce_io_queues(&ctrl->ctrl);
> if (!nvme_wait_freeze_timeout(&ctrl->ctrl, NVME_IO_TIMEOUT)) {
> /*
> @@ -926,6 +927,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
> * to be safe.
> */
> ret = -ENODEV;
> + nvme_unfreeze(&ctrl->ctrl);
> goto out_wait_freeze_timed_out;
> }
> blk_mq_update_nr_hw_queues(ctrl->ctrl.tagset,
> @@ -975,7 +977,6 @@ static void nvme_rdma_teardown_io_queues(struct nvme_rdma_ctrl *ctrl,
> bool remove)
> {
> if (ctrl->ctrl.queue_count > 1) {
> - nvme_start_freeze(&ctrl->ctrl);
> nvme_quiesce_io_queues(&ctrl->ctrl);
> nvme_sync_io_queues(&ctrl->ctrl);
> nvme_rdma_stop_io_queues(ctrl);
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index bf0230442d57..5ae08e9cb16d 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1909,6 +1909,7 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
> goto out_cleanup_connect_q;
>
> if (!new) {
> + nvme_start_freeze(ctrl);
> nvme_unquiesce_io_queues(ctrl);
> if (!nvme_wait_freeze_timeout(ctrl, NVME_IO_TIMEOUT)) {
> /*
> @@ -1917,6 +1918,7 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
> * to be safe.
> */
> ret = -ENODEV;
> + nvme_unfreeze(ctrl);
> goto out_wait_freeze_timed_out;
> }
> blk_mq_update_nr_hw_queues(ctrl->tagset,
> @@ -2021,7 +2023,6 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
> if (ctrl->queue_count <= 1)
> return;
> nvme_quiesce_admin_queue(ctrl);
> - nvme_start_freeze(ctrl);
> nvme_quiesce_io_queues(ctrl);
> nvme_sync_io_queues(ctrl);
> nvme_tcp_stop_io_queues(ctrl);
>
>
>
>
> Thanks
> Ming
>

2023-06-08 14:07:06

by Ming Lei

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting

On Thu, Jun 08, 2023 at 10:48:50AM +0800, 许春光 wrote:
> Ming Lei <[email protected]> 于2023年6月8日周四 08:56写道:
> >
> > On Wed, Jun 07, 2023 at 12:09:17PM +0800, 许春光 wrote:
> > > Hi Ming:
> > >
> > > Ming Lei <[email protected]> 于2023年6月6日周二 23:15写道:
> > > >
> > > > Hello Chunguang,
> > > >
> > > > On Mon, May 29, 2023 at 06:59:22PM +0800, brookxu.cn wrote:
> > > > > From: Chunguang Xu <[email protected]>
> > > > >
> > > > > We found that nvme_remove_namespaces() may hang in flush_work(&ctrl->scan_work)
> > > > > while removing ctrl. The root cause may due to the state of ctrl changed to
> > > > > NVME_CTRL_DELETING while removing ctrl , which intterupt nvme_tcp_error_recovery_work()/
> > > > > nvme_reset_ctrl_work()/nvme_tcp_reconnect_or_remove(). At this time, ctrl is
> > > >
> > > > I didn't dig into ctrl state check in these error handler yet, but error
> > > > handling is supposed to provide forward progress for any controller state.
> > > >
> > > > Can you explain a bit how switching to DELETING interrupts the above
> > > > error handling and breaks the forward progress guarantee?
> > >
> > > Here we freezed ctrl, if ctrl state has changed to DELETING or
> > > DELETING_NIO(by nvme disconnect), we will break up and lease ctrl
> > > freeze, so nvme_remove_namespaces() hang.
> > >
> > > static void nvme_tcp_error_recovery_work(struct work_struct *work)
> > > {
> > > ...
> > > if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) {
> > > /* state change failure is ok if we started ctrl delete */
> > > WARN_ON_ONCE(ctrl->state != NVME_CTRL_DELETING &&
> > > ctrl->state != NVME_CTRL_DELETING_NOIO);
> > > return;
> > > }
> > >
> > > nvme_tcp_reconnect_or_remove(ctrl);
> > > }
> > >
> > >
> > > Another path, we will check ctrl state while reconnecting, if it changes to
> > > DELETING or DELETING_NIO, we will break up and lease ctrl freeze and
> > > queue quiescing (through reset path), as a result Hang occurs.
> > >
> > > static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
> > > {
> > > /* If we are resetting/deleting then do nothing */
> > > if (ctrl->state != NVME_CTRL_CONNECTING) {
> > > WARN_ON_ONCE(ctrl->state == NVME_CTRL_NEW ||
> > > ctrl->state == NVME_CTRL_LIVE);
> > > return;
> > > }
> > > ...
> > > }
> > >
> > > > > freezed and queue is quiescing . Since scan_work may continue to issue IOs to
> > > > > load partition table, make it blocked, and lead to nvme_tcp_error_recovery_work()
> > > > > hang in flush_work(&ctrl->scan_work).
> > > > >
> > > > > After analyzation, we found that there are mainly two case:
> > > > > 1. Since ctrl is freeze, scan_work hang in __bio_queue_enter() while it issue
> > > > > new IO to load partition table.
> > > >
> > > > Yeah, nvme freeze usage is fragile, and I suggested to move
> > > > nvme_start_freeze() from nvme_tcp_teardown_io_queues to
> > > > nvme_tcp_configure_io_queues(), such as the posted change on rdma:
> > > >
> > > > https://lore.kernel.org/linux-block/CAHj4cs-4gQHnp5aiekvJmb6o8qAcb6nLV61uOGFiisCzM49_dg@mail.gmail.com/T/#ma0d6bbfaa0c8c1be79738ff86a2fdcf7582e06b0
> > >
> > > While drive reconnecting, I think we should freeze ctrl or quiescing queue,
> > > otherwise nvme_fail_nonready_command()may return BLK_STS_RESOURCE,
> > > and the IOs may retry frequently. So I think we may better freeze ctrl
> > > while entering
> > > error_recovery/reconnect, but need to unfreeze it while exit.
> >
> > quiescing is always done in error handling, and freeze is actually
> > not a must, and it is easier to cause race by calling freeze & unfreeze
> > from different contexts.
>
> I think if we donot freeze ctrl, as the IO already submit (just queue
> to hctx->dispatch) and may
> pending for a long time, it may trigger new hang task issue, but
> freeze ctrl may can avoid these
> hang task.

How can the freeze make the difference? If driver/device can't move on,
any request is stuck, so the IO path waits in either submit_bio() or
upper layer after returning from submit_bio().


Thanks,
Ming


2023-06-09 03:54:03

by 许春光

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting

Ming Lei <[email protected]> 于2023年6月8日周四 21:51写道:
>
> On Thu, Jun 08, 2023 at 10:48:50AM +0800, 许春光 wrote:
> > Ming Lei <[email protected]> 于2023年6月8日周四 08:56写道:
> > >
> > > On Wed, Jun 07, 2023 at 12:09:17PM +0800, 许春光 wrote:
> > > > Hi Ming:
> > > >
> > > > Ming Lei <[email protected]> 于2023年6月6日周二 23:15写道:
> > > > >
> > > > > Hello Chunguang,
> > > > >
> > > > > On Mon, May 29, 2023 at 06:59:22PM +0800, brookxu.cn wrote:
> > > > > > From: Chunguang Xu <[email protected]>
> > > > > >
> > > > > > We found that nvme_remove_namespaces() may hang in flush_work(&ctrl->scan_work)
> > > > > > while removing ctrl. The root cause may due to the state of ctrl changed to
> > > > > > NVME_CTRL_DELETING while removing ctrl , which intterupt nvme_tcp_error_recovery_work()/
> > > > > > nvme_reset_ctrl_work()/nvme_tcp_reconnect_or_remove(). At this time, ctrl is
> > > > >
> > > > > I didn't dig into ctrl state check in these error handler yet, but error
> > > > > handling is supposed to provide forward progress for any controller state.
> > > > >
> > > > > Can you explain a bit how switching to DELETING interrupts the above
> > > > > error handling and breaks the forward progress guarantee?
> > > >
> > > > Here we freezed ctrl, if ctrl state has changed to DELETING or
> > > > DELETING_NIO(by nvme disconnect), we will break up and lease ctrl
> > > > freeze, so nvme_remove_namespaces() hang.
> > > >
> > > > static void nvme_tcp_error_recovery_work(struct work_struct *work)
> > > > {
> > > > ...
> > > > if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) {
> > > > /* state change failure is ok if we started ctrl delete */
> > > > WARN_ON_ONCE(ctrl->state != NVME_CTRL_DELETING &&
> > > > ctrl->state != NVME_CTRL_DELETING_NOIO);
> > > > return;
> > > > }
> > > >
> > > > nvme_tcp_reconnect_or_remove(ctrl);
> > > > }
> > > >
> > > >
> > > > Another path, we will check ctrl state while reconnecting, if it changes to
> > > > DELETING or DELETING_NIO, we will break up and lease ctrl freeze and
> > > > queue quiescing (through reset path), as a result Hang occurs.
> > > >
> > > > static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
> > > > {
> > > > /* If we are resetting/deleting then do nothing */
> > > > if (ctrl->state != NVME_CTRL_CONNECTING) {
> > > > WARN_ON_ONCE(ctrl->state == NVME_CTRL_NEW ||
> > > > ctrl->state == NVME_CTRL_LIVE);
> > > > return;
> > > > }
> > > > ...
> > > > }
> > > >
> > > > > > freezed and queue is quiescing . Since scan_work may continue to issue IOs to
> > > > > > load partition table, make it blocked, and lead to nvme_tcp_error_recovery_work()
> > > > > > hang in flush_work(&ctrl->scan_work).
> > > > > >
> > > > > > After analyzation, we found that there are mainly two case:
> > > > > > 1. Since ctrl is freeze, scan_work hang in __bio_queue_enter() while it issue
> > > > > > new IO to load partition table.
> > > > >
> > > > > Yeah, nvme freeze usage is fragile, and I suggested to move
> > > > > nvme_start_freeze() from nvme_tcp_teardown_io_queues to
> > > > > nvme_tcp_configure_io_queues(), such as the posted change on rdma:
> > > > >
> > > > > https://lore.kernel.org/linux-block/CAHj4cs-4gQHnp5aiekvJmb6o8qAcb6nLV61uOGFiisCzM49_dg@mail.gmail.com/T/#ma0d6bbfaa0c8c1be79738ff86a2fdcf7582e06b0
> > > >
> > > > While drive reconnecting, I think we should freeze ctrl or quiescing queue,
> > > > otherwise nvme_fail_nonready_command()may return BLK_STS_RESOURCE,
> > > > and the IOs may retry frequently. So I think we may better freeze ctrl
> > > > while entering
> > > > error_recovery/reconnect, but need to unfreeze it while exit.
> > >
> > > quiescing is always done in error handling, and freeze is actually
> > > not a must, and it is easier to cause race by calling freeze & unfreeze
> > > from different contexts.
> >
> > I think if we donot freeze ctrl, as the IO already submit (just queue
> > to hctx->dispatch) and may
> > pending for a long time, it may trigger new hang task issue, but
> > freeze ctrl may can avoid these
> > hang task.
>
> How can the freeze make the difference? If driver/device can't move on,
> any request is stuck, so the IO path waits in either submit_bio() or
> upper layer after returning from submit_bio().
>

Now error_recovery and reset ctrl are handled somewhat differently:
1. error_recovery will freeze the controller, but it will unquiescing
queue to fast fail pending IO later,
otherwise this part of IO may cause task hang during the reconnection,
so while error_recovery work
interrupted, just leave ctrl freeze, queue is unquiescing.

Think carefully, the new IO will still hang in enter_queue, it seems
that this solution still not work fine, if we try to remove freeze from
nvme_tcp_teardown_io_queues(), I think we may also need to
refactor error_recovery.

2. Reset ctrl will freeze the controller and quiescing queue at the
same time, while reset interrupted,
ctrl is freeze and the queue is quiescing.

I may got the point of you, what
https://lore.kernel.org/linux-block/CAHj4cs-4gQHnp5aiekvJmb6o8qAcb6nLV61uOGFiisCzM49_dg@mail.gmail.com/T/#ma0d6bbfaa0c8c1be79738ff86a2fdcf7582e06b0
proposal seems better.

> Thanks,
> Ming
>

2023-06-09 04:00:30

by 许春光

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting

Ming Lei <[email protected]> 于2023年6月8日周四 21:51写道:
>
> On Thu, Jun 08, 2023 at 10:48:50AM +0800, 许春光 wrote:
> > Ming Lei <[email protected]> 于2023年6月8日周四 08:56写道:
> > >
> > > On Wed, Jun 07, 2023 at 12:09:17PM +0800, 许春光 wrote:
> > > > Hi Ming:
> > > >
> > > > Ming Lei <[email protected]> 于2023年6月6日周二 23:15写道:
> > > > >
> > > > > Hello Chunguang,
> > > > >
> > > > > On Mon, May 29, 2023 at 06:59:22PM +0800, brookxu.cn wrote:
> > > > > > From: Chunguang Xu <[email protected]>
> > > > > >
> > > > > > We found that nvme_remove_namespaces() may hang in flush_work(&ctrl->scan_work)
> > > > > > while removing ctrl. The root cause may due to the state of ctrl changed to
> > > > > > NVME_CTRL_DELETING while removing ctrl , which intterupt nvme_tcp_error_recovery_work()/
> > > > > > nvme_reset_ctrl_work()/nvme_tcp_reconnect_or_remove(). At this time, ctrl is
> > > > >
> > > > > I didn't dig into ctrl state check in these error handler yet, but error
> > > > > handling is supposed to provide forward progress for any controller state.
> > > > >
> > > > > Can you explain a bit how switching to DELETING interrupts the above
> > > > > error handling and breaks the forward progress guarantee?
> > > >
> > > > Here we freezed ctrl, if ctrl state has changed to DELETING or
> > > > DELETING_NIO(by nvme disconnect), we will break up and lease ctrl
> > > > freeze, so nvme_remove_namespaces() hang.
> > > >
> > > > static void nvme_tcp_error_recovery_work(struct work_struct *work)
> > > > {
> > > > ...
> > > > if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) {
> > > > /* state change failure is ok if we started ctrl delete */
> > > > WARN_ON_ONCE(ctrl->state != NVME_CTRL_DELETING &&
> > > > ctrl->state != NVME_CTRL_DELETING_NOIO);
> > > > return;
> > > > }
> > > >
> > > > nvme_tcp_reconnect_or_remove(ctrl);
> > > > }
> > > >
> > > >
> > > > Another path, we will check ctrl state while reconnecting, if it changes to
> > > > DELETING or DELETING_NIO, we will break up and lease ctrl freeze and
> > > > queue quiescing (through reset path), as a result Hang occurs.
> > > >
> > > > static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
> > > > {
> > > > /* If we are resetting/deleting then do nothing */
> > > > if (ctrl->state != NVME_CTRL_CONNECTING) {
> > > > WARN_ON_ONCE(ctrl->state == NVME_CTRL_NEW ||
> > > > ctrl->state == NVME_CTRL_LIVE);
> > > > return;
> > > > }
> > > > ...
> > > > }
> > > >
> > > > > > freezed and queue is quiescing . Since scan_work may continue to issue IOs to
> > > > > > load partition table, make it blocked, and lead to nvme_tcp_error_recovery_work()
> > > > > > hang in flush_work(&ctrl->scan_work).
> > > > > >
> > > > > > After analyzation, we found that there are mainly two case:
> > > > > > 1. Since ctrl is freeze, scan_work hang in __bio_queue_enter() while it issue
> > > > > > new IO to load partition table.
> > > > >
> > > > > Yeah, nvme freeze usage is fragile, and I suggested to move
> > > > > nvme_start_freeze() from nvme_tcp_teardown_io_queues to
> > > > > nvme_tcp_configure_io_queues(), such as the posted change on rdma:
> > > > >
> > > > > https://lore.kernel.org/linux-block/CAHj4cs-4gQHnp5aiekvJmb6o8qAcb6nLV61uOGFiisCzM49_dg@mail.gmail.com/T/#ma0d6bbfaa0c8c1be79738ff86a2fdcf7582e06b0
> > > >
> > > > While drive reconnecting, I think we should freeze ctrl or quiescing queue,
> > > > otherwise nvme_fail_nonready_command()may return BLK_STS_RESOURCE,
> > > > and the IOs may retry frequently. So I think we may better freeze ctrl
> > > > while entering
> > > > error_recovery/reconnect, but need to unfreeze it while exit.
> > >
> > > quiescing is always done in error handling, and freeze is actually
> > > not a must, and it is easier to cause race by calling freeze & unfreeze
> > > from different contexts.
> >
> > I think if we donot freeze ctrl, as the IO already submit (just queue
> > to hctx->dispatch) and may
> > pending for a long time, it may trigger new hang task issue, but
> > freeze ctrl may can avoid these
> > hang task.
>
> How can the freeze make the difference? If driver/device can't move on,
> any request is stuck, so the IO path waits in either submit_bio() or
> upper layer after returning from submit_bio().
>

Now error_recovery and reset ctrl are handled somewhat differently:
1. error_recovery will freeze the controller, but it will unquiescing
queue to fast fail pending IO later,
otherwise this part of IO may cause task hang during the reconnection,
so while error_recovery work
interrupted, just leave ctrl freeze, queue is unquiescing.

Think carefully, the new IO will still hang in enter_queue, it seems
that this solution still not work fine,
so I think we may also need to refactor the logic of error_recovery.

2. Reset ctrl will freeze the controller and quiescing queue at the
same time, while reset interrupted,
ctrl is freeze and the queue is quiescing.

I may got the point of you, what
https://lore.kernel.org/linux-block/CAHj4cs-4gQHnp5aiekvJmb6o8qAcb6nLV61uOGFiisCzM49_dg@mail.gmail.com/T/#ma0d6bbfaa0c8c1be79738ff86a2fdcf7582e06b0
proposal seems better.

> Thanks,
> Ming
>

2023-06-11 08:20:34

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting


>> Hi Ming:
>>
>> Ming Lei <[email protected]> 于2023年6月6日周二 23:15写道:
>>>
>>> Hello Chunguang,
>>>
>>> On Mon, May 29, 2023 at 06:59:22PM +0800, brookxu.cn wrote:
>>>> From: Chunguang Xu <[email protected]>
>>>>
>>>> We found that nvme_remove_namespaces() may hang in flush_work(&ctrl->scan_work)
>>>> while removing ctrl. The root cause may due to the state of ctrl changed to
>>>> NVME_CTRL_DELETING while removing ctrl , which intterupt nvme_tcp_error_recovery_work()/
>>>> nvme_reset_ctrl_work()/nvme_tcp_reconnect_or_remove(). At this time, ctrl is
>>>
>>> I didn't dig into ctrl state check in these error handler yet, but error
>>> handling is supposed to provide forward progress for any controller state.
>>>
>>> Can you explain a bit how switching to DELETING interrupts the above
>>> error handling and breaks the forward progress guarantee?
>>
>> Here we freezed ctrl, if ctrl state has changed to DELETING or
>> DELETING_NIO(by nvme disconnect), we will break up and lease ctrl
>> freeze, so nvme_remove_namespaces() hang.
>>
>> static void nvme_tcp_error_recovery_work(struct work_struct *work)
>> {
>> ...
>> if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) {
>> /* state change failure is ok if we started ctrl delete */
>> WARN_ON_ONCE(ctrl->state != NVME_CTRL_DELETING &&
>> ctrl->state != NVME_CTRL_DELETING_NOIO);
>> return;
>> }
>>
>> nvme_tcp_reconnect_or_remove(ctrl);
>> }
>>
>>
>> Another path, we will check ctrl state while reconnecting, if it changes to
>> DELETING or DELETING_NIO, we will break up and lease ctrl freeze and
>> queue quiescing (through reset path), as a result Hang occurs.
>>
>> static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
>> {
>> /* If we are resetting/deleting then do nothing */
>> if (ctrl->state != NVME_CTRL_CONNECTING) {
>> WARN_ON_ONCE(ctrl->state == NVME_CTRL_NEW ||
>> ctrl->state == NVME_CTRL_LIVE);
>> return;
>> }
>> ...
>> }
>>
>>>> freezed and queue is quiescing . Since scan_work may continue to issue IOs to
>>>> load partition table, make it blocked, and lead to nvme_tcp_error_recovery_work()
>>>> hang in flush_work(&ctrl->scan_work).
>>>>
>>>> After analyzation, we found that there are mainly two case:
>>>> 1. Since ctrl is freeze, scan_work hang in __bio_queue_enter() while it issue
>>>> new IO to load partition table.
>>>
>>> Yeah, nvme freeze usage is fragile, and I suggested to move
>>> nvme_start_freeze() from nvme_tcp_teardown_io_queues to
>>> nvme_tcp_configure_io_queues(), such as the posted change on rdma:
>>>
>>> https://lore.kernel.org/linux-block/CAHj4cs-4gQHnp5aiekvJmb6o8qAcb6nLV61uOGFiisCzM49_dg@mail.gmail.com/T/#ma0d6bbfaa0c8c1be79738ff86a2fdcf7582e06b0
>>
>> While drive reconnecting, I think we should freeze ctrl or quiescing queue,
>> otherwise nvme_fail_nonready_command()may return BLK_STS_RESOURCE,
>> and the IOs may retry frequently. So I think we may better freeze ctrl
>> while entering
>> error_recovery/reconnect, but need to unfreeze it while exit.
>
> quiescing is always done in error handling, and freeze is actually
> not a must, and it is easier to cause race by calling freeze & unfreeze
> from different contexts.
>
> But yes, unquiesce should have been done after exiting error handling, or
> simply do it in nvme_unquiesce_io_queues().
>
> And the following patch should cover all these hangs:
>

Ming, are you sending a formal patchset for this?

>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 3ec38e2b9173..83d3818fc60b 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4692,6 +4692,9 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
> */
> nvme_mpath_clear_ctrl_paths(ctrl);
>
> + /* unquiesce io queues so scan work won't hang */
> + nvme_unquiesce_io_queues(ctrl);

What guarantees that the queues won't be quiesced right after this
by the transport?

I'm still unclear why this affects the scan_work?

> +
> /* prevent racing with ns scanning */
> flush_work(&ctrl->scan_work);
>
> @@ -4701,10 +4704,8 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
> * removing the namespaces' disks; fail all the queues now to avoid
> * potentially having to clean up the failed sync later.
> */
> - if (ctrl->state == NVME_CTRL_DEAD) {
> + if (ctrl->state == NVME_CTRL_DEAD)
> nvme_mark_namespaces_dead(ctrl);
> - nvme_unquiesce_io_queues(ctrl);
> - }
>
> /* this is a no-op when called from the controller reset handler */
> nvme_change_ctrl_state(ctrl, NVME_CTRL_DELETING_NOIO);
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 492f319ebdf3..5d775b76baca 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2578,14 +2578,15 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
> dead = nvme_pci_ctrl_is_dead(dev);
> if (dev->ctrl.state == NVME_CTRL_LIVE ||
> dev->ctrl.state == NVME_CTRL_RESETTING) {
> - if (pci_is_enabled(pdev))
> - nvme_start_freeze(&dev->ctrl);
> /*
> * Give the controller a chance to complete all entered requests
> * if doing a safe shutdown.
> */
> - if (!dead && shutdown)
> + if (!dead && shutdown & pci_is_enabled(pdev)) {
> + nvme_start_freeze(&dev->ctrl);
> nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
> + nvme_unfreeze(&dev->ctrl);
> + }
> }
>
> nvme_quiesce_io_queues(&dev->ctrl);
> @@ -2740,6 +2741,7 @@ static void nvme_reset_work(struct work_struct *work)
> * controller around but remove all namespaces.
> */
> if (dev->online_queues > 1) {
> + nvme_start_freeze(&dev->ctrl);
> nvme_unquiesce_io_queues(&dev->ctrl);
> nvme_wait_freeze(&dev->ctrl);
> nvme_pci_update_nr_queues(dev);
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 0eb79696fb73..354cce8853c1 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -918,6 +918,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
> goto out_cleanup_tagset;
>
> if (!new) {
> + nvme_start_freeze(&ctrl->ctrl);
> nvme_unquiesce_io_queues(&ctrl->ctrl);
> if (!nvme_wait_freeze_timeout(&ctrl->ctrl, NVME_IO_TIMEOUT)) {
> /*
> @@ -926,6 +927,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
> * to be safe.
> */
> ret = -ENODEV;
> + nvme_unfreeze(&ctrl->ctrl);
> goto out_wait_freeze_timed_out;
> }
> blk_mq_update_nr_hw_queues(ctrl->ctrl.tagset,
> @@ -975,7 +977,6 @@ static void nvme_rdma_teardown_io_queues(struct nvme_rdma_ctrl *ctrl,
> bool remove)
> {
> if (ctrl->ctrl.queue_count > 1) {
> - nvme_start_freeze(&ctrl->ctrl);
> nvme_quiesce_io_queues(&ctrl->ctrl);
> nvme_sync_io_queues(&ctrl->ctrl);
> nvme_rdma_stop_io_queues(ctrl);
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index bf0230442d57..5ae08e9cb16d 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1909,6 +1909,7 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
> goto out_cleanup_connect_q;
>
> if (!new) {
> + nvme_start_freeze(ctrl);
> nvme_unquiesce_io_queues(ctrl);
> if (!nvme_wait_freeze_timeout(ctrl, NVME_IO_TIMEOUT)) {
> /*
> @@ -1917,6 +1918,7 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
> * to be safe.
> */
> ret = -ENODEV;
> + nvme_unfreeze(ctrl);
> goto out_wait_freeze_timed_out;
> }
> blk_mq_update_nr_hw_queues(ctrl->tagset,
> @@ -2021,7 +2023,6 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
> if (ctrl->queue_count <= 1)
> return;
> nvme_quiesce_admin_queue(ctrl);
> - nvme_start_freeze(ctrl);
> nvme_quiesce_io_queues(ctrl);
> nvme_sync_io_queues(ctrl);
> nvme_tcp_stop_io_queues(ctrl);
>
>
>
>
> Thanks
> Ming
>

2023-06-12 01:42:06

by Ming Lei

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting

On Sun, Jun 11, 2023 at 11:11:06AM +0300, Sagi Grimberg wrote:
>
> > > Hi Ming:
> > >
> > > Ming Lei <[email protected]> 于2023年6月6日周二 23:15写道:
> > > >
> > > > Hello Chunguang,
> > > >
> > > > On Mon, May 29, 2023 at 06:59:22PM +0800, brookxu.cn wrote:
> > > > > From: Chunguang Xu <[email protected]>
> > > > >
> > > > > We found that nvme_remove_namespaces() may hang in flush_work(&ctrl->scan_work)
> > > > > while removing ctrl. The root cause may due to the state of ctrl changed to
> > > > > NVME_CTRL_DELETING while removing ctrl , which intterupt nvme_tcp_error_recovery_work()/
> > > > > nvme_reset_ctrl_work()/nvme_tcp_reconnect_or_remove(). At this time, ctrl is
> > > >
> > > > I didn't dig into ctrl state check in these error handler yet, but error
> > > > handling is supposed to provide forward progress for any controller state.
> > > >
> > > > Can you explain a bit how switching to DELETING interrupts the above
> > > > error handling and breaks the forward progress guarantee?
> > >
> > > Here we freezed ctrl, if ctrl state has changed to DELETING or
> > > DELETING_NIO(by nvme disconnect), we will break up and lease ctrl
> > > freeze, so nvme_remove_namespaces() hang.
> > >
> > > static void nvme_tcp_error_recovery_work(struct work_struct *work)
> > > {
> > > ...
> > > if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) {
> > > /* state change failure is ok if we started ctrl delete */
> > > WARN_ON_ONCE(ctrl->state != NVME_CTRL_DELETING &&
> > > ctrl->state != NVME_CTRL_DELETING_NOIO);
> > > return;
> > > }
> > >
> > > nvme_tcp_reconnect_or_remove(ctrl);
> > > }
> > >
> > >
> > > Another path, we will check ctrl state while reconnecting, if it changes to
> > > DELETING or DELETING_NIO, we will break up and lease ctrl freeze and
> > > queue quiescing (through reset path), as a result Hang occurs.
> > >
> > > static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
> > > {
> > > /* If we are resetting/deleting then do nothing */
> > > if (ctrl->state != NVME_CTRL_CONNECTING) {
> > > WARN_ON_ONCE(ctrl->state == NVME_CTRL_NEW ||
> > > ctrl->state == NVME_CTRL_LIVE);
> > > return;
> > > }
> > > ...
> > > }
> > >
> > > > > freezed and queue is quiescing . Since scan_work may continue to issue IOs to
> > > > > load partition table, make it blocked, and lead to nvme_tcp_error_recovery_work()
> > > > > hang in flush_work(&ctrl->scan_work).
> > > > >
> > > > > After analyzation, we found that there are mainly two case:
> > > > > 1. Since ctrl is freeze, scan_work hang in __bio_queue_enter() while it issue
> > > > > new IO to load partition table.
> > > >
> > > > Yeah, nvme freeze usage is fragile, and I suggested to move
> > > > nvme_start_freeze() from nvme_tcp_teardown_io_queues to
> > > > nvme_tcp_configure_io_queues(), such as the posted change on rdma:
> > > >
> > > > https://lore.kernel.org/linux-block/CAHj4cs-4gQHnp5aiekvJmb6o8qAcb6nLV61uOGFiisCzM49_dg@mail.gmail.com/T/#ma0d6bbfaa0c8c1be79738ff86a2fdcf7582e06b0
> > >
> > > While drive reconnecting, I think we should freeze ctrl or quiescing queue,
> > > otherwise nvme_fail_nonready_command()may return BLK_STS_RESOURCE,
> > > and the IOs may retry frequently. So I think we may better freeze ctrl
> > > while entering
> > > error_recovery/reconnect, but need to unfreeze it while exit.
> >
> > quiescing is always done in error handling, and freeze is actually
> > not a must, and it is easier to cause race by calling freeze & unfreeze
> > from different contexts.
> >
> > But yes, unquiesce should have been done after exiting error handling, or
> > simply do it in nvme_unquiesce_io_queues().
> >
> > And the following patch should cover all these hangs:
> >
>
> Ming, are you sending a formal patchset for this?

Not yet, will do it.

>
> >
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 3ec38e2b9173..83d3818fc60b 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -4692,6 +4692,9 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
> > */
> > nvme_mpath_clear_ctrl_paths(ctrl);
> > + /* unquiesce io queues so scan work won't hang */
> > + nvme_unquiesce_io_queues(ctrl);
>
> What guarantees that the queues won't be quiesced right after this
> by the transport?

Please see nvme_change_ctrl_state(), if controller state is in
DELETING, new NVME_CTRL_RESETTING/NVME_CTRL_CONNECTING can be entered
any more.

>
> I'm still unclear why this affects the scan_work?

As Chunguang mentioned, if error recover is terminated by nvme deletion,
the controller can be kept in quiesced state, then in-queue IOs can'tu
move on, meantime new error recovery can't be started successfully because
controller state is NVME_CTRL_DELETING, so any pending IOs(include those
from scan context) can't be completed.




Thanks,
Ming


2023-06-12 08:19:12

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting


>>>> Hi Ming:
>>>>
>>>> Ming Lei <[email protected]> 于2023年6月6日周二 23:15写道:
>>>>>
>>>>> Hello Chunguang,
>>>>>
>>>>> On Mon, May 29, 2023 at 06:59:22PM +0800, brookxu.cn wrote:
>>>>>> From: Chunguang Xu <[email protected]>
>>>>>>
>>>>>> We found that nvme_remove_namespaces() may hang in flush_work(&ctrl->scan_work)
>>>>>> while removing ctrl. The root cause may due to the state of ctrl changed to
>>>>>> NVME_CTRL_DELETING while removing ctrl , which intterupt nvme_tcp_error_recovery_work()/
>>>>>> nvme_reset_ctrl_work()/nvme_tcp_reconnect_or_remove(). At this time, ctrl is
>>>>>
>>>>> I didn't dig into ctrl state check in these error handler yet, but error
>>>>> handling is supposed to provide forward progress for any controller state.
>>>>>
>>>>> Can you explain a bit how switching to DELETING interrupts the above
>>>>> error handling and breaks the forward progress guarantee?
>>>>
>>>> Here we freezed ctrl, if ctrl state has changed to DELETING or
>>>> DELETING_NIO(by nvme disconnect), we will break up and lease ctrl
>>>> freeze, so nvme_remove_namespaces() hang.
>>>>
>>>> static void nvme_tcp_error_recovery_work(struct work_struct *work)
>>>> {
>>>> ...
>>>> if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) {
>>>> /* state change failure is ok if we started ctrl delete */
>>>> WARN_ON_ONCE(ctrl->state != NVME_CTRL_DELETING &&
>>>> ctrl->state != NVME_CTRL_DELETING_NOIO);
>>>> return;
>>>> }
>>>>
>>>> nvme_tcp_reconnect_or_remove(ctrl);
>>>> }
>>>>
>>>>
>>>> Another path, we will check ctrl state while reconnecting, if it changes to
>>>> DELETING or DELETING_NIO, we will break up and lease ctrl freeze and
>>>> queue quiescing (through reset path), as a result Hang occurs.
>>>>
>>>> static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
>>>> {
>>>> /* If we are resetting/deleting then do nothing */
>>>> if (ctrl->state != NVME_CTRL_CONNECTING) {
>>>> WARN_ON_ONCE(ctrl->state == NVME_CTRL_NEW ||
>>>> ctrl->state == NVME_CTRL_LIVE);
>>>> return;
>>>> }
>>>> ...
>>>> }
>>>>
>>>>>> freezed and queue is quiescing . Since scan_work may continue to issue IOs to
>>>>>> load partition table, make it blocked, and lead to nvme_tcp_error_recovery_work()
>>>>>> hang in flush_work(&ctrl->scan_work).
>>>>>>
>>>>>> After analyzation, we found that there are mainly two case:
>>>>>> 1. Since ctrl is freeze, scan_work hang in __bio_queue_enter() while it issue
>>>>>> new IO to load partition table.
>>>>>
>>>>> Yeah, nvme freeze usage is fragile, and I suggested to move
>>>>> nvme_start_freeze() from nvme_tcp_teardown_io_queues to
>>>>> nvme_tcp_configure_io_queues(), such as the posted change on rdma:
>>>>>
>>>>> https://lore.kernel.org/linux-block/CAHj4cs-4gQHnp5aiekvJmb6o8qAcb6nLV61uOGFiisCzM49_dg@mail.gmail.com/T/#ma0d6bbfaa0c8c1be79738ff86a2fdcf7582e06b0
>>>>
>>>> While drive reconnecting, I think we should freeze ctrl or quiescing queue,
>>>> otherwise nvme_fail_nonready_command()may return BLK_STS_RESOURCE,
>>>> and the IOs may retry frequently. So I think we may better freeze ctrl
>>>> while entering
>>>> error_recovery/reconnect, but need to unfreeze it while exit.
>>>
>>> quiescing is always done in error handling, and freeze is actually
>>> not a must, and it is easier to cause race by calling freeze & unfreeze
>>> from different contexts.
>>>
>>> But yes, unquiesce should have been done after exiting error handling, or
>>> simply do it in nvme_unquiesce_io_queues().
>>>
>>> And the following patch should cover all these hangs:
>>>
>>
>> Ming, are you sending a formal patchset for this?
>
> Not yet, will do it.

Would like it to get to the next pull request going out this week...

>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index 3ec38e2b9173..83d3818fc60b 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -4692,6 +4692,9 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
>>> */
>>> nvme_mpath_clear_ctrl_paths(ctrl);
>>> + /* unquiesce io queues so scan work won't hang */
>>> + nvme_unquiesce_io_queues(ctrl);
>>
>> What guarantees that the queues won't be quiesced right after this
>> by the transport?
>
> Please see nvme_change_ctrl_state(), if controller state is in
> DELETING, new NVME_CTRL_RESETTING/NVME_CTRL_CONNECTING can be entered
> any more.

Yes, this relies on the fact that nvme_remove_namespaces is only called
after DELETING state was set. ok.

>> I'm still unclear why this affects the scan_work?
>
> As Chunguang mentioned, if error recover is terminated by nvme deletion,
> the controller can be kept in quiesced state, then in-queue IOs can'tu
> move on, meantime new error recovery can't be started successfully because
> controller state is NVME_CTRL_DELETING, so any pending IOs(include those
> from scan context) can't be completed.

Yes. please separate to individual patches when submitting though.

2023-06-12 08:58:50

by 许春光

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting

Ming Lei <[email protected]> 于2023年6月12日周一 09:33写道:
>
> On Sun, Jun 11, 2023 at 11:11:06AM +0300, Sagi Grimberg wrote:
> >
> > > > Hi Ming:
> > > >
> > > > Ming Lei <[email protected]> 于2023年6月6日周二 23:15写道:
> > > > >
> > > > > Hello Chunguang,
> > > > >
> > > > > On Mon, May 29, 2023 at 06:59:22PM +0800, brookxu.cn wrote:
> > > > > > From: Chunguang Xu <[email protected]>
> > > > > >
> > > > > > We found that nvme_remove_namespaces() may hang in flush_work(&ctrl->scan_work)
> > > > > > while removing ctrl. The root cause may due to the state of ctrl changed to
> > > > > > NVME_CTRL_DELETING while removing ctrl , which intterupt nvme_tcp_error_recovery_work()/
> > > > > > nvme_reset_ctrl_work()/nvme_tcp_reconnect_or_remove(). At this time, ctrl is
> > > > >
> > > > > I didn't dig into ctrl state check in these error handler yet, but error
> > > > > handling is supposed to provide forward progress for any controller state.
> > > > >
> > > > > Can you explain a bit how switching to DELETING interrupts the above
> > > > > error handling and breaks the forward progress guarantee?
> > > >
> > > > Here we freezed ctrl, if ctrl state has changed to DELETING or
> > > > DELETING_NIO(by nvme disconnect), we will break up and lease ctrl
> > > > freeze, so nvme_remove_namespaces() hang.
> > > >
> > > > static void nvme_tcp_error_recovery_work(struct work_struct *work)
> > > > {
> > > > ...
> > > > if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) {
> > > > /* state change failure is ok if we started ctrl delete */
> > > > WARN_ON_ONCE(ctrl->state != NVME_CTRL_DELETING &&
> > > > ctrl->state != NVME_CTRL_DELETING_NOIO);
> > > > return;
> > > > }
> > > >
> > > > nvme_tcp_reconnect_or_remove(ctrl);
> > > > }
> > > >
> > > >
> > > > Another path, we will check ctrl state while reconnecting, if it changes to
> > > > DELETING or DELETING_NIO, we will break up and lease ctrl freeze and
> > > > queue quiescing (through reset path), as a result Hang occurs.
> > > >
> > > > static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
> > > > {
> > > > /* If we are resetting/deleting then do nothing */
> > > > if (ctrl->state != NVME_CTRL_CONNECTING) {
> > > > WARN_ON_ONCE(ctrl->state == NVME_CTRL_NEW ||
> > > > ctrl->state == NVME_CTRL_LIVE);
> > > > return;
> > > > }
> > > > ...
> > > > }
> > > >
> > > > > > freezed and queue is quiescing . Since scan_work may continue to issue IOs to
> > > > > > load partition table, make it blocked, and lead to nvme_tcp_error_recovery_work()
> > > > > > hang in flush_work(&ctrl->scan_work).
> > > > > >
> > > > > > After analyzation, we found that there are mainly two case:
> > > > > > 1. Since ctrl is freeze, scan_work hang in __bio_queue_enter() while it issue
> > > > > > new IO to load partition table.
> > > > >
> > > > > Yeah, nvme freeze usage is fragile, and I suggested to move
> > > > > nvme_start_freeze() from nvme_tcp_teardown_io_queues to
> > > > > nvme_tcp_configure_io_queues(), such as the posted change on rdma:
> > > > >
> > > > > https://lore.kernel.org/linux-block/CAHj4cs-4gQHnp5aiekvJmb6o8qAcb6nLV61uOGFiisCzM49_dg@mail.gmail.com/T/#ma0d6bbfaa0c8c1be79738ff86a2fdcf7582e06b0
> > > >
> > > > While drive reconnecting, I think we should freeze ctrl or quiescing queue,
> > > > otherwise nvme_fail_nonready_command()may return BLK_STS_RESOURCE,
> > > > and the IOs may retry frequently. So I think we may better freeze ctrl
> > > > while entering
> > > > error_recovery/reconnect, but need to unfreeze it while exit.
> > >
> > > quiescing is always done in error handling, and freeze is actually
> > > not a must, and it is easier to cause race by calling freeze & unfreeze
> > > from different contexts.
> > >
> > > But yes, unquiesce should have been done after exiting error handling, or
> > > simply do it in nvme_unquiesce_io_queues().
> > >
> > > And the following patch should cover all these hangs:
> > >
> >
> > Ming, are you sending a formal patchset for this?
>
> Not yet, will do it.

Hi Ming:

Please cc me, thx.

> >
> > >
> > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > > index 3ec38e2b9173..83d3818fc60b 100644
> > > --- a/drivers/nvme/host/core.c
> > > +++ b/drivers/nvme/host/core.c
> > > @@ -4692,6 +4692,9 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
> > > */
> > > nvme_mpath_clear_ctrl_paths(ctrl);
> > > + /* unquiesce io queues so scan work won't hang */
> > > + nvme_unquiesce_io_queues(ctrl);
> >
> > What guarantees that the queues won't be quiesced right after this
> > by the transport?
>
> Please see nvme_change_ctrl_state(), if controller state is in
> DELETING, new NVME_CTRL_RESETTING/NVME_CTRL_CONNECTING can be entered
> any more.
>
> >
> > I'm still unclear why this affects the scan_work?
>
> As Chunguang mentioned, if error recover is terminated by nvme deletion,
> the controller can be kept in quiesced state, then in-queue IOs can'tu
> move on, meantime new error recovery can't be started successfully because
> controller state is NVME_CTRL_DELETING, so any pending IOs(include those
> from scan context) can't be completed.
>
>
>
>
> Thanks,
> Ming
>

2023-06-13 01:11:20

by Ming Lei

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting

Hi Sagi,

On Mon, Jun 12, 2023 at 09:36:53AM +0300, Sagi Grimberg wrote:
>
...

> >
> > Not yet, will do it.
>
> Would like it to get to the next pull request going out this week...

The two patches have been sent out:

https://lore.kernel.org/linux-nvme/[email protected]/T/#t

thanks,
Ming