Hi,
I've replaced my 'nvme_start_freeze' patch with the two patches from
James and gave it another test run on top of Ming's 'v2 fix
blk_mq_alloc_request_hctx' series. All looks good.
Thanks,
Daniel
v1:
- https://lore.kernel.org/linux-nvme/[email protected]/
v2:
- https://lore.kernel.org/linux-nvme/[email protected]/
- reviewed tags collected
- added 'update hardware queues' for all transport
- added fix for fc hanger in nvme_wait_freeze_timeout
v3:
- dropped 'nvme-fc: Freeze queues before destroying them'
- added James' two patches
Initial cover letter:
this is a followup on the crash I reported in
https://lore.kernel.org/linux-block/[email protected]/
By moving the hardware check up the crash was gone. Unfortuntatly, I
don't understand why this fixes the crash. The per-cpu access is
crashing but I can't see why the blk_mq_update_nr_hw_queues() is
fixing this problem.
Even though I can't explain why it fixes it, I think it makes sense to
update the hardware queue mapping bevore we recreate the IO
queues. Thus I avoided in the commit message to say it fixes
something.
Also during testing I observed the we hang indivinetly in
blk_mq_freeze_queue_wait(). Again I can't explain why we get stuck
there but given a common pattern for the nvme_wait_freeze() is to use
it with a timeout I think the timeout should be used too :)
Anyway, someone with more undertanding of the stack can explain the
problems.
Daniel Wagner (3):
nvme-fc: Update hardware queues before using them
nvme-rdma: Update number of hardware queues before using them
nvme-fc: Wait with a timeout for queue to freeze
Hannes Reinecke (1):
nvme-tcp: Update number of hardware queues before using them
James Smart (2):
nvme-fc: avoid race between time out and tear down
nvme-fc: fix controller reset hang during traffic
drivers/nvme/host/fc.c | 28 +++++++++++++++++++---------
drivers/nvme/host/rdma.c | 13 ++++++-------
drivers/nvme/host/tcp.c | 14 ++++++--------
3 files changed, 31 insertions(+), 24 deletions(-)
--
2.29.2
From: James Smart <[email protected]>
commit fe35ec58f0d3 ("block: update hctx map when use multiple maps")
exposed an issue where we may hang trying to wait for queue freeze
during I/O. We call blk_mq_update_nr_hw_queues which may attempt to freeze
the queue. However we never started queue freeze when starting the
reset, which means that we have inflight pending requests that entered the
queue that we will not complete once the queue is quiesced.
So start a freeze before we quiesce the queue, and unfreeze the queue
after we successfully connected the I/O queues (the unfreeze is already
present in the code). blk_mq_update_nr_hw_queues will be called only
after we are sure that the queue was already frozen.
This follows to how the pci driver handles resets.
This patch added logic introduced in commit 9f98772ba307 "nvme-rdma: fix
controller reset hang during traffic".
Signed-off-by: James Smart <[email protected]>
CC: Sagi Grimberg <[email protected]>
Tested-by: Daniel Wagner <[email protected]>
Reviewed-by: Daniel Wagner <[email protected]>
---
drivers/nvme/host/fc.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 112e62cd8a2a..ad3344f6048d 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2486,6 +2486,7 @@ __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues)
* (but with error status).
*/
if (ctrl->ctrl.queue_count > 1) {
+ nvme_start_freeze(&ctrl->ctrl);
nvme_stop_queues(&ctrl->ctrl);
nvme_sync_io_queues(&ctrl->ctrl);
blk_mq_tagset_busy_iter(&ctrl->tag_set,
--
2.29.2
On Tue, Jul 20, 2021 at 02:43:47PM +0200, Daniel Wagner wrote:
> v1:
> - https://lore.kernel.org/linux-nvme/[email protected]/
> v2:
> - https://lore.kernel.org/linux-nvme/[email protected]/
> - reviewed tags collected
> - added 'update hardware queues' for all transport
> - added fix for fc hanger in nvme_wait_freeze_timeout
> v3:
> - dropped 'nvme-fc: Freeze queues before destroying them'
> - added James' two patches
Forgot to add Hannes' reviewed tag. Sorry!
On Tue, Jul 20, 2021 at 02:48:00PM +0200, Daniel Wagner wrote:
> On Tue, Jul 20, 2021 at 02:43:47PM +0200, Daniel Wagner wrote:
> > v1:
> > - https://lore.kernel.org/linux-nvme/[email protected]/
> > v2:
> > - https://lore.kernel.org/linux-nvme/[email protected]/
> > - reviewed tags collected
> > - added 'update hardware queues' for all transport
> > - added fix for fc hanger in nvme_wait_freeze_timeout
> > v3:
> > - dropped 'nvme-fc: Freeze queues before destroying them'
> > - added James' two patches
>
> Forgot to add Hannes' reviewed tag. Sorry!
FTR, I've tested the 'prior_ioq_cnt != nr_io_queues' case. In this
scenario the series works. Though in the case of 'prior_ioq_cnt ==
nr_io_queues' I see hanging I/Os.
On Mon, Jul 26, 2021 at 07:27:04PM +0200, Daniel Wagner wrote:
> FTR, I've tested the 'prior_ioq_cnt != nr_io_queues' case. In this
> scenario the series works. Though in the case of 'prior_ioq_cnt ==
> nr_io_queues' I see hanging I/Os.
Back on starring on this issue. So the hanging I/Os happen in this path
after a remote port has been disabled:
nvme nvme1: NVME-FC{1}: new ctrl: NQN "nqn.1992-08.com.netapp:sn.d646dc63336511e995cb00a0988fb732:subsystem.nvme-svm-dolin-ana_subsystem"
nvme nvme1: NVME-FC{1}: controller connectivity lost. Awaiting Reconnect
nvme nvme1: NVME-FC{1}: io failed due to lldd error 6
nvme nvme1: NVME-FC{1}: io failed due to lldd error 6
nvme nvme1: NVME-FC{1}: io failed due to lldd error 6
nvme nvme1: NVME-FC{1}: io failed due to lldd error 6
nvme nvme1: NVME-FC{1}: io failed due to lldd error 6
nvme nvme1: NVME-FC{1}: io failed due to lldd error 6
nvme nvme1: NVME-FC{1}: connectivity re-established. Attempting reconnect
nvme nvme1: NVME-FC{1}: create association : host wwpn 0x100000109b579ef6 rport wwpn 0x201900a09890f5bf: NQN "nqn.1992-08.com.netapp:sn.d646dc63336511e995cb00a0988fb732:subsystem.nvme-svm-dolin-ana_subsystem"
nvme nvme1: NVME-FC{1}: controller connect complete
and all hanging tasks have the same call trace:
task:fio state:D stack: 0 pid:13545 ppid: 13463 flags:0x00000000
Call Trace:
__schedule+0x2d7/0x8f0
schedule+0x3c/0xa0
blk_queue_enter+0x106/0x1f0
? wait_woken+0x80/0x80
submit_bio_noacct+0x116/0x4b0
? submit_bio+0x4b/0x1a0
submit_bio+0x4b/0x1a0
__blkdev_direct_IO_simple+0x20c/0x350
? update_load_avg+0x1ac/0x5e0
? blkdev_iopoll+0x30/0x30
? blkdev_direct_IO+0x4a2/0x520
blkdev_direct_IO+0x4a2/0x520
? update_load_avg+0x1ac/0x5e0
? update_load_avg+0x1ac/0x5e0
? generic_file_read_iter+0x84/0x140
? __blkdev_direct_IO_simple+0x350/0x350
generic_file_read_iter+0x84/0x140
blkdev_read_iter+0x41/0x50
new_sync_read+0x118/0x1a0
vfs_read+0x15a/0x180
ksys_pread64+0x71/0x90
do_syscall_64+0x3c/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
(gdb) l *blk_queue_enter+0x106
0xffffffff81473736 is in blk_queue_enter (block/blk-core.c:469).
464 * queue dying flag, otherwise the following wait may
465 * never return if the two reads are reordered.
466 */
467 smp_rmb();
468
469 wait_event(q->mq_freeze_wq,
470 (!q->mq_freeze_depth &&
471 blk_pm_resume_queue(pm, q)) ||
472 blk_queue_dying(q));
473 if (blk_queue_dying(q))
On Fri, Jul 30, 2021 at 11:49:07AM +0200, Daniel Wagner wrote:
> On Mon, Jul 26, 2021 at 07:27:04PM +0200, Daniel Wagner wrote:
> > FTR, I've tested the 'prior_ioq_cnt != nr_io_queues' case. In this
> > scenario the series works. Though in the case of 'prior_ioq_cnt ==
> > nr_io_queues' I see hanging I/Os.
>
> Back on starring on this issue. So the hanging I/Os happen in this path
> after a remote port has been disabled:
It turns out, the same happens with TCP as transport. I've got two
connection configured and block traffic on the target side with
iptables rules. This is what I see
nvme nvme16: creating 80 I/O queues.
nvme nvme16: mapped 80/0/0 default/read/poll queues.
nvme nvme16: new ctrl: NQN "nqn.2014-08.org.nvmexpress:NVMf:uuid:de63429f-50a4-4e03-ade6-0be27b75be77", addr 10.161.8.24:4420
nvme nvme17: creating 80 I/O queues.
nvme nvme17: mapped 80/0/0 default/read/poll queues.
nvme nvme17: new ctrl: NQN "nqn.2014-08.org.nvmexpress:NVMf:uuid:de63429f-50a4-4e03-ade6-0be27b75be77", addr 10.161.8.24:4421
nvme nvme17: starting error recovery
nvme nvme17: failed nvme_keep_alive_end_io error=10
nvme nvme17: Reconnecting in 10 seconds...
nvme nvme17: failed to connect socket: -110
nvme nvme17: Failed reconnect attempt 1
nvme nvme17: Reconnecting in 10 seconds...
nvme nvme17: failed to connect socket: -110
nvme nvme17: Failed reconnect attempt 2
nvme nvme17: Reconnecting in 10 seconds...
nvme nvme17: creating 80 I/O queues.
nvme nvme17: Successfully reconnected (3 attempt)
Call Trace:
__schedule+0x2d7/0x8f0
schedule+0x3c/0xa0
blk_queue_enter+0x106/0x1f0
? wait_woken+0x80/0x80
submit_bio_noacct+0x116/0x4b0
? submit_bio+0x4b/0x1a0
submit_bio+0x4b/0x1a0
__blkdev_direct_IO_simple+0x20c/0x350
? blkdev_iopoll+0x30/0x30
? blkdev_direct_IO+0x4a2/0x520
blkdev_direct_IO+0x4a2/0x520
? asm_sysvec_apic_timer_interrupt+0x12/0x20
? generic_file_read_iter+0x84/0x140
? __blkdev_direct_IO_simple+0x350/0x350
generic_file_read_iter+0x84/0x140
blkdev_read_iter+0x41/0x50
new_sync_read+0x118/0x1a0
vfs_read+0x15a/0x180
ksys_pread64+0x71/0x90
do_syscall_64+0x3c/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
I think all transport handle the unfreezing incorrectly in the
recovering path. At least for TCP and FC I could test this. I don't have
and RDMA setup but this code looks suspiciously the same.. I think the
nvme_unfreeze() needs to be called always not just in the case where the
number of queues change.