I've added Christophs' RBs and refactored the last patch as suggested. IMO, it
makes it more readable, so a win :)
Also I'd run some more tests, e.g. with KASAN dissabled to see if we hit some
other races but all looks good. There are still a bunch of fallouts (e.g. my
test systems seems to be broken for the AUTH tests). But as said, it doesn't
hang or produces any crahses anymore.
nvme/002 (create many subsystems and test discovery) [not run]
nvme_trtype=fc is not supported in this test
nvme/003 (test if we're sending keep-alives to a discovery controller)
runtime 11.470s ...
WARNING: Test did not clean up fc device: nvme0
nvme/003 (test if we're sending keep-alives to a discovery controller) [passed]
runtime 11.470s ... 11.399s
nvme/004 (test nvme and nvmet UUID NS descriptors)
runtime 0.580s ...
WARNING: Test did not clean up fc device: nvme0
nvme/004 (test nvme and nvmet UUID NS descriptors) [passed]
runtime 0.580s ... 0.467s
nvme/005 (reset local loopback target)
runtime 0.653s ...
WARNING: Test did not clean up fc device: nvme0
nvme/005 (reset local loopback target) [passed]
runtime 0.653s ... 0.574s
nvme/006 (create an NVMeOF target with a block device-backed ns)
runtime 0.187s ...
WARNING: Test did not clean up fc device: nvme0
nvme/006 (create an NVMeOF target with a block device-backed ns) [passed]
runtime 0.187s ... 0.133s
nvme/007 (create an NVMeOF target with a file-backed ns) [passed]
runtime 0.146s ... 0.089s
nvme/008 (create an NVMeOF host with a block device-backed ns) [passed]
runtime 0.576s ... 0.429s
nvme/009 (create an NVMeOF host with a file-backed ns) [passed]
runtime 0.515s ... 0.411s
nvme/010 (run data verification fio job on NVMeOF block device-backed ns)
runtime 8.543s ...
WARNING: Test did not clean up fc device: nvme0
nvme/010 (run data verification fio job on NVMeOF block device-backed ns) [passed]
runtime 8.543s ... 4.541s
nvme/011 (run data verification fio job on NVMeOF file-backed ns)
runtime 26.668s ...
WARNING: Test did not clean up fc device: nvme0
nvme/011 (run data verification fio job on NVMeOF file-backed ns) [passed]
runtime 26.668s ... 8.701s
nvme/012 (run mkfs and data verification fio job on NVMeOF block device-backed ns)
runtime 16.517s ...
WARNING: Test did not clean up fc device: nvme0
nvme/012 (run mkfs and data verification fio job on NVMeOF block device-backed ns) [passed]
runtime 16.517s ... 5.739s
nvme/013 (run mkfs and data verification fio job on NVMeOF file-backed ns)
runtime 28.152s ...
WARNING: Test did not clean up fc device: nvme0
nvme/013 (run mkfs and data verification fio job on NVMeOF file-backed ns) [passed]
runtime 28.152s ... 8.771s
nvme/014 (flush a NVMeOF block device-backed ns)
runtime 3.492s ...
WARNING: Test did not clean up fc device: nvme0
nvme/014 (flush a NVMeOF block device-backed ns) [passed]
runtime 3.492s ... 2.401s
nvme/015 (unit test for NVMe flush for file backed ns)
runtime 2.952s ...
WARNING: Test did not clean up fc device: nvme0
nvme/015 (unit test for NVMe flush for file backed ns) [passed]
runtime 2.952s ... 1.962s
nvme/016 (create/delete many NVMeOF block device-backed ns and test discovery) [not run]
nvme_trtype=fc is not supported in this test
nvme/017 (create/delete many file-ns and test discovery) [not run]
nvme_trtype=fc is not supported in this test
nvme/018 (unit test NVMe-oF out of range access on a file backend)
runtime 0.607s ...
WARNING: Test did not clean up fc device: nvme0
nvme/018 (unit test NVMe-oF out of range access on a file backend) [passed]
runtime 0.607s ... 0.468s
nvme/019 (test NVMe DSM Discard command on NVMeOF block-device ns)
runtime 0.613s ...
WARNING: Test did not clean up fc device: nvme0
nvme/019 (test NVMe DSM Discard command on NVMeOF block-device ns) [passed]
runtime 0.613s ... 0.534s
nvme/020 (test NVMe DSM Discard command on NVMeOF file-backed ns)
runtime 0.583s ...
WARNING: Test did not clean up fc device: nvme0
nvme/020 (test NVMe DSM Discard command on NVMeOF file-backed ns) [passed]
runtime 0.583s ... 0.495s
nvme/021 (test NVMe list command on NVMeOF file-backed ns)
runtime 0.559s ...
WARNING: Test did not clean up fc device: nvme0
nvme/021 (test NVMe list command on NVMeOF file-backed ns) [passed]
runtime 0.559s ... 0.486s
nvme/022 (test NVMe reset command on NVMeOF file-backed ns)
runtime 0.726s ...
WARNING: Test did not clean up fc device: nvme0
nvme/022 (test NVMe reset command on NVMeOF file-backed ns) [passed]
runtime 0.726s ... 0.618s
nvme/023 (test NVMe smart-log command on NVMeOF block-device ns)
runtime 0.637s ...
WARNING: Test did not clean up fc device: nvme0
nvme/023 (test NVMe smart-log command on NVMeOF block-device ns) [passed]
runtime 0.637s ... 0.504s
nvme/024 (test NVMe smart-log command on NVMeOF file-backed ns)
runtime 0.635s ...
WARNING: Test did not clean up fc device: nvme0
nvme/024 (test NVMe smart-log command on NVMeOF file-backed ns) [passed]
runtime 0.635s ... 0.490s
nvme/025 (test NVMe effects-log command on NVMeOF file-backed ns)
runtime 0.581s ...
WARNING: Test did not clean up fc device: nvme0
nvme/025 (test NVMe effects-log command on NVMeOF file-backed ns) [passed]
runtime 0.581s ... 0.479s
nvme/026 (test NVMe ns-descs command on NVMeOF file-backed ns)
runtime 0.576s ...
WARNING: Test did not clean up fc device: nvme0
nvme/026 (test NVMe ns-descs command on NVMeOF file-backed ns) [passed]
runtime 0.576s ... 0.463s
nvme/027 (test NVMe ns-rescan command on NVMeOF file-backed ns)
runtime 0.663s ...
WARNING: Test did not clean up fc device: nvme0
nvme/027 (test NVMe ns-rescan command on NVMeOF file-backed ns) [passed]
runtime 0.663s ... 0.492s
nvme/028 (test NVMe list-subsys command on NVMeOF file-backed ns)
runtime 0.577s ...
WARNING: Test did not clean up fc device: nvme0
nvme/028 (test NVMe list-subsys command on NVMeOF file-backed ns) [passed]
runtime 0.577s ... 0.501s
nvme/029 (test userspace IO via nvme-cli read/write interface)
runtime 0.885s ...
WARNING: Test did not clean up fc device: nvme0
nvme/029 (test userspace IO via nvme-cli read/write interface) [passed]
runtime 0.885s ... 0.653s
nvme/030 (ensure the discovery generation counter is updated appropriately)
nvme/030 (ensure the discovery generation counter is updated appropriately) [failed]
runtime 0.677s ... 0.505sc device: nvme0
--- tests/nvme/030.out 2023-08-30 10:39:08.428409596 +0200
+++ /home/wagi/work/blktests/results/nodev/nvme/030.out.bad 2024-01-31 09:40:26.552302587 +0100
@@ -1,2 +1,7 @@
Running nvme/030
+warning: use generated hostid instead of hostid file
+warning: use generated hostid instead of hostid file
+warning: use generated hostid instead of hostid file
+warning: use generated hostid instead of hostid file
+warning: use generated hostid instead of hostid file
Test complete
nvme/031 (test deletion of NVMeOF controllers immediately after setup)
runtime 3.622s ...
WARNING: Test did not clean up fc device: nvme0
nvme/031 (test deletion of NVMeOF controllers immediately after setup) [passed]
runtime 3.622s ... 2.907s
nvme/038 (test deletion of NVMeOF subsystem without enabling) [passed]
runtime 0.085s ... 0.036s
nvme/040 (test nvme fabrics controller reset/disconnect operation during I/O)
runtime 6.757s ...
WARNING: Test did not clean up fc device: nvme0
nvme/040 (test nvme fabrics controller reset/disconnect operation during I/O) [passed]
runtime 6.757s ... 6.555s
/usr/bin/zgrep: line 271: /usr/bin/gzip: Permission denied
/usr/bin/zgrep: line 286: /usr/bin/grep: Permission denied
/usr/bin/zgrep: line 271: /usr/bin/gzip: Permission denied
/usr/bin/zgrep: line 286: /usr/bin/grep: Permission denied
/usr/bin/zgrep: line 271: /usr/bin/gzip: Permission denied
/usr/bin/zgrep: line 286: /usr/bin/grep: Permission denied
/usr/bin/zgrep: line 271: /usr/bin/gzip: Permission denied
/usr/bin/zgrep: line 286: /usr/bin/grep: Permission denied
nvme/041 (Create authenticated connections) [not run]
kernel option NVME_AUTH has not been enabled
kernel option NVME_TARGET_AUTH has not been enabled
/usr/bin/zgrep: line 271: /usr/bin/gzip: Permission denied
/usr/bin/zgrep: line 286: /usr/bin/grep: Permission denied
/usr/bin/zgrep: line 286: /usr/bin/grep: Permission denied
/usr/bin/zgrep: line 271: /usr/bin/gzip: Permission denied
/usr/bin/zgrep: line 271: /usr/bin/gzip: Permission denied
/usr/bin/zgrep: line 286: /usr/bin/grep: Permission denied
/usr/bin/zgrep: line 271: /usr/bin/gzip: Permission denied
/usr/bin/zgrep: line 286: /usr/bin/grep: Permission denied
nvme/042 (Test dhchap key types for authenticated connections) [not run]
kernel option NVME_AUTH has not been enabled
kernel option NVME_TARGET_AUTH has not been enabled
/usr/bin/zgrep: line 271: /usr/bin/gzip: Permission denied
/usr/bin/zgrep: line 286: /usr/bin/grep: Permission denied
/usr/bin/zgrep: line 271: /usr/bin/gzip: Permission denied
/usr/bin/zgrep: line 286: /usr/bin/grep: Permission denied
/usr/bin/zgrep: line 286: /usr/bin/grep: Permission denied
/usr/bin/zgrep: line 271: /usr/bin/gzip: Permission denied
/usr/bin/zgrep: line 271: /usr/bin/gzip: Permission denied
/usr/bin/zgrep: line 286: /usr/bin/grep: Permission denied
nvme/043 (Test hash and DH group variations for authenticated connections) [not run]
kernel option NVME_AUTH has not been enabled
kernel option NVME_TARGET_AUTH has not been enabled
/usr/bin/zgrep: line 271: /usr/bin/gzip: Permission denied
/usr/bin/zgrep: line 286: /usr/bin/grep: Permission denied
/usr/bin/zgrep: line 271: /usr/bin/gzip: Permission denied
/usr/bin/zgrep: line 286: /usr/bin/grep: Permission denied
/usr/bin/zgrep: line 271: /usr/bin/gzip: Permission denied
/usr/bin/zgrep: line 286: /usr/bin/grep: Permission denied
/usr/bin/zgrep: line 271: /usr/bin/gzip: Permission denied
/usr/bin/zgrep: line 286: /usr/bin/grep: Permission denied
nvme/044 (Test bi-directional authentication) [not run]
kernel option NVME_AUTH has not been enabled
kernel option NVME_TARGET_AUTH has not been enabled
/usr/bin/zgrep: line 271: /usr/bin/gzip: Permission denied
/usr/bin/zgrep: line 286: /usr/bin/grep: Permission denied
/usr/bin/zgrep: line 271: /usr/bin/gzip: Permission denied
/usr/bin/zgrep: line 286: /usr/bin/grep: Permission denied
/usr/bin/zgrep: line 271: /usr/bin/gzip: Permission denied
/usr/bin/zgrep: line 286: /usr/bin/grep: Permission denied
/usr/bin/zgrep: line 286: /usr/bin/grep: Permission denied
/usr/bin/zgrep: line 271: /usr/bin/gzip: Permission denied
nvme/045 (Test re-authentication) [not run]
kernel option NVME_AUTH has not been enabled
kernel option NVME_TARGET_AUTH has not been enabled
nvme/047 (test different queue types for fabric transports) [not run]
nvme_trtype=fc is not supported in this test
nvme/048 (Test queue count changes on reconnect) [failed]
runtime 0.549s ... 0.378s
--- tests/nvme/048.out 2023-11-28 12:59:52.711505550 +0100
+++ /home/wagi/work/blktests/results/nodev/nvme/048.out.bad 2024-01-31 09:40:47.502748168 +0100
@@ -1,3 +1,7 @@
Running nvme/048
+expected queue count 2 not set
+FAIL
+expected queue count 3 not set
+FAIL
disconnected 1 controller(s)
Test complete
changes:
v5:
- collected RBs
- refactored 'nvmet-fc: use RCU list iterator for assoc_list'
v4:
- dropped patches which got applied
- dropped 'nvmet-fc: free hostport after release reference to tgtport'
- reworked commit message of 'nvmet-fc: untangle cross refcounting objects'
and renamed it to 'nvmet-fc: defer cleanup using RCU properly'
- added 'nvmet-fcloop: swap the list_add_tail arguments'
and 'nvmet-fc: use RCU list iterator for assoc_list'
- added RBs
- https://lore.kernel.org/linux-nvme/[email protected]/
v3:
- collected all patches into one series
- updated ref counting in nvmet-fc
- https://lore.kernel.org/linux-nvme/[email protected]/
v2:
- added RBs
- added new patches
- https://lore.kernel.org/linux-nvme/[email protected]/
v1:
- https://lore.kernel.org/linux-nvme/[email protected]/
*** BLURB HERE ***
Daniel Wagner (12):
nvme-fc: do not wait in vain when unloading module
nvmet-fcloop: swap the list_add_tail arguments
nvmet-fc: release reference on target port
nvmet-fc: defer cleanup using RCU properly
nvmet-fc: free queue and assoc directly
nvmet-fc: hold reference on hostport match
nvmet-fc: remove null hostport pointer check
nvmet-fc: do not tack refs on tgtports from assoc
nvmet-fc: abort command when there is no binding
nvmet-fc: avoid deadlock on delete association path
nvmet-fc: take ref count on tgtport before delete assoc
nvmet-fc: use RCU list iterator for assoc_list
drivers/nvme/host/fc.c | 47 ++-------
drivers/nvme/target/fc.c | 185 +++++++++++++++++++----------------
drivers/nvme/target/fcloop.c | 6 +-
3 files changed, 111 insertions(+), 127 deletions(-)
--
2.43.0
Neither struct nvmet_fc_tgt_queue nor struct nvmet_fc_tgt_assoc are data
structure which are used in a RCU context. So there is no reason to
delay the free operation.
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/nvme/target/fc.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index 11ecfef41bd1..b44b99525c44 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -145,7 +145,6 @@ struct nvmet_fc_tgt_queue {
struct list_head avail_defer_list;
struct workqueue_struct *work_q;
struct kref ref;
- struct rcu_head rcu;
/* array of fcp_iods */
struct nvmet_fc_fcp_iod fod[] __counted_by(sqsize);
} __aligned(sizeof(unsigned long long));
@@ -169,7 +168,6 @@ struct nvmet_fc_tgt_assoc {
struct nvmet_fc_tgt_queue *queues[NVMET_NR_QUEUES + 1];
struct kref ref;
struct work_struct del_work;
- struct rcu_head rcu;
};
@@ -852,7 +850,7 @@ nvmet_fc_tgt_queue_free(struct kref *ref)
destroy_workqueue(queue->work_q);
- kfree_rcu(queue, rcu);
+ kfree(queue);
}
static void
@@ -1185,8 +1183,8 @@ nvmet_fc_target_assoc_free(struct kref *ref)
dev_info(tgtport->dev,
"{%d:%d} Association freed\n",
tgtport->fc_target_port.port_num, assoc->a_id);
- kfree_rcu(assoc, rcu);
nvmet_fc_tgtport_put(tgtport);
+ kfree(assoc);
}
static void
--
2.43.0
An association has always a valid hostport pointer. Remove useless
null pointer check.
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/nvme/target/fc.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index 205a12b1e841..849d9b17c60a 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -488,8 +488,7 @@ nvmet_fc_xmt_disconnect_assoc(struct nvmet_fc_tgt_assoc *assoc)
* message is normal. Otherwise, send unless the hostport has
* already been invalidated by the lldd.
*/
- if (!tgtport->ops->ls_req || !assoc->hostport ||
- assoc->hostport->invalid)
+ if (!tgtport->ops->ls_req || assoc->hostport->invalid)
return;
lsop = kzalloc((sizeof(*lsop) +
@@ -1524,8 +1523,7 @@ nvmet_fc_invalidate_host(struct nvmet_fc_target_port *target_port,
spin_lock_irqsave(&tgtport->lock, flags);
list_for_each_entry_safe(assoc, next,
&tgtport->assoc_list, a_list) {
- if (!assoc->hostport ||
- assoc->hostport->hosthandle != hosthandle)
+ if (assoc->hostport->hosthandle != hosthandle)
continue;
if (!nvmet_fc_tgt_a_get(assoc))
continue;
--
2.43.0
The assoc_list is a RCU protected list, thus use the RCU flavor of list
functions.
Let's use this opportunity and refactor this code and move the lookup
into a helper and give it a descriptive name.
Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/nvme/target/fc.c | 34 ++++++++++++++++++++++------------
1 file changed, 22 insertions(+), 12 deletions(-)
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index 671d096745a5..fd229f310c93 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -1114,14 +1114,27 @@ nvmet_fc_schedule_delete_assoc(struct nvmet_fc_tgt_assoc *assoc)
queue_work(nvmet_wq, &assoc->del_work);
}
+static bool
+nvmet_fc_assoc_exits(struct nvmet_fc_tgtport *tgtport, u64 association_id)
+{
+ struct nvmet_fc_tgt_assoc *a;
+
+ list_for_each_entry_rcu(a, &tgtport->assoc_list, a_list) {
+ if (association_id == a->association_id)
+ return true;
+ }
+
+ return false;
+}
+
static struct nvmet_fc_tgt_assoc *
nvmet_fc_alloc_target_assoc(struct nvmet_fc_tgtport *tgtport, void *hosthandle)
{
- struct nvmet_fc_tgt_assoc *assoc, *tmpassoc;
+ struct nvmet_fc_tgt_assoc *assoc;
unsigned long flags;
+ bool done;
u64 ran;
int idx;
- bool needrandom = true;
if (!tgtport->pe)
return NULL;
@@ -1145,24 +1158,21 @@ nvmet_fc_alloc_target_assoc(struct nvmet_fc_tgtport *tgtport, void *hosthandle)
INIT_WORK(&assoc->del_work, nvmet_fc_delete_assoc_work);
atomic_set(&assoc->terminating, 0);
- while (needrandom) {
+ done = false;
+ do {
get_random_bytes(&ran, sizeof(ran) - BYTES_FOR_QID);
ran = ran << BYTES_FOR_QID_SHIFT;
spin_lock_irqsave(&tgtport->lock, flags);
- needrandom = false;
- list_for_each_entry(tmpassoc, &tgtport->assoc_list, a_list) {
- if (ran == tmpassoc->association_id) {
- needrandom = true;
- break;
- }
- }
- if (!needrandom) {
+ rcu_read_lock();
+ if (!nvmet_fc_assoc_exits(tgtport, ran)) {
assoc->association_id = ran;
list_add_tail_rcu(&assoc->a_list, &tgtport->assoc_list);
+ done = true;
}
+ rcu_read_unlock();
spin_unlock_irqrestore(&tgtport->lock, flags);
- }
+ } while (!done);
return assoc;
--
2.43.0
The hostport data structure is shared between the association, this why
we keep track of the users via a refcount. So we should not decrement
the refcount on a match and free the hostport several times.
Reported by KASAN.
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/nvme/target/fc.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index b44b99525c44..205a12b1e841 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -1068,8 +1068,6 @@ nvmet_fc_alloc_hostport(struct nvmet_fc_tgtport *tgtport, void *hosthandle)
/* new allocation not needed */
kfree(newhost);
newhost = match;
- /* no new allocation - release reference */
- nvmet_fc_tgtport_put(tgtport);
} else {
newhost->tgtport = tgtport;
newhost->hosthandle = hosthandle;
--
2.43.0
The first argument of list_add_tail function is the new element which
should be added to the list which is the second argument. Swap the
arguments to allow processing more than one element at a time.
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/nvme/target/fcloop.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c
index c1faeb1e9e55..1471af250ea6 100644
--- a/drivers/nvme/target/fcloop.c
+++ b/drivers/nvme/target/fcloop.c
@@ -358,7 +358,7 @@ fcloop_h2t_ls_req(struct nvme_fc_local_port *localport,
if (!rport->targetport) {
tls_req->status = -ECONNREFUSED;
spin_lock(&rport->lock);
- list_add_tail(&rport->ls_list, &tls_req->ls_list);
+ list_add_tail(&tls_req->ls_list, &rport->ls_list);
spin_unlock(&rport->lock);
queue_work(nvmet_wq, &rport->ls_work);
return ret;
@@ -391,7 +391,7 @@ fcloop_h2t_xmt_ls_rsp(struct nvmet_fc_target_port *targetport,
if (remoteport) {
rport = remoteport->private;
spin_lock(&rport->lock);
- list_add_tail(&rport->ls_list, &tls_req->ls_list);
+ list_add_tail(&tls_req->ls_list, &rport->ls_list);
spin_unlock(&rport->lock);
queue_work(nvmet_wq, &rport->ls_work);
}
@@ -446,7 +446,7 @@ fcloop_t2h_ls_req(struct nvmet_fc_target_port *targetport, void *hosthandle,
if (!tport->remoteport) {
tls_req->status = -ECONNREFUSED;
spin_lock(&tport->lock);
- list_add_tail(&tport->ls_list, &tls_req->ls_list);
+ list_add_tail(&tls_req->ls_list, &tport->ls_list);
spin_unlock(&tport->lock);
queue_work(nvmet_wq, &tport->ls_work);
return ret;
--
2.43.0
When the target port has not active port binding, there is no point in
trying to process the command as it has to fail anyway. Instead adding
checks to all commands abort the command early.
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/nvme/target/fc.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index fe3246024836..c80b8a066fd1 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -1099,6 +1099,9 @@ nvmet_fc_alloc_target_assoc(struct nvmet_fc_tgtport *tgtport, void *hosthandle)
int idx;
bool needrandom = true;
+ if (!tgtport->pe)
+ return NULL;
+
assoc = kzalloc(sizeof(*assoc), GFP_KERNEL);
if (!assoc)
return NULL;
@@ -2514,8 +2517,9 @@ nvmet_fc_handle_fcp_rqst(struct nvmet_fc_tgtport *tgtport,
fod->req.cmd = &fod->cmdiubuf.sqe;
fod->req.cqe = &fod->rspiubuf.cqe;
- if (tgtport->pe)
- fod->req.port = tgtport->pe->port;
+ if (!tgtport->pe)
+ goto transport_error;
+ fod->req.port = tgtport->pe->port;
/* clear any response payload */
memset(&fod->rspiubuf, 0, sizeof(fod->rspiubuf));
--
2.43.0
In case we return early out of __nvmet_fc_finish_ls_req() we still have
to release the reference on the target port.
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/nvme/target/fc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index 05e6a755b330..3d53d9dc1099 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -360,7 +360,7 @@ __nvmet_fc_finish_ls_req(struct nvmet_fc_ls_req_op *lsop)
if (!lsop->req_queued) {
spin_unlock_irqrestore(&tgtport->lock, flags);
- return;
+ goto out_puttgtport;
}
list_del(&lsop->lsreq_list);
@@ -373,6 +373,7 @@ __nvmet_fc_finish_ls_req(struct nvmet_fc_ls_req_op *lsop)
(lsreq->rqstlen + lsreq->rsplen),
DMA_BIDIRECTIONAL);
+out_puttgtport:
nvmet_fc_tgtport_put(tgtport);
}
--
2.43.0
We have to ensure that the tgtport is not going away
before be have remove all the associations.
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/nvme/target/fc.c | 31 +++++++++++++++++++++++--------
1 file changed, 23 insertions(+), 8 deletions(-)
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index 3e0d391e631b..671d096745a5 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -1090,13 +1090,28 @@ nvmet_fc_alloc_hostport(struct nvmet_fc_tgtport *tgtport, void *hosthandle)
}
static void
-nvmet_fc_delete_assoc(struct work_struct *work)
+nvmet_fc_delete_assoc(struct nvmet_fc_tgt_assoc *assoc)
+{
+ nvmet_fc_delete_target_assoc(assoc);
+ nvmet_fc_tgt_a_put(assoc);
+}
+
+static void
+nvmet_fc_delete_assoc_work(struct work_struct *work)
{
struct nvmet_fc_tgt_assoc *assoc =
container_of(work, struct nvmet_fc_tgt_assoc, del_work);
+ struct nvmet_fc_tgtport *tgtport = assoc->tgtport;
- nvmet_fc_delete_target_assoc(assoc);
- nvmet_fc_tgt_a_put(assoc);
+ nvmet_fc_delete_assoc(assoc);
+ nvmet_fc_tgtport_put(tgtport);
+}
+
+static void
+nvmet_fc_schedule_delete_assoc(struct nvmet_fc_tgt_assoc *assoc)
+{
+ nvmet_fc_tgtport_get(assoc->tgtport);
+ queue_work(nvmet_wq, &assoc->del_work);
}
static struct nvmet_fc_tgt_assoc *
@@ -1127,7 +1142,7 @@ nvmet_fc_alloc_target_assoc(struct nvmet_fc_tgtport *tgtport, void *hosthandle)
assoc->a_id = idx;
INIT_LIST_HEAD(&assoc->a_list);
kref_init(&assoc->ref);
- INIT_WORK(&assoc->del_work, nvmet_fc_delete_assoc);
+ INIT_WORK(&assoc->del_work, nvmet_fc_delete_assoc_work);
atomic_set(&assoc->terminating, 0);
while (needrandom) {
@@ -1483,7 +1498,7 @@ __nvmet_fc_free_assocs(struct nvmet_fc_tgtport *tgtport)
list_for_each_entry_rcu(assoc, &tgtport->assoc_list, a_list) {
if (!nvmet_fc_tgt_a_get(assoc))
continue;
- queue_work(nvmet_wq, &assoc->del_work);
+ nvmet_fc_schedule_delete_assoc(assoc);
nvmet_fc_tgt_a_put(assoc);
}
rcu_read_unlock();
@@ -1536,7 +1551,7 @@ nvmet_fc_invalidate_host(struct nvmet_fc_target_port *target_port,
continue;
assoc->hostport->invalid = 1;
noassoc = false;
- queue_work(nvmet_wq, &assoc->del_work);
+ nvmet_fc_schedule_delete_assoc(assoc);
nvmet_fc_tgt_a_put(assoc);
}
spin_unlock_irqrestore(&tgtport->lock, flags);
@@ -1581,7 +1596,7 @@ nvmet_fc_delete_ctrl(struct nvmet_ctrl *ctrl)
nvmet_fc_tgtport_put(tgtport);
if (found_ctrl) {
- queue_work(nvmet_wq, &assoc->del_work);
+ nvmet_fc_schedule_delete_assoc(assoc);
nvmet_fc_tgt_a_put(assoc);
return;
}
@@ -1888,7 +1903,7 @@ nvmet_fc_ls_disconnect(struct nvmet_fc_tgtport *tgtport,
nvmet_fc_xmt_ls_rsp(tgtport, oldls);
}
- queue_work(nvmet_wq, &assoc->del_work);
+ nvmet_fc_schedule_delete_assoc(assoc);
nvmet_fc_tgt_a_put(assoc);
return false;
--
2.43.0
The module exit path has race between deleting all controllers and
freeing 'left over IDs'. To prevent double free a synchronization
between nvme_delete_ctrl and ida_destroy has been added by the initial
commit.
There is some logic around trying to prevent from hanging forever in
wait_for_completion, though it does not handling all cases. E.g.
blktests is able to reproduce the situation where the module unload
hangs forever.
If we completely rely on the cleanup code executed from the
nvme_delete_ctrl path, all IDs will be freed eventually. This makes
calling ida_destroy unnecessary. We only have to ensure that all
nvme_delete_ctrl code has been executed before we leave
nvme_fc_exit_module. This is done by flushing the nvme_delete_wq
workqueue.
While at it, remove the unused nvme_fc_wq workqueue too.
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/nvme/host/fc.c | 47 ++++++------------------------------------
1 file changed, 6 insertions(+), 41 deletions(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index e2308119f8f0..7006f4caac2f 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -221,11 +221,6 @@ static LIST_HEAD(nvme_fc_lport_list);
static DEFINE_IDA(nvme_fc_local_port_cnt);
static DEFINE_IDA(nvme_fc_ctrl_cnt);
-static struct workqueue_struct *nvme_fc_wq;
-
-static bool nvme_fc_waiting_to_unload;
-static DECLARE_COMPLETION(nvme_fc_unload_proceed);
-
/*
* These items are short-term. They will eventually be moved into
* a generic FC class. See comments in module init.
@@ -255,8 +250,6 @@ nvme_fc_free_lport(struct kref *ref)
/* remove from transport list */
spin_lock_irqsave(&nvme_fc_lock, flags);
list_del(&lport->port_list);
- if (nvme_fc_waiting_to_unload && list_empty(&nvme_fc_lport_list))
- complete(&nvme_fc_unload_proceed);
spin_unlock_irqrestore(&nvme_fc_lock, flags);
ida_free(&nvme_fc_local_port_cnt, lport->localport.port_num);
@@ -3894,10 +3887,6 @@ static int __init nvme_fc_init_module(void)
{
int ret;
- nvme_fc_wq = alloc_workqueue("nvme_fc_wq", WQ_MEM_RECLAIM, 0);
- if (!nvme_fc_wq)
- return -ENOMEM;
-
/*
* NOTE:
* It is expected that in the future the kernel will combine
@@ -3915,7 +3904,7 @@ static int __init nvme_fc_init_module(void)
ret = class_register(&fc_class);
if (ret) {
pr_err("couldn't register class fc\n");
- goto out_destroy_wq;
+ return ret;
}
/*
@@ -3939,8 +3928,6 @@ static int __init nvme_fc_init_module(void)
device_destroy(&fc_class, MKDEV(0, 0));
out_destroy_class:
class_unregister(&fc_class);
-out_destroy_wq:
- destroy_workqueue(nvme_fc_wq);
return ret;
}
@@ -3960,45 +3947,23 @@ nvme_fc_delete_controllers(struct nvme_fc_rport *rport)
spin_unlock(&rport->lock);
}
-static void
-nvme_fc_cleanup_for_unload(void)
+static void __exit nvme_fc_exit_module(void)
{
struct nvme_fc_lport *lport;
struct nvme_fc_rport *rport;
-
- list_for_each_entry(lport, &nvme_fc_lport_list, port_list) {
- list_for_each_entry(rport, &lport->endp_list, endp_list) {
- nvme_fc_delete_controllers(rport);
- }
- }
-}
-
-static void __exit nvme_fc_exit_module(void)
-{
unsigned long flags;
- bool need_cleanup = false;
spin_lock_irqsave(&nvme_fc_lock, flags);
- nvme_fc_waiting_to_unload = true;
- if (!list_empty(&nvme_fc_lport_list)) {
- need_cleanup = true;
- nvme_fc_cleanup_for_unload();
- }
+ list_for_each_entry(lport, &nvme_fc_lport_list, port_list)
+ list_for_each_entry(rport, &lport->endp_list, endp_list)
+ nvme_fc_delete_controllers(rport);
spin_unlock_irqrestore(&nvme_fc_lock, flags);
- if (need_cleanup) {
- pr_info("%s: waiting for ctlr deletes\n", __func__);
- wait_for_completion(&nvme_fc_unload_proceed);
- pr_info("%s: ctrl deletes complete\n", __func__);
- }
+ flush_workqueue(nvme_delete_wq);
nvmf_unregister_transport(&nvme_fc_transport);
- ida_destroy(&nvme_fc_local_port_cnt);
- ida_destroy(&nvme_fc_ctrl_cnt);
-
device_destroy(&fc_class, MKDEV(0, 0));
class_unregister(&fc_class);
- destroy_workqueue(nvme_fc_wq);
}
module_init(nvme_fc_init_module);
--
2.43.0
When the target executes a disconnect and the host triggers a reconnect
immediately, the reconnect command still finds an existing association.
The reconnect crashes later on because nvmet_fc_delete_target_assoc
blindly removes resources while the reconnect code wants to use it.
To address this, nvmet_fc_find_target_assoc should not be able to
lookup an association which is being removed. The association list
is already under RCU lifetime management, so let's properly use it
and remove the association from the list and wait for a grace period
before cleaning up all. This means we also can drop the RCU management
on the queues, because this is now handled via the association itself.
A second step split the execution context so that the initial disconnect
command can complete without running the reconnect code in the same
context. As usual, this is done by deferring the ->done to a workqueue.
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/nvme/target/fc.c | 83 ++++++++++++++++++----------------------
1 file changed, 37 insertions(+), 46 deletions(-)
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index 3d53d9dc1099..11ecfef41bd1 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -166,7 +166,7 @@ struct nvmet_fc_tgt_assoc {
struct nvmet_fc_hostport *hostport;
struct nvmet_fc_ls_iod *rcv_disconn;
struct list_head a_list;
- struct nvmet_fc_tgt_queue __rcu *queues[NVMET_NR_QUEUES + 1];
+ struct nvmet_fc_tgt_queue *queues[NVMET_NR_QUEUES + 1];
struct kref ref;
struct work_struct del_work;
struct rcu_head rcu;
@@ -803,14 +803,11 @@ nvmet_fc_alloc_target_queue(struct nvmet_fc_tgt_assoc *assoc,
if (!queue)
return NULL;
- if (!nvmet_fc_tgt_a_get(assoc))
- goto out_free_queue;
-
queue->work_q = alloc_workqueue("ntfc%d.%d.%d", 0, 0,
assoc->tgtport->fc_target_port.port_num,
assoc->a_id, qid);
if (!queue->work_q)
- goto out_a_put;
+ goto out_free_queue;
queue->qid = qid;
queue->sqsize = sqsize;
@@ -832,15 +829,13 @@ nvmet_fc_alloc_target_queue(struct nvmet_fc_tgt_assoc *assoc,
goto out_fail_iodlist;
WARN_ON(assoc->queues[qid]);
- rcu_assign_pointer(assoc->queues[qid], queue);
+ assoc->queues[qid] = queue;
return queue;
out_fail_iodlist:
nvmet_fc_destroy_fcp_iodlist(assoc->tgtport, queue);
destroy_workqueue(queue->work_q);
-out_a_put:
- nvmet_fc_tgt_a_put(assoc);
out_free_queue:
kfree(queue);
return NULL;
@@ -853,12 +848,8 @@ nvmet_fc_tgt_queue_free(struct kref *ref)
struct nvmet_fc_tgt_queue *queue =
container_of(ref, struct nvmet_fc_tgt_queue, ref);
- rcu_assign_pointer(queue->assoc->queues[queue->qid], NULL);
-
nvmet_fc_destroy_fcp_iodlist(queue->assoc->tgtport, queue);
- nvmet_fc_tgt_a_put(queue->assoc);
-
destroy_workqueue(queue->work_q);
kfree_rcu(queue, rcu);
@@ -970,7 +961,7 @@ nvmet_fc_find_target_queue(struct nvmet_fc_tgtport *tgtport,
rcu_read_lock();
list_for_each_entry_rcu(assoc, &tgtport->assoc_list, a_list) {
if (association_id == assoc->association_id) {
- queue = rcu_dereference(assoc->queues[qid]);
+ queue = assoc->queues[qid];
if (queue &&
(!atomic_read(&queue->connected) ||
!nvmet_fc_tgt_q_get(queue)))
@@ -1173,13 +1164,18 @@ nvmet_fc_target_assoc_free(struct kref *ref)
struct nvmet_fc_tgtport *tgtport = assoc->tgtport;
struct nvmet_fc_ls_iod *oldls;
unsigned long flags;
+ int i;
+
+ for (i = NVMET_NR_QUEUES; i >= 0; i--) {
+ if (assoc->queues[i])
+ nvmet_fc_delete_target_queue(assoc->queues[i]);
+ }
/* Send Disconnect now that all i/o has completed */
nvmet_fc_xmt_disconnect_assoc(assoc);
nvmet_fc_free_hostport(assoc->hostport);
spin_lock_irqsave(&tgtport->lock, flags);
- list_del_rcu(&assoc->a_list);
oldls = assoc->rcv_disconn;
spin_unlock_irqrestore(&tgtport->lock, flags);
/* if pending Rcv Disconnect Association LS, send rsp now */
@@ -1209,7 +1205,7 @@ static void
nvmet_fc_delete_target_assoc(struct nvmet_fc_tgt_assoc *assoc)
{
struct nvmet_fc_tgtport *tgtport = assoc->tgtport;
- struct nvmet_fc_tgt_queue *queue;
+ unsigned long flags;
int i, terminating;
terminating = atomic_xchg(&assoc->terminating, 1);
@@ -1218,29 +1214,21 @@ nvmet_fc_delete_target_assoc(struct nvmet_fc_tgt_assoc *assoc)
if (terminating)
return;
+ spin_lock_irqsave(&tgtport->lock, flags);
+ list_del_rcu(&assoc->a_list);
+ spin_unlock_irqrestore(&tgtport->lock, flags);
- for (i = NVMET_NR_QUEUES; i >= 0; i--) {
- rcu_read_lock();
- queue = rcu_dereference(assoc->queues[i]);
- if (!queue) {
- rcu_read_unlock();
- continue;
- }
+ synchronize_rcu();
- if (!nvmet_fc_tgt_q_get(queue)) {
- rcu_read_unlock();
- continue;
- }
- rcu_read_unlock();
- nvmet_fc_delete_target_queue(queue);
- nvmet_fc_tgt_q_put(queue);
+ /* ensure all in-flight I/Os have been processed */
+ for (i = NVMET_NR_QUEUES; i >= 0; i--) {
+ if (assoc->queues[i])
+ flush_workqueue(assoc->queues[i]->work_q);
}
dev_info(tgtport->dev,
"{%d:%d} Association deleted\n",
tgtport->fc_target_port.port_num, assoc->a_id);
-
- nvmet_fc_tgt_a_put(assoc);
}
static struct nvmet_fc_tgt_assoc *
@@ -1493,9 +1481,8 @@ __nvmet_fc_free_assocs(struct nvmet_fc_tgtport *tgtport)
list_for_each_entry_rcu(assoc, &tgtport->assoc_list, a_list) {
if (!nvmet_fc_tgt_a_get(assoc))
continue;
- if (!queue_work(nvmet_wq, &assoc->del_work))
- /* already deleting - release local reference */
- nvmet_fc_tgt_a_put(assoc);
+ queue_work(nvmet_wq, &assoc->del_work);
+ nvmet_fc_tgt_a_put(assoc);
}
rcu_read_unlock();
}
@@ -1548,9 +1535,8 @@ nvmet_fc_invalidate_host(struct nvmet_fc_target_port *target_port,
continue;
assoc->hostport->invalid = 1;
noassoc = false;
- if (!queue_work(nvmet_wq, &assoc->del_work))
- /* already deleting - release local reference */
- nvmet_fc_tgt_a_put(assoc);
+ queue_work(nvmet_wq, &assoc->del_work);
+ nvmet_fc_tgt_a_put(assoc);
}
spin_unlock_irqrestore(&tgtport->lock, flags);
@@ -1582,7 +1568,7 @@ nvmet_fc_delete_ctrl(struct nvmet_ctrl *ctrl)
rcu_read_lock();
list_for_each_entry_rcu(assoc, &tgtport->assoc_list, a_list) {
- queue = rcu_dereference(assoc->queues[0]);
+ queue = assoc->queues[0];
if (queue && queue->nvme_sq.ctrl == ctrl) {
if (nvmet_fc_tgt_a_get(assoc))
found_ctrl = true;
@@ -1594,9 +1580,8 @@ nvmet_fc_delete_ctrl(struct nvmet_ctrl *ctrl)
nvmet_fc_tgtport_put(tgtport);
if (found_ctrl) {
- if (!queue_work(nvmet_wq, &assoc->del_work))
- /* already deleting - release local reference */
- nvmet_fc_tgt_a_put(assoc);
+ queue_work(nvmet_wq, &assoc->del_work);
+ nvmet_fc_tgt_a_put(assoc);
return;
}
@@ -1626,6 +1611,8 @@ nvmet_fc_unregister_targetport(struct nvmet_fc_target_port *target_port)
/* terminate any outstanding associations */
__nvmet_fc_free_assocs(tgtport);
+ flush_workqueue(nvmet_wq);
+
/*
* should terminate LS's as well. However, LS's will be generated
* at the tail end of association termination, so they likely don't
@@ -1871,9 +1858,6 @@ nvmet_fc_ls_disconnect(struct nvmet_fc_tgtport *tgtport,
sizeof(struct fcnvme_ls_disconnect_assoc_acc)),
FCNVME_LS_DISCONNECT_ASSOC);
- /* release get taken in nvmet_fc_find_target_assoc */
- nvmet_fc_tgt_a_put(assoc);
-
/*
* The rules for LS response says the response cannot
* go back until ABTS's have been sent for all outstanding
@@ -1888,8 +1872,6 @@ nvmet_fc_ls_disconnect(struct nvmet_fc_tgtport *tgtport,
assoc->rcv_disconn = iod;
spin_unlock_irqrestore(&tgtport->lock, flags);
- nvmet_fc_delete_target_assoc(assoc);
-
if (oldls) {
dev_info(tgtport->dev,
"{%d:%d} Multiple Disconnect Association LS's "
@@ -1905,6 +1887,9 @@ nvmet_fc_ls_disconnect(struct nvmet_fc_tgtport *tgtport,
nvmet_fc_xmt_ls_rsp(tgtport, oldls);
}
+ queue_work(nvmet_wq, &assoc->del_work);
+ nvmet_fc_tgt_a_put(assoc);
+
return false;
}
@@ -2903,6 +2888,9 @@ nvmet_fc_remove_port(struct nvmet_port *port)
nvmet_fc_portentry_unbind(pe);
+ /* terminate any outstanding associations */
+ __nvmet_fc_free_assocs(pe->tgtport);
+
kfree(pe);
}
@@ -2934,6 +2922,9 @@ static int __init nvmet_fc_init_module(void)
static void __exit nvmet_fc_exit_module(void)
{
+ /* ensure any shutdown operation, e.g. delete ctrls have finished */
+ flush_workqueue(nvmet_wq);
+
/* sanity check - all lports should be removed */
if (!list_empty(&nvmet_fc_target_list))
pr_warn("%s: targetport list not empty\n", __func__);
--
2.43.0
The association life time is tied to the life time of the target port.
That means we should not take extra a refcount when creating a
association.
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/nvme/target/fc.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index 849d9b17c60a..fe3246024836 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -1107,12 +1107,9 @@ nvmet_fc_alloc_target_assoc(struct nvmet_fc_tgtport *tgtport, void *hosthandle)
if (idx < 0)
goto out_free_assoc;
- if (!nvmet_fc_tgtport_get(tgtport))
- goto out_ida;
-
assoc->hostport = nvmet_fc_alloc_hostport(tgtport, hosthandle);
if (IS_ERR(assoc->hostport))
- goto out_put;
+ goto out_ida;
assoc->tgtport = tgtport;
assoc->a_id = idx;
@@ -1142,8 +1139,6 @@ nvmet_fc_alloc_target_assoc(struct nvmet_fc_tgtport *tgtport, void *hosthandle)
return assoc;
-out_put:
- nvmet_fc_tgtport_put(tgtport);
out_ida:
ida_free(&tgtport->assoc_cnt, idx);
out_free_assoc:
@@ -1180,7 +1175,6 @@ nvmet_fc_target_assoc_free(struct kref *ref)
dev_info(tgtport->dev,
"{%d:%d} Association freed\n",
tgtport->fc_target_port.port_num, assoc->a_id);
- nvmet_fc_tgtport_put(tgtport);
kfree(assoc);
}
--
2.43.0
When deleting an association the shutdown path is deadlocking because we
try to flush the nvmet_wq nested. Avoid this by deadlock by deferring
the put work into its own work item.
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/nvme/target/fc.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index c80b8a066fd1..3e0d391e631b 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -111,6 +111,8 @@ struct nvmet_fc_tgtport {
struct nvmet_fc_port_entry *pe;
struct kref ref;
u32 max_sg_cnt;
+
+ struct work_struct put_work;
};
struct nvmet_fc_port_entry {
@@ -247,6 +249,13 @@ static int nvmet_fc_tgt_a_get(struct nvmet_fc_tgt_assoc *assoc);
static void nvmet_fc_tgt_q_put(struct nvmet_fc_tgt_queue *queue);
static int nvmet_fc_tgt_q_get(struct nvmet_fc_tgt_queue *queue);
static void nvmet_fc_tgtport_put(struct nvmet_fc_tgtport *tgtport);
+static void nvmet_fc_put_tgtport_work(struct work_struct *work)
+{
+ struct nvmet_fc_tgtport *tgtport =
+ container_of(work, struct nvmet_fc_tgtport, put_work);
+
+ nvmet_fc_tgtport_put(tgtport);
+}
static int nvmet_fc_tgtport_get(struct nvmet_fc_tgtport *tgtport);
static void nvmet_fc_handle_fcp_rqst(struct nvmet_fc_tgtport *tgtport,
struct nvmet_fc_fcp_iod *fod);
@@ -358,7 +367,7 @@ __nvmet_fc_finish_ls_req(struct nvmet_fc_ls_req_op *lsop)
if (!lsop->req_queued) {
spin_unlock_irqrestore(&tgtport->lock, flags);
- goto out_puttgtport;
+ goto out_putwork;
}
list_del(&lsop->lsreq_list);
@@ -371,8 +380,8 @@ __nvmet_fc_finish_ls_req(struct nvmet_fc_ls_req_op *lsop)
(lsreq->rqstlen + lsreq->rsplen),
DMA_BIDIRECTIONAL);
-out_puttgtport:
- nvmet_fc_tgtport_put(tgtport);
+out_putwork:
+ queue_work(nvmet_wq, &tgtport->put_work);
}
static int
@@ -1396,6 +1405,7 @@ nvmet_fc_register_targetport(struct nvmet_fc_port_info *pinfo,
kref_init(&newrec->ref);
ida_init(&newrec->assoc_cnt);
newrec->max_sg_cnt = template->max_sgl_segments;
+ INIT_WORK(&newrec->put_work, nvmet_fc_put_tgtport_work);
ret = nvmet_fc_alloc_ls_iodlist(newrec);
if (ret) {
--
2.43.0
Looks good:
Reviewed-by: Christoph Hellwig <[email protected]>
On 1/31/24 09:51, Daniel Wagner wrote:
> The module exit path has race between deleting all controllers and
> freeing 'left over IDs'. To prevent double free a synchronization
> between nvme_delete_ctrl and ida_destroy has been added by the initial
> commit.
>
> There is some logic around trying to prevent from hanging forever in
> wait_for_completion, though it does not handling all cases. E.g.
> blktests is able to reproduce the situation where the module unload
> hangs forever.
>
> If we completely rely on the cleanup code executed from the
> nvme_delete_ctrl path, all IDs will be freed eventually. This makes
> calling ida_destroy unnecessary. We only have to ensure that all
> nvme_delete_ctrl code has been executed before we leave
> nvme_fc_exit_module. This is done by flushing the nvme_delete_wq
> workqueue.
>
> While at it, remove the unused nvme_fc_wq workqueue too.
>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Daniel Wagner <[email protected]>
> ---
> drivers/nvme/host/fc.c | 47 ++++++------------------------------------
> 1 file changed, 6 insertions(+), 41 deletions(-)
>
Reviewed-by: Hannes Reinecke <[email protected]>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich
On 1/31/24 09:51, Daniel Wagner wrote:
> The first argument of list_add_tail function is the new element which
> should be added to the list which is the second argument. Swap the
> arguments to allow processing more than one element at a time.
>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Daniel Wagner <[email protected]>
> ---
> drivers/nvme/target/fcloop.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
Reviewed-by: Hannes Reinecke <[email protected]>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich
On 1/31/24 09:51, Daniel Wagner wrote:
> When the target executes a disconnect and the host triggers a reconnect
> immediately, the reconnect command still finds an existing association.
>
> The reconnect crashes later on because nvmet_fc_delete_target_assoc
> blindly removes resources while the reconnect code wants to use it.
>
> To address this, nvmet_fc_find_target_assoc should not be able to
> lookup an association which is being removed. The association list
> is already under RCU lifetime management, so let's properly use it
> and remove the association from the list and wait for a grace period
> before cleaning up all. This means we also can drop the RCU management
> on the queues, because this is now handled via the association itself.
>
> A second step split the execution context so that the initial disconnect
> command can complete without running the reconnect code in the same
> context. As usual, this is done by deferring the ->done to a workqueue.
>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Daniel Wagner <[email protected]>
> ---
> drivers/nvme/target/fc.c | 83 ++++++++++++++++++----------------------
> 1 file changed, 37 insertions(+), 46 deletions(-)
>
Reviewed-by: Hannes Reinecke <[email protected]>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich
On 1/31/24 16:51, Daniel Wagner wrote:
> When deleting an association the shutdown path is deadlocking because we
> try to flush the nvmet_wq nested. Avoid this by deadlock by deferring
> the put work into its own work item.
>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Daniel Wagner <[email protected]>
> ---
> drivers/nvme/target/fc.c | 16 +++++++++++++---
> 1 file changed, 13 insertions(+), 3 deletions(-)
>
Reviewed-by: Hannes Reinecke <[email protected]>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich
On 1/31/24 16:51, Daniel Wagner wrote:
> The assoc_list is a RCU protected list, thus use the RCU flavor of list
> functions.
>
> Let's use this opportunity and refactor this code and move the lookup
> into a helper and give it a descriptive name.
>
> Signed-off-by: Daniel Wagner <[email protected]>
> ---
> drivers/nvme/target/fc.c | 34 ++++++++++++++++++++++------------
> 1 file changed, 22 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
> index 671d096745a5..fd229f310c93 100644
> --- a/drivers/nvme/target/fc.c
> +++ b/drivers/nvme/target/fc.c
> @@ -1114,14 +1114,27 @@ nvmet_fc_schedule_delete_assoc(struct nvmet_fc_tgt_assoc *assoc)
> queue_work(nvmet_wq, &assoc->del_work);
> }
>
> +static bool
> +nvmet_fc_assoc_exits(struct nvmet_fc_tgtport *tgtport, u64 association_id)
> +{
> + struct nvmet_fc_tgt_assoc *a;
> +
> + list_for_each_entry_rcu(a, &tgtport->assoc_list, a_list) {
> + if (association_id == a->association_id)
> + return true;
> + }
> +
> + return false;
> +}
> +
> static struct nvmet_fc_tgt_assoc *
> nvmet_fc_alloc_target_assoc(struct nvmet_fc_tgtport *tgtport, void *hosthandle)
> {
> - struct nvmet_fc_tgt_assoc *assoc, *tmpassoc;
> + struct nvmet_fc_tgt_assoc *assoc;
> unsigned long flags;
> + bool done;
> u64 ran;
> int idx;
> - bool needrandom = true;
>
> if (!tgtport->pe)
> return NULL;
> @@ -1145,24 +1158,21 @@ nvmet_fc_alloc_target_assoc(struct nvmet_fc_tgtport *tgtport, void *hosthandle)
> INIT_WORK(&assoc->del_work, nvmet_fc_delete_assoc_work);
> atomic_set(&assoc->terminating, 0);
>
> - while (needrandom) {
> + done = false;
> + do {
> get_random_bytes(&ran, sizeof(ran) - BYTES_FOR_QID);
> ran = ran << BYTES_FOR_QID_SHIFT;
>
> spin_lock_irqsave(&tgtport->lock, flags);
> - needrandom = false;
> - list_for_each_entry(tmpassoc, &tgtport->assoc_list, a_list) {
> - if (ran == tmpassoc->association_id) {
> - needrandom = true;
> - break;
> - }
> - }
> - if (!needrandom) {
> + rcu_read_lock();
> + if (!nvmet_fc_assoc_exits(tgtport, ran)) {
> assoc->association_id = ran;
> list_add_tail_rcu(&assoc->a_list, &tgtport->assoc_list);
> + done = true;
> }
> + rcu_read_unlock();
> spin_unlock_irqrestore(&tgtport->lock, flags);
Odd. You already take the tgtport lock, so there really is no point in
using rcu_read_lock() here.
Or does the tgtport lock not protect the assoc_list?
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich
On Tue, Feb 06, 2024 at 02:51:24PM +0900, Hannes Reinecke wrote:
> On 1/31/24 16:51, Daniel Wagner wrote:
> > + rcu_read_lock();
> > + if (!nvmet_fc_assoc_exits(tgtport, ran)) {
> > assoc->association_id = ran;
> > list_add_tail_rcu(&assoc->a_list, &tgtport->assoc_list);
> > + done = true;
> > }
> > + rcu_read_unlock();
> > spin_unlock_irqrestore(&tgtport->lock, flags);
>
> Odd. You already take the tgtport lock, so there really is no point in using
> rcu_read_lock() here.
That's a interesting point, but I think it's harmless since there's no
rcu sync within the read section.
https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html#Guaranteed%20Read-to-Write%20Upgrade
That said, the rcu_read_lock() seems like it should be moved to
nvmet_fc_assoc_exits() since that's the only part reading rcu protected
structures.
On Tue, Feb 06, 2024 at 11:22:49AM -0800, Keith Busch wrote:
> That said, the rcu_read_lock() seems like it should be moved to
> nvmet_fc_assoc_exits() since that's the only part reading rcu protected
> structures.
FTR, here is a fix:
https://lore.kernel.org/linux-nvme/[email protected]/