LinuxLists.cc - [PATCH 0/5] Misc NVMe lockdep fixes

2018-05-15 07:41:41

Subject: [PATCH 0/5] Misc NVMe lockdep fixes

I've "accidently" activated lockdep and ran some tests on nvme, here's
the post-fallout patch series.

Johannes Thumshirn (5):
nvme: fix lockdep warning in nvme_mpath_clear_current_path
nvme: don't hold nvmf_transports_rwsem for more than transport lookups
nvme: call nvmf_create_ctrl before checking for duplicate assignment
nvmet: use atomic allocations when allocating fc requests
nvmet: fcloop: use irqsave spinlocks

drivers/nvme/host/fabrics.c | 20 +++++++---------
drivers/nvme/host/nvme.h | 3 ++-
drivers/nvme/target/fcloop.c | 54 ++++++++++++++++++++++++--------------------
3 files changed, 40 insertions(+), 37 deletions(-)

--
2.16.3

2018-05-15 07:41:24

by Johannes Thumshirn

[permalink] [raw]

Subject: [PATCH 4/5] nvmet: use atomic allocations when allocating fc requests

fcloop_fcp_req() runs with the hctx_lock (a rcu_read_lock() locked
section) held, so memory allocations done in this context have to be
atomic.

This fixes the follwing lockdep complaint:
[ 9.753313] BUG: sleeping function called from invalid context at mm/slab.h:421
[ 9.754518] in_atomic(): 1, irqs_disabled(): 0, pid: 1420, name: nvme
[ 9.755613] 3 locks held by nvme/1420:
[ 9.756221] #0: (ptrval) (nvmf_dev_mutex){+.+.}, at: nvmf_dev_write+0x6a/0xb7d [nvme_fabrics]
[ 9.757575] #1: (ptrval) (nvmf_transports_rwsem){++++}, at: nvmf_dev_write+0x6e5/0xb7d [nvme_fabrics]
[ 9.759000] #2: (ptrval) (rcu_read_lock){....}, at: hctx_lock+0x56/0xd0
[ 9.760141] CPU: 2 PID: 1420 Comm: nvme Not tainted 4.17.0-rc5+ #883
[ 9.761078] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[ 9.762624] Call Trace:
[ 9.763021] dump_stack+0x78/0xb3
[ 9.763505] ___might_sleep+0x227/0x250
[ 9.764115] kmem_cache_alloc_trace+0x1ae/0x270
[ 9.764793] fcloop_fcp_req+0x32/0x1a0 [nvme_fcloop]
[ 9.765561] nvme_fc_start_fcp_op.part.39+0x193/0x4c0 [nvme_fc]
[ 9.766480] blk_mq_dispatch_rq_list+0x7f/0x4a0
[ 9.767163] ? blk_mq_flush_busy_ctxs+0xa8/0xf0
[ 9.767871] blk_mq_sched_dispatch_requests+0x16e/0x170
[ 9.768644] __blk_mq_run_hw_queue+0x79/0xd0
[ 9.769294] __blk_mq_delay_run_hw_queue+0x11c/0x160
[ 9.770012] blk_mq_run_hw_queue+0x63/0xc0
[ 9.770667] blk_mq_sched_insert_request+0xb2/0x140
[ 9.771399] blk_execute_rq+0x64/0xc0
[ 9.771990] __nvme_submit_sync_cmd+0x63/0xd0 [nvme_core]
[ 9.772765] nvmf_connect_admin_queue+0x11e/0x190 [nvme_fabrics]
[ 9.773659] ? mark_held_locks+0x6b/0x90
[ 9.774798] nvme_fc_create_association+0x35b/0x970 [nvme_fc]
[ 9.775631] nvme_fc_create_ctrl+0x5d2/0x830 [nvme_fc]
[ 9.776423] nvmf_dev_write+0x92d/0xb7d [nvme_fabrics]
[ 9.777188] __vfs_write+0x21/0x130
[ 9.777725] ? selinux_file_permission+0xe9/0x140
[ 9.778417] ? security_file_permission+0x2f/0xb0
[ 9.779158] vfs_write+0xbd/0x1c0
[ 9.779644] ksys_write+0x40/0xa0
[ 9.780184] ? do_syscall_64+0xd/0x190
[ 9.780736] do_syscall_64+0x51/0x190
[ 9.781281] entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Johannes Thumshirn <[email protected]>
---
drivers/nvme/target/fcloop.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c
index 34712def81b1..d2209c60f95f 100644
--- a/drivers/nvme/target/fcloop.c
+++ b/drivers/nvme/target/fcloop.c
@@ -509,7 +509,7 @@ fcloop_fcp_req(struct nvme_fc_local_port *localport,
if (!rport->targetport)
return -ECONNREFUSED;

- tfcp_req = kzalloc(sizeof(*tfcp_req), GFP_KERNEL);
+ tfcp_req = kzalloc(sizeof(*tfcp_req), GFP_ATOMIC);
if (!tfcp_req)
return -ENOMEM;

--
2.16.3

2018-05-15 07:41:55

by Johannes Thumshirn

[permalink] [raw]

Subject: [PATCHv2 1/5] nvme: fix lockdep warning in nvme_mpath_clear_current_path

When running blktest's nvme/005 with a lockdep enabled kernel the test
case fails due to the following lockdep splat in dmesg:

[ 18.206166] =============================
[ 18.207286] WARNING: suspicious RCU usage
[ 18.208417] 4.17.0-rc5 #881 Not tainted
[ 18.209487] -----------------------------
[ 18.210612] drivers/nvme/host/nvme.h:457 suspicious rcu_dereference_check() usage!
[ 18.213486]
[ 18.213486] other info that might help us debug this:
[ 18.213486]
[ 18.214745]
[ 18.214745] rcu_scheduler_active = 2, debug_locks = 1
[ 18.215798] 3 locks held by kworker/u32:5/1102:
[ 18.216535] #0: (ptrval) ((wq_completion)"nvme-wq"){+.+.}, at: process_one_work+0x152/0x5c0
[ 18.217983] #1: (ptrval) ((work_completion)(&ctrl->scan_work)){+.+.}, at: process_one_work+0x152/0x5c0
[ 18.219584] #2: (ptrval) (&subsys->lock#2){+.+.}, at: nvme_ns_remove+0x43/0x1c0 [nvme_core]
[ 18.221037]
[ 18.221037] stack backtrace:
[ 18.221721] CPU: 12 PID: 1102 Comm: kworker/u32:5 Not tainted 4.17.0-rc5 #881
[ 18.222830] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[ 18.224451] Workqueue: nvme-wq nvme_scan_work [nvme_core]
[ 18.225308] Call Trace:
[ 18.225704] dump_stack+0x78/0xb3
[ 18.226224] nvme_ns_remove+0x1a3/0x1c0 [nvme_core]
[ 18.226975] nvme_validate_ns+0x87/0x850 [nvme_core]
[ 18.227749] ? blk_queue_exit+0x69/0x110
[ 18.228358] ? blk_queue_exit+0x81/0x110
[ 18.228960] ? direct_make_request+0x1a0/0x1a0
[ 18.229649] nvme_scan_work+0x212/0x2d0 [nvme_core]
[ 18.230411] process_one_work+0x1d8/0x5c0
[ 18.231037] ? process_one_work+0x152/0x5c0
[ 18.231705] worker_thread+0x45/0x3e0
[ 18.232282] kthread+0x101/0x140
[ 18.232788] ? process_one_work+0x5c0/0x5c0

The only caller of nvme_mpath_clear_current_path() is nvme_ns_remove()
which holds the subsys lock so it's likely a false positive, but when
using rcu_access_pointer(), we're telling rcu and lockdep that we're
only after the pointer falue.

Fixes: 32acab3181c7 ("nvme: implement multipath access to nvme subsystems")
Signed-off-by: Johannes Thumshirn <[email protected]>
Suggested-by: Paul E. McKenney <[email protected]>

---
Changes to v1:
- Change rcu_dereference_protected() to rc_access_pointer() (Paul)
---
drivers/nvme/host/nvme.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 17d2f7cf3fed..af2bb6bc984d 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -22,6 +22,7 @@
#include <linux/lightnvm.h>
#include <linux/sed-opal.h>
#include <linux/fault-inject.h>
+#include <linux/rcupdate.h>

extern unsigned int nvme_io_timeout;
#define NVME_IO_TIMEOUT (nvme_io_timeout * HZ)
@@ -454,7 +455,7 @@ static inline void nvme_mpath_clear_current_path(struct nvme_ns *ns)
{
struct nvme_ns_head *head = ns->head;

- if (head && ns == srcu_dereference(head->current_path, &head->srcu))
+ if (head && ns == rcu_access_pointer(head->current_path))
rcu_assign_pointer(head->current_path, NULL);
}
struct nvme_ns *nvme_find_path(struct nvme_ns_head *head);
--
2.16.3

2018-05-15 07:42:12

by Johannes Thumshirn

[permalink] [raw]

Subject: [PATCH 5/5] nvmet: fcloop: use irqsave spinlocks

Lockdep complains about inconsistent hardirq states when using
nvme-fcloop. So use the irqsave variant of spin_locks for the
nvmefc_fcp_req's reqlock.

Here's the lockdep report:
[ 11.138417] ================================
[ 11.139085] WARNING: inconsistent lock state
[ 11.139207] nvmet: creating controller 2 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:046394e7-bff7-4403-9502-1816800efaa0.
[ 11.139727] 4.17.0-rc5+ #883 Not tainted
[ 11.139732] --------------------------------
[ 11.144441] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
[ 11.145341] swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
[ 11.146108] (ptrval) (&(&tfcp_req->reqlock)->rlock){?.+.}, at: fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
[ 11.148633] {HARDIRQ-ON-W} state was registered at:
[ 11.149380] _raw_spin_lock+0x34/0x70
[ 11.149940] fcloop_fcp_recv_work+0x17/0x90 [nvme_fcloop]
[ 11.150746] process_one_work+0x1d8/0x5c0
[ 11.151346] worker_thread+0x45/0x3e0
[ 11.151886] kthread+0x101/0x140
[ 11.152370] ret_from_fork+0x3a/0x50
[ 11.152903] irq event stamp: 36666
[ 11.153402] hardirqs last enabled at (36663): [<ffffffffafc601c3>] default_idle+0x13/0x180
[ 11.154601] hardirqs last disabled at (36664): [<ffffffffafe00964>] interrupt_entry+0xc4/0xe0
[ 11.155817] softirqs last enabled at (36666): [<ffffffffaf267409>] irq_enter+0x59/0x60
[ 11.156952] softirqs last disabled at (36665): [<ffffffffaf2673ee>] irq_enter+0x3e/0x60
[ 11.158092]
[ 11.158092] other info that might help us debug this:
[ 11.159436] Possible unsafe locking scenario:
[ 11.159436]
[ 11.160299] CPU0
[ 11.160663] ----
[ 11.161026] lock(&(&tfcp_req->reqlock)->rlock);
[ 11.161709] <Interrupt>
[ 11.162091] lock(&(&tfcp_req->reqlock)->rlock);
[ 11.163148]
[ 11.163148] *** DEADLOCK ***
[ 11.163148]
[ 11.164007] no locks held by swapper/8/0.
[ 11.164596]
[ 11.164596] stack backtrace:
[ 11.165238] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.17.0-rc5+ #883
[ 11.166180] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[ 11.167673] Call Trace:
[ 11.168037] <IRQ>
[ 11.168349] dump_stack+0x78/0xb3
[ 11.168864] print_usage_bug+0x1ed/0x1fe
[ 11.169440] ? check_usage_backwards+0x110/0x110
[ 11.170111] mark_lock+0x263/0x2a0
[ 11.170995] __lock_acquire+0x675/0x1870
[ 11.171865] ? __lock_acquire+0x48d/0x1870
[ 11.172460] lock_acquire+0xd4/0x220
[ 11.172981] ? fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
[ 11.173709] _raw_spin_lock+0x34/0x70
[ 11.174243] ? fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
[ 11.174978] fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
[ 11.175673] nvmet_fc_transfer_fcp_data+0x9b/0x130 [nvmet_fc]
[ 11.176507] nvmet_req_complete+0x10/0x110 [nvmet]
[ 11.177210] nvmet_bio_done+0x23/0x40 [nvmet]
[ 11.177837] blk_update_request+0xab/0x3b0
[ 11.178434] blk_mq_end_request+0x13/0x60
[ 11.179033] flush_smp_call_function_queue+0x58/0x150
[ 11.179755] smp_call_function_single_interrupt+0x49/0x260
[ 11.180531] call_function_single_interrupt+0xf/0x20
[ 11.181236] </IRQ>
[ 11.181542] RIP: 0010:native_safe_halt+0x2/0x10
[ 11.182186] RSP: 0018:ffffa481006cbec0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff04
[ 11.183265] RAX: ffff9f54f8b86200 RBX: ffff9f54f8b86200 RCX: 0000000000000000
[ 11.184264] RDX: ffff9f54f8b86200 RSI: 0000000000000001 RDI: ffff9f54f8b86200
[ 11.185270] RBP: 0000000000000008 R08: 0000000000000000 R09: 0000000000000000
[ 11.186271] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[ 11.187280] R13: 0000000000000000 R14: ffff9f54f8b86200 R15: ffff9f54f8b86200
[ 11.188280] default_idle+0x18/0x180
[ 11.188806] do_idle+0x176/0x260
[ 11.189273] cpu_startup_entry+0x5a/0x60
[ 11.189832] start_secondary+0x18f/0x1b0

Signed-off-by: Johannes Thumshirn <[email protected]>
Cc: James Smart <[email protected]>
---
drivers/nvme/target/fcloop.c | 52 ++++++++++++++++++++++++--------------------
1 file changed, 29 insertions(+), 23 deletions(-)

diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c
index d2209c60f95f..f4a3700201c1 100644
--- a/drivers/nvme/target/fcloop.c
+++ b/drivers/nvme/target/fcloop.c
@@ -406,9 +406,10 @@ fcloop_fcp_recv_work(struct work_struct *work)
container_of(work, struct fcloop_fcpreq, fcp_rcv_work);
struct nvmefc_fcp_req *fcpreq = tfcp_req->fcpreq;
int ret = 0;
+ unsigned long flags;
bool aborted = false;

- spin_lock(&tfcp_req->reqlock);
+ spin_lock_irqsave(&tfcp_req->reqlock, flags);
switch (tfcp_req->inistate) {
case INI_IO_START:
tfcp_req->inistate = INI_IO_ACTIVE;
@@ -417,11 +418,11 @@ fcloop_fcp_recv_work(struct work_struct *work)
aborted = true;
break;
default:
- spin_unlock(&tfcp_req->reqlock);
+ spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
WARN_ON(1);
return;
}
- spin_unlock(&tfcp_req->reqlock);
+ spin_unlock_irqrestore(&tfcp_req->reqlock, flags);

if (unlikely(aborted))
ret = -ECANCELED;
@@ -442,8 +443,9 @@ fcloop_fcp_abort_recv_work(struct work_struct *work)
container_of(work, struct fcloop_fcpreq, abort_rcv_work);
struct nvmefc_fcp_req *fcpreq;
bool completed = false;
+ unsigned long flags;

- spin_lock(&tfcp_req->reqlock);
+ spin_lock_irqsave(&tfcp_req->reqlock, flags);
fcpreq = tfcp_req->fcpreq;
switch (tfcp_req->inistate) {
case INI_IO_ABORTED:
@@ -452,11 +454,11 @@ fcloop_fcp_abort_recv_work(struct work_struct *work)
completed = true;
break;
default:
- spin_unlock(&tfcp_req->reqlock);
+ spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
WARN_ON(1);
return;
}
- spin_unlock(&tfcp_req->reqlock);
+ spin_unlock_irqrestore(&tfcp_req->reqlock, flags);

if (unlikely(completed)) {
/* remove reference taken in original abort downcall */
@@ -468,9 +470,9 @@ fcloop_fcp_abort_recv_work(struct work_struct *work)
nvmet_fc_rcv_fcp_abort(tfcp_req->tport->targetport,
&tfcp_req->tgt_fcp_req);

- spin_lock(&tfcp_req->reqlock);
+ spin_lock_irqsave(&tfcp_req->reqlock, flags);
tfcp_req->fcpreq = NULL;
- spin_unlock(&tfcp_req->reqlock);
+ spin_unlock_irqrestore(&tfcp_req->reqlock, flags);

fcloop_call_host_done(fcpreq, tfcp_req, -ECANCELED);
/* call_host_done releases reference for abort downcall */
@@ -486,11 +488,12 @@ fcloop_tgt_fcprqst_done_work(struct work_struct *work)
struct fcloop_fcpreq *tfcp_req =
container_of(work, struct fcloop_fcpreq, tio_done_work);
struct nvmefc_fcp_req *fcpreq;
+ unsigned long flags;

- spin_lock(&tfcp_req->reqlock);
+ spin_lock_irqsave(&tfcp_req->reqlock, flags);
fcpreq = tfcp_req->fcpreq;
tfcp_req->inistate = INI_IO_COMPLETED;
- spin_unlock(&tfcp_req->reqlock);
+ spin_unlock_irqrestore(&tfcp_req->reqlock, flags);

fcloop_call_host_done(fcpreq, tfcp_req, tfcp_req->status);
}
@@ -594,13 +597,14 @@ fcloop_fcp_op(struct nvmet_fc_target_port *tgtport,
u32 rsplen = 0, xfrlen = 0;
int fcp_err = 0, active, aborted;
u8 op = tgt_fcpreq->op;
+ unsigned long flags;

- spin_lock(&tfcp_req->reqlock);
+ spin_lock_irqsave(&tfcp_req->reqlock, flags);
fcpreq = tfcp_req->fcpreq;
active = tfcp_req->active;
aborted = tfcp_req->aborted;
tfcp_req->active = true;
- spin_unlock(&tfcp_req->reqlock);
+ spin_unlock_irqrestore(&tfcp_req->reqlock, flags);

if (unlikely(active))
/* illegal - call while i/o active */
@@ -608,9 +612,9 @@ fcloop_fcp_op(struct nvmet_fc_target_port *tgtport,

if (unlikely(aborted)) {
/* target transport has aborted i/o prior */
- spin_lock(&tfcp_req->reqlock);
+ spin_lock_irqsave(&tfcp_req->reqlock, flags);
tfcp_req->active = false;
- spin_unlock(&tfcp_req->reqlock);
+ spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
tgt_fcpreq->transferred_length = 0;
tgt_fcpreq->fcp_error = -ECANCELED;
tgt_fcpreq->done(tgt_fcpreq);
@@ -666,9 +670,9 @@ fcloop_fcp_op(struct nvmet_fc_target_port *tgtport,
break;
}

- spin_lock(&tfcp_req->reqlock);
+ spin_lock_irqsave(&tfcp_req->reqlock, flags);
tfcp_req->active = false;
- spin_unlock(&tfcp_req->reqlock);
+ spin_unlock_irqrestore(&tfcp_req->reqlock, flags);

tgt_fcpreq->transferred_length = xfrlen;
tgt_fcpreq->fcp_error = fcp_err;
@@ -682,15 +686,16 @@ fcloop_tgt_fcp_abort(struct nvmet_fc_target_port *tgtport,
struct nvmefc_tgt_fcp_req *tgt_fcpreq)
{
struct fcloop_fcpreq *tfcp_req = tgt_fcp_req_to_fcpreq(tgt_fcpreq);
+ unsigned long flags;

/*
* mark aborted only in case there were 2 threads in transport
* (one doing io, other doing abort) and only kills ops posted
* after the abort request
*/
- spin_lock(&tfcp_req->reqlock);
+ spin_lock_irqsave(&tfcp_req->reqlock, flags);
tfcp_req->aborted = true;
- spin_unlock(&tfcp_req->reqlock);
+ spin_unlock_irqrestore(&tfcp_req->reqlock, flags);

tfcp_req->status = NVME_SC_INTERNAL;

@@ -726,19 +731,20 @@ fcloop_fcp_abort(struct nvme_fc_local_port *localport,
struct fcloop_ini_fcpreq *inireq = fcpreq->private;
struct fcloop_fcpreq *tfcp_req;
bool abortio = true;
+ unsigned long flags;

- spin_lock(&inireq->inilock);
+ spin_lock_irqsave(&inireq->inilock, flags);
tfcp_req = inireq->tfcp_req;
if (tfcp_req)
fcloop_tfcp_req_get(tfcp_req);
- spin_unlock(&inireq->inilock);
+ spin_unlock_irqrestore(&inireq->inilock, flags);

if (!tfcp_req)
/* abort has already been called */
return;

/* break initiator/target relationship for io */
- spin_lock(&tfcp_req->reqlock);
+ spin_lock_irqsave(&tfcp_req->reqlock, flags);
switch (tfcp_req->inistate) {
case INI_IO_START:
case INI_IO_ACTIVE:
@@ -748,11 +754,11 @@ fcloop_fcp_abort(struct nvme_fc_local_port *localport,
abortio = false;
break;
default:
- spin_unlock(&tfcp_req->reqlock);
+ spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
WARN_ON(1);
return;
}
- spin_unlock(&tfcp_req->reqlock);
+ spin_unlock_irqrestore(&tfcp_req->reqlock, flags);

if (abortio)
/* leave the reference while the work item is scheduled */
--
2.16.3

2018-05-15 07:43:27

by Johannes Thumshirn

[permalink] [raw]

Subject: [PATCH 2/5] nvme: don't hold nvmf_transports_rwsem for more than transport lookups

Only take nvmf_transports_rwsem when doing a lookup of regitered
transports, as subsequent actions may take the hctx_lock which is an
rcu_read_lock and thus lockdep complains about suspicious rcu usage.

[ 9.703333] =============================
[ 9.704797] WARNING: suspicious RCU usage
[ 9.706250] 4.17.0-rc5+ #883 Not tainted
[ 9.708146] -----------------------------
[ 9.708868] ./include/linux/rcupdate.h:304 Illegal context switch in RCU read-side critical section!
[ 9.710511]
[ 9.710511] other info that might help us debug this:
[ 9.710511]
[ 9.711959]
[ 9.711959] rcu_scheduler_active = 2, debug_locks = 1
[ 9.713142] 3 locks held by nvme/1420:
[ 9.713800] #0: (ptrval) (nvmf_dev_mutex){+.+.}, at: nvmf_dev_write+0x6a/0xb7d [nvme_fabrics]
[ 9.715471] #1: (ptrval) (nvmf_transports_rwsem){++++}, at: nvmf_dev_write+0x6e5/0xb7d [nvme_fabrics]
[ 9.717279] #2: (ptrval) (rcu_read_lock){....}, at: hctx_lock+0x56/0xd0
[ 9.718636]
[ 9.718636] stack backtrace:
[ 9.720266] CPU: 2 PID: 1420 Comm: nvme Not tainted 4.17.0-rc5+ #883
[ 9.721446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[ 9.723003] Call Trace:
[ 9.723453] dump_stack+0x78/0xb3
[ 9.724059] ___might_sleep+0xde/0x250
[ 9.724749] kmem_cache_alloc_trace+0x1ae/0x270
[ 9.725565] fcloop_fcp_req+0x32/0x1a0 [nvme_fcloop]
[ 9.726428] nvme_fc_start_fcp_op.part.39+0x193/0x4c0 [nvme_fc]
[ 9.727425] blk_mq_dispatch_rq_list+0x7f/0x4a0
[ 9.728219] ? blk_mq_flush_busy_ctxs+0xa8/0xf0
[ 9.729035] blk_mq_sched_dispatch_requests+0x16e/0x170
[ 9.729984] __blk_mq_run_hw_queue+0x79/0xd0
[ 9.730737] __blk_mq_delay_run_hw_queue+0x11c/0x160
[ 9.731647] blk_mq_run_hw_queue+0x63/0xc0
[ 9.732357] blk_mq_sched_insert_request+0xb2/0x140
[ 9.733204] blk_execute_rq+0x64/0xc0
[ 9.733840] __nvme_submit_sync_cmd+0x63/0xd0 [nvme_core]
[ 9.734772] nvmf_connect_admin_queue+0x11e/0x190 [nvme_fabrics]
[ 9.735815] ? mark_held_locks+0x6b/0x90
[ 9.736504] nvme_fc_create_association+0x35b/0x970 [nvme_fc]
[ 9.737489] nvme_fc_create_ctrl+0x5d2/0x830 [nvme_fc]
[ 9.738379] nvmf_dev_write+0x92d/0xb7d [nvme_fabrics]
[ 9.739253] __vfs_write+0x21/0x130
[ 9.739895] ? selinux_file_permission+0xe9/0x140
[ 9.740698] ? security_file_permission+0x2f/0xb0
[ 9.741526] vfs_write+0xbd/0x1c0
[ 9.742095] ksys_write+0x40/0xa0
[ 9.742681] ? do_syscall_64+0xd/0x190
[ 9.743335] do_syscall_64+0x51/0x190
[ 9.744415] entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Johannes Thumshirn <[email protected]>
---
drivers/nvme/host/fabrics.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 7ae732a77fe8..febf82639b40 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -957,16 +957,17 @@ nvmf_create_ctrl(struct device *dev, const char *buf, size_t count)

down_read(&nvmf_transports_rwsem);
ops = nvmf_lookup_transport(opts);
+ up_read(&nvmf_transports_rwsem);
if (!ops) {
pr_info("no handler found for transport %s.\n",
opts->transport);
ret = -EINVAL;
- goto out_unlock;
+ goto out_free_opts;
}

if (!try_module_get(ops->module)) {
ret = -EBUSY;
- goto out_unlock;
+ goto out_free_opts;
}

ret = nvmf_check_required_opts(opts, ops->required_opts);
@@ -988,19 +989,15 @@ nvmf_create_ctrl(struct device *dev, const char *buf, size_t count)
"controller returned incorrect NQN: \"%s\".\n",
ctrl->subsys->subnqn);
module_put(ops->module);
- up_read(&nvmf_transports_rwsem);
nvme_delete_ctrl_sync(ctrl);
return ERR_PTR(-EINVAL);
}

module_put(ops->module);
- up_read(&nvmf_transports_rwsem);
return ctrl;

out_module_put:
module_put(ops->module);
-out_unlock:
- up_read(&nvmf_transports_rwsem);
out_free_opts:
nvmf_free_options(opts);
return ERR_PTR(ret);
--
2.16.3

2018-05-15 07:43:39

by Johannes Thumshirn

[permalink] [raw]

Subject: [PATCH 3/5] nvme: call nvmf_create_ctrl before checking for duplicate assignment

In nvmf_dev_write we did check if the /dev/nvme-fabrics device node's
private data is already set and then create a controller data
structure afterwards. The private data is protected by the
nvmf_dev_mutex, but there is no need to hold it while calling
nvmf_create_ctrl().

This also reduces the number of lockdep complaints in the 'nvme
connect' with fcloop scenario.

[ 9.703333] =============================
[ 9.704797] WARNING: suspicious RCU usage
[ 9.706250] 4.17.0-rc5+ #883 Not tainted
[ 9.708146] -----------------------------
[ 9.708868] ./include/linux/rcupdate.h:304 Illegal context switch in RCU read-side critical section!
[ 9.710511]
[ 9.710511] other info that might help us debug this:
[ 9.710511]
[ 9.711959]
[ 9.711959] rcu_scheduler_active = 2, debug_locks = 1
[ 9.713142] 3 locks held by nvme/1420:
[ 9.713800] #0: (ptrval) (nvmf_dev_mutex){+.+.}, at: nvmf_dev_write+0x6a/0xb7d [nvme_fabrics]
[ 9.717279] #1: (ptrval) (rcu_read_lock){....}, at: hctx_lock+0x56/0xd0
[ 9.718636]
[ 9.718636] stack backtrace:
[ 9.720266] CPU: 2 PID: 1420 Comm: nvme Not tainted 4.17.0-rc5+ #883
[ 9.721446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[ 9.723003] Call Trace:
[ 9.723453] dump_stack+0x78/0xb3
[ 9.724059] ___might_sleep+0xde/0x250
[ 9.724749] kmem_cache_alloc_trace+0x1ae/0x270
[ 9.725565] fcloop_fcp_req+0x32/0x1a0 [nvme_fcloop]
[ 9.726428] nvme_fc_start_fcp_op.part.39+0x193/0x4c0 [nvme_fc]
[ 9.727425] blk_mq_dispatch_rq_list+0x7f/0x4a0
[ 9.728219] ? blk_mq_flush_busy_ctxs+0xa8/0xf0
[ 9.729035] blk_mq_sched_dispatch_requests+0x16e/0x170
[ 9.729984] __blk_mq_run_hw_queue+0x79/0xd0
[ 9.730737] __blk_mq_delay_run_hw_queue+0x11c/0x160
[ 9.731647] blk_mq_run_hw_queue+0x63/0xc0
[ 9.732357] blk_mq_sched_insert_request+0xb2/0x140
[ 9.733204] blk_execute_rq+0x64/0xc0
[ 9.733840] __nvme_submit_sync_cmd+0x63/0xd0 [nvme_core]
[ 9.734772] nvmf_connect_admin_queue+0x11e/0x190 [nvme_fabrics]
[ 9.735815] ? mark_held_locks+0x6b/0x90
[ 9.736504] nvme_fc_create_association+0x35b/0x970 [nvme_fc]
[ 9.737489] nvme_fc_create_ctrl+0x5d2/0x830 [nvme_fc]
[ 9.738379] nvmf_dev_write+0x92d/0xb7d [nvme_fabrics]
[ 9.739253] __vfs_write+0x21/0x130
[ 9.739895] ? selinux_file_permission+0xe9/0x140
[ 9.740698] ? security_file_permission+0x2f/0xb0
[ 9.741526] vfs_write+0xbd/0x1c0
[ 9.742095] ksys_write+0x40/0xa0
[ 9.742681] ? do_syscall_64+0xd/0x190
[ 9.743335] do_syscall_64+0x51/0x190
[ 9.744415] entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Johannes Thumshirn <[email protected]>
---
drivers/nvme/host/fabrics.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index febf82639b40..757a49b9c5a8 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -1022,18 +1022,17 @@ static ssize_t nvmf_dev_write(struct file *file, const char __user *ubuf,
if (IS_ERR(buf))
return PTR_ERR(buf);

+ ctrl = nvmf_create_ctrl(nvmf_device, buf, count);
+ if (IS_ERR(ctrl))
+ return PTR_ERR(ctrl);
+
mutex_lock(&nvmf_dev_mutex);
if (seq_file->private) {
+ nvme_delete_ctrl_sync(ctrl);
ret = -EINVAL;
goto out_unlock;
}

- ctrl = nvmf_create_ctrl(nvmf_device, buf, count);
- if (IS_ERR(ctrl)) {
- ret = PTR_ERR(ctrl);
- goto out_unlock;
- }
-
seq_file->private = ctrl;

out_unlock:
--
2.16.3

2018-05-15 07:44:31

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH 2/5] nvme: don't hold nvmf_transports_rwsem for more than transport lookups

> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
> index 7ae732a77fe8..febf82639b40 100644
> --- a/drivers/nvme/host/fabrics.c
> +++ b/drivers/nvme/host/fabrics.c
> @@ -957,16 +957,17 @@ nvmf_create_ctrl(struct device *dev, const char *buf, size_t count)
>
> down_read(&nvmf_transports_rwsem);
> ops = nvmf_lookup_transport(opts);
> + up_read(&nvmf_transports_rwsem);

And what protects us from the transport getting unregister right here
from anothet thread waiting to acquire nvmf_transports_rwsem?

2018-05-15 07:48:33

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH 3/5] nvme: call nvmf_create_ctrl before checking for duplicate assignment

On Tue, May 15, 2018 at 09:40:41AM +0200, Johannes Thumshirn wrote:
> In nvmf_dev_write we did check if the /dev/nvme-fabrics device node's
> private data is already set and then create a controller data
> structure afterwards. The private data is protected by the
> nvmf_dev_mutex, but there is no need to hold it while calling
> nvmf_create_ctrl().
>
> This also reduces the number of lockdep complaints in the 'nvme
> connect' with fcloop scenario.

Something looks seriously fishy in fcloop. How do we and up in
a RCU critical section here?

In general it seems like fc loop needs to offload any I/O to a workqueue
just like nvme-loop does, but even then I can't see how that is going to
cause an issue in this area.

2018-05-15 07:52:50

by Johannes Thumshirn

[permalink] [raw]

Subject: Re: [PATCH 3/5] nvme: call nvmf_create_ctrl before checking for duplicate assignment

On Tue, May 15, 2018 at 09:50:55AM +0200, Christoph Hellwig wrote:
> On Tue, May 15, 2018 at 09:40:41AM +0200, Johannes Thumshirn wrote:
> > In nvmf_dev_write we did check if the /dev/nvme-fabrics device node's
> > private data is already set and then create a controller data
> > structure afterwards. The private data is protected by the
> > nvmf_dev_mutex, but there is no need to hold it while calling
> > nvmf_create_ctrl().
> >
> > This also reduces the number of lockdep complaints in the 'nvme
> > connect' with fcloop scenario.
>
> Something looks seriously fishy in fcloop. How do we and up in
> a RCU critical section here?

__blk_mq_run_hw_queue() calls blk_mq_sched_dispatch_requests() with
the hctx_lock() a.k.a rcu_read_lock() held.

nvmf_connect_admin_queue() does I/O which causes the above to be called.

> In general it seems like fc loop needs to offload any I/O to a workqueue
> just like nvme-loop does, but even then I can't see how that is going to
> cause an issue in this area.

Hmm I'll be looking into it.

--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2018-05-15 08:09:15

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCHv2 1/5] nvme: fix lockdep warning in nvme_mpath_clear_current_path

Looks good, although I hate having the whole splat in the commit
message..

Reviewed-by: Christoph Hellwig <[email protected]>

2018-05-31 09:24:29

by Sagi Grimberg

[permalink] [raw]

Subject: Re: [PATCHv2 1/5] nvme: fix lockdep warning in nvme_mpath_clear_current_path

Reviewed-by: Sagi Grimberg <[email protected]>

2018-05-31 09:26:51

by Sagi Grimberg

[permalink] [raw]

Subject: Re: [PATCH 2/5] nvme: don't hold nvmf_transports_rwsem for more than transport lookups

>> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
>> index 7ae732a77fe8..febf82639b40 100644
>> --- a/drivers/nvme/host/fabrics.c
>> +++ b/drivers/nvme/host/fabrics.c
>> @@ -957,16 +957,17 @@ nvmf_create_ctrl(struct device *dev, const char *buf, size_t count)
>>
>> down_read(&nvmf_transports_rwsem);
>> ops = nvmf_lookup_transport(opts);
>> + up_read(&nvmf_transports_rwsem);
>
> And what protects us from the transport getting unregister right here
> from anothet thread waiting to acquire nvmf_transports_rwsem?

I think having the module_get inside as well would protect against it.

2018-05-31 09:28:00

by Sagi Grimberg

[permalink] [raw]

Subject: Re: [PATCH 3/5] nvme: call nvmf_create_ctrl before checking for duplicate assignment

>> In general it seems like fc loop needs to offload any I/O to a workqueue
>> just like nvme-loop does, but even then I can't see how that is going to
>> cause an issue in this area.
>
> Hmm I'll be looking into it.

FWIW, I agree it should do that as well.

2018-05-31 09:32:12

by Sagi Grimberg

[permalink] [raw]

Subject: Re: [PATCH 4/5] nvmet: use atomic allocations when allocating fc requests

> diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c
> index 34712def81b1..d2209c60f95f 100644
> --- a/drivers/nvme/target/fcloop.c
> +++ b/drivers/nvme/target/fcloop.c
> @@ -509,7 +509,7 @@ fcloop_fcp_req(struct nvme_fc_local_port *localport,
> if (!rport->targetport)
> return -ECONNREFUSED;
>
> - tfcp_req = kzalloc(sizeof(*tfcp_req), GFP_KERNEL);
> + tfcp_req = kzalloc(sizeof(*tfcp_req), GFP_ATOMIC);

Question, why isn't tfcp_req embedded in fcpreq? don't they have
the same lifetime?

> if (!tfcp_req)
> return -ENOMEM;
>
>

2018-05-31 09:38:22

by Sagi Grimberg

[permalink] [raw]

Subject: Re: [PATCH 5/5] nvmet: fcloop: use irqsave spinlocks

> Lockdep complains about inconsistent hardirq states when using
> nvme-fcloop. So use the irqsave variant of spin_locks for the
> nvmefc_fcp_req's reqlock.

Is this because of the nvmet bdev can trigger rx path in interrupt
context? Don't understand exactly what is violated.

>
> Here's the lockdep report:
> [ 11.138417] ================================
> [ 11.139085] WARNING: inconsistent lock state
> [ 11.139207] nvmet: creating controller 2 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:046394e7-bff7-4403-9502-1816800efaa0.
> [ 11.139727] 4.17.0-rc5+ #883 Not tainted
> [ 11.139732] --------------------------------
> [ 11.144441] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> [ 11.145341] swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> [ 11.146108] (ptrval) (&(&tfcp_req->reqlock)->rlock){?.+.}, at: fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [ 11.148633] {HARDIRQ-ON-W} state was registered at:
> [ 11.149380] _raw_spin_lock+0x34/0x70
> [ 11.149940] fcloop_fcp_recv_work+0x17/0x90 [nvme_fcloop]
> [ 11.150746] process_one_work+0x1d8/0x5c0
> [ 11.151346] worker_thread+0x45/0x3e0
> [ 11.151886] kthread+0x101/0x140
> [ 11.152370] ret_from_fork+0x3a/0x50
> [ 11.152903] irq event stamp: 36666
> [ 11.153402] hardirqs last enabled at (36663): [<ffffffffafc601c3>] default_idle+0x13/0x180
> [ 11.154601] hardirqs last disabled at (36664): [<ffffffffafe00964>] interrupt_entry+0xc4/0xe0
> [ 11.155817] softirqs last enabled at (36666): [<ffffffffaf267409>] irq_enter+0x59/0x60
> [ 11.156952] softirqs last disabled at (36665): [<ffffffffaf2673ee>] irq_enter+0x3e/0x60
> [ 11.158092]
> [ 11.158092] other info that might help us debug this:
> [ 11.159436] Possible unsafe locking scenario:
> [ 11.159436]
> [ 11.160299] CPU0
> [ 11.160663] ----
> [ 11.161026] lock(&(&tfcp_req->reqlock)->rlock);
> [ 11.161709] <Interrupt>
> [ 11.162091] lock(&(&tfcp_req->reqlock)->rlock);
> [ 11.163148]
> [ 11.163148] *** DEADLOCK ***
> [ 11.163148]
> [ 11.164007] no locks held by swapper/8/0.
> [ 11.164596]
> [ 11.164596] stack backtrace:
> [ 11.165238] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.17.0-rc5+ #883
> [ 11.166180] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
> [ 11.167673] Call Trace:
> [ 11.168037] <IRQ>
> [ 11.168349] dump_stack+0x78/0xb3
> [ 11.168864] print_usage_bug+0x1ed/0x1fe
> [ 11.169440] ? check_usage_backwards+0x110/0x110
> [ 11.170111] mark_lock+0x263/0x2a0
> [ 11.170995] __lock_acquire+0x675/0x1870
> [ 11.171865] ? __lock_acquire+0x48d/0x1870
> [ 11.172460] lock_acquire+0xd4/0x220
> [ 11.172981] ? fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [ 11.173709] _raw_spin_lock+0x34/0x70
> [ 11.174243] ? fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [ 11.174978] fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [ 11.175673] nvmet_fc_transfer_fcp_data+0x9b/0x130 [nvmet_fc]
> [ 11.176507] nvmet_req_complete+0x10/0x110 [nvmet]
> [ 11.177210] nvmet_bio_done+0x23/0x40 [nvmet]
> [ 11.177837] blk_update_request+0xab/0x3b0
> [ 11.178434] blk_mq_end_request+0x13/0x60
> [ 11.179033] flush_smp_call_function_queue+0x58/0x150
> [ 11.179755] smp_call_function_single_interrupt+0x49/0x260
> [ 11.180531] call_function_single_interrupt+0xf/0x20
> [ 11.181236] </IRQ>
> [ 11.181542] RIP: 0010:native_safe_halt+0x2/0x10
> [ 11.182186] RSP: 0018:ffffa481006cbec0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff04
> [ 11.183265] RAX: ffff9f54f8b86200 RBX: ffff9f54f8b86200 RCX: 0000000000000000
> [ 11.184264] RDX: ffff9f54f8b86200 RSI: 0000000000000001 RDI: ffff9f54f8b86200
> [ 11.185270] RBP: 0000000000000008 R08: 0000000000000000 R09: 0000000000000000
> [ 11.186271] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [ 11.187280] R13: 0000000000000000 R14: ffff9f54f8b86200 R15: ffff9f54f8b86200
> [ 11.188280] default_idle+0x18/0x180
> [ 11.188806] do_idle+0x176/0x260
> [ 11.189273] cpu_startup_entry+0x5a/0x60
> [ 11.189832] start_secondary+0x18f/0x1b0
>
> Signed-off-by: Johannes Thumshirn <[email protected]>
> Cc: James Smart <[email protected]>
> ---
> drivers/nvme/target/fcloop.c | 52 ++++++++++++++++++++++++--------------------
> 1 file changed, 29 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c
> index d2209c60f95f..f4a3700201c1 100644
> --- a/drivers/nvme/target/fcloop.c
> +++ b/drivers/nvme/target/fcloop.c
> @@ -406,9 +406,10 @@ fcloop_fcp_recv_work(struct work_struct *work)
> container_of(work, struct fcloop_fcpreq, fcp_rcv_work);
> struct nvmefc_fcp_req *fcpreq = tfcp_req->fcpreq;
> int ret = 0;
> + unsigned long flags;
> bool aborted = false;
>
> - spin_lock(&tfcp_req->reqlock);
> + spin_lock_irqsave(&tfcp_req->reqlock, flags);
> switch (tfcp_req->inistate) {
> case INI_IO_START:
> tfcp_req->inistate = INI_IO_ACTIVE;
> @@ -417,11 +418,11 @@ fcloop_fcp_recv_work(struct work_struct *work)
> aborted = true;
> break;
> default:
> - spin_unlock(&tfcp_req->reqlock);
> + spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
> WARN_ON(1);
> return;
> }
> - spin_unlock(&tfcp_req->reqlock);
> + spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
>
> if (unlikely(aborted))
> ret = -ECANCELED;
> @@ -442,8 +443,9 @@ fcloop_fcp_abort_recv_work(struct work_struct *work)
> container_of(work, struct fcloop_fcpreq, abort_rcv_work);
> struct nvmefc_fcp_req *fcpreq;
> bool completed = false;
> + unsigned long flags;
>
> - spin_lock(&tfcp_req->reqlock);
> + spin_lock_irqsave(&tfcp_req->reqlock, flags);
> fcpreq = tfcp_req->fcpreq;
> switch (tfcp_req->inistate) {
> case INI_IO_ABORTED:
> @@ -452,11 +454,11 @@ fcloop_fcp_abort_recv_work(struct work_struct *work)
> completed = true;
> break;
> default:
> - spin_unlock(&tfcp_req->reqlock);
> + spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
> WARN_ON(1);
> return;
> }
> - spin_unlock(&tfcp_req->reqlock);
> + spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
>
> if (unlikely(completed)) {
> /* remove reference taken in original abort downcall */
> @@ -468,9 +470,9 @@ fcloop_fcp_abort_recv_work(struct work_struct *work)
> nvmet_fc_rcv_fcp_abort(tfcp_req->tport->targetport,
> &tfcp_req->tgt_fcp_req);
>
> - spin_lock(&tfcp_req->reqlock);
> + spin_lock_irqsave(&tfcp_req->reqlock, flags);
> tfcp_req->fcpreq = NULL;
> - spin_unlock(&tfcp_req->reqlock);
> + spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
>
> fcloop_call_host_done(fcpreq, tfcp_req, -ECANCELED);
> /* call_host_done releases reference for abort downcall */
> @@ -486,11 +488,12 @@ fcloop_tgt_fcprqst_done_work(struct work_struct *work)
> struct fcloop_fcpreq *tfcp_req =
> container_of(work, struct fcloop_fcpreq, tio_done_work);
> struct nvmefc_fcp_req *fcpreq;
> + unsigned long flags;
>
> - spin_lock(&tfcp_req->reqlock);
> + spin_lock_irqsave(&tfcp_req->reqlock, flags);
> fcpreq = tfcp_req->fcpreq;
> tfcp_req->inistate = INI_IO_COMPLETED;
> - spin_unlock(&tfcp_req->reqlock);
> + spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
>
> fcloop_call_host_done(fcpreq, tfcp_req, tfcp_req->status);
> }
> @@ -594,13 +597,14 @@ fcloop_fcp_op(struct nvmet_fc_target_port *tgtport,
> u32 rsplen = 0, xfrlen = 0;
> int fcp_err = 0, active, aborted;
> u8 op = tgt_fcpreq->op;
> + unsigned long flags;
>
> - spin_lock(&tfcp_req->reqlock);
> + spin_lock_irqsave(&tfcp_req->reqlock, flags);
> fcpreq = tfcp_req->fcpreq;
> active = tfcp_req->active;
> aborted = tfcp_req->aborted;
> tfcp_req->active = true;
> - spin_unlock(&tfcp_req->reqlock);
> + spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
>
> if (unlikely(active))
> /* illegal - call while i/o active */
> @@ -608,9 +612,9 @@ fcloop_fcp_op(struct nvmet_fc_target_port *tgtport,
>
> if (unlikely(aborted)) {
> /* target transport has aborted i/o prior */
> - spin_lock(&tfcp_req->reqlock);
> + spin_lock_irqsave(&tfcp_req->reqlock, flags);
> tfcp_req->active = false;
> - spin_unlock(&tfcp_req->reqlock);
> + spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
> tgt_fcpreq->transferred_length = 0;
> tgt_fcpreq->fcp_error = -ECANCELED;
> tgt_fcpreq->done(tgt_fcpreq);
> @@ -666,9 +670,9 @@ fcloop_fcp_op(struct nvmet_fc_target_port *tgtport,
> break;
> }
>
> - spin_lock(&tfcp_req->reqlock);
> + spin_lock_irqsave(&tfcp_req->reqlock, flags);
> tfcp_req->active = false;
> - spin_unlock(&tfcp_req->reqlock);
> + spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
>
> tgt_fcpreq->transferred_length = xfrlen;
> tgt_fcpreq->fcp_error = fcp_err;
> @@ -682,15 +686,16 @@ fcloop_tgt_fcp_abort(struct nvmet_fc_target_port *tgtport,
> struct nvmefc_tgt_fcp_req *tgt_fcpreq)
> {
> struct fcloop_fcpreq *tfcp_req = tgt_fcp_req_to_fcpreq(tgt_fcpreq);
> + unsigned long flags;
>
> /*
> * mark aborted only in case there were 2 threads in transport
> * (one doing io, other doing abort) and only kills ops posted
> * after the abort request
> */
> - spin_lock(&tfcp_req->reqlock);
> + spin_lock_irqsave(&tfcp_req->reqlock, flags);
> tfcp_req->aborted = true;
> - spin_unlock(&tfcp_req->reqlock);
> + spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
>
> tfcp_req->status = NVME_SC_INTERNAL;
>
> @@ -726,19 +731,20 @@ fcloop_fcp_abort(struct nvme_fc_local_port *localport,
> struct fcloop_ini_fcpreq *inireq = fcpreq->private;
> struct fcloop_fcpreq *tfcp_req;
> bool abortio = true;
> + unsigned long flags;
>
> - spin_lock(&inireq->inilock);
> + spin_lock_irqsave(&inireq->inilock, flags);
> tfcp_req = inireq->tfcp_req;
> if (tfcp_req)
> fcloop_tfcp_req_get(tfcp_req);
> - spin_unlock(&inireq->inilock);
> + spin_unlock_irqrestore(&inireq->inilock, flags);
>
> if (!tfcp_req)
> /* abort has already been called */
> return;
>
> /* break initiator/target relationship for io */
> - spin_lock(&tfcp_req->reqlock);
> + spin_lock_irqsave(&tfcp_req->reqlock, flags);
> switch (tfcp_req->inistate) {
> case INI_IO_START:
> case INI_IO_ACTIVE:
> @@ -748,11 +754,11 @@ fcloop_fcp_abort(struct nvme_fc_local_port *localport,
> abortio = false;
> break;
> default:
> - spin_unlock(&tfcp_req->reqlock);
> + spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
> WARN_ON(1);
> return;
> }
> - spin_unlock(&tfcp_req->reqlock);
> + spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
>
> if (abortio)
> /* leave the reference while the work item is scheduled */
>

2018-06-12 20:37:10

by James Smart

[permalink] [raw]

Subject: Re: [PATCH 4/5] nvmet: use atomic allocations when allocating fc requests

On 5/31/2018 2:31 AM, Sagi Grimberg wrote:
>
> Question, why isn't tfcp_req embedded in fcpreq? don't they have
> the same lifetime?
>

no they don't. To properly simulate cable-pulls, etc - the host side
and controller side effectively have their own "exchange" structure.
tfcp_req corresponds to the controller side. The lifetimes of the two
halves can differ.

-- james

2018-06-12 20:40:04

by James Smart

[permalink] [raw]

Subject: Re: [PATCH 4/5] nvmet: use atomic allocations when allocating fc requests

On 5/15/2018 12:40 AM, Johannes Thumshirn wrote:
> fcloop_fcp_req() runs with the hctx_lock (a rcu_read_lock() locked
> section) held, so memory allocations done in this context have to be
> atomic.
...
>
> Signed-off-by: Johannes Thumshirn <[email protected]>
> ---
> drivers/nvme/target/fcloop.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c
> index 34712def81b1..d2209c60f95f 100644
> --- a/drivers/nvme/target/fcloop.c
> +++ b/drivers/nvme/target/fcloop.c
> @@ -509,7 +509,7 @@ fcloop_fcp_req(struct nvme_fc_local_port *localport,
> if (!rport->targetport)
> return -ECONNREFUSED;
>
> - tfcp_req = kzalloc(sizeof(*tfcp_req), GFP_KERNEL);
> + tfcp_req = kzalloc(sizeof(*tfcp_req), GFP_ATOMIC);
> if (!tfcp_req)
> return -ENOMEM;
>

Reviewed-by: James Smart <[email protected]>

2018-06-12 20:42:53

by James Smart

[permalink] [raw]

Subject: Re: [PATCH 5/5] nvmet: fcloop: use irqsave spinlocks

On 5/15/2018 12:40 AM, Johannes Thumshirn wrote:
> Lockdep complains about inconsistent hardirq states when using
> nvme-fcloop. So use the irqsave variant of spin_locks for the
> nvmefc_fcp_req's reqlock.
>
> Here's the lockdep report:
> [ 11.138417] ================================
> [ 11.139085] WARNING: inconsistent lock state
> [ 11.139207] nvmet: creating controller 2 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:046394e7-bff7-4403-9502-1816800efaa0.
> [ 11.139727] 4.17.0-rc5+ #883 Not tainted
> [ 11.139732] --------------------------------
> [ 11.144441] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> [ 11.145341] swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> [ 11.146108] (ptrval) (&(&tfcp_req->reqlock)->rlock){?.+.}, at: fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [ 11.148633] {HARDIRQ-ON-W} state was registered at:
> [ 11.149380] _raw_spin_lock+0x34/0x70
> [ 11.149940] fcloop_fcp_recv_work+0x17/0x90 [nvme_fcloop]
> [ 11.150746] process_one_work+0x1d8/0x5c0
> [ 11.151346] worker_thread+0x45/0x3e0
> [ 11.151886] kthread+0x101/0x140
> [ 11.152370] ret_from_fork+0x3a/0x50
> [ 11.152903] irq event stamp: 36666
> [ 11.153402] hardirqs last enabled at (36663): [<ffffffffafc601c3>] default_idle+0x13/0x180
> [ 11.154601] hardirqs last disabled at (36664): [<ffffffffafe00964>] interrupt_entry+0xc4/0xe0
> [ 11.155817] softirqs last enabled at (36666): [<ffffffffaf267409>] irq_enter+0x59/0x60
> [ 11.156952] softirqs last disabled at (36665): [<ffffffffaf2673ee>] irq_enter+0x3e/0x60
> [ 11.158092]
> [ 11.158092] other info that might help us debug this:
> [ 11.159436] Possible unsafe locking scenario:
> [ 11.159436]
> [ 11.160299] CPU0
> [ 11.160663] ----
> [ 11.161026] lock(&(&tfcp_req->reqlock)->rlock);
> [ 11.161709] <Interrupt>
> [ 11.162091] lock(&(&tfcp_req->reqlock)->rlock);
> [ 11.163148]
> [ 11.163148] *** DEADLOCK ***
> [ 11.163148]
> [ 11.164007] no locks held by swapper/8/0.
> [ 11.164596]
> [ 11.164596] stack backtrace:
> [ 11.165238] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.17.0-rc5+ #883
> [ 11.166180] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
> [ 11.167673] Call Trace:
> [ 11.168037] <IRQ>
> [ 11.168349] dump_stack+0x78/0xb3
> [ 11.168864] print_usage_bug+0x1ed/0x1fe
> [ 11.169440] ? check_usage_backwards+0x110/0x110
> [ 11.170111] mark_lock+0x263/0x2a0
> [ 11.170995] __lock_acquire+0x675/0x1870
> [ 11.171865] ? __lock_acquire+0x48d/0x1870
> [ 11.172460] lock_acquire+0xd4/0x220
> [ 11.172981] ? fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [ 11.173709] _raw_spin_lock+0x34/0x70
> [ 11.174243] ? fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [ 11.174978] fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [ 11.175673] nvmet_fc_transfer_fcp_data+0x9b/0x130 [nvmet_fc]
> [ 11.176507] nvmet_req_complete+0x10/0x110 [nvmet]
> [ 11.177210] nvmet_bio_done+0x23/0x40 [nvmet]
> [ 11.177837] blk_update_request+0xab/0x3b0
> [ 11.178434] blk_mq_end_request+0x13/0x60
> [ 11.179033] flush_smp_call_function_queue+0x58/0x150
> [ 11.179755] smp_call_function_single_interrupt+0x49/0x260
> [ 11.180531] call_function_single_interrupt+0xf/0x20
> [ 11.181236] </IRQ>
> [ 11.181542] RIP: 0010:native_safe_halt+0x2/0x10
> [ 11.182186] RSP: 0018:ffffa481006cbec0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff04
> [ 11.183265] RAX: ffff9f54f8b86200 RBX: ffff9f54f8b86200 RCX: 0000000000000000
> [ 11.184264] RDX: ffff9f54f8b86200 RSI: 0000000000000001 RDI: ffff9f54f8b86200
> [ 11.185270] RBP: 0000000000000008 R08: 0000000000000000 R09: 0000000000000000
> [ 11.186271] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [ 11.187280] R13: 0000000000000000 R14: ffff9f54f8b86200 R15: ffff9f54f8b86200
> [ 11.188280] default_idle+0x18/0x180
> [ 11.188806] do_idle+0x176/0x260
> [ 11.189273] cpu_startup_entry+0x5a/0x60
> [ 11.189832] start_secondary+0x18f/0x1b0
>
> Signed-off-by: Johannes Thumshirn <[email protected]>
> Cc: James Smart <[email protected]>
> ---
> drivers/nvme/target/fcloop.c | 52 ++++++++++++++++++++++++--------------------
> 1 file changed, 29 insertions(+), 23 deletions(-)
>

Looks ok. I assume it is as the bio_done call can be in an interrupt
handler.

Reviewed-by: James Smart <[email protected]>

2018-06-12 20:50:03

by Jens Axboe

[permalink] [raw]

Subject: Re: [PATCH 5/5] nvmet: fcloop: use irqsave spinlocks

On 5/15/18 1:40 AM, Johannes Thumshirn wrote:
> Lockdep complains about inconsistent hardirq states when using
> nvme-fcloop. So use the irqsave variant of spin_locks for the
> nvmefc_fcp_req's reqlock.
>
> Here's the lockdep report:
> [ 11.138417] ================================
> [ 11.139085] WARNING: inconsistent lock state
> [ 11.139207] nvmet: creating controller 2 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:046394e7-bff7-4403-9502-1816800efaa0.
> [ 11.139727] 4.17.0-rc5+ #883 Not tainted
> [ 11.139732] --------------------------------
> [ 11.144441] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> [ 11.145341] swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> [ 11.146108] (ptrval) (&(&tfcp_req->reqlock)->rlock){?.+.}, at: fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [ 11.148633] {HARDIRQ-ON-W} state was registered at:
> [ 11.149380] _raw_spin_lock+0x34/0x70
> [ 11.149940] fcloop_fcp_recv_work+0x17/0x90 [nvme_fcloop]
> [ 11.150746] process_one_work+0x1d8/0x5c0
> [ 11.151346] worker_thread+0x45/0x3e0
> [ 11.151886] kthread+0x101/0x140
> [ 11.152370] ret_from_fork+0x3a/0x50
> [ 11.152903] irq event stamp: 36666
> [ 11.153402] hardirqs last enabled at (36663): [<ffffffffafc601c3>] default_idle+0x13/0x180
> [ 11.154601] hardirqs last disabled at (36664): [<ffffffffafe00964>] interrupt_entry+0xc4/0xe0
> [ 11.155817] softirqs last enabled at (36666): [<ffffffffaf267409>] irq_enter+0x59/0x60
> [ 11.156952] softirqs last disabled at (36665): [<ffffffffaf2673ee>] irq_enter+0x3e/0x60
> [ 11.158092]
> [ 11.158092] other info that might help us debug this:
> [ 11.159436] Possible unsafe locking scenario:
> [ 11.159436]
> [ 11.160299] CPU0
> [ 11.160663] ----
> [ 11.161026] lock(&(&tfcp_req->reqlock)->rlock);
> [ 11.161709] <Interrupt>
> [ 11.162091] lock(&(&tfcp_req->reqlock)->rlock);
> [ 11.163148]
> [ 11.163148] *** DEADLOCK ***
> [ 11.163148]
> [ 11.164007] no locks held by swapper/8/0.
> [ 11.164596]
> [ 11.164596] stack backtrace:
> [ 11.165238] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.17.0-rc5+ #883
> [ 11.166180] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
> [ 11.167673] Call Trace:
> [ 11.168037] <IRQ>
> [ 11.168349] dump_stack+0x78/0xb3
> [ 11.168864] print_usage_bug+0x1ed/0x1fe
> [ 11.169440] ? check_usage_backwards+0x110/0x110
> [ 11.170111] mark_lock+0x263/0x2a0
> [ 11.170995] __lock_acquire+0x675/0x1870
> [ 11.171865] ? __lock_acquire+0x48d/0x1870
> [ 11.172460] lock_acquire+0xd4/0x220
> [ 11.172981] ? fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [ 11.173709] _raw_spin_lock+0x34/0x70
> [ 11.174243] ? fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [ 11.174978] fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [ 11.175673] nvmet_fc_transfer_fcp_data+0x9b/0x130 [nvmet_fc]
> [ 11.176507] nvmet_req_complete+0x10/0x110 [nvmet]
> [ 11.177210] nvmet_bio_done+0x23/0x40 [nvmet]
> [ 11.177837] blk_update_request+0xab/0x3b0
> [ 11.178434] blk_mq_end_request+0x13/0x60
> [ 11.179033] flush_smp_call_function_queue+0x58/0x150
> [ 11.179755] smp_call_function_single_interrupt+0x49/0x260
> [ 11.180531] call_function_single_interrupt+0xf/0x20
> [ 11.181236] </IRQ>
> [ 11.181542] RIP: 0010:native_safe_halt+0x2/0x10
> [ 11.182186] RSP: 0018:ffffa481006cbec0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff04
> [ 11.183265] RAX: ffff9f54f8b86200 RBX: ffff9f54f8b86200 RCX: 0000000000000000
> [ 11.184264] RDX: ffff9f54f8b86200 RSI: 0000000000000001 RDI: ffff9f54f8b86200
> [ 11.185270] RBP: 0000000000000008 R08: 0000000000000000 R09: 0000000000000000
> [ 11.186271] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [ 11.187280] R13: 0000000000000000 R14: ffff9f54f8b86200 R15: ffff9f54f8b86200
> [ 11.188280] default_idle+0x18/0x180
> [ 11.188806] do_idle+0x176/0x260
> [ 11.189273] cpu_startup_entry+0x5a/0x60
> [ 11.189832] start_secondary+0x18f/0x1b0
>
> Signed-off-by: Johannes Thumshirn <[email protected]>
> Cc: James Smart <[email protected]>
> ---
> drivers/nvme/target/fcloop.c | 52 ++++++++++++++++++++++++--------------------
> 1 file changed, 29 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c
> index d2209c60f95f..f4a3700201c1 100644
> --- a/drivers/nvme/target/fcloop.c
> +++ b/drivers/nvme/target/fcloop.c
> @@ -406,9 +406,10 @@ fcloop_fcp_recv_work(struct work_struct *work)
> container_of(work, struct fcloop_fcpreq, fcp_rcv_work);
> struct nvmefc_fcp_req *fcpreq = tfcp_req->fcpreq;
> int ret = 0;
> + unsigned long flags;
> bool aborted = false;
>
> - spin_lock(&tfcp_req->reqlock);
> + spin_lock_irqsave(&tfcp_req->reqlock, flags);

This should just be spin_lock_irq(), you don't need irq saving locks when
you are always called from a workqueue handler.

Ditto the other workqueue handlers. fcloop_fcp_op() part looks fine.

--
Jens Axboe