2023-07-26 18:39:32

by Chengfeng Ye

[permalink] [raw]
Subject: [PATCH v2] scsi: lpfc: Fix potential deadlock on &ndlp->lock

As &ndlp->lock is acquired by timer lpfc_els_retry_delay() under softirq
context, process context code acquiring the lock &ndlp->lock should
disable irq or bh, otherwise deadlock could happen if the timer preempt
the execution while the lock is held in process context on the same CPU.

The two lock acquisition inside lpfc_cleanup_pending_mbox() does not
disable irq or softirq.

[Deadlock Scenario]
lpfc_cmpl_els_fdisc()
-> lpfc_cleanup_pending_mbox()
-> spin_lock(&ndlp->lock);
<irq>
-> lpfc_els_retry_delay()
-> lpfc_nlp_get()
-> spin_lock_irqsave(&ndlp->lock, flags); (deadlock here)

This flaw was found by an experimental static analysis tool I am
developing for irq-related deadlock.

The patch fix the potential deadlock by spin_lock_irq() to disable
irq.

Signed-off-by: Chengfeng Ye <[email protected]>
---
drivers/scsi/lpfc/lpfc_sli.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 58d10f8f75a7..8555f6bb9742 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -21049,9 +21049,9 @@ lpfc_cleanup_pending_mbox(struct lpfc_vport *vport)
mb->mbox_flag |= LPFC_MBX_IMED_UNREG;
restart_loop = 1;
spin_unlock_irq(&phba->hbalock);
- spin_lock(&ndlp->lock);
+ spin_lock_irq(&ndlp->lock);
ndlp->nlp_flag &= ~NLP_IGNR_REG_CMPL;
- spin_unlock(&ndlp->lock);
+ spin_unlock_irq(&ndlp->lock);
spin_lock_irq(&phba->hbalock);
break;
}
@@ -21067,9 +21067,9 @@ lpfc_cleanup_pending_mbox(struct lpfc_vport *vport)
ndlp = (struct lpfc_nodelist *)mb->ctx_ndlp;
mb->ctx_ndlp = NULL;
if (ndlp) {
- spin_lock(&ndlp->lock);
+ spin_lock_irq(&ndlp->lock);
ndlp->nlp_flag &= ~NLP_IGNR_REG_CMPL;
- spin_unlock(&ndlp->lock);
+ spin_unlock_irq(&ndlp->lock);
lpfc_nlp_put(ndlp);
}
}
--
2.17.1



2023-07-27 05:34:24

by Chengfeng Ye

[permalink] [raw]
Subject: Re: [PATCH v2] scsi: lpfc: Fix potential deadlock on &ndlp->lock

Sorry for the interruption, I just notice that the ndlp node
inside timer does not share with that of lpfc_cleanup_pending_mbox().

This is a false alarm and sorry again for this.

Best regards,
Chengfeng