Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp2886899rwb; Mon, 15 Aug 2022 13:19:44 -0700 (PDT) X-Google-Smtp-Source: AA6agR5zHIyzzoXAAYcS9d/gfPtSKkj3kt+y0WcWhCtHX/zAZgtZiihhn2GV5HU4DXJDsRqMzvfd X-Received: by 2002:a05:6402:5510:b0:43a:76ff:b044 with SMTP id fi16-20020a056402551000b0043a76ffb044mr16113146edb.197.1660594784792; Mon, 15 Aug 2022 13:19:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660594784; cv=none; d=google.com; s=arc-20160816; b=XMW8E11S6NdqRwZbqR32kwWNac8ytBfkXBN8be45RZwpS3CmCVrLtNKbQb5VuQwZJ1 V6ZEvY3Al3TsH8gHsScYKunIug78uQaSHvVuEi/i6IX8zhKk+nKX5LBJzQXY7y5k5hG7 KylH6OnHnEKCgKx89eM3B6/O8vxrGLIxDYeuKmel/D0aC31nhcIT2J7+H36HVkMjjl6e nCWeTrQ3vNVT+4bKyhgydbv/cT6D28oByPVbBAUTRVTsNhV4WkLWR8+Z4wEeFtFq7BK4 TOa05xXuWv+2vUVONFPiAZToaTmjrDYRIZ5I8atQzhYMFeJEc9Op3Tu6GwGcy+12lvLi HK+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=nT7JErUJBA9X2dBJF5mgTiO7zDqZDuNuEcLC2knNI5g=; b=pIwthF3xRAbc9QHkqiLQus3AS8VWfZXof5KQsnii7lY0oCjp797OEh1RvJIk2qAiKF FYWkTDabnGUORjHeyuTKxm/mx6BOZO3eG2d26FQcANzoQgu+ZtH7rWhMNIMP6RbkOFYK dhp+Vxv2TdbzznwG9YGSxwtAvzEpuemfk46U4QPAAGIOiBK8RTT33i3BERTsb5E7lA3e yTVGUOTy/J62PDMzZUd7Q8oOKe2hdSY44nFhc21RihqP/C2bYuax7cMU//QlzO9gMFv5 2Cmfy4OV7pfIIRdJdBL3HsIBkXpPn2uw+NJ8J440cU3SBRfdiBoXaDsBkNgy8nt8x/Cv UECg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=DTOEbBQD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p17-20020a056402075100b004381b6af19csi8042889edy.155.2022.08.15.13.19.17; Mon, 15 Aug 2022 13:19:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=DTOEbBQD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345595AbiHOTyq (ORCPT + 99 others); Mon, 15 Aug 2022 15:54:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44370 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345281AbiHOTxZ (ORCPT ); Mon, 15 Aug 2022 15:53:25 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1901B45047; Mon, 15 Aug 2022 11:51:18 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 98A8D611F9; Mon, 15 Aug 2022 18:51:17 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 54DE7C433D6; Mon, 15 Aug 2022 18:51:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1660589477; bh=dKazzpAFWzOaHCSaPPCObbfczVjo3Xo3sjYmNNBL5rM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=DTOEbBQDIFq0coPmVBJ01LDDXujiulgkWWIfBCD0j1L0tkZlUuu2opoqj+UO7eeSH NitBltswKzNkYK5Ic6auGHQ7r1SWPW4l5ee6AzHgre6FjmAKychTFR3vhkTDdSj/5r M8C9VwpIXKddLXvS4cD94nDT5VwwvfqsAlRTo4d4= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Justin Tee , James Smart , "Martin K. Petersen" , Sasha Levin Subject: [PATCH 5.15 700/779] scsi: lpfc: Fix EEH support for NVMe I/O Date: Mon, 15 Aug 2022 20:05:44 +0200 Message-Id: <20220815180407.290844270@linuxfoundation.org> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220815180337.130757997@linuxfoundation.org> References: <20220815180337.130757997@linuxfoundation.org> User-Agent: quilt/0.67 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: James Smart [ Upstream commit 25ac2c970be32993f1dff607f8354f3c053d42bc ] Injecting errors on the PCI slot while the driver is handling NVMe I/O will cause crashes and hangs. There are several rather difficult scenarios occurring. The main issue is that the adapter can report a PCI error before or simultaneously to the PCI subsystem reporting the error. Both paths have different entry points and currently there is no interlock between them. Thus multiple teardown paths are competing and all heck breaks loose. Complicating things is the NVMs path. To a large degree, I/O was able to be shutdown for a full FC port on the SCSI stack. But on NVMe, there isn't a similar call. At best, it works on a per-controller basis, but even at the controller level, it's a controller "reset" call. All of which means I/O is still flowing on different CPUs with reset paths expecting hw access (mailbox commands) to execute properly. The following modifications are made: - A new flag is set in PCI error entrypoints so the driver can track being called by that path. - An interlock is added in the SLI hw error path and the PCI error path such that only one of the paths proceeds with the teardown logic. - RPI cleanup is patched such that RPIs are marked unregistered w/o mbx cmds in cases of hw error. - If entering the SLI port re-init calls, a case where SLI error teardown was quick and beat the PCI calls now reporting error, check whether the SLI port is still live on the PCI bus. - In the PCI reset code to bring the adapter back, recheck the IRQ settings. Different checks for SLI3 vs SLI4. - In I/O completions, that may be called as part of the cleanup or underway just before the hw error, check the state of the adapter. If in error, shortcut handling that would expect further adapter completions as the hw error won't be sending them. - In routines waiting on I/O completions, which may have been in progress prior to the hw error, detect the device is being torn down and abort from their waits and just give up. This points to a larger issue in the driver on ref-counting for data structures, as it doesn't have ref-counting on q and port structures. We'll do this fix for now as it would be a major rework to be done differently. - Fix the NVMe cleanup to simulate NVMe I/O completions if I/O is being failed back due to hw error. - In I/O buf allocation, done at the start of new I/Os, check hw state and fail if hw error. Link: https://lore.kernel.org/r/20210910233159.115896-10-jsmart2021@gmail.com Co-developed-by: Justin Tee Signed-off-by: Justin Tee Signed-off-by: James Smart Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/lpfc/lpfc.h | 1 + drivers/scsi/lpfc/lpfc_hbadisc.c | 1 + drivers/scsi/lpfc/lpfc_init.c | 31 +++++++++++++++- drivers/scsi/lpfc/lpfc_nvme.c | 53 ++++++++++++++++++++++++++- drivers/scsi/lpfc/lpfc_scsi.c | 63 +++++++++++++++++++++----------- drivers/scsi/lpfc/lpfc_sli.c | 31 +++++++++++++++- drivers/scsi/lpfc/lpfc_sli4.h | 2 + 7 files changed, 154 insertions(+), 28 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h index 1044832b6054..b2508a00bafd 100644 --- a/drivers/scsi/lpfc/lpfc.h +++ b/drivers/scsi/lpfc/lpfc.h @@ -1028,6 +1028,7 @@ struct lpfc_hba { * Firmware supports Forced Link Speed * capability */ +#define HBA_PCI_ERR 0x80000 /* The PCI slot is offline */ #define HBA_FLOGI_ISSUED 0x100000 /* FLOGI was issued */ #define HBA_CGN_RSVD1 0x200000 /* Reserved CGN flag */ #define HBA_CGN_DAY_WRAP 0x400000 /* HBA Congestion info day wraps */ diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c index 3bb7c2aa949f..4bb0a15cfcc0 100644 --- a/drivers/scsi/lpfc/lpfc_hbadisc.c +++ b/drivers/scsi/lpfc/lpfc_hbadisc.c @@ -5349,6 +5349,7 @@ lpfc_unreg_rpi(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp) rc = lpfc_sli_issue_mbox(phba, mbox, MBX_NOWAIT); if (rc == MBX_NOT_FINISHED) { + ndlp->nlp_flag &= ~NLP_UNREG_INP; mempool_free(mbox, phba->mbox_mem_pool); acc_plogi = 1; } diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index a2694fb32b5d..cda571393a5f 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -1606,6 +1606,11 @@ void lpfc_sli4_offline_eratt(struct lpfc_hba *phba) { spin_lock_irq(&phba->hbalock); + if (phba->link_state == LPFC_HBA_ERROR && + phba->hba_flag & HBA_PCI_ERR) { + spin_unlock_irq(&phba->hbalock); + return; + } phba->link_state = LPFC_HBA_ERROR; spin_unlock_irq(&phba->hbalock); @@ -1945,7 +1950,6 @@ lpfc_handle_eratt_s4(struct lpfc_hba *phba) if (pci_channel_offline(phba->pcidev)) { lpfc_printf_log(phba, KERN_ERR, LOG_TRACE_EVENT, "3166 pci channel is offline\n"); - lpfc_sli4_offline_eratt(phba); return; } @@ -3643,6 +3647,7 @@ lpfc_offline_prep(struct lpfc_hba *phba, int mbx_action) struct lpfc_vport **vports; struct Scsi_Host *shost; int i; + int offline = 0; if (vport->fc_flag & FC_OFFLINE_MODE) return; @@ -3651,6 +3656,8 @@ lpfc_offline_prep(struct lpfc_hba *phba, int mbx_action) lpfc_linkdown(phba); + offline = pci_channel_offline(phba->pcidev); + /* Issue an unreg_login to all nodes on all vports */ vports = lpfc_create_vport_work_array(phba); if (vports != NULL) { @@ -3673,7 +3680,14 @@ lpfc_offline_prep(struct lpfc_hba *phba, int mbx_action) ndlp->nlp_flag &= ~NLP_NPR_ADISC; spin_unlock_irq(&ndlp->lock); - lpfc_unreg_rpi(vports[i], ndlp); + if (offline) { + spin_lock_irq(&ndlp->lock); + ndlp->nlp_flag &= ~(NLP_UNREG_INP | + NLP_RPI_REGISTERED); + spin_unlock_irq(&ndlp->lock); + } else { + lpfc_unreg_rpi(vports[i], ndlp); + } /* * Whenever an SLI4 port goes offline, free the * RPI. Get a new RPI when the adapter port @@ -14070,6 +14084,10 @@ lpfc_pci_resume_one_s3(struct device *dev_d) return error; } + /* Init cpu_map array */ + lpfc_cpu_map_array_init(phba); + /* Init hba_eq_hdl array */ + lpfc_hba_eq_hdl_array_init(phba); /* Configure and enable interrupt */ intr_mode = lpfc_sli_enable_intr(phba, phba->intr_mode); if (intr_mode == LPFC_INTR_ERROR) { @@ -15023,14 +15041,17 @@ lpfc_io_error_detected_s4(struct pci_dev *pdev, pci_channel_state_t state) lpfc_sli4_prep_dev_for_recover(phba); return PCI_ERS_RESULT_CAN_RECOVER; case pci_channel_io_frozen: + phba->hba_flag |= HBA_PCI_ERR; /* Fatal error, prepare for slot reset */ lpfc_sli4_prep_dev_for_reset(phba); return PCI_ERS_RESULT_NEED_RESET; case pci_channel_io_perm_failure: + phba->hba_flag |= HBA_PCI_ERR; /* Permanent failure, prepare for device down */ lpfc_sli4_prep_dev_for_perm_failure(phba); return PCI_ERS_RESULT_DISCONNECT; default: + phba->hba_flag |= HBA_PCI_ERR; /* Unknown state, prepare and request slot reset */ lpfc_printf_log(phba, KERN_ERR, LOG_TRACE_EVENT, "2825 Unknown PCI error state: x%x\n", state); @@ -15074,6 +15095,7 @@ lpfc_io_slot_reset_s4(struct pci_dev *pdev) pci_restore_state(pdev); + phba->hba_flag &= ~HBA_PCI_ERR; /* * As the new kernel behavior of pci_restore_state() API call clears * device saved_state flag, need to save the restored state again. @@ -15098,6 +15120,7 @@ lpfc_io_slot_reset_s4(struct pci_dev *pdev) return PCI_ERS_RESULT_DISCONNECT; } else phba->intr_mode = intr_mode; + lpfc_cpu_affinity_check(phba, phba->cfg_irq_chann); /* Log the current active interrupt mode */ lpfc_log_intr_mode(phba, phba->intr_mode); @@ -15299,6 +15322,10 @@ lpfc_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state) struct lpfc_hba *phba = ((struct lpfc_vport *)shost->hostdata)->phba; pci_ers_result_t rc = PCI_ERS_RESULT_DISCONNECT; + if (phba->link_state == LPFC_HBA_ERROR && + phba->hba_flag & HBA_IOQ_FLUSH) + return PCI_ERS_RESULT_NEED_RESET; + switch (phba->pci_dev_grp) { case LPFC_PCI_DEV_LP: rc = lpfc_io_error_detected_s3(pdev, state); diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c index 4fb3dc5092f5..4e0c0b273e5f 100644 --- a/drivers/scsi/lpfc/lpfc_nvme.c +++ b/drivers/scsi/lpfc/lpfc_nvme.c @@ -937,6 +937,7 @@ lpfc_nvme_io_cmd_wqe_cmpl(struct lpfc_hba *phba, struct lpfc_iocbq *pwqeIn, #ifdef CONFIG_SCSI_LPFC_DEBUG_FS int cpu; #endif + int offline = 0; /* Sanity check on return of outstanding command */ if (!lpfc_ncmd) { @@ -1098,11 +1099,12 @@ lpfc_nvme_io_cmd_wqe_cmpl(struct lpfc_hba *phba, struct lpfc_iocbq *pwqeIn, nCmd->transferred_length = 0; nCmd->rcv_rsplen = 0; nCmd->status = NVME_SC_INTERNAL; + offline = pci_channel_offline(vport->phba->pcidev); } } /* pick up SLI4 exhange busy condition */ - if (bf_get(lpfc_wcqe_c_xb, wcqe)) + if (bf_get(lpfc_wcqe_c_xb, wcqe) && !offline) lpfc_ncmd->flags |= LPFC_SBUF_XBUSY; else lpfc_ncmd->flags &= ~LPFC_SBUF_XBUSY; @@ -2174,6 +2176,10 @@ lpfc_nvme_lport_unreg_wait(struct lpfc_vport *vport, abts_nvme = 0; for (i = 0; i < phba->cfg_hdw_queue; i++) { qp = &phba->sli4_hba.hdwq[i]; + if (!vport || !vport->localport || + !qp || !qp->io_wq) + return; + pring = qp->io_wq->pring; if (!pring) continue; @@ -2181,6 +2187,10 @@ lpfc_nvme_lport_unreg_wait(struct lpfc_vport *vport, abts_scsi += qp->abts_scsi_io_bufs; abts_nvme += qp->abts_nvme_io_bufs; } + if (!vport || !vport->localport || + vport->phba->hba_flag & HBA_PCI_ERR) + return; + lpfc_printf_vlog(vport, KERN_ERR, LOG_TRACE_EVENT, "6176 Lport x%px Localport x%px wait " "timed out. Pending %d [%d:%d]. " @@ -2220,6 +2230,8 @@ lpfc_nvme_destroy_localport(struct lpfc_vport *vport) return; localport = vport->localport; + if (!localport) + return; lport = (struct lpfc_nvme_lport *)localport->private; lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME, @@ -2536,7 +2548,8 @@ lpfc_nvme_unregister_port(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp) * return values is ignored. The upcall is a courtesy to the * transport. */ - if (vport->load_flag & FC_UNLOADING) + if (vport->load_flag & FC_UNLOADING || + unlikely(vport->phba->hba_flag & HBA_PCI_ERR)) (void)nvme_fc_set_remoteport_devloss(remoteport, 0); ret = nvme_fc_unregister_remoteport(remoteport); @@ -2564,6 +2577,42 @@ lpfc_nvme_unregister_port(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp) vport->localport, ndlp->rport, ndlp->nlp_DID); } +/** + * lpfc_sli4_nvme_pci_offline_aborted - Fast-path process of NVME xri abort + * @phba: pointer to lpfc hba data structure. + * @lpfc_ncmd: The nvme job structure for the request being aborted. + * + * This routine is invoked by the worker thread to process a SLI4 fast-path + * NVME aborted xri. Aborted NVME IO commands are completed to the transport + * here. + **/ +void +lpfc_sli4_nvme_pci_offline_aborted(struct lpfc_hba *phba, + struct lpfc_io_buf *lpfc_ncmd) +{ + struct nvmefc_fcp_req *nvme_cmd = NULL; + + lpfc_printf_log(phba, KERN_INFO, LOG_NVME_ABTS, + "6533 %s nvme_cmd %p tag x%x abort complete and " + "xri released\n", __func__, + lpfc_ncmd->nvmeCmd, + lpfc_ncmd->cur_iocbq.iotag); + + /* Aborted NVME commands are required to not complete + * before the abort exchange command fully completes. + * Once completed, it is available via the put list. + */ + if (lpfc_ncmd->nvmeCmd) { + nvme_cmd = lpfc_ncmd->nvmeCmd; + nvme_cmd->transferred_length = 0; + nvme_cmd->rcv_rsplen = 0; + nvme_cmd->status = NVME_SC_INTERNAL; + nvme_cmd->done(nvme_cmd); + lpfc_ncmd->nvmeCmd = NULL; + } + lpfc_release_nvme_buf(phba, lpfc_ncmd); +} + /** * lpfc_sli4_nvme_xri_aborted - Fast-path process of NVME xri abort * @phba: pointer to lpfc hba data structure. diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c index c6944b282e21..ea93b7e07b7e 100644 --- a/drivers/scsi/lpfc/lpfc_scsi.c +++ b/drivers/scsi/lpfc/lpfc_scsi.c @@ -493,8 +493,8 @@ void lpfc_sli4_io_xri_aborted(struct lpfc_hba *phba, struct sli4_wcqe_xri_aborted *axri, int idx) { - uint16_t xri = bf_get(lpfc_wcqe_xa_xri, axri); - uint16_t rxid = bf_get(lpfc_wcqe_xa_remote_xid, axri); + u16 xri = 0; + u16 rxid = 0; struct lpfc_io_buf *psb, *next_psb; struct lpfc_sli4_hdw_queue *qp; unsigned long iflag = 0; @@ -504,15 +504,22 @@ lpfc_sli4_io_xri_aborted(struct lpfc_hba *phba, int rrq_empty = 0; struct lpfc_sli_ring *pring = phba->sli4_hba.els_wq->pring; struct scsi_cmnd *cmd; + int offline = 0; if (!(phba->cfg_enable_fc4_type & LPFC_ENABLE_FCP)) return; - + offline = pci_channel_offline(phba->pcidev); + if (!offline) { + xri = bf_get(lpfc_wcqe_xa_xri, axri); + rxid = bf_get(lpfc_wcqe_xa_remote_xid, axri); + } qp = &phba->sli4_hba.hdwq[idx]; spin_lock_irqsave(&phba->hbalock, iflag); spin_lock(&qp->abts_io_buf_list_lock); list_for_each_entry_safe(psb, next_psb, &qp->lpfc_abts_io_buf_list, list) { + if (offline) + xri = psb->cur_iocbq.sli4_xritag; if (psb->cur_iocbq.sli4_xritag == xri) { list_del_init(&psb->list); psb->flags &= ~LPFC_SBUF_XBUSY; @@ -521,8 +528,15 @@ lpfc_sli4_io_xri_aborted(struct lpfc_hba *phba, qp->abts_nvme_io_bufs--; spin_unlock(&qp->abts_io_buf_list_lock); spin_unlock_irqrestore(&phba->hbalock, iflag); - lpfc_sli4_nvme_xri_aborted(phba, axri, psb); - return; + if (!offline) { + lpfc_sli4_nvme_xri_aborted(phba, axri, + psb); + return; + } + lpfc_sli4_nvme_pci_offline_aborted(phba, psb); + spin_lock_irqsave(&phba->hbalock, iflag); + spin_lock(&qp->abts_io_buf_list_lock); + continue; } qp->abts_scsi_io_bufs--; spin_unlock(&qp->abts_io_buf_list_lock); @@ -534,13 +548,13 @@ lpfc_sli4_io_xri_aborted(struct lpfc_hba *phba, rrq_empty = list_empty(&phba->active_rrq_list); spin_unlock_irqrestore(&phba->hbalock, iflag); - if (ndlp) { + if (ndlp && !offline) { lpfc_set_rrq_active(phba, ndlp, psb->cur_iocbq.sli4_lxritag, rxid, 1); lpfc_sli4_abts_err_handler(phba, ndlp, axri); } - if (phba->cfg_fcp_wait_abts_rsp) { + if (phba->cfg_fcp_wait_abts_rsp || offline) { spin_lock_irqsave(&psb->buf_lock, iflag); cmd = psb->pCmd; psb->pCmd = NULL; @@ -567,25 +581,30 @@ lpfc_sli4_io_xri_aborted(struct lpfc_hba *phba, lpfc_release_scsi_buf_s4(phba, psb); if (rrq_empty) lpfc_worker_wake_up(phba); - return; + if (!offline) + return; + spin_lock_irqsave(&phba->hbalock, iflag); + spin_lock(&qp->abts_io_buf_list_lock); + continue; } } spin_unlock(&qp->abts_io_buf_list_lock); - for (i = 1; i <= phba->sli.last_iotag; i++) { - iocbq = phba->sli.iocbq_lookup[i]; - - if (!(iocbq->iocb_flag & LPFC_IO_FCP) || - (iocbq->iocb_flag & LPFC_IO_LIBDFC)) - continue; - if (iocbq->sli4_xritag != xri) - continue; - psb = container_of(iocbq, struct lpfc_io_buf, cur_iocbq); - psb->flags &= ~LPFC_SBUF_XBUSY; - spin_unlock_irqrestore(&phba->hbalock, iflag); - if (!list_empty(&pring->txq)) - lpfc_worker_wake_up(phba); - return; + if (!offline) { + for (i = 1; i <= phba->sli.last_iotag; i++) { + iocbq = phba->sli.iocbq_lookup[i]; + if (!(iocbq->iocb_flag & LPFC_IO_FCP) || + (iocbq->iocb_flag & LPFC_IO_LIBDFC)) + continue; + if (iocbq->sli4_xritag != xri) + continue; + psb = container_of(iocbq, struct lpfc_io_buf, cur_iocbq); + psb->flags &= ~LPFC_SBUF_XBUSY; + spin_unlock_irqrestore(&phba->hbalock, iflag); + if (!list_empty(&pring->txq)) + lpfc_worker_wake_up(phba); + return; + } } spin_unlock_irqrestore(&phba->hbalock, iflag); } diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c index d8d26cde70b6..e905a246a2d6 100644 --- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -1404,7 +1404,8 @@ __lpfc_sli_release_iocbq_s4(struct lpfc_hba *phba, struct lpfc_iocbq *iocbq) } if ((iocbq->iocb_flag & LPFC_EXCHANGE_BUSY) && - (sglq->state != SGL_XRI_ABORTED)) { + (!(unlikely(pci_channel_offline(phba->pcidev)))) && + sglq->state != SGL_XRI_ABORTED) { spin_lock_irqsave(&phba->sli4_hba.sgl_list_lock, iflag); @@ -4583,10 +4584,12 @@ lpfc_sli_flush_io_rings(struct lpfc_hba *phba) lpfc_sli_cancel_iocbs(phba, &txq, IOSTAT_LOCAL_REJECT, IOERR_SLI_DOWN); - /* Flush the txcmpq */ + /* Flush the txcmplq */ lpfc_sli_cancel_iocbs(phba, &txcmplq, IOSTAT_LOCAL_REJECT, IOERR_SLI_DOWN); + if (unlikely(pci_channel_offline(phba->pcidev))) + lpfc_sli4_io_xri_aborted(phba, NULL, 0); } } else { pring = &psli->sli3_ring[LPFC_FCP_RING]; @@ -22035,8 +22038,26 @@ lpfc_get_io_buf_from_multixri_pools(struct lpfc_hba *phba, qp = &phba->sli4_hba.hdwq[hwqid]; lpfc_ncmd = NULL; + if (!qp) { + lpfc_printf_log(phba, KERN_INFO, + LOG_SLI | LOG_NVME_ABTS | LOG_FCP, + "5556 NULL qp for hwqid x%x\n", hwqid); + return lpfc_ncmd; + } multixri_pool = qp->p_multixri_pool; + if (!multixri_pool) { + lpfc_printf_log(phba, KERN_INFO, + LOG_SLI | LOG_NVME_ABTS | LOG_FCP, + "5557 NULL multixri for hwqid x%x\n", hwqid); + return lpfc_ncmd; + } pvt_pool = &multixri_pool->pvt_pool; + if (!pvt_pool) { + lpfc_printf_log(phba, KERN_INFO, + LOG_SLI | LOG_NVME_ABTS | LOG_FCP, + "5558 NULL pvt_pool for hwqid x%x\n", hwqid); + return lpfc_ncmd; + } multixri_pool->io_req_count++; /* If pvt_pool is empty, move some XRIs from public to private pool */ @@ -22112,6 +22133,12 @@ struct lpfc_io_buf *lpfc_get_io_buf(struct lpfc_hba *phba, qp = &phba->sli4_hba.hdwq[hwqid]; lpfc_cmd = NULL; + if (!qp) { + lpfc_printf_log(phba, KERN_WARNING, + LOG_SLI | LOG_NVME_ABTS | LOG_FCP, + "5555 NULL qp for hwqid x%x\n", hwqid); + return lpfc_cmd; + } if (phba->cfg_xri_rebalancing) lpfc_cmd = lpfc_get_io_buf_from_multixri_pools( diff --git a/drivers/scsi/lpfc/lpfc_sli4.h b/drivers/scsi/lpfc/lpfc_sli4.h index 99c5d1e4da5e..5962cf508842 100644 --- a/drivers/scsi/lpfc/lpfc_sli4.h +++ b/drivers/scsi/lpfc/lpfc_sli4.h @@ -1116,6 +1116,8 @@ void lpfc_sli4_fcf_redisc_event_proc(struct lpfc_hba *); int lpfc_sli4_resume_rpi(struct lpfc_nodelist *, void (*)(struct lpfc_hba *, LPFC_MBOXQ_t *), void *); void lpfc_sli4_els_xri_abort_event_proc(struct lpfc_hba *phba); +void lpfc_sli4_nvme_pci_offline_aborted(struct lpfc_hba *phba, + struct lpfc_io_buf *lpfc_ncmd); void lpfc_sli4_nvme_xri_aborted(struct lpfc_hba *phba, struct sli4_wcqe_xri_aborted *axri, struct lpfc_io_buf *lpfc_ncmd); -- 2.35.1