2020-08-27 09:59:41

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH 4/4] qla2xxx: Handle incorrect entry_type entries

It was observed on an ISP8324 16Gb HBA with fw=8.08.203 (d0d5) that
pkt->entry_type was MBX_IOCB_TYPE/0x39 with an sp->type SRB_SCSI_CMD
which is invalid and should not be possible.

A careful code review of the crash dump didn't reveal any short
comings. Reading the entry_type from the crash dump shows the expected
value of STATUS_TYPE/0x03 but the call trace shows that
qla24xx_mbx_iocb_entry() is used.

One possible explanation is when pkt->entry_type is read it doesn't
contain the correct information. That means the driver observes an data
race by the firmware.

Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/scsi/qla2xxx/qla_isr.c | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
index b787643f5031..0c324e88b189 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -3392,6 +3392,31 @@ void qla24xx_nvme_ls4_iocb(struct scsi_qla_host *vha,
sp->done(sp, comp_status);
}

+static void qla24xx_process_mbx_iocb_response(struct scsi_qla_host *vha,
+ struct rsp_que *rsp, struct sts_entry_24xx *pkt)
+{
+ srb_t *sp;
+
+ sp = qla2x00_get_sp_from_handle(vha, rsp->req, pkt);
+ if (!sp)
+ return;
+
+ if (sp->type == SRB_SCSI_CMD ||
+ sp->type == SRB_NVME_CMD ||
+ sp->type == SRB_TM_CMD) {
+ /* Some firmware version don't update the entry_type
+ * correctly. It was observed entry_type contained
+ * MBCX_IOCB_TYPE instead of the expected STATUS_TYPE
+ * for sp->type SRB_SCSI_CMD, SRB_NVME_CMD or
+ * SRB_TM_CMD.
+ */
+ qla2x00_status_entry(vha, rsp, pkt);
+ return;
+ }
+
+ qla24xx_mbx_iocb_entry(vha, rsp->req, (struct mbx_24xx_entry *)pkt);
+}
+
/**
* qla24xx_process_response_queue() - Process response queue entries.
* @vha: SCSI driver HA context
@@ -3499,8 +3524,7 @@ void qla24xx_process_response_queue(struct scsi_qla_host *vha,
(struct abort_entry_24xx *)pkt);
break;
case MBX_IOCB_TYPE:
- qla24xx_mbx_iocb_entry(vha, rsp->req,
- (struct mbx_24xx_entry *)pkt);
+ qla24xx_process_mbx_iocb_response(vha, rsp, pkt);
break;
case VP_CTRL_IOCB_TYPE:
qla_ctrlvp_completed(vha, rsp->req,
--
2.16.4


2020-08-27 10:21:13

by Martin Wilck

[permalink] [raw]
Subject: Re: [PATCH 4/4] qla2xxx: Handle incorrect entry_type entries

On Thu, 2020-08-27 at 11:58 +0200, Daniel Wagner wrote:
> It was observed on an ISP8324 16Gb HBA with fw=8.08.203 (d0d5) that
> pkt->entry_type was MBX_IOCB_TYPE/0x39 with an sp->type SRB_SCSI_CMD
> which is invalid and should not be possible.
>
> A careful code review of the crash dump didn't reveal any short
> comings. Reading the entry_type from the crash dump shows the
> expected
> value of STATUS_TYPE/0x03 but the call trace shows that
> qla24xx_mbx_iocb_entry() is used.
>
> One possible explanation is when pkt->entry_type is read it doesn't
> contain the correct information. That means the driver observes an
> data
> race by the firmware.
>
> Signed-off-by: Daniel Wagner <[email protected]>
> ---
> drivers/scsi/qla2xxx/qla_isr.c | 28 ++++++++++++++++++++++++++--
> 1 file changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/scsi/qla2xxx/qla_isr.c
> b/drivers/scsi/qla2xxx/qla_isr.c
> index b787643f5031..0c324e88b189 100644
> --- a/drivers/scsi/qla2xxx/qla_isr.c
> +++ b/drivers/scsi/qla2xxx/qla_isr.c
> @@ -3392,6 +3392,31 @@ void qla24xx_nvme_ls4_iocb(struct
> scsi_qla_host *vha,
> sp->done(sp, comp_status);
> }
>
> +static void qla24xx_process_mbx_iocb_response(struct scsi_qla_host
> *vha,
> + struct rsp_que *rsp, struct sts_entry_24xx *pkt)
> +{
> + srb_t *sp;
> +
> + sp = qla2x00_get_sp_from_handle(vha, rsp->req, pkt);
> + if (!sp)
> + return;
> +
> + if (sp->type == SRB_SCSI_CMD ||
> + sp->type == SRB_NVME_CMD ||
> + sp->type == SRB_TM_CMD) {
> + /* Some firmware version don't update the entry_type
> + * correctly. It was observed entry_type contained
> + * MBCX_IOCB_TYPE instead of the expected STATUS_TYPE
> + * for sp->type SRB_SCSI_CMD, SRB_NVME_CMD or
> + * SRB_TM_CMD.
> + */
> + qla2x00_status_entry(vha, rsp, pkt);
> + return;
> + }
> +
> + qla24xx_mbx_iocb_entry(vha, rsp->req, (struct mbx_24xx_entry
> *)pkt);
> +}
> +
> /**
> * qla24xx_process_response_queue() - Process response queue
> entries.
> * @vha: SCSI driver HA context
> @@ -3499,8 +3524,7 @@ void qla24xx_process_response_queue(struct
> scsi_qla_host *vha,
> (struct abort_entry_24xx *)pkt);
> break;
> case MBX_IOCB_TYPE:
> - qla24xx_mbx_iocb_entry(vha, rsp->req,
> - (struct mbx_24xx_entry *)pkt);
> + qla24xx_process_mbx_iocb_response(vha, rsp,
> pkt);
> break;
> case VP_CTRL_IOCB_TYPE:
> qla_ctrlvp_completed(vha, rsp->req,

Should we perhaps log an error message when we detect a mismatch
between sp->type and entry_type?

Regards,
Martin


2020-08-27 12:12:46

by Daniel Wagner

[permalink] [raw]
Subject: Re: [PATCH 4/4] qla2xxx: Handle incorrect entry_type entries

On Thu, Aug 27, 2020 at 12:17:13PM +0200, Martin Wilck wrote:
> Should we perhaps log an error message when we detect a mismatch
> between sp->type and entry_type?

Sure can do, but does it really help? Not much we can do in the
driver. I hope the firmware gets fixed eventually. I am not against it,
just not sure if the log entry really is helping except saying 'you are
using a firmware with a known issue'.

2020-08-27 12:52:07

by Martin Wilck

[permalink] [raw]
Subject: Re: [PATCH 4/4] qla2xxx: Handle incorrect entry_type entries

On Thu, 2020-08-27 at 13:46 +0200, Daniel Wagner wrote:
> On Thu, Aug 27, 2020 at 12:17:13PM +0200, Martin Wilck wrote:
> > Should we perhaps log an error message when we detect a mismatch
> > between sp->type and entry_type?
>
> Sure can do, but does it really help? Not much we can do in the
> driver. I hope the firmware gets fixed eventually. I am not against
> it,
> just not sure if the log entry really is helping except saying 'you
> are
> using a firmware with a known issue'.
>

... which might provide insightful, to users as well as perhaps
developers (by observing under which conditions this problem occurs).
I'd hope so, at least. But you know this issue much better than me.

Regards,
Martin