From: Hannes Reinecke <[email protected]>
LUNs going into “failed ready running” state observed on >1T and on
even numbers of size (2T, 4T, 6T, 8T and 10T). The issue occurs when
DIF is enabled at the host.
The kernel logs:
Cannot setup S/G List for HBAIO segs 1/1 SGL 512 SCSI 256: 3 0
The host lpfc driver is failing to setup scatter/gather list
(protection data) for the IO's.
The return type lpfc_bg_setup_sgl()/lpfc_bg_setup_sgl_prot() causes
the compiler to remove the most significant bit. Use an unsigned type
instead.
Signed-off-by: Hannes Reinecke <[email protected]>
[dwagner: added commit message]
Signed-off-by: Daniel Wagner <[email protected]>
---
drivers/scsi/lpfc/lpfc_scsi.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index d26941b131fd..bf879d81846b 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -1918,7 +1918,7 @@ lpfc_bg_setup_bpl_prot(struct lpfc_hba *phba, struct scsi_cmnd *sc,
*
* Returns the number of SGEs added to the SGL.
**/
-static int
+static uint32_t
lpfc_bg_setup_sgl(struct lpfc_hba *phba, struct scsi_cmnd *sc,
struct sli4_sge *sgl, int datasegcnt,
struct lpfc_io_buf *lpfc_cmd)
@@ -1926,8 +1926,8 @@ lpfc_bg_setup_sgl(struct lpfc_hba *phba, struct scsi_cmnd *sc,
struct scatterlist *sgde = NULL; /* s/g data entry */
struct sli4_sge_diseed *diseed = NULL;
dma_addr_t physaddr;
- int i = 0, num_sge = 0, status;
- uint32_t reftag;
+ int i = 0, status;
+ uint32_t reftag, num_sge = 0;
uint8_t txop, rxop;
#ifdef CONFIG_SCSI_LPFC_DEBUG_FS
uint32_t rc;
@@ -2099,7 +2099,7 @@ lpfc_bg_setup_sgl(struct lpfc_hba *phba, struct scsi_cmnd *sc,
*
* Returns the number of SGEs added to the SGL.
**/
-static int
+static uint32_t
lpfc_bg_setup_sgl_prot(struct lpfc_hba *phba, struct scsi_cmnd *sc,
struct sli4_sge *sgl, int datacnt, int protcnt,
struct lpfc_io_buf *lpfc_cmd)
@@ -2123,8 +2123,8 @@ lpfc_bg_setup_sgl_prot(struct lpfc_hba *phba, struct scsi_cmnd *sc,
uint32_t rc;
#endif
uint32_t checking = 1;
- uint32_t dma_offset = 0;
- int num_sge = 0, j = 2;
+ uint32_t dma_offset = 0, num_sge = 0;
+ int j = 2;
struct sli4_hybrid_sgl *sgl_xtra = NULL;
sgpe = scsi_prot_sglist(sc);
--
2.43.0
The change is good, however, I don't think this was really the problem.
We would like to know if this patch really solved an issue they observed.
A good data point to know is what adapter they're using.
If that adapter supports hybrid sgl (i.e. phba->cfg_xpsgl), then we
would have set the max sg_tablesize = LPFC_MAX_SG_TABLESIZE = 0xffff.
But even then, this patch implies that dma_map_sg() returned a crazy
huge amount with the MSB set.
On Wed, Dec 20, 2023 at 8:29 AM Daniel Wagner <[email protected]> wrote:
> From: Hannes Reinecke <[email protected]>
>
> LUNs going into “failed ready running” state observed on >1T and on
> even numbers of size (2T, 4T, 6T, 8T and 10T). The issue occurs when
> DIF is enabled at the host.
>
> The kernel logs:
>
> Cannot setup S/G List for HBAIO segs 1/1 SGL 512 SCSI 256: 3 0
>
> The host lpfc driver is failing to setup scatter/gather list
> (protection data) for the IO's.
>
> The return type lpfc_bg_setup_sgl()/lpfc_bg_setup_sgl_prot() causes
> the compiler to remove the most significant bit. Use an unsigned type
> instead.
>
> Signed-off-by: Hannes Reinecke <[email protected]>
> [dwagner: added commit message]
> Signed-off-by: Daniel Wagner <[email protected]>
> ---
> drivers/scsi/lpfc/lpfc_scsi.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
> index d26941b131fd..bf879d81846b 100644
> --- a/drivers/scsi/lpfc/lpfc_scsi.c
> +++ b/drivers/scsi/lpfc/lpfc_scsi.c
> @@ -1918,7 +1918,7 @@ lpfc_bg_setup_bpl_prot(struct lpfc_hba *phba, struct
> scsi_cmnd *sc,
> *
> * Returns the number of SGEs added to the SGL.
> **/
> -static int
> +static uint32_t
> lpfc_bg_setup_sgl(struct lpfc_hba *phba, struct scsi_cmnd *sc,
> struct sli4_sge *sgl, int datasegcnt,
> struct lpfc_io_buf *lpfc_cmd)
> @@ -1926,8 +1926,8 @@ lpfc_bg_setup_sgl(struct lpfc_hba *phba, struct
> scsi_cmnd *sc,
> struct scatterlist *sgde = NULL; /* s/g data entry */
> struct sli4_sge_diseed *diseed = NULL;
> dma_addr_t physaddr;
> - int i = 0, num_sge = 0, status;
> - uint32_t reftag;
> + int i = 0, status;
> + uint32_t reftag, num_sge = 0;
> uint8_t txop, rxop;
> #ifdef CONFIG_SCSI_LPFC_DEBUG_FS
> uint32_t rc;
> @@ -2099,7 +2099,7 @@ lpfc_bg_setup_sgl(struct lpfc_hba *phba, struct
> scsi_cmnd *sc,
> *
> * Returns the number of SGEs added to the SGL.
> **/
> -static int
> +static uint32_t
> lpfc_bg_setup_sgl_prot(struct lpfc_hba *phba, struct scsi_cmnd *sc,
> struct sli4_sge *sgl, int datacnt, int protcnt,
> struct lpfc_io_buf *lpfc_cmd)
> @@ -2123,8 +2123,8 @@ lpfc_bg_setup_sgl_prot(struct lpfc_hba *phba, struct
> scsi_cmnd *sc,
> uint32_t rc;
> #endif
> uint32_t checking = 1;
> - uint32_t dma_offset = 0;
> - int num_sge = 0, j = 2;
> + uint32_t dma_offset = 0, num_sge = 0;
> + int j = 2;
> struct sli4_hybrid_sgl *sgl_xtra = NULL;
>
> sgpe = scsi_prot_sglist(sc);
> --
> 2.43.0
>
>
--
This electronic communication and the information and any files transmitted
with it, or attached to it, are confidential and are intended solely for
the use of the individual or entity to whom it is addressed and may contain
information that is confidential, legally privileged, protected by privacy
laws, or otherwise restricted from disclosure to anyone else. If you are
not the intended recipient or the person responsible for delivering the
e-mail to the intended recipient, you are hereby notified that any use,
copying, distributing, dissemination, forwarding, printing, or copying of
this e-mail is strictly prohibited. If you received this e-mail in error,
please return the e-mail to the sender, delete it from your computer, and
destroy any printed copy of it.
Hi Dick,
On Fri, Dec 22, 2023 at 10:04:50AM -0800, Dick Kennedy wrote:
> The change is good, however, I don't think this was really the
> problem.
I tried to write the commit message based on the bug report we got. So
yes, it's possible the it is not correct as I was not really involved
and might missinterpret it.
> We would like to know if this patch really solved an issue they
> observed.
Yes, it fixes the reported problem.
> A good data point to know is what adapter they're using.
A bunch of different HPE cards which show this log entry: SN1700E,
SN1610E and SN1200E.
> If that adapter supports hybrid sgl (i.e. phba->cfg_xpsgl), then we
> would have set the max sg_tablesize = LPFC_MAX_SG_TABLESIZE = 0xffff.
>
> But even then, this patch implies that dma_map_sg() returned a crazy
> huge amount with the MSB set.
Sure, though this seems to be the case.
One noteworthy information is that DIF needs to be enabled to trigger
it:
# cat /sys/module/lpfc/parameters/lpfc_enable_bg
1
# cat /sys/module/lpfc/parameters/lpfc_prot_guard
2
Thanks,
Daniel
On Wed, Jan 17, 2024 at 11:56:27AM +0100, Daniel Wagner wrote:
> Hi Dick,
>
> On Fri, Dec 22, 2023 at 10:04:50AM -0800, Dick Kennedy wrote:
> > The change is good, however, I don't think this was really the
> > problem.
>
> I tried to write the commit message based on the bug report we got. So
> yes, it's possible the it is not correct as I was not really involved
> and might missinterpret it.
Any chance to get this moving forward?
On Wed, 20 Dec 2023 17:26:58 +0100, Daniel Wagner wrote:
> LUNs going into “failed ready running” state observed on >1T and on
> even numbers of size (2T, 4T, 6T, 8T and 10T). The issue occurs when
> DIF is enabled at the host.
>
> The kernel logs:
>
> Cannot setup S/G List for HBAIO segs 1/1 SGL 512 SCSI 256: 3 0
>
> [...]
Applied to 6.8/scsi-fixes, thanks!
[1/1] scsi: lpfc: use unsigned type for num_sge
https://git.kernel.org/mkp/scsi/c/d6c1b19153f9
--
Martin K. Petersen Oracle Linux Engineering