2022-01-04 18:32:08

by John Garry

[permalink] [raw]
Subject: [PATCH RFT] scsi: pm8001: Fix FW crash for maxcpus=1

According to the comment in check_fw_ready() we should not check the
IOP1_READY field in register SCRATCH_PAD_1 for 8008 or 8009 controllers.

However we check this very field in process_oq() for processing the highest
index interrupt vector. Change that function to not check IOP1_READY for
those mentioned controllers, but do check ILA_READY in both cases.

The reason I assume that this was not hit earlier was because we always
allocated 64 MSI(X), and just did not pass the vector index check in
process_oq(), i.e. the handler never ran for vector index 63.

Signed-off-by: John Garry <[email protected]>

diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c b/drivers/scsi/pm8001/pm80xx_hwi.c
index 2101fc5761c3..77b8bb30615b 100644
--- a/drivers/scsi/pm8001/pm80xx_hwi.c
+++ b/drivers/scsi/pm8001/pm80xx_hwi.c
@@ -4162,9 +4162,16 @@ static int process_oq(struct pm8001_hba_info *pm8001_ha, u8 vec)
u32 regval;

if (vec == (pm8001_ha->max_q_num - 1)) {
+ u32 mipsall_ready;
+
+ if ((pm8001_ha->chip_id == chip_8008) ||
+ (pm8001_ha->chip_id == chip_8009))
+ mipsall_ready = SCRATCH_PAD_MIPSALL_READY_8PORT;
+ else
+ mipsall_ready = SCRATCH_PAD_MIPSALL_READY_16PORT;
+
regval = pm8001_cr32(pm8001_ha, 0, MSGU_SCRATCH_PAD_1);
- if ((regval & SCRATCH_PAD_MIPSALL_READY) !=
- SCRATCH_PAD_MIPSALL_READY) {
+ if ((regval & mipsall_ready) != mipsall_ready) {
pm8001_ha->controller_fatal_error = true;
pm8001_dbg(pm8001_ha, FAIL,
"Firmware Fatal error! Regval:0x%x\n",
diff --git a/drivers/scsi/pm8001/pm80xx_hwi.h b/drivers/scsi/pm8001/pm80xx_hwi.h
index c7e5d93bea92..c41ed039c92a 100644
--- a/drivers/scsi/pm8001/pm80xx_hwi.h
+++ b/drivers/scsi/pm8001/pm80xx_hwi.h
@@ -1405,8 +1405,12 @@ typedef struct SASProtocolTimerConfig SASProtocolTimerConfig_t;
#define SCRATCH_PAD_BOOT_LOAD_SUCCESS 0x0
#define SCRATCH_PAD_IOP0_READY 0xC00
#define SCRATCH_PAD_IOP1_READY 0x3000
-#define SCRATCH_PAD_MIPSALL_READY (SCRATCH_PAD_IOP1_READY | \
+#define SCRATCH_PAD_MIPSALL_READY_16PORT (SCRATCH_PAD_IOP1_READY | \
SCRATCH_PAD_IOP0_READY | \
+ SCRATCH_PAD_ILA_READY | \
+ SCRATCH_PAD_RAAE_READY)
+#define SCRATCH_PAD_MIPSALL_READY_8PORT (SCRATCH_PAD_IOP0_READY | \
+ SCRATCH_PAD_ILA_READY | \
SCRATCH_PAD_RAAE_READY)

/* boot loader state */
--
2.26.2



2022-01-05 04:03:59

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH RFT] scsi: pm8001: Fix FW crash for maxcpus=1

On 1/5/22 03:26, John Garry wrote:
> According to the comment in check_fw_ready() we should not check the
> IOP1_READY field in register SCRATCH_PAD_1 for 8008 or 8009 controllers.
>
> However we check this very field in process_oq() for processing the highest
> index interrupt vector. Change that function to not check IOP1_READY for
> those mentioned controllers, but do check ILA_READY in both cases.
>
> The reason I assume that this was not hit earlier was because we always
> allocated 64 MSI(X), and just did not pass the vector index check in
> process_oq(), i.e. the handler never ran for vector index 63.
>
> Signed-off-by: John Garry <[email protected]>
>
> diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c b/drivers/scsi/pm8001/pm80xx_hwi.c
> index 2101fc5761c3..77b8bb30615b 100644
> --- a/drivers/scsi/pm8001/pm80xx_hwi.c
> +++ b/drivers/scsi/pm8001/pm80xx_hwi.c
> @@ -4162,9 +4162,16 @@ static int process_oq(struct pm8001_hba_info *pm8001_ha, u8 vec)
> u32 regval;
>
> if (vec == (pm8001_ha->max_q_num - 1)) {
> + u32 mipsall_ready;
> +
> + if ((pm8001_ha->chip_id == chip_8008) ||
> + (pm8001_ha->chip_id == chip_8009))

nit: no need for the inner brackets here.

> + mipsall_ready = SCRATCH_PAD_MIPSALL_READY_8PORT;
> + else
> + mipsall_ready = SCRATCH_PAD_MIPSALL_READY_16PORT;
> +
> regval = pm8001_cr32(pm8001_ha, 0, MSGU_SCRATCH_PAD_1);
> - if ((regval & SCRATCH_PAD_MIPSALL_READY) !=
> - SCRATCH_PAD_MIPSALL_READY) {
> + if ((regval & mipsall_ready) != mipsall_ready) {
> pm8001_ha->controller_fatal_error = true;
> pm8001_dbg(pm8001_ha, FAIL,
> "Firmware Fatal error! Regval:0x%x\n",
> diff --git a/drivers/scsi/pm8001/pm80xx_hwi.h b/drivers/scsi/pm8001/pm80xx_hwi.h
> index c7e5d93bea92..c41ed039c92a 100644
> --- a/drivers/scsi/pm8001/pm80xx_hwi.h
> +++ b/drivers/scsi/pm8001/pm80xx_hwi.h
> @@ -1405,8 +1405,12 @@ typedef struct SASProtocolTimerConfig SASProtocolTimerConfig_t;
> #define SCRATCH_PAD_BOOT_LOAD_SUCCESS 0x0
> #define SCRATCH_PAD_IOP0_READY 0xC00
> #define SCRATCH_PAD_IOP1_READY 0x3000
> -#define SCRATCH_PAD_MIPSALL_READY (SCRATCH_PAD_IOP1_READY | \
> +#define SCRATCH_PAD_MIPSALL_READY_16PORT (SCRATCH_PAD_IOP1_READY | \
> SCRATCH_PAD_IOP0_READY | \
> + SCRATCH_PAD_ILA_READY | \
> + SCRATCH_PAD_RAAE_READY)
> +#define SCRATCH_PAD_MIPSALL_READY_8PORT (SCRATCH_PAD_IOP0_READY | \
> + SCRATCH_PAD_ILA_READY | \
> SCRATCH_PAD_RAAE_READY)
>
> /* boot loader state */

Otherwise, looks OK to me.
I tested with and without max_cpus=1 with a ATTO Technology, Inc.
ExpressSAS 12Gb/s SAS/SATA HBA (rev 06) adapter and everything is OK.
That adapter uses chip_8072 though, not 8008 or 8009.

Feel free to add:

Reviewed-by: Damien Le Moal <[email protected]>
Tested-by: Damien Le Moal <[email protected]>

--
Damien Le Moal
Western Digital Research

2022-01-05 11:28:32

by John Garry

[permalink] [raw]
Subject: Re: [PATCH RFT] scsi: pm8001: Fix FW crash for maxcpus=1

On 05/01/2022 04:03, Damien Le Moal wrote:
> On 1/5/22 03:26, John Garry wrote:
>> According to the comment in check_fw_ready() we should not check the
>> IOP1_READY field in register SCRATCH_PAD_1 for 8008 or 8009 controllers.
>>
>> However we check this very field in process_oq() for processing the highest
>> index interrupt vector. Change that function to not check IOP1_READY for
>> those mentioned controllers, but do check ILA_READY in both cases.
>>
>> The reason I assume that this was not hit earlier was because we always
>> allocated 64 MSI(X), and just did not pass the vector index check in
>> process_oq(), i.e. the handler never ran for vector index 63.
>>
>> Signed-off-by: John Garry<[email protected]>
>>
>> diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c b/drivers/scsi/pm8001/pm80xx_hwi.c
>> index 2101fc5761c3..77b8bb30615b 100644
>> --- a/drivers/scsi/pm8001/pm80xx_hwi.c
>> +++ b/drivers/scsi/pm8001/pm80xx_hwi.c
>> @@ -4162,9 +4162,16 @@ static int process_oq(struct pm8001_hba_info *pm8001_ha, u8 vec)
>> u32 regval;
>>
>> if (vec == (pm8001_ha->max_q_num - 1)) {
>> + u32 mipsall_ready;
>> +
>> + if ((pm8001_ha->chip_id == chip_8008) ||
>> + (pm8001_ha->chip_id == chip_8009))
> nit: no need for the inner brackets here.

ok, I can fix that.

But I would also like opinion from microchip guys/maintainer on why this
code is here at all. Seems strange in the way we check in this register
in the interrupt handler for only a specific vector and, also, why we
check at all in an interrupt handler.
>
>> + mipsall_ready = SCRATCH_PAD_MIPSALL_READY_8PORT;
>> + else
>> + mipsall_ready = SCRATCH_PAD_MIPSALL_READY_16PORT;
>> +
>> regval = pm8001_cr32(pm8001_ha, 0, MSGU_SCRATCH_PAD_1);
>> - if ((regval & SCRATCH_PAD_MIPSALL_READY) !=
>> - SCRATCH_PAD_MIPSALL_READY) {
>> + if ((regval & mipsall_ready) != mipsall_ready) {
>> pm8001_ha->controller_fatal_error = true;
>> pm8001_dbg(pm8001_ha, FAIL,
>> "Firmware Fatal error! Regval:0x%x\n",
>> diff --git a/drivers/scsi/pm8001/pm80xx_hwi.h b/drivers/scsi/pm8001/pm80xx_hwi.h
>> index c7e5d93bea92..c41ed039c92a 100644
>> --- a/drivers/scsi/pm8001/pm80xx_hwi.h
>> +++ b/drivers/scsi/pm8001/pm80xx_hwi.h
>> @@ -1405,8 +1405,12 @@ typedef struct SASProtocolTimerConfig SASProtocolTimerConfig_t;
>> #define SCRATCH_PAD_BOOT_LOAD_SUCCESS 0x0
>> #define SCRATCH_PAD_IOP0_READY 0xC00
>> #define SCRATCH_PAD_IOP1_READY 0x3000
>> -#define SCRATCH_PAD_MIPSALL_READY (SCRATCH_PAD_IOP1_READY | \
>> +#define SCRATCH_PAD_MIPSALL_READY_16PORT (SCRATCH_PAD_IOP1_READY | \
>> SCRATCH_PAD_IOP0_READY | \
>> + SCRATCH_PAD_ILA_READY | \
>> + SCRATCH_PAD_RAAE_READY)
>> +#define SCRATCH_PAD_MIPSALL_READY_8PORT (SCRATCH_PAD_IOP0_READY | \
>> + SCRATCH_PAD_ILA_READY | \
>> SCRATCH_PAD_RAAE_READY)
>>
>> /* boot loader state */
> Otherwise, looks OK to me.
> I tested with and without max_cpus=1 with a ATTO Technology, Inc.
> ExpressSAS 12Gb/s SAS/SATA HBA (rev 06) adapter and everything is OK.
> That adapter uses chip_8072 though, not 8008 or 8009.
>
> Feel free to add:
>
> Reviewed-by: Damien Le Moal<[email protected]>
> Tested-by: Damien Le Moal<[email protected]>

Thanks!

john


2022-01-06 13:03:58

by Ajish.Koshy

[permalink] [raw]
Subject: RE: [PATCH RFT] scsi: pm8001: Fix FW crash for maxcpus=1

> On 05/01/2022 04:03, Damien Le Moal wrote:
> > On 1/5/22 03:26, John Garry wrote:
> >> According to the comment in check_fw_ready() we should not check the
> >> IOP1_READY field in register SCRATCH_PAD_1 for 8008 or 8009
> controllers.
> >>
> >> However we check this very field in process_oq() for processing the
> >> highest index interrupt vector. Change that function to not check
> >> IOP1_READY for those mentioned controllers, but do check ILA_READY in
> both cases.
> >>
> >> The reason I assume that this was not hit earlier was because we
> >> always allocated 64 MSI(X), and just did not pass the vector index
> >> check in process_oq(), i.e. the handler never ran for vector index 63.
> >>
> >> Signed-off-by: John Garry<[email protected]>
> >>
> >> diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c
> >> b/drivers/scsi/pm8001/pm80xx_hwi.c
> >> index 2101fc5761c3..77b8bb30615b 100644
> >> --- a/drivers/scsi/pm8001/pm80xx_hwi.c
> >> +++ b/drivers/scsi/pm8001/pm80xx_hwi.c
> >> @@ -4162,9 +4162,16 @@ static int process_oq(struct pm8001_hba_info
> *pm8001_ha, u8 vec)
> >> u32 regval;
> >>
> >> if (vec == (pm8001_ha->max_q_num - 1)) {
> >> + u32 mipsall_ready;
> >> +
> >> + if ((pm8001_ha->chip_id == chip_8008) ||
> >> + (pm8001_ha->chip_id == chip_8009))
> > nit: no need for the inner brackets here.
>
> ok, I can fix that.
>
> But I would also like opinion from microchip guys/maintainer on why this
> code is here at all. Seems strange in the way we check in this register in the
> interrupt handler for only a specific vector and, also, why we check at all in
> an interrupt handler.

Here is my initial understanding so far based on the code
and data sheet

1. Controller has the capability to communicate
to the host about fatal error condition via configured
interrupt vector MSI/MSI-X.
2. This capability is achieved by setting two fields
a. Enable Controller Fatal error notification
Dowrd 0x1C, Bit[0].
1 - Enable; 0 - Disable
Code: pm8001_ha->main_cfg_tbl.pm80xx_tbl.
fatal_err_interrupt = 0x01;
b. Fatal Error Interrupt Vector Dword 0x1C, bit[15:8]
This parameter configures which interrupt vector
is used to notify the host of the fatal error.
Code: /* Update Fatal error interrupt vector */
pm8001_ha->main_cfg_tbl.pm80xx_tbl.
fatal_err_interrupt |=
((pm8001_ha->max_q_num - 1) << 8);

Probably this will be the reason why we check
the vector in process_oq() for processing
controller fatal error

if (vec == (pm8001_ha->max_q_num - 1)) {

Please do let me know if it helped in clarification.

> >
> >> + mipsall_ready = SCRATCH_PAD_MIPSALL_READY_8PORT;
> >> + else
> >> + mipsall_ready =
> >> + SCRATCH_PAD_MIPSALL_READY_16PORT;
> >> +
> >> regval = pm8001_cr32(pm8001_ha, 0, MSGU_SCRATCH_PAD_1);
> >> - if ((regval & SCRATCH_PAD_MIPSALL_READY) !=
> >> - SCRATCH_PAD_MIPSALL_READY) {
> >> + if ((regval & mipsall_ready) != mipsall_ready) {
> >> pm8001_ha->controller_fatal_error = true;
> >> pm8001_dbg(pm8001_ha, FAIL,
> >> "Firmware Fatal error!
> >> Regval:0x%x\n", diff --git a/drivers/scsi/pm8001/pm80xx_hwi.h
> >> b/drivers/scsi/pm8001/pm80xx_hwi.h
> >> index c7e5d93bea92..c41ed039c92a 100644
> >> --- a/drivers/scsi/pm8001/pm80xx_hwi.h
> >> +++ b/drivers/scsi/pm8001/pm80xx_hwi.h
> >> @@ -1405,8 +1405,12 @@ typedef struct SASProtocolTimerConfig
> SASProtocolTimerConfig_t;
> >> #define SCRATCH_PAD_BOOT_LOAD_SUCCESS 0x0
> >> #define SCRATCH_PAD_IOP0_READY 0xC00
> >> #define SCRATCH_PAD_IOP1_READY 0x3000
> >> -#define SCRATCH_PAD_MIPSALL_READY (SCRATCH_PAD_IOP1_READY |
> \
> >> +#define SCRATCH_PAD_MIPSALL_READY_16PORT
> (SCRATCH_PAD_IOP1_READY | \
> >> SCRATCH_PAD_IOP0_READY | \
> >> + SCRATCH_PAD_ILA_READY | \
> >> + SCRATCH_PAD_RAAE_READY)
> >> +#define SCRATCH_PAD_MIPSALL_READY_8PORT
> (SCRATCH_PAD_IOP0_READY | \
> >> + SCRATCH_PAD_ILA_READY | \
> >> SCRATCH_PAD_RAAE_READY)
> >>
> >> /* boot loader state */
> > Otherwise, looks OK to me.
> > I tested with and without max_cpus=1 with a ATTO Technology, Inc.
> > ExpressSAS 12Gb/s SAS/SATA HBA (rev 06) adapter and everything is OK.
> > That adapter uses chip_8072 though, not 8008 or 8009.
> >
> > Feel free to add:
> >
> > Reviewed-by: Damien Le Moal<[email protected]>
> > Tested-by: Damien Le Moal<[email protected]>
>
> Thanks!
>
> john

Thanks,

Ajish

2022-01-06 15:33:13

by John Garry

[permalink] [raw]
Subject: Re: [PATCH RFT] scsi: pm8001: Fix FW crash for maxcpus=1

On 06/01/2022 13:03, [email protected] wrote:
>> only a specific vector and, also, why we check at all in
>> an interrupt handler.
> Here is my initial understanding so far based on the code
> and data sheet
>
> 1. Controller has the capability to communicate
> to the host about fatal error condition via configured
> interrupt vector MSI/MSI-X.
> 2. This capability is achieved by setting two fields
> a. Enable Controller Fatal error notification
> Dowrd 0x1C, Bit[0].
> 1 - Enable; 0 - Disable
> Code: pm8001_ha->main_cfg_tbl.pm80xx_tbl.
> fatal_err_interrupt = 0x01;
> b. Fatal Error Interrupt Vector Dword 0x1C, bit[15:8]
> This parameter configures which interrupt vector
> is used to notify the host of the fatal error.
> Code: /* Update Fatal error interrupt vector */
> pm8001_ha->main_cfg_tbl.pm80xx_tbl.
> fatal_err_interrupt |=
> ((pm8001_ha->max_q_num - 1) << 8);
>
> Probably this will be the reason why we check
> the vector in process_oq() for processing
> controller fatal error
>
> if (vec == (pm8001_ha->max_q_num - 1)) {
>
> Please do let me know if it helped in clarification.
>

Sounds reasonable. And we only discover the issue for 8008/8009 now as
we have that (pm8001_ha->max_q_num - 1) vector being used for standard IO.

So let me know of any other issue, otherwise I'll send a v2 with the
coding style fixup.

Thanks,
John

2022-01-06 23:59:42

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH RFT] scsi: pm8001: Fix FW crash for maxcpus=1

On 2022/01/07 0:32, John Garry wrote:
> On 06/01/2022 13:03, [email protected] wrote:
>>> only a specific vector and, also, why we check at all in
>>> an interrupt handler.
>> Here is my initial understanding so far based on the code
>> and data sheet
>>
>> 1. Controller has the capability to communicate
>> to the host about fatal error condition via configured
>> interrupt vector MSI/MSI-X.
>> 2. This capability is achieved by setting two fields
>> a. Enable Controller Fatal error notification
>> Dowrd 0x1C, Bit[0].
>> 1 - Enable; 0 - Disable
>> Code: pm8001_ha->main_cfg_tbl.pm80xx_tbl.
>> fatal_err_interrupt = 0x01;
>> b. Fatal Error Interrupt Vector Dword 0x1C, bit[15:8]
>> This parameter configures which interrupt vector
>> is used to notify the host of the fatal error.
>> Code: /* Update Fatal error interrupt vector */
>> pm8001_ha->main_cfg_tbl.pm80xx_tbl.
>> fatal_err_interrupt |=
>> ((pm8001_ha->max_q_num - 1) << 8);
>>
>> Probably this will be the reason why we check
>> the vector in process_oq() for processing
>> controller fatal error
>>
>> if (vec == (pm8001_ha->max_q_num - 1)) {
>>
>> Please do let me know if it helped in clarification.
>>
>
> Sounds reasonable. And we only discover the issue for 8008/8009 now as
> we have that (pm8001_ha->max_q_num - 1) vector being used for standard IO.
>
> So let me know of any other issue, otherwise I'll send a v2 with the
> coding style fixup.

And maybe add comments about the above so that the information does not get lost ?

>
> Thanks,
> John


--
Damien Le Moal
Western Digital Research

2022-01-07 11:05:30

by Ajish.Koshy

[permalink] [raw]
Subject: RE: [PATCH RFT] scsi: pm8001: Fix FW crash for maxcpus=1

Hi John,

> On 06/01/2022 13:03, [email protected] wrote:
> >> only a specific vector and, also, why we check at all in an
> >> interrupt handler.
> > Here is my initial understanding so far based on the code and data
> > sheet
> >
> > 1. Controller has the capability to communicate to the host about
> > fatal error condition via configured interrupt vector MSI/MSI-X.
> > 2. This capability is achieved by setting two fields
> > a. Enable Controller Fatal error notification
> > Dowrd 0x1C, Bit[0].
> > 1 - Enable; 0 - Disable
> > Code: pm8001_ha->main_cfg_tbl.pm80xx_tbl.
> > fatal_err_interrupt = 0x01;
> > b. Fatal Error Interrupt Vector Dword 0x1C, bit[15:8]
> > This parameter configures which interrupt vector
> > is used to notify the host of the fatal error.
> > Code: /* Update Fatal error interrupt vector */
> > pm8001_ha->main_cfg_tbl.pm80xx_tbl.
> > fatal_err_interrupt |=
> > ((pm8001_ha->max_q_num - 1) << 8);
> >
> > Probably this will be the reason why we check the vector in
> > process_oq() for processing controller fatal error
> >
> > if (vec == (pm8001_ha->max_q_num - 1)) {
> >
> > Please do let me know if it helped in clarification.
> >
>
> Sounds reasonable. And we only discover the issue for 8008/8009 now as we
> have that (pm8001_ha->max_q_num - 1) vector being used for standard IO.
>
> So let me know of any other issue, otherwise I'll send a v2 with the coding
> style fixup.

After going through check_fw_ready(), the change here looks fine.

>
> Thanks,
> John