2019-02-05 16:23:53

by Martin Kepplinger

[permalink] [raw]
Subject: [PATCH v2] mtd: rawnand: gpmi: fix MX28 bus master lockup problem

Disable BCH soft reset according to MX23 erratum #2847 ("BCH soft
reset may cause bus master lock up") for MX28 too. It has the same
problem.

Observed problem: once per 100,000+ MX28 reboots NAND read failed on
DMA timeout errors:
[ 1.770823] UBI: attaching mtd3 to ubi0
[ 2.768088] gpmi_nand: DMA timeout, last DMA :1
[ 3.958087] gpmi_nand: BCH timeout, last DMA :1
[ 4.156033] gpmi_nand: Error in ECC-based read: -110
[ 4.161136] UBI warning: ubi_io_read: error -110 while reading 64
bytes from PEB 0:0, read only 0 bytes, retry
[ 4.171283] step 1 error
[ 4.173846] gpmi_nand: Chip: 0, Error -1

Without BCH soft reset we successfully executed 1,000,000 MX28 reboots.

I have a quote from NXP regarding this problem, from July 18th 2016:

"As the i.MX23 and i.MX28 are of the same generation, they share many
characteristics. Unfortunately, also the erratas may be shared.
In case of the documented erratas and the workarounds, you can also
apply the workaround solution of one device on the other one. This have
been reported, but I’m afraid that there are not an estimated date for
updating the Errata documents.
Please accept our apologies for any inconveniences this may cause."

Fixes: 6f2a6a52560a ("mtd: nand: gpmi: reset BCH earlier, too, to avoid
NAND startup problems")
Cc: [email protected]
Signed-off-by: Manfred Schlaegl <[email protected]>
Signed-off-by: Martin Kepplinger <[email protected]>
Reviewed-by: Miquel Raynal <[email protected]>
Reviewed-by: Fabio Estevam <[email protected]>
---


revision history
----------------
v2: add Fixes tag, Cc stable and add recent Reviewed-by tags


drivers/mtd/nand/raw/gpmi-nand/gpmi-lib.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/mtd/nand/raw/gpmi-nand/gpmi-lib.c b/drivers/mtd/nand/raw/gpmi-nand/gpmi-lib.c
index bd4cfac6b5aa..a4768df5083f 100644
--- a/drivers/mtd/nand/raw/gpmi-nand/gpmi-lib.c
+++ b/drivers/mtd/nand/raw/gpmi-nand/gpmi-lib.c
@@ -155,9 +155,10 @@ int gpmi_init(struct gpmi_nand_data *this)

/*
* Reset BCH here, too. We got failures otherwise :(
- * See later BCH reset for explanation of MX23 handling
+ * See later BCH reset for explanation of MX23 and MX28 handling
*/
- ret = gpmi_reset_block(r->bch_regs, GPMI_IS_MX23(this));
+ ret = gpmi_reset_block(r->bch_regs,
+ GPMI_IS_MX23(this) || GPMI_IS_MX28(this));
if (ret)
goto err_out;

@@ -263,12 +264,10 @@ int bch_set_geometry(struct gpmi_nand_data *this)
/*
* Due to erratum #2847 of the MX23, the BCH cannot be soft reset on this
* chip, otherwise it will lock up. So we skip resetting BCH on the MX23.
- * On the other hand, the MX28 needs the reset, because one case has been
- * seen where the BCH produced ECC errors constantly after 10000
- * consecutive reboots. The latter case has not been seen on the MX23
- * yet, still we don't know if it could happen there as well.
+ * and MX28.
*/
- ret = gpmi_reset_block(r->bch_regs, GPMI_IS_MX23(this));
+ ret = gpmi_reset_block(r->bch_regs,
+ GPMI_IS_MX23(this) || GPMI_IS_MX28(this));
if (ret)
goto err_out;

--
2.20.1


Attachments:
smime.p7s (3.53 kB)

2019-02-05 16:41:28

by Han Xu

[permalink] [raw]
Subject: RE: [PATCH v2] mtd: rawnand: gpmi: fix MX28 bus master lockup problem



> -----Original Message-----
> From: Martin Kepplinger <[email protected]>
> Sent: Tuesday, February 5, 2019 9:53 AM
> To: Han Xu <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; linux-
> [email protected]
> Cc: [email protected]; [email protected]; Manfred Schlaegl
> <[email protected]>; Fabio Estevam <[email protected]>
> Subject: [PATCH v2] mtd: rawnand: gpmi: fix MX28 bus master lockup
> problem
>
> Disable BCH soft reset according to MX23 erratum #2847 ("BCH soft
> reset may cause bus master lock up") for MX28 too. It has the same
> problem.
>
> Observed problem: once per 100,000+ MX28 reboots NAND read failed on
> DMA timeout errors:
> [ 1.770823] UBI: attaching mtd3 to ubi0
> [ 2.768088] gpmi_nand: DMA timeout, last DMA :1
> [ 3.958087] gpmi_nand: BCH timeout, last DMA :1
> [ 4.156033] gpmi_nand: Error in ECC-based read: -110
> [ 4.161136] UBI warning: ubi_io_read: error -110 while reading 64
> bytes from PEB 0:0, read only 0 bytes, retry
> [ 4.171283] step 1 error
> [ 4.173846] gpmi_nand: Chip: 0, Error -1
>
> Without BCH soft reset we successfully executed 1,000,000 MX28 reboots.
>
> I have a quote from NXP regarding this problem, from July 18th 2016:
>
> "As the i.MX23 and i.MX28 are of the same generation, they share many
> characteristics. Unfortunately, also the erratas may be shared.
> In case of the documented erratas and the workarounds, you can also
> apply the workaround solution of one device on the other one. This have
> been reported, but I’m afraid that there are not an estimated date for
> updating the Errata documents.
> Please accept our apologies for any inconveniences this may cause."
>
> Fixes: 6f2a6a52560a ("mtd: nand: gpmi: reset BCH earlier, too, to avoid
> NAND startup problems")
> Cc: [email protected]
> Signed-off-by: Manfred Schlaegl <[email protected]>
> Signed-off-by: Martin Kepplinger <[email protected]>
> Reviewed-by: Miquel Raynal <[email protected]>
> Reviewed-by: Fabio Estevam <[email protected]>

Acked-by: Han Xu <[email protected]>

> ---
>
>
> revision history
> ----------------
> v2: add Fixes tag, Cc stable and add recent Reviewed-by tags
>
>
> drivers/mtd/nand/raw/gpmi-nand/gpmi-lib.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/mtd/nand/raw/gpmi-nand/gpmi-lib.c
> b/drivers/mtd/nand/raw/gpmi-nand/gpmi-lib.c
> index bd4cfac6b5aa..a4768df5083f 100644
> --- a/drivers/mtd/nand/raw/gpmi-nand/gpmi-lib.c
> +++ b/drivers/mtd/nand/raw/gpmi-nand/gpmi-lib.c
> @@ -155,9 +155,10 @@ int gpmi_init(struct gpmi_nand_data *this)
>
> /*
> * Reset BCH here, too. We got failures otherwise :(
> - * See later BCH reset for explanation of MX23 handling
> + * See later BCH reset for explanation of MX23 and MX28 handling
> */
> - ret = gpmi_reset_block(r->bch_regs, GPMI_IS_MX23(this));
> + ret = gpmi_reset_block(r->bch_regs,
> + GPMI_IS_MX23(this) || GPMI_IS_MX28(this));
> if (ret)
> goto err_out;
>
> @@ -263,12 +264,10 @@ int bch_set_geometry(struct gpmi_nand_data
> *this)
> /*
> * Due to erratum #2847 of the MX23, the BCH cannot be soft reset on
> this
> * chip, otherwise it will lock up. So we skip resetting BCH on the
> MX23.
> - * On the other hand, the MX28 needs the reset, because one case
> has been
> - * seen where the BCH produced ECC errors constantly after 10000
> - * consecutive reboots. The latter case has not been seen on the
> MX23
> - * yet, still we don't know if it could happen there as well.
> + * and MX28.
> */
> - ret = gpmi_reset_block(r->bch_regs, GPMI_IS_MX23(this));
> + ret = gpmi_reset_block(r->bch_regs,
> + GPMI_IS_MX23(this) || GPMI_IS_MX28(this));
> if (ret)
> goto err_out;
>
> --
> 2.20.1

2019-02-06 08:36:53

by Boris Brezillon

[permalink] [raw]
Subject: Re: [v2] mtd: rawnand: gpmi: fix MX28 bus master lockup problem

From: Your Name <[email protected]>

On Tue, 2019-02-05 at 15:52:51 UTC, Martin Kepplinger wrote:
> Disable BCH soft reset according to MX23 erratum #2847 ("BCH soft
> reset may cause bus master lock up") for MX28 too. It has the same
> problem.
>
> Observed problem: once per 100,000+ MX28 reboots NAND read failed on
> DMA timeout errors:
> [ 1.770823] UBI: attaching mtd3 to ubi0
> [ 2.768088] gpmi_nand: DMA timeout, last DMA :1
> [ 3.958087] gpmi_nand: BCH timeout, last DMA :1
> [ 4.156033] gpmi_nand: Error in ECC-based read: -110
> [ 4.161136] UBI warning: ubi_io_read: error -110 while reading 64
> bytes from PEB 0:0, read only 0 bytes, retry
> [ 4.171283] step 1 error
> [ 4.173846] gpmi_nand: Chip: 0, Error -1
>
> Without BCH soft reset we successfully executed 1,000,000 MX28 reboots.
>
> I have a quote from NXP regarding this problem, from July 18th 2016:
>
> "As the i.MX23 and i.MX28 are of the same generation, they share many
> characteristics. Unfortunately, also the erratas may be shared.
> In case of the documented erratas and the workarounds, you can also
> apply the workaround solution of one device on the other one. This have
> been reported, but I’m afraid that there are not an estimated date for
> updating the Errata documents.
> Please accept our apologies for any inconveniences this may cause."
>
> Fixes: 6f2a6a52560a ("mtd: nand: gpmi: reset BCH earlier, too, to avoid
> NAND startup problems")
> Cc: [email protected]
> Signed-off-by: Manfred Schlaegl <[email protected]>
> Signed-off-by: Martin Kepplinger <[email protected]>
> Reviewed-by: Miquel Raynal <[email protected]>
> Reviewed-by: Fabio Estevam <[email protected]>
> Acked-by: Han Xu <[email protected]>

Applied to http://git.infradead.org/linux-mtd.git mtd/fixes, thanks.

y'all come back now!

2019-02-06 08:44:50

by Boris Brezillon

[permalink] [raw]
Subject: Re: [PATCH v2] mtd: rawnand: gpmi: fix MX28 bus master lockup problem

Hi Martin,

On Tue, 5 Feb 2019 16:52:51 +0100
Martin Kepplinger <[email protected]> wrote:

> Disable BCH soft reset according to MX23 erratum #2847 ("BCH soft
> reset may cause bus master lock up") for MX28 too. It has the same
> problem.
>
> Observed problem: once per 100,000+ MX28 reboots NAND read failed on
> DMA timeout errors:
> [ 1.770823] UBI: attaching mtd3 to ubi0
> [ 2.768088] gpmi_nand: DMA timeout, last DMA :1
> [ 3.958087] gpmi_nand: BCH timeout, last DMA :1
> [ 4.156033] gpmi_nand: Error in ECC-based read: -110
> [ 4.161136] UBI warning: ubi_io_read: error -110 while reading 64
> bytes from PEB 0:0, read only 0 bytes, retry
> [ 4.171283] step 1 error
> [ 4.173846] gpmi_nand: Chip: 0, Error -1
>
> Without BCH soft reset we successfully executed 1,000,000 MX28 reboots.
>
> I have a quote from NXP regarding this problem, from July 18th 2016:
>
> "As the i.MX23 and i.MX28 are of the same generation, they share many
> characteristics. Unfortunately, also the erratas may be shared.
> In case of the documented erratas and the workarounds, you can also
> apply the workaround solution of one device on the other one. This have
> been reported, but I’m afraid that there are not an estimated date for
> updating the Errata documents.
> Please accept our apologies for any inconveniences this may cause."
>
> Fixes: 6f2a6a52560a ("mtd: nand: gpmi: reset BCH earlier, too, to avoid
> NAND startup problems")

Please make sure this Fixes line is not wrapped next time.

Thanks,

Boris

2019-02-08 10:50:35

by Martin Kepplinger

[permalink] [raw]
Subject: Re: [PATCH v2] mtd: rawnand: gpmi: fix MX28 bus master lockup problem

Hi Sasha,

On 07.02.19 17:32, Sasha Levin wrote:
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a "Fixes:" tag,
> fixing commit: 6f2a6a52560a mtd: nand: gpmi: reset BCH earlier, too, to avoid NAND startup problems.
>
> The bot has tested the following trees: v4.20.6, v4.19.19, v4.14.97, v4.9.154, v4.4.172, v3.18.133.
>
> v4.20.6: Build OK!
> v4.19.19: Build OK!
> v4.14.97: Failed to apply! Possible dependencies:
> Unable to calculate
>
> v4.9.154: Failed to apply! Possible dependencies:
> Unable to calculate
>
> v4.4.172: Failed to apply! Possible dependencies:
> Unable to calculate
>
> v3.18.133: Failed to apply! Possible dependencies:
> Unable to calculate
>
>
> How should we proceed with this patch?

The directory structure has changed since and I've sent the backported
commit, see https://lkml.org/lkml/2019/2/8/200

That should apply to 4.14 and older stable trees.

In case you need that in a different form, please just tell me.

thanks

martin


Attachments:
smime.p7s (3.53 kB)