2020-01-10 16:27:29

by Zak Hays

[permalink] [raw]
Subject: [PATCH v1] mtd: rawnand: micron: don't error out if internal ECC is set

From: Zak Hays <[email protected]>

Recent changes to the driver require use of on-die correction if
the internal ECC enable bit is set. On some Micron parts, this bit
is enabled by default and there is no method for disabling it.

This is a false assumption though as that bit being enabled does not
necessarily mean that the on-die ECC *has* to be used. It has been
verified with a Micron FAE that other methods of error correction are
still valid even if this bit is set.

HW ECC offers generally higher performance than on-die so it is
preferred in some situations. This also allows multiple NAND parts to
be supported on the same PCB as some parts may not support on-die
error correction.

With that in mind, only throw a warning that the on-die bit is set
and allow the init to continue.

Signed-off-by: Zak Hays <[email protected]>
---
drivers/mtd/nand/raw/nand_micron.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c
index 56654030ec7f..ec40c76443be 100644
--- a/drivers/mtd/nand/raw/nand_micron.c
+++ b/drivers/mtd/nand/raw/nand_micron.c
@@ -455,9 +455,7 @@ static int micron_nand_init(struct nand_chip *chip)

if (ondie == MICRON_ON_DIE_MANDATORY &&
chip->ecc.mode != NAND_ECC_ON_DIE) {
- pr_err("On-die ECC forcefully enabled, not supported\n");
- ret = -EINVAL;
- goto err_free_manuf_data;
+ pr_warn("WARNING: On-die ECC forcefully enabled, use caution with other methods\n");
}

if (chip->ecc.mode == NAND_ECC_ON_DIE) {
--
2.17.1


2020-01-17 00:02:22

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH v1] mtd: rawnand: micron: don't error out if internal ECC is set

Hi Zak,

[email protected] wrote on Fri, 10 Jan 2020 11:25:01 -0500:

> From: Zak Hays <[email protected]>
>
> Recent changes to the driver require use of on-die correction if
> the internal ECC enable bit is set. On some Micron parts, this bit
> is enabled by default and there is no method for disabling it.
>
> This is a false assumption though as that bit being enabled does not
> necessarily mean that the on-die ECC *has* to be used. It has been
> verified with a Micron FAE that other methods of error correction are
> still valid even if this bit is set.
>
> HW ECC offers generally higher performance than on-die so it is
> preferred in some situations. This also allows multiple NAND parts to
> be supported on the same PCB as some parts may not support on-die
> error correction.
>
> With that in mind, only throw a warning that the on-die bit is set
> and allow the init to continue.

I don't think I can take this patch as-is. We must find a reliable way
to discriminate Micron parts features. If we cannot (I think we can't
before of the endless list of bugs they have introduced without
documenting them), the best way is to build a static table.

>
> Signed-off-by: Zak Hays <[email protected]>
> ---
> drivers/mtd/nand/raw/nand_micron.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c
> index 56654030ec7f..ec40c76443be 100644
> --- a/drivers/mtd/nand/raw/nand_micron.c
> +++ b/drivers/mtd/nand/raw/nand_micron.c
> @@ -455,9 +455,7 @@ static int micron_nand_init(struct nand_chip *chip)
>
> if (ondie == MICRON_ON_DIE_MANDATORY &&
> chip->ecc.mode != NAND_ECC_ON_DIE) {
> - pr_err("On-die ECC forcefully enabled, not supported\n");
> - ret = -EINVAL;
> - goto err_free_manuf_data;
> + pr_warn("WARNING: On-die ECC forcefully enabled, use caution with other methods\n");
> }
>
> if (chip->ecc.mode == NAND_ECC_ON_DIE) {

Thanks,
Miquèl

2020-01-17 07:12:19

by Marco Felsch

[permalink] [raw]
Subject: Re: [PATCH v1] mtd: rawnand: micron: don't error out if internal ECC is set

Hi Zak, Miquel,

On 20-01-16 19:22, Miquel Raynal wrote:
> Hi Zak,
>
> [email protected] wrote on Fri, 10 Jan 2020 11:25:01 -0500:
>
> > From: Zak Hays <[email protected]>
> >
> > Recent changes to the driver require use of on-die correction if
> > the internal ECC enable bit is set. On some Micron parts, this bit
> > is enabled by default and there is no method for disabling it.

Which changes did you mean here?

> > This is a false assumption though as that bit being enabled does not
> > necessarily mean that the on-die ECC *has* to be used. It has been
> > verified with a Micron FAE that other methods of error correction are
> > still valid even if this bit is set.

It would be cool if a micron FAE can provide a document with all the
quirks and how those quirks can be handled.

> > HW ECC offers generally higher performance than on-die so it is
> > preferred in some situations. This also allows multiple NAND parts to
> > be supported on the same PCB as some parts may not support on-die
> > error correction.

By HW ECC you mean the host ecc controller?

> > With that in mind, only throw a warning that the on-die bit is set
> > and allow the init to continue.
>
> I don't think I can take this patch as-is. We must find a reliable way
> to discriminate Micron parts features. If we cannot (I think we can't
> before of the endless list of bugs they have introduced without
> documenting them), the best way is to build a static table.

+1 for 'find a reliable way to discriminate Micron parts features'

Regards,
Marco

> >
> > Signed-off-by: Zak Hays <[email protected]>
> > ---
> > drivers/mtd/nand/raw/nand_micron.c | 4 +---
> > 1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c
> > index 56654030ec7f..ec40c76443be 100644
> > --- a/drivers/mtd/nand/raw/nand_micron.c
> > +++ b/drivers/mtd/nand/raw/nand_micron.c
> > @@ -455,9 +455,7 @@ static int micron_nand_init(struct nand_chip *chip)
> >
> > if (ondie == MICRON_ON_DIE_MANDATORY &&
> > chip->ecc.mode != NAND_ECC_ON_DIE) {
> > - pr_err("On-die ECC forcefully enabled, not supported\n");
> > - ret = -EINVAL;
> > - goto err_free_manuf_data;
> > + pr_warn("WARNING: On-die ECC forcefully enabled, use caution with other methods\n");
> > }
> >
> > if (chip->ecc.mode == NAND_ECC_ON_DIE) {
>
> Thanks,
> Miqu?l
>

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2020-01-17 19:34:39

by Zak Hays

[permalink] [raw]
Subject: Re: [PATCH v1] mtd: rawnand: micron: don't error out if internal ECC is set

Hi Marco & Miquel,

On Fri, Jan 17, 2020 at 2:10 AM Marco Felsch <[email protected]> wrote:
>
> Hi Zak, Miquel,
>
> On 20-01-16 19:22, Miquel Raynal wrote:
> > Hi Zak,
> >
> > [email protected] wrote on Fri, 10 Jan 2020 11:25:01 -0500:
> >
> > > From: Zak Hays <[email protected]>
> > >
> > > Recent changes to the driver require use of on-die correction if
> > > the internal ECC enable bit is set. On some Micron parts, this bit
> > > is enabled by default and there is no method for disabling it.
>
> Which changes did you mean here?

I was referring to the combination of these two patches:

9748e1d87573 Thomas Petazzoni mtd: nand: add support for Micron on-die ECC
cb2bf403a462 Chris Packham mtd: rawnand: micron: allow forced on-die ECC

>
> > > This is a false assumption though as that bit being enabled does not
> > > necessarily mean that the on-die ECC *has* to be used. It has been
> > > verified with a Micron FAE that other methods of error correction are
> > > still valid even if this bit is set.
>
> It would be cool if a micron FAE can provide a document with all the
> quirks and how those quirks can be handled.
>

Agreed. I'll ask my contact at Micron if such a document exists and
if they would be willing to share it.

> > > HW ECC offers generally higher performance than on-die so it is
> > > preferred in some situations. This also allows multiple NAND parts to
> > > be supported on the same PCB as some parts may not support on-die
> > > error correction.
>
> By HW ECC you mean the host ecc controller?
>

Yes. I used the term HW ECC as that is how it is referenced in the
device tree (nand-ecc-mode = "hw") and the driver (NAND_ECC_HW).

> > > With that in mind, only throw a warning that the on-die bit is set
> > > and allow the init to continue.
> >
> > I don't think I can take this patch as-is. We must find a reliable way
> > to discriminate Micron parts features. If we cannot (I think we can't
> > before of the endless list of bugs they have introduced without
> > documenting them), the best way is to build a static table.
>

That's understandable. I'll look into this a little more and see if it's
feasible to implement a static table to handle this.

Thanks,
Zak


On Fri, Jan 17, 2020 at 2:28 PM Zak Hays <[email protected]> wrote:
>
> Hi Marco & Miquel,
>
> On Fri, Jan 17, 2020 at 2:10 AM Marco Felsch <[email protected]> wrote:
> >
> > Hi Zak, Miquel,
> >
> > On 20-01-16 19:22, Miquel Raynal wrote:
> > > Hi Zak,
> > >
> > > [email protected] wrote on Fri, 10 Jan 2020 11:25:01 -0500:
> > >
> > > > From: Zak Hays <[email protected]>
> > > >
> > > > Recent changes to the driver require use of on-die correction if
> > > > the internal ECC enable bit is set. On some Micron parts, this bit
> > > > is enabled by default and there is no method for disabling it.
> >
> > Which changes did you mean here?
>
> I was referring to the combination of these two patches:
>
> 9748e1d87573 Thomas Petazzoni mtd: nand: add support for Micron on-die ECC
> cb2bf403a462 Chris Packham mtd: rawnand: micron: allow forced on-die ECC
>
> >
> > > > This is a false assumption though as that bit being enabled does not
> > > > necessarily mean that the on-die ECC *has* to be used. It has been
> > > > verified with a Micron FAE that other methods of error correction are
> > > > still valid even if this bit is set.
> >
> > It would be cool if a micron FAE can provide a document with all the
> > quirks and how those quirks can be handled.
> >
>
> Agreed. I'll ask my contact at Micron if such a document exists and
> if they would be willing to share it.
>
> > > > HW ECC offers generally higher performance than on-die so it is
> > > > preferred in some situations. This also allows multiple NAND parts to
> > > > be supported on the same PCB as some parts may not support on-die
> > > > error correction.
> >
> > By HW ECC you mean the host ecc controller?
> >
>
> Yes. I used the term HW ECC as that is how it is referenced in the
> device tree (nand-ecc-mode = "hw") and the driver (NAND_ECC_HW).
>
> > > > With that in mind, only throw a warning that the on-die bit is set
> > > > and allow the init to continue.
> > >
> > > I don't think I can take this patch as-is. We must find a reliable way
> > > to discriminate Micron parts features. If we cannot (I think we can't
> > > before of the endless list of bugs they have introduced without
> > > documenting them), the best way is to build a static table.
> >
>
> That's understandable. I'll look into this a little more and see if it's
> feasible to implement a static table to handle this.
>
> Thanks,
> Zak