2021-08-08 07:27:41

by Daniel Kestrel

[permalink] [raw]
Subject: [PATCH v2] mtd: rawnand: xway: No hardcoded ECC engine, use device tree setting

Some devices use Micron NAND chips, which use on-die ECC. The hardcoded
setting of NAND_ECC_ENGINE_TYPE_SOFT makes them unusable, because the
software ECC on top of the hardware ECC produces errors for every read
and write access, not to mention that booting does not work, because
the boot loader uses the correct ECC when trying to load the kernel
and stops loading on severe ECC errors.
This patch requires the devices that currently work with the hard coded
setting to set the nand-ecc-mode property to soft in their device
tree.

Signed-off-by: Daniel Kestrel <[email protected]>
Tested-by: Aleksander Jan Bajkowski <[email protected]> # tested on BT Home Hub 5A
---
drivers/mtd/nand/raw/xway_nand.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/drivers/mtd/nand/raw/xway_nand.c b/drivers/mtd/nand/raw/xway_nand.c
index 26751976e502..0a4b0aa7dd4c 100644
--- a/drivers/mtd/nand/raw/xway_nand.c
+++ b/drivers/mtd/nand/raw/xway_nand.c
@@ -148,8 +148,6 @@ static void xway_write_buf(struct nand_chip *chip, const u_char *buf, int len)

static int xway_attach_chip(struct nand_chip *chip)
{
- chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_SOFT;
-
if (chip->ecc.algo == NAND_ECC_ALGO_UNKNOWN)
chip->ecc.algo = NAND_ECC_ALGO_HAMMING;

--
2.17.1


2021-08-16 07:32:27

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH v2] mtd: rawnand: xway: No hardcoded ECC engine, use device tree setting

Hi Daniel,

Daniel Kestrel <[email protected]> wrote on Sun, 8 Aug 2021
09:26:43 +0200:

> Some devices use Micron NAND chips, which use on-die ECC. The hardcoded
> setting of NAND_ECC_ENGINE_TYPE_SOFT makes them unusable, because the
> software ECC on top of the hardware ECC produces errors for every read
> and write access, not to mention that booting does not work, because
> the boot loader uses the correct ECC when trying to load the kernel
> and stops loading on severe ECC errors.
> This patch requires the devices that currently work with the hard coded
> setting to set the nand-ecc-mode property to soft in their device
> tree.
>

Please add a Fixes: and Cc: stable tags, you will also need to send to
[email protected] a different version of the patch for the kernel
5.4 IIUC.

> Signed-off-by: Daniel Kestrel <[email protected]>
> Tested-by: Aleksander Jan Bajkowski <[email protected]> # tested on BT Home Hub 5A
> ---
> drivers/mtd/nand/raw/xway_nand.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/drivers/mtd/nand/raw/xway_nand.c b/drivers/mtd/nand/raw/xway_nand.c
> index 26751976e502..0a4b0aa7dd4c 100644
> --- a/drivers/mtd/nand/raw/xway_nand.c
> +++ b/drivers/mtd/nand/raw/xway_nand.c
> @@ -148,8 +148,6 @@ static void xway_write_buf(struct nand_chip *chip, const u_char *buf, int len)
>
> static int xway_attach_chip(struct nand_chip *chip)
> {
> - chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_SOFT;
> -
> if (chip->ecc.algo == NAND_ECC_ALGO_UNKNOWN)
> chip->ecc.algo = NAND_ECC_ALGO_HAMMING;

You also need to only set the Hamming algorithm when engine_type is
TYPE_SOFT.

Thanks,
Miquèl

2021-08-19 07:24:05

by Daniel Kestrel

[permalink] [raw]
Subject: Re: [PATCH v2] mtd: rawnand: xway: No hardcoded ECC engine, use device tree setting

Hi Miquèl

Am Mo., 16. Aug. 2021 um 09:31 Uhr schrieb Miquel Raynal
<[email protected]>:
>
> Hi Daniel,
>
> Daniel Kestrel <[email protected]> wrote on Sun, 8 Aug 2021
> 09:26:43 +0200:
>
> > Some devices use Micron NAND chips, which use on-die ECC. The hardcoded
> > setting of NAND_ECC_ENGINE_TYPE_SOFT makes them unusable, because the
> > software ECC on top of the hardware ECC produces errors for every read
> > and write access, not to mention that booting does not work, because
> > the boot loader uses the correct ECC when trying to load the kernel
> > and stops loading on severe ECC errors.
> > This patch requires the devices that currently work with the hard coded
> > setting to set the nand-ecc-mode property to soft in their device
> > tree.
> >
>
> Please add a Fixes: and Cc: stable tags, you will also need to send to
> [email protected] a different version of the patch for the kernel
> 5.4 IIUC.
>
> > Signed-off-by: Daniel Kestrel <[email protected]>
> > Tested-by: Aleksander Jan Bajkowski <[email protected]> # tested on BT Home Hub 5A
> > ---
> > drivers/mtd/nand/raw/xway_nand.c | 2 --
> > 1 file changed, 2 deletions(-)
> >
> > diff --git a/drivers/mtd/nand/raw/xway_nand.c b/drivers/mtd/nand/raw/xway_nand.c
> > index 26751976e502..0a4b0aa7dd4c 100644
> > --- a/drivers/mtd/nand/raw/xway_nand.c
> > +++ b/drivers/mtd/nand/raw/xway_nand.c
> > @@ -148,8 +148,6 @@ static void xway_write_buf(struct nand_chip *chip, const u_char *buf, int len)
> >
> > static int xway_attach_chip(struct nand_chip *chip)
> > {
> > - chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_SOFT;
> > -
> > if (chip->ecc.algo == NAND_ECC_ALGO_UNKNOWN)
> > chip->ecc.algo = NAND_ECC_ALGO_HAMMING;
>
> You also need to only set the Hamming algorithm when engine_type is
> TYPE_SOFT.
>
> Thanks,
> Miquèl

I am really struggling with what to do. For one of the affected
devices, they created two device
trees, one for Micron and one for all others. Which obviously had no
effect due to the
hardcoded settings, which led me to Patch 2 and I thought, so be it.
But the process to figure
out if ones device has Micron Chips is essentially flashing an image
and if it does not work,
use the stock OEM recovery and try the other image.
However, since Micron is the only chip that is treated differently, I wonder
if your first proposal, even though it is hacky, is the better
approach to solve the issue
for the Micron devices not booting and throwing ECC errors. What do you think?
Follow up first patch or this one?

Thanks, Daniel.

2021-08-19 08:06:30

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH v2] mtd: rawnand: xway: No hardcoded ECC engine, use device tree setting

Hello,

Kestrel seventyfour <[email protected]> wrote on Thu, 19 Aug
2021 09:21:42 +0200:

> Hi Miquèl
>
> Am Mo., 16. Aug. 2021 um 09:31 Uhr schrieb Miquel Raynal
> <[email protected]>:
> >
> > Hi Daniel,
> >
> > Daniel Kestrel <[email protected]> wrote on Sun, 8 Aug 2021
> > 09:26:43 +0200:
> >
> > > Some devices use Micron NAND chips, which use on-die ECC. The hardcoded
> > > setting of NAND_ECC_ENGINE_TYPE_SOFT makes them unusable, because the
> > > software ECC on top of the hardware ECC produces errors for every read
> > > and write access, not to mention that booting does not work, because
> > > the boot loader uses the correct ECC when trying to load the kernel
> > > and stops loading on severe ECC errors.
> > > This patch requires the devices that currently work with the hard coded
> > > setting to set the nand-ecc-mode property to soft in their device
> > > tree.
> > >
> >
> > Please add a Fixes: and Cc: stable tags, you will also need to send to
> > [email protected] a different version of the patch for the kernel
> > 5.4 IIUC.
> >
> > > Signed-off-by: Daniel Kestrel <[email protected]>
> > > Tested-by: Aleksander Jan Bajkowski <[email protected]> # tested on BT Home Hub 5A
> > > ---
> > > drivers/mtd/nand/raw/xway_nand.c | 2 --
> > > 1 file changed, 2 deletions(-)
> > >
> > > diff --git a/drivers/mtd/nand/raw/xway_nand.c b/drivers/mtd/nand/raw/xway_nand.c
> > > index 26751976e502..0a4b0aa7dd4c 100644
> > > --- a/drivers/mtd/nand/raw/xway_nand.c
> > > +++ b/drivers/mtd/nand/raw/xway_nand.c
> > > @@ -148,8 +148,6 @@ static void xway_write_buf(struct nand_chip *chip, const u_char *buf, int len)
> > >
> > > static int xway_attach_chip(struct nand_chip *chip)
> > > {
> > > - chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_SOFT;
> > > -
> > > if (chip->ecc.algo == NAND_ECC_ALGO_UNKNOWN)
> > > chip->ecc.algo = NAND_ECC_ALGO_HAMMING;
> >
> > You also need to only set the Hamming algorithm when engine_type is
> > TYPE_SOFT.
> >
> > Thanks,
> > Miquèl
>
> I am really struggling with what to do. For one of the affected
> devices, they created two device
> trees, one for Micron and one for all others. Which obviously had no
> effect due to the
> hardcoded settings, which led me to Patch 2 and I thought, so be it.
> But the process to figure
> out if ones device has Micron Chips is essentially flashing an image
> and if it does not work,
> use the stock OEM recovery and try the other image.
> However, since Micron is the only chip that is treated differently, I wonder
> if your first proposal, even though it is hacky, is the better
> approach to solve the issue
> for the Micron devices not booting and throwing ECC errors. What do you think?
> Follow up first patch or this one?

I am not sure we understood each other, your patch is fine, but you
need to do something like:

static int xway_attach_chip(struct nand_chip *chip)
{
if (chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_SOFT &&
chip->ecc.algo == NAND_ECC_ALGO_UNKNOWN)
chip->ecc.algo = NAND_ECC_ALGO_HAMMING;

In the DT you should not force any ECC engine (drop the nand-ecc-xxx
properties) and let the core handle it. It will probably choose the
most suitable engines for you.

Thanks,
Miquèl

2021-08-23 11:20:55

by Daniel Kestrel

[permalink] [raw]
Subject: Re: [PATCH v2] mtd: rawnand: xway: No hardcoded ECC engine, use device tree setting

Hi Miquèl,

Am Do., 19. Aug. 2021 um 10:03 Uhr schrieb Miquel Raynal
<[email protected]>:
>
> Hello,
>
> Kestrel seventyfour <[email protected]> wrote on Thu, 19 Aug
> 2021 09:21:42 +0200:
>
> > Hi Miquèl
> >
> > Am Mo., 16. Aug. 2021 um 09:31 Uhr schrieb Miquel Raynal
> > <[email protected]>:
> > >
> > > Hi Daniel,
> > >
> > > Daniel Kestrel <[email protected]> wrote on Sun, 8 Aug 2021
> > > 09:26:43 +0200:
> > >
> > > > Some devices use Micron NAND chips, which use on-die ECC. The hardcoded
> > > > setting of NAND_ECC_ENGINE_TYPE_SOFT makes them unusable, because the
> > > > software ECC on top of the hardware ECC produces errors for every read
> > > > and write access, not to mention that booting does not work, because
> > > > the boot loader uses the correct ECC when trying to load the kernel
> > > > and stops loading on severe ECC errors.
> > > > This patch requires the devices that currently work with the hard coded
> > > > setting to set the nand-ecc-mode property to soft in their device
> > > > tree.
> > > >
> > >
> > > Please add a Fixes: and Cc: stable tags, you will also need to send to
> > > [email protected] a different version of the patch for the kernel
> > > 5.4 IIUC.
> > >
> > > > Signed-off-by: Daniel Kestrel <[email protected]>
> > > > Tested-by: Aleksander Jan Bajkowski <[email protected]> # tested on BT Home Hub 5A
> > > > ---
> > > > drivers/mtd/nand/raw/xway_nand.c | 2 --
> > > > 1 file changed, 2 deletions(-)
> > > >
> > > > diff --git a/drivers/mtd/nand/raw/xway_nand.c b/drivers/mtd/nand/raw/xway_nand.c
> > > > index 26751976e502..0a4b0aa7dd4c 100644
> > > > --- a/drivers/mtd/nand/raw/xway_nand.c
> > > > +++ b/drivers/mtd/nand/raw/xway_nand.c
> > > > @@ -148,8 +148,6 @@ static void xway_write_buf(struct nand_chip *chip, const u_char *buf, int len)
> > > >
> > > > static int xway_attach_chip(struct nand_chip *chip)
> > > > {
> > > > - chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_SOFT;
> > > > -
> > > > if (chip->ecc.algo == NAND_ECC_ALGO_UNKNOWN)
> > > > chip->ecc.algo = NAND_ECC_ALGO_HAMMING;
> > >
> > > You also need to only set the Hamming algorithm when engine_type is
> > > TYPE_SOFT.
> > >
> > > Thanks,
> > > Miquèl
> >
> > I am really struggling with what to do. For one of the affected
> > devices, they created two device
> > trees, one for Micron and one for all others. Which obviously had no
> > effect due to the
> > hardcoded settings, which led me to Patch 2 and I thought, so be it.
> > But the process to figure
> > out if ones device has Micron Chips is essentially flashing an image
> > and if it does not work,
> > use the stock OEM recovery and try the other image.
> > However, since Micron is the only chip that is treated differently, I wonder
> > if your first proposal, even though it is hacky, is the better
> > approach to solve the issue
> > for the Micron devices not booting and throwing ECC errors. What do you think?
> > Follow up first patch or this one?
>
> I am not sure we understood each other, your patch is fine, but you
> need to do something like:
>
> static int xway_attach_chip(struct nand_chip *chip)
> {
> if (chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_SOFT &&
> chip->ecc.algo == NAND_ECC_ALGO_UNKNOWN)
> chip->ecc.algo = NAND_ECC_ALGO_HAMMING;
>
> In the DT you should not force any ECC engine (drop the nand-ecc-xxx
> properties) and let the core handle it. It will probably choose the
> most suitable engines for you.
>
> Thanks,
> Miquèl

thank you for your response.
If I remove the nand-ecc-xxx properties in the device tree, the device with
the Toshiba NAND chip is working. However, the device with the Micron
NAND fails with NO ECC functions supplied; hardware ECC not possible,
seems to be at line 5367 or equivalent.
https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_base.c#L5367

It looks like the micron nand driver supports on die only if its
specified int the
Device tree:
https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_micron.c#L511
The Micron NAND driver probably needs to set the ECC type to ON DIE if the
variable ondie contains the supported attribute?!

Any ideas? Second patch?

Thanks, Daniel.

2021-08-23 15:25:32

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH v2] mtd: rawnand: xway: No hardcoded ECC engine, use device tree setting

Hi Kestrel,

Kestrel seventyfour <[email protected]> wrote on Mon, 23 Aug
2021 13:19:43 +0200:

> Hi Miquèl,
>
> Am Do., 19. Aug. 2021 um 10:03 Uhr schrieb Miquel Raynal
> <[email protected]>:
> >
> > Hello,
> >
> > Kestrel seventyfour <[email protected]> wrote on Thu, 19 Aug
> > 2021 09:21:42 +0200:
> >
> > > Hi Miquèl
> > >
> > > Am Mo., 16. Aug. 2021 um 09:31 Uhr schrieb Miquel Raynal
> > > <[email protected]>:
> > > >
> > > > Hi Daniel,
> > > >
> > > > Daniel Kestrel <[email protected]> wrote on Sun, 8 Aug 2021
> > > > 09:26:43 +0200:
> > > >
> > > > > Some devices use Micron NAND chips, which use on-die ECC. The hardcoded
> > > > > setting of NAND_ECC_ENGINE_TYPE_SOFT makes them unusable, because the
> > > > > software ECC on top of the hardware ECC produces errors for every read
> > > > > and write access, not to mention that booting does not work, because
> > > > > the boot loader uses the correct ECC when trying to load the kernel
> > > > > and stops loading on severe ECC errors.
> > > > > This patch requires the devices that currently work with the hard coded
> > > > > setting to set the nand-ecc-mode property to soft in their device
> > > > > tree.
> > > > >
> > > >
> > > > Please add a Fixes: and Cc: stable tags, you will also need to send to
> > > > [email protected] a different version of the patch for the kernel
> > > > 5.4 IIUC.
> > > >
> > > > > Signed-off-by: Daniel Kestrel <[email protected]>
> > > > > Tested-by: Aleksander Jan Bajkowski <[email protected]> # tested on BT Home Hub 5A
> > > > > ---
> > > > > drivers/mtd/nand/raw/xway_nand.c | 2 --
> > > > > 1 file changed, 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/mtd/nand/raw/xway_nand.c b/drivers/mtd/nand/raw/xway_nand.c
> > > > > index 26751976e502..0a4b0aa7dd4c 100644
> > > > > --- a/drivers/mtd/nand/raw/xway_nand.c
> > > > > +++ b/drivers/mtd/nand/raw/xway_nand.c
> > > > > @@ -148,8 +148,6 @@ static void xway_write_buf(struct nand_chip *chip, const u_char *buf, int len)
> > > > >
> > > > > static int xway_attach_chip(struct nand_chip *chip)
> > > > > {
> > > > > - chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_SOFT;
> > > > > -
> > > > > if (chip->ecc.algo == NAND_ECC_ALGO_UNKNOWN)
> > > > > chip->ecc.algo = NAND_ECC_ALGO_HAMMING;
> > > >
> > > > You also need to only set the Hamming algorithm when engine_type is
> > > > TYPE_SOFT.
> > > >
> > > > Thanks,
> > > > Miquèl
> > >
> > > I am really struggling with what to do. For one of the affected
> > > devices, they created two device
> > > trees, one for Micron and one for all others. Which obviously had no
> > > effect due to the
> > > hardcoded settings, which led me to Patch 2 and I thought, so be it.
> > > But the process to figure
> > > out if ones device has Micron Chips is essentially flashing an image
> > > and if it does not work,
> > > use the stock OEM recovery and try the other image.
> > > However, since Micron is the only chip that is treated differently, I wonder
> > > if your first proposal, even though it is hacky, is the better
> > > approach to solve the issue
> > > for the Micron devices not booting and throwing ECC errors. What do you think?
> > > Follow up first patch or this one?
> >
> > I am not sure we understood each other, your patch is fine, but you
> > need to do something like:
> >
> > static int xway_attach_chip(struct nand_chip *chip)
> > {
> > if (chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_SOFT &&
> > chip->ecc.algo == NAND_ECC_ALGO_UNKNOWN)
> > chip->ecc.algo = NAND_ECC_ALGO_HAMMING;
> >
> > In the DT you should not force any ECC engine (drop the nand-ecc-xxx
> > properties) and let the core handle it. It will probably choose the
> > most suitable engines for you.
> >
> > Thanks,
> > Miquèl
>
> thank you for your response.
> If I remove the nand-ecc-xxx properties in the device tree, the device with
> the Toshiba NAND chip is working. However, the device with the Micron
> NAND fails with NO ECC functions supplied; hardware ECC not possible,
> seems to be at line 5367 or equivalent.
> https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_base.c#L5367
>
> It looks like the micron nand driver supports on die only if its
> specified int the
> Device tree:
> https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_micron.c#L511
> The Micron NAND driver probably needs to set the ECC type to ON DIE if the
> variable ondie contains the supported attribute?!

You're right but I don't see any easy upstream-able solution here.
Changing the behavior in the Xway driver would certainly break users,
changing the behavior in the Micron driver would certainly break even
more users. The root cause being an absence of proper description (the
integration changed). Honestly I feel stuck, maybe you can try to
register your device, if it fails, change the integration in the driver
(to an ondie ecc engine) then retry?

Thanks,
Miquèl

2021-08-24 07:17:15

by Daniel Kestrel

[permalink] [raw]
Subject: Re: [PATCH v2] mtd: rawnand: xway: No hardcoded ECC engine, use device tree setting

Hi Miquèl,

Am Mo., 23. Aug. 2021 um 17:24 Uhr schrieb Miquel Raynal
<[email protected]>:
>
> Hi Kestrel,
>
> Kestrel seventyfour <[email protected]> wrote on Mon, 23 Aug
> 2021 13:19:43 +0200:
>
> > Hi Miquèl,
> >
> > Am Do., 19. Aug. 2021 um 10:03 Uhr schrieb Miquel Raynal
> > <[email protected]>:
...
> >
> > thank you for your response.
> > If I remove the nand-ecc-xxx properties in the device tree, the device with
> > the Toshiba NAND chip is working. However, the device with the Micron
> > NAND fails with NO ECC functions supplied; hardware ECC not possible,
> > seems to be at line 5367 or equivalent.
> > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_base.c#L5367
> >
> > It looks like the micron nand driver supports on die only if its
> > specified int the
> > Device tree:
> > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_micron.c#L511
> > The Micron NAND driver probably needs to set the ECC type to ON DIE if the
> > variable ondie contains the supported attribute?!
>
> You're right but I don't see any easy upstream-able solution here.
> Changing the behavior in the Xway driver would certainly break users,
> changing the behavior in the Micron driver would certainly break even
> more users. The root cause being an absence of proper description (the
> integration changed). Honestly I feel stuck, maybe you can try to
> register your device, if it fails, change the integration in the driver
> (to an ondie ecc engine) then retry?
>
> Thanks,
> Miquèl

Do you think adding something like below at the following location
https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/xway_nand.c#L223
would be upstreamable (with or without device tree property?)?

err = nand_scan(&data->chip, 1);
if (err /* && of_property_read_bool(np, "lantiq,retry-on-die") */) {
data->chip.ecc.engine_type = NAND_ECC_ENGINE_TYPE_ON_DIE;
err = nand_scan(&data->chip, 1);
if (err) return err;
}

It still throws the kernel warning on first try, but the second try then works.

Thanks Daniel.

2021-08-24 17:27:13

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH v2] mtd: rawnand: xway: No hardcoded ECC engine, use device tree setting

Hello,

Kestrel seventyfour <[email protected]> wrote on Tue, 24 Aug
2021 09:15:49 +0200:

> Hi Miquèl,
>
> Am Mo., 23. Aug. 2021 um 17:24 Uhr schrieb Miquel Raynal
> <[email protected]>:
> >
> > Hi Kestrel,
> >
> > Kestrel seventyfour <[email protected]> wrote on Mon, 23 Aug
> > 2021 13:19:43 +0200:
> >
> > > Hi Miquèl,
> > >
> > > Am Do., 19. Aug. 2021 um 10:03 Uhr schrieb Miquel Raynal
> > > <[email protected]>:
> ...
> > >
> > > thank you for your response.
> > > If I remove the nand-ecc-xxx properties in the device tree, the device with
> > > the Toshiba NAND chip is working. However, the device with the Micron
> > > NAND fails with NO ECC functions supplied; hardware ECC not possible,
> > > seems to be at line 5367 or equivalent.
> > > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_base.c#L5367
> > >
> > > It looks like the micron nand driver supports on die only if its
> > > specified int the
> > > Device tree:
> > > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_micron.c#L511
> > > The Micron NAND driver probably needs to set the ECC type to ON DIE if the
> > > variable ondie contains the supported attribute?!
> >
> > You're right but I don't see any easy upstream-able solution here.
> > Changing the behavior in the Xway driver would certainly break users,
> > changing the behavior in the Micron driver would certainly break even
> > more users. The root cause being an absence of proper description (the
> > integration changed). Honestly I feel stuck, maybe you can try to
> > register your device, if it fails, change the integration in the driver
> > (to an ondie ecc engine) then retry?
> >
> > Thanks,
> > Miquèl
>
> Do you think adding something like below at the following location
> https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/xway_nand.c#L223
> would be upstreamable (with or without device tree property?)?
>
> err = nand_scan(&data->chip, 1);
> if (err /* && of_property_read_bool(np, "lantiq,retry-on-die") */) {
> data->chip.ecc.engine_type = NAND_ECC_ENGINE_TYPE_ON_DIE;
> err = nand_scan(&data->chip, 1);
> if (err) return err;
> }
>
> It still throws the kernel warning on first try, but the second try then works.

Can you please remind me what is xway/lantiq/your setup/how public it
is/who's using this driver?

Thanks,
Miquèl

2021-08-25 08:52:04

by Daniel Kestrel

[permalink] [raw]
Subject: Re: [PATCH v2] mtd: rawnand: xway: No hardcoded ECC engine, use device tree setting

Hi Miquèl,

Am Di., 24. Aug. 2021 um 19:22 Uhr schrieb Miquel Raynal
<[email protected]>:
>
> Hello,
>
> Kestrel seventyfour <[email protected]> wrote on Tue, 24 Aug
> 2021 09:15:49 +0200:
>
> > Hi Miquèl,
> >
> > Am Mo., 23. Aug. 2021 um 17:24 Uhr schrieb Miquel Raynal
> > <[email protected]>:
> > >
> > > Hi Kestrel,
> > >
> > > Kestrel seventyfour <[email protected]> wrote on Mon, 23 Aug
> > > 2021 13:19:43 +0200:
> > >
> > > > Hi Miquèl,
> > > >
> > > > Am Do., 19. Aug. 2021 um 10:03 Uhr schrieb Miquel Raynal
> > > > <[email protected]>:
> > ...
> > > >
> > > > thank you for your response.
> > > > If I remove the nand-ecc-xxx properties in the device tree, the device with
> > > > the Toshiba NAND chip is working. However, the device with the Micron
> > > > NAND fails with NO ECC functions supplied; hardware ECC not possible,
> > > > seems to be at line 5367 or equivalent.
> > > > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_base.c#L5367
> > > >
> > > > It looks like the micron nand driver supports on die only if its
> > > > specified int the
> > > > Device tree:
> > > > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_micron.c#L511
> > > > The Micron NAND driver probably needs to set the ECC type to ON DIE if the
> > > > variable ondie contains the supported attribute?!
> > >
> > > You're right but I don't see any easy upstream-able solution here.
> > > Changing the behavior in the Xway driver would certainly break users,
> > > changing the behavior in the Micron driver would certainly break even
> > > more users. The root cause being an absence of proper description (the
> > > integration changed). Honestly I feel stuck, maybe you can try to
> > > register your device, if it fails, change the integration in the driver
> > > (to an ondie ecc engine) then retry?
> > >
> > > Thanks,
> > > Miquèl
> >
> > Do you think adding something like below at the following location
> > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/xway_nand.c#L223
> > would be upstreamable (with or without device tree property?)?
> >
> > err = nand_scan(&data->chip, 1);
> > if (err /* && of_property_read_bool(np, "lantiq,retry-on-die") */) {
> > data->chip.ecc.engine_type = NAND_ECC_ENGINE_TYPE_ON_DIE;
> > err = nand_scan(&data->chip, 1);
> > if (err) return err;
> > }
> >
> > It still throws the kernel warning on first try, but the second try then works.
>
> Can you please remind me what is xway/lantiq/your setup/how public it
> is/who's using this driver?
>
> Thanks,
> Miquèl

Its for Openwrt, I would like to add support for 3 more devices
AVM fritzbox 3490/5490 and 7490. They all have varying NAND chips.
I have initially created a PR to have my initial patch tested:
https://github.com/openwrt/openwrt/pull/4426
There is already one device supported which has two DTBs one for
Micron and one for non Micron (3370), but its not very straight forward.
Without having this issue solved, flashing those devices would be
possibly having issues depending on NAND chip or the awkward
workaround of flashing one image and if it does not boot, boot the
other one. Without self soldered serial console, it would not very
easy to figure out the NAND manufacturer.
The AVM stock firmware is old kernel and does not use device
tree for NAND, they just query all possible manufacturers and set
up NAND based on manufacturer query.

Thanks, Daniel.

2021-08-25 08:53:46

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH v2] mtd: rawnand: xway: No hardcoded ECC engine, use device tree setting


Kestrel seventyfour <[email protected]> wrote on Wed, 25 Aug
2021 10:47:40 +0200:

> Hi Miquèl,
>
> Am Di., 24. Aug. 2021 um 19:22 Uhr schrieb Miquel Raynal
> <[email protected]>:
> >
> > Hello,
> >
> > Kestrel seventyfour <[email protected]> wrote on Tue, 24 Aug
> > 2021 09:15:49 +0200:
> >
> > > Hi Miquèl,
> > >
> > > Am Mo., 23. Aug. 2021 um 17:24 Uhr schrieb Miquel Raynal
> > > <[email protected]>:
> > > >
> > > > Hi Kestrel,
> > > >
> > > > Kestrel seventyfour <[email protected]> wrote on Mon, 23 Aug
> > > > 2021 13:19:43 +0200:
> > > >
> > > > > Hi Miquèl,
> > > > >
> > > > > Am Do., 19. Aug. 2021 um 10:03 Uhr schrieb Miquel Raynal
> > > > > <[email protected]>:
> > > ...
> > > > >
> > > > > thank you for your response.
> > > > > If I remove the nand-ecc-xxx properties in the device tree, the device with
> > > > > the Toshiba NAND chip is working. However, the device with the Micron
> > > > > NAND fails with NO ECC functions supplied; hardware ECC not possible,
> > > > > seems to be at line 5367 or equivalent.
> > > > > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_base.c#L5367
> > > > >
> > > > > It looks like the micron nand driver supports on die only if its
> > > > > specified int the
> > > > > Device tree:
> > > > > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_micron.c#L511
> > > > > The Micron NAND driver probably needs to set the ECC type to ON DIE if the
> > > > > variable ondie contains the supported attribute?!
> > > >
> > > > You're right but I don't see any easy upstream-able solution here.
> > > > Changing the behavior in the Xway driver would certainly break users,
> > > > changing the behavior in the Micron driver would certainly break even
> > > > more users. The root cause being an absence of proper description (the
> > > > integration changed). Honestly I feel stuck, maybe you can try to
> > > > register your device, if it fails, change the integration in the driver
> > > > (to an ondie ecc engine) then retry?
> > > >
> > > > Thanks,
> > > > Miquèl
> > >
> > > Do you think adding something like below at the following location
> > > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/xway_nand.c#L223
> > > would be upstreamable (with or without device tree property?)?
> > >
> > > err = nand_scan(&data->chip, 1);
> > > if (err /* && of_property_read_bool(np, "lantiq,retry-on-die") */) {
> > > data->chip.ecc.engine_type = NAND_ECC_ENGINE_TYPE_ON_DIE;
> > > err = nand_scan(&data->chip, 1);
> > > if (err) return err;
> > > }
> > >
> > > It still throws the kernel warning on first try, but the second try then works.
> >
> > Can you please remind me what is xway/lantiq/your setup/how public it
> > is/who's using this driver?
> >
> > Thanks,
> > Miquèl
>
> Its for Openwrt, I would like to add support for 3 more devices
> AVM fritzbox 3490/5490 and 7490. They all have varying NAND chips.
> I have initially created a PR to have my initial patch tested:
> https://github.com/openwrt/openwrt/pull/4426
> There is already one device supported which has two DTBs one for
> Micron and one for non Micron (3370), but its not very straight forward.
> Without having this issue solved, flashing those devices would be
> possibly having issues depending on NAND chip or the awkward
> workaround of flashing one image and if it does not boot, boot the
> other one. Without self soldered serial console, it would not very
> easy to figure out the NAND manufacturer.
> The AVM stock firmware is old kernel and does not use device
> tree for NAND, they just query all possible manufacturers and set
> up NAND based on manufacturer query.

But in this case can't you check the 'root' compatible against certain
values and and some kind of quirk in the ->attach() hook to update the
ECC engine to the right one?

Thanks,
Miquèl

2021-08-25 14:10:02

by Daniel Kestrel

[permalink] [raw]
Subject: Re: [PATCH v2] mtd: rawnand: xway: No hardcoded ECC engine, use device tree setting

Hi Miquèl,

Am Mi., 25. Aug. 2021 um 10:51 Uhr schrieb Miquel Raynal
<[email protected]>:
>
>
> Kestrel seventyfour <[email protected]> wrote on Wed, 25 Aug
> 2021 10:47:40 +0200:
>
> > Hi Miquèl,
> >
> > Am Di., 24. Aug. 2021 um 19:22 Uhr schrieb Miquel Raynal
> > <[email protected]>:
> > >
> > > Hello,
> > >
> > > Kestrel seventyfour <[email protected]> wrote on Tue, 24 Aug
> > > 2021 09:15:49 +0200:
> > >
> > > > Hi Miquèl,
> > > >
> > > > Am Mo., 23. Aug. 2021 um 17:24 Uhr schrieb Miquel Raynal
> > > > <[email protected]>:
> > > > >
> > > > > Hi Kestrel,
> > > > >
> > > > > Kestrel seventyfour <[email protected]> wrote on Mon, 23 Aug
> > > > > 2021 13:19:43 +0200:
> > > > >
> > > > > > Hi Miquèl,
> > > > > >
> > > > > > Am Do., 19. Aug. 2021 um 10:03 Uhr schrieb Miquel Raynal
> > > > > > <[email protected]>:
> > > > ...
> > > > > >
> > > > > > thank you for your response.
> > > > > > If I remove the nand-ecc-xxx properties in the device tree, the device with
> > > > > > the Toshiba NAND chip is working. However, the device with the Micron
> > > > > > NAND fails with NO ECC functions supplied; hardware ECC not possible,
> > > > > > seems to be at line 5367 or equivalent.
> > > > > > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_base.c#L5367
> > > > > >
> > > > > > It looks like the micron nand driver supports on die only if its
> > > > > > specified int the
> > > > > > Device tree:
> > > > > > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_micron.c#L511
> > > > > > The Micron NAND driver probably needs to set the ECC type to ON DIE if the
> > > > > > variable ondie contains the supported attribute?!
> > > > >
> > > > > You're right but I don't see any easy upstream-able solution here.
> > > > > Changing the behavior in the Xway driver would certainly break users,
> > > > > changing the behavior in the Micron driver would certainly break even
> > > > > more users. The root cause being an absence of proper description (the
> > > > > integration changed). Honestly I feel stuck, maybe you can try to
> > > > > register your device, if it fails, change the integration in the driver
> > > > > (to an ondie ecc engine) then retry?
> > > > >
> > > > > Thanks,
> > > > > Miquèl
> > > >
> > > > Do you think adding something like below at the following location
> > > > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/xway_nand.c#L223
> > > > would be upstreamable (with or without device tree property?)?
> > > >
> > > > err = nand_scan(&data->chip, 1);
> > > > if (err /* && of_property_read_bool(np, "lantiq,retry-on-die") */) {
> > > > data->chip.ecc.engine_type = NAND_ECC_ENGINE_TYPE_ON_DIE;
> > > > err = nand_scan(&data->chip, 1);
> > > > if (err) return err;
> > > > }
> > > >
> > > > It still throws the kernel warning on first try, but the second try then works.
> > >
> > > Can you please remind me what is xway/lantiq/your setup/how public it
> > > is/who's using this driver?
> > >
> > > Thanks,
> > > Miquèl
> >
> > Its for Openwrt, I would like to add support for 3 more devices
> > AVM fritzbox 3490/5490 and 7490. They all have varying NAND chips.
> > I have initially created a PR to have my initial patch tested:
> > https://github.com/openwrt/openwrt/pull/4426
> > There is already one device supported which has two DTBs one for
> > Micron and one for non Micron (3370), but its not very straight forward.
> > Without having this issue solved, flashing those devices would be
> > possibly having issues depending on NAND chip or the awkward
> > workaround of flashing one image and if it does not boot, boot the
> > other one. Without self soldered serial console, it would not very
> > easy to figure out the NAND manufacturer.
> > The AVM stock firmware is old kernel and does not use device
> > tree for NAND, they just query all possible manufacturers and set
> > up NAND based on manufacturer query.
>
> But in this case can't you check the 'root' compatible against certain
> values and and some kind of quirk in the ->attach() hook to update the
> ECC engine to the right one?
>
> Thanks,
> Miquèl

maybe I wrote the wrong thing, but what do you mean with root compatible.
The controller is always NAND xway and I think there are 4 different
NAND chip types. But the stock firmware just queries the NAND
manufacturer name, not a date or type/model of the devices. E.g. 7490 can
have Micron or Macronix NAND, but querying the Micron id or Macronix
id requires a change to the xway driver?
Do you mean something like in the first patch do query the manufacturer
but just for all the 4?

Thanks, Daniel.

2021-08-25 15:37:31

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH v2] mtd: rawnand: xway: No hardcoded ECC engine, use device tree setting

> > > > > > > thank you for your response.
> > > > > > > If I remove the nand-ecc-xxx properties in the device tree, the device with
> > > > > > > the Toshiba NAND chip is working. However, the device with the Micron
> > > > > > > NAND fails with NO ECC functions supplied; hardware ECC not possible,
> > > > > > > seems to be at line 5367 or equivalent.
> > > > > > > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_base.c#L5367
> > > > > > >
> > > > > > > It looks like the micron nand driver supports on die only if its
> > > > > > > specified int the
> > > > > > > Device tree:
> > > > > > > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_micron.c#L511
> > > > > > > The Micron NAND driver probably needs to set the ECC type to ON DIE if the
> > > > > > > variable ondie contains the supported attribute?!
> > > > > >
> > > > > > You're right but I don't see any easy upstream-able solution here.
> > > > > > Changing the behavior in the Xway driver would certainly break users,
> > > > > > changing the behavior in the Micron driver would certainly break even
> > > > > > more users. The root cause being an absence of proper description (the
> > > > > > integration changed). Honestly I feel stuck, maybe you can try to
> > > > > > register your device, if it fails, change the integration in the driver
> > > > > > (to an ondie ecc engine) then retry?
> > > > > >
> > > > > > Thanks,
> > > > > > Miquèl
> > > > >
> > > > > Do you think adding something like below at the following location
> > > > > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/xway_nand.c#L223
> > > > > would be upstreamable (with or without device tree property?)?
> > > > >
> > > > > err = nand_scan(&data->chip, 1);
> > > > > if (err /* && of_property_read_bool(np, "lantiq,retry-on-die") */) {
> > > > > data->chip.ecc.engine_type = NAND_ECC_ENGINE_TYPE_ON_DIE;
> > > > > err = nand_scan(&data->chip, 1);
> > > > > if (err) return err;
> > > > > }
> > > > >
> > > > > It still throws the kernel warning on first try, but the second try then works.
> > > >
> > > > Can you please remind me what is xway/lantiq/your setup/how public it
> > > > is/who's using this driver?
> > > >
> > > > Thanks,
> > > > Miquèl
> > >
> > > Its for Openwrt, I would like to add support for 3 more devices
> > > AVM fritzbox 3490/5490 and 7490. They all have varying NAND chips.
> > > I have initially created a PR to have my initial patch tested:
> > > https://github.com/openwrt/openwrt/pull/4426
> > > There is already one device supported which has two DTBs one for
> > > Micron and one for non Micron (3370), but its not very straight forward.
> > > Without having this issue solved, flashing those devices would be
> > > possibly having issues depending on NAND chip or the awkward
> > > workaround of flashing one image and if it does not boot, boot the
> > > other one. Without self soldered serial console, it would not very
> > > easy to figure out the NAND manufacturer.
> > > The AVM stock firmware is old kernel and does not use device
> > > tree for NAND, they just query all possible manufacturers and set
> > > up NAND based on manufacturer query.
> >
> > But in this case can't you check the 'root' compatible against certain
> > values and and some kind of quirk in the ->attach() hook to update the
> > ECC engine to the right one?
> >
> > Thanks,
> > Miquèl
>
> maybe I wrote the wrong thing, but what do you mean with root compatible.
> The controller is always NAND xway and I think there are 4 different
> NAND chip types. But the stock firmware just queries the NAND
> manufacturer name, not a date or type/model of the devices. E.g. 7490 can
> have Micron or Macronix NAND, but querying the Micron id or Macronix
> id requires a change to the xway driver?
> Do you mean something like in the first patch do query the manufacturer
> but just for all the 4?
>
> Thanks, Daniel.

You were talking about new boards, if, on each of these boards, you
know what is the NAND chip, you could retrieve the root compatible (by
root I mean the one at the root of the tree / { } in the DT) and
depending on it change the ECC engine that must be used from the driver
itself.


Thanks,
Miquèl

2021-08-26 05:29:35

by Daniel Kestrel

[permalink] [raw]
Subject: Re: [PATCH v2] mtd: rawnand: xway: No hardcoded ECC engine, use device tree setting

Hi Miquèl,

Am Mi., 25. Aug. 2021 um 17:31 Uhr schrieb Miquel Raynal
<[email protected]>:
>
> > > > > > > > thank you for your response.
> > > > > > > > If I remove the nand-ecc-xxx properties in the device tree, the device with
> > > > > > > > the Toshiba NAND chip is working. However, the device with the Micron
> > > > > > > > NAND fails with NO ECC functions supplied; hardware ECC not possible,
> > > > > > > > seems to be at line 5367 or equivalent.
> > > > > > > > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_base.c#L5367
> > > > > > > >
> > > > > > > > It looks like the micron nand driver supports on die only if its
> > > > > > > > specified int the
> > > > > > > > Device tree:
> > > > > > > > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_micron.c#L511
> > > > > > > > The Micron NAND driver probably needs to set the ECC type to ON DIE if the
> > > > > > > > variable ondie contains the supported attribute?!
> > > > > > >
> > > > > > > You're right but I don't see any easy upstream-able solution here.
> > > > > > > Changing the behavior in the Xway driver would certainly break users,
> > > > > > > changing the behavior in the Micron driver would certainly break even
> > > > > > > more users. The root cause being an absence of proper description (the
> > > > > > > integration changed). Honestly I feel stuck, maybe you can try to
> > > > > > > register your device, if it fails, change the integration in the driver
> > > > > > > (to an ondie ecc engine) then retry?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Miquèl
> > > > > >
> > > > > > Do you think adding something like below at the following location
> > > > > > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/xway_nand.c#L223
> > > > > > would be upstreamable (with or without device tree property?)?
> > > > > >
> > > > > > err = nand_scan(&data->chip, 1);
> > > > > > if (err /* && of_property_read_bool(np, "lantiq,retry-on-die") */) {
> > > > > > data->chip.ecc.engine_type = NAND_ECC_ENGINE_TYPE_ON_DIE;
> > > > > > err = nand_scan(&data->chip, 1);
> > > > > > if (err) return err;
> > > > > > }
> > > > > >
> > > > > > It still throws the kernel warning on first try, but the second try then works.
> > > > >
> > > > > Can you please remind me what is xway/lantiq/your setup/how public it
> > > > > is/who's using this driver?
> > > > >
> > > > > Thanks,
> > > > > Miquèl
> > > >
> > > > Its for Openwrt, I would like to add support for 3 more devices
> > > > AVM fritzbox 3490/5490 and 7490. They all have varying NAND chips.
> > > > I have initially created a PR to have my initial patch tested:
> > > > https://github.com/openwrt/openwrt/pull/4426
> > > > There is already one device supported which has two DTBs one for
> > > > Micron and one for non Micron (3370), but its not very straight forward.
> > > > Without having this issue solved, flashing those devices would be
> > > > possibly having issues depending on NAND chip or the awkward
> > > > workaround of flashing one image and if it does not boot, boot the
> > > > other one. Without self soldered serial console, it would not very
> > > > easy to figure out the NAND manufacturer.
> > > > The AVM stock firmware is old kernel and does not use device
> > > > tree for NAND, they just query all possible manufacturers and set
> > > > up NAND based on manufacturer query.
> > >
> > > But in this case can't you check the 'root' compatible against certain
> > > values and and some kind of quirk in the ->attach() hook to update the
> > > ECC engine to the right one?
> > >
> > > Thanks,
> > > Miquèl
> >
> > maybe I wrote the wrong thing, but what do you mean with root compatible.
> > The controller is always NAND xway and I think there are 4 different
> > NAND chip types. But the stock firmware just queries the NAND
> > manufacturer name, not a date or type/model of the devices. E.g. 7490 can
> > have Micron or Macronix NAND, but querying the Micron id or Macronix
> > id requires a change to the xway driver?
> > Do you mean something like in the first patch do query the manufacturer
> > but just for all the 4?
> >
> > Thanks, Daniel.
>
> You were talking about new boards, if, on each of these boards, you
> know what is the NAND chip, you could retrieve the root compatible (by
> root I mean the one at the root of the tree / { } in the DT) and
> depending on it change the ECC engine that must be used from the driver
> itself.
>
>
> Thanks,
> Miquèl

The boards are not supported yet in OpenWrt and there is the
existing fritzbox 3370. Fritzbox 7490 first manufacture date was 2013.
As I stated the 7490 can have either Micron or Macronix NAND with
no indicator. I was just hoping there is a dynamic way to support
both NAND chips without having to build two different images with two
different DTBs. For this case, DTB is too static.

Thanks, Daniel.

2021-09-17 06:22:39

by Jan Hoffmann

[permalink] [raw]
Subject: Re: [PATCH v2] mtd: rawnand: xway: No hardcoded ECC engine, use device tree setting

Hello,

Configuration of the ECC engine type using device tree has actually
worked before. I am using OpenWrt on a Fritzbox 7362 SL, which has a
Micron 29F1G08ABADA flash chip. The bootloader of the device uses on-die
ECC, so that has to be used for Linux as well. It is configured in DTS
using "nand-ecc-mode = "on-die";". This worked fine with kernel 5.4.
However, after switching to kernel 5.10 it is ignored and software ECC
is used instead.

If I understand this correctly, the situation is as follows:

Originally, xway-nand did set defaults for ECC mode and algorithm, but
different values could be configured using device tree.

Commit d7157ff49a5b ("mtd: rawnand: Use the ECC framework user input
parsing bits") broke these default values, as the ECC algorithm is now
unconditionally set from the user configuration in rawnand_dt_init.
Previously, the default value was only overwritten if the device tree
actually contained a value.

This is fixed in d525914b5bd8 ("mtd: rawnand: xway: Move the ECC
initialization to ->attach_chip()"). However, this makes it impossible
to configure the ECC engine type in the device tree, as it is now
overwritten by the default value in xway_attach_chip.

I am not sure if this patch is the best approach for fixing this, as it
would again cause breakage for anyone who relies on the existing
default value. And this kind of breakage seems to have been the reason
for moving the default values to attach_chip in the first place (see
https://lore.kernel.org/lkml/20201105084939.72ea6bfd@xps13/ ).

As similar changes were applied to other NAND drivers, the same issue
probably also exists there. Maybe it makes sense to add a proper fix
for all of them?

Thanks,
Jan

2021-09-18 01:08:05

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH v2] mtd: rawnand: xway: No hardcoded ECC engine, use device tree setting

Hi Jan,

[email protected] wrote on Thu, 16 Sep 2021 21:38:26 +0200:

> Hello,
>
> Configuration of the ECC engine type using device tree has actually
> worked before. I am using OpenWrt on a Fritzbox 7362 SL, which has a
> Micron 29F1G08ABADA flash chip. The bootloader of the device uses on-die
> ECC, so that has to be used for Linux as well. It is configured in DTS
> using "nand-ecc-mode = "on-die";". This worked fine with kernel 5.4.
> However, after switching to kernel 5.10 it is ignored and software ECC
> is used instead.
>
> If I understand this correctly, the situation is as follows:
>
> Originally, xway-nand did set defaults for ECC mode and algorithm, but
> different values could be configured using device tree.
>
> Commit d7157ff49a5b ("mtd: rawnand: Use the ECC framework user input
> parsing bits") broke these default values, as the ECC algorithm is now
> unconditionally set from the user configuration in rawnand_dt_init.
> Previously, the default value was only overwritten if the device tree
> actually contained a value.
>
> This is fixed in d525914b5bd8 ("mtd: rawnand: xway: Move the ECC
> initialization to ->attach_chip()"). However, this makes it impossible
> to configure the ECC engine type in the device tree, as it is now
> overwritten by the default value in xway_attach_chip.
>
> I am not sure if this patch is the best approach for fixing this, as it
> would again cause breakage for anyone who relies on the existing
> default value. And this kind of breakage seems to have been the reason
> for moving the default values to attach_chip in the first place (see
> https://lore.kernel.org/lkml/20201105084939.72ea6bfd@xps13/ ).
>
> As similar changes were applied to other NAND drivers, the same issue
> probably also exists there. Maybe it makes sense to add a proper fix
> for all of them?

I am not sure to understand your message as answer to this thread.
There are two problems here:
1/ The DT values not being taken into account
2/ Kestrel's issue with two different integrations with no way to
distinguish between them.

1/ Has already been fixed (at least that is what I think)
2/ Cannot easily be fixed and I don't think there is anything we can do
besides the manufacturer "fixing" the board description.

Am I missing something?

Thanks,
Miquèl

2021-09-18 07:06:07

by Jan Hoffmann

[permalink] [raw]
Subject: Re: [PATCH v2] mtd: rawnand: xway: No hardcoded ECC engine, use device tree setting

Hi Miquèl,

> I am not sure to understand your message as answer to this thread.
> There are two problems here:
> 1/ The DT values not being taken into account

This is the issue I'm referring to.

> 1/ Has already been fixed (at least that is what I think)

This is not fixed in kernel 5.10. And unless I am missing something
there also doesn't seem be a fix in more recent kernels.

I am aware of commit 33d974e76e21 ("mtd: rawnand: xway: Do not force a
particular software ECC engine"), but that only fixes the ECC
algorithm. The ECC engine type is still hard-coded in xway_attach_chip.

Thanks,
Jan

2021-09-18 07:54:59

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH v2] mtd: rawnand: xway: No hardcoded ECC engine, use device tree setting

Hi Jan,

[email protected] wrote on Fri, 17 Sep 2021 19:45:33 +0200:

> Hi Miquèl,
>
> > I am not sure to understand your message as answer to this thread.
> > There are two problems here:
> > 1/ The DT values not being taken into account
>
> This is the issue I'm referring to.
>
> > 1/ Has already been fixed (at least that is what I think)
>
> This is not fixed in kernel 5.10. And unless I am missing something
> there also doesn't seem be a fix in more recent kernels.
>
> I am aware of commit 33d974e76e21 ("mtd: rawnand: xway: Do not force a
> particular software ECC engine"), but that only fixes the ECC
> algorithm. The ECC engine type is still hard-coded in xway_attach_chip.

Yes this was my understanding, that only software ECC engine was
supported. The mainline driver shows absolutely no signs of hardware
ECC engine support.

Perhaps however you mean that on-die ECC engine are not supported
anymore because of the engine_type being set in attach_chip()? If yes
then indeed there is something to do, perhaps checking if an engine has
been already set is enough? You can try something like:

if (engine_type == unknown)
engine_type = soft;

Thanks,
Miquèl

2021-09-19 11:32:34

by Jan Hoffmann

[permalink] [raw]
Subject: Re: [PATCH v2] mtd: rawnand: xway: No hardcoded ECC engine, use device tree setting

Hi Miquèl,

> Yes this was my understanding, that only software ECC engine was
> supported. The mainline driver shows absolutely no signs of hardware
> ECC engine support.
>
> Perhaps however you mean that on-die ECC engine are not supported
> anymore because of the engine_type being set in attach_chip()?

Yes, this is exactly the issue.

> If yes then indeed there is something to do, perhaps checking if an
> engine has been already set is enough? You can try something like:
>
> if (engine_type == unknown)
> engine_type = soft;

Checking for NAND_ECC_ENGINE_TYPE_INVALID doesn't work, as the engine
type is already set to NAND_ECC_ENGINE_TYPE_ON_HOST by rawnand_dt_init.
The code there seems to expect that chip->ecc.engine_type contains the
default value, which is no longer the case after commit 525914b5bd8
("mtd: rawnand: xway: Move the ECC initialization to ->attach_chip()").

The following in attach_chip works:

if (chip->ecc.engine_type == NAND_ECC_ENGINE_TYPE_ON_HOST)
chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_SOFT;

However, this will also silently use the software ECC engine if anyone
actually configures the on-host hardware ECC engine in the device tree
(which is of course unsupported for xway).

Thanks,
Jan

2021-09-27 16:34:17

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH v2] mtd: rawnand: xway: No hardcoded ECC engine, use device tree setting

Hi Jan,

Sorry for the delay, I had to rethink about the problem first.

[email protected] wrote on Sat, 18 Sep 2021 23:26:48 +0200:

> Hi Miquèl,
>
> > Yes this was my understanding, that only software ECC engine was
> > supported. The mainline driver shows absolutely no signs of hardware
> > ECC engine support.
> >
> > Perhaps however you mean that on-die ECC engine are not supported
> > anymore because of the engine_type being set in attach_chip()?
>
> Yes, this is exactly the issue.
>
> > If yes then indeed there is something to do, perhaps checking if an
> > engine has been already set is enough? You can try something like:
> >
> > if (engine_type == unknown)
> > engine_type = soft;
>
> Checking for NAND_ECC_ENGINE_TYPE_INVALID doesn't work, as the engine
> type is already set to NAND_ECC_ENGINE_TYPE_ON_HOST by rawnand_dt_init.
> The code there seems to expect that chip->ecc.engine_type contains the
> default value, which is no longer the case after commit 525914b5bd8
> ("mtd: rawnand: xway: Move the ECC initialization to ->attach_chip()").

This is indeed an issue and should be fixed. However the solution is I
think already available in the raw NAND core (but I had to dig into the
code again to remember how I wanted to handle this).

> The following in attach_chip works:
>
> if (chip->ecc.engine_type == NAND_ECC_ENGINE_TYPE_ON_HOST)
> chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_SOFT;

If you look at the rawnand_dt_init() function [1] the logic is:

Is the user configuration (DT) valid ? if yes take it.
Is the NAND controller driver setting a default mode? if yes take it.
Otherwise take the raw NAND subsystem default: ON_HOST.

This logic happens very soon in the init process so the default mode
for the driver should be provided before the nand_scan() call.

So what I propose is:
- in the driver probe: set the default to software ECC
- in the attach() hook: drop the line setting the engine_type to SOFT.

Please check the below diff and tell me if it works for you. I'll then
propose a wider series fixing the other drivers as well.

> However, this will also silently use the software ECC engine if anyone
> actually configures the on-host hardware ECC engine in the device tree
> (which is of course unsupported for xway).
>
> Thanks,
> Jan


[1] https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_base.c#L5328

Thanks,
Miquèl

---
diff --git a/drivers/mtd/nand/raw/xway_nand.c b/drivers/mtd/nand/raw/xway_nand.c
index 26751976e502..d9e167dbb717 100644
--- a/drivers/mtd/nand/raw/xway_nand.c
+++ b/drivers/mtd/nand/raw/xway_nand.c
@@ -148,9 +148,8 @@ static void xway_write_buf(struct nand_chip *chip, const u_char *buf, int len)

static int xway_attach_chip(struct nand_chip *chip)
{
- chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_SOFT;
-
- if (chip->ecc.algo == NAND_ECC_ALGO_UNKNOWN)
+ if (chip->ecc.engine_type == NAND_ECC_ENGINE_TYPE_SOFT &&
+ chip->ecc.algo == NAND_ECC_ALGO_UNKNOWN)
chip->ecc.algo = NAND_ECC_ALGO_HAMMING;

return 0;
@@ -219,6 +218,14 @@ static int xway_nand_probe(struct platform_device *pdev)
| NAND_CON_SE_P | NAND_CON_WP_P | NAND_CON_PRE_P
| cs_flag, EBU_NAND_CON);

+ /*
+ * This controller has no hardware ECC engine and the driver assumes
+ * that the default ECC engine should be TYPE_SOFT. Set ->engine_type
+ * before the NAND registration in order to provide a default value.
+ */
+ data->chip.ecc.engine_type = NAND_ECC_ENGINE_TYPE_SOFT;
+
+
/* Scan to find existence of the device */
err = nand_scan(&data->chip, 1);
if (err)

2021-09-27 20:36:09

by Jan Hoffmann

[permalink] [raw]
Subject: Re: [PATCH v2] mtd: rawnand: xway: No hardcoded ECC engine, use device tree setting

Hi Miquèl,

> So what I propose is:
> - in the driver probe: set the default to software ECC
> - in the attach() hook: drop the line setting the engine_type to SOFT.
>
> Please check the below diff and tell me if it works for you. I'll then
> propose a wider series fixing the other drivers as well.

I can confirm that your patch fixes the issue.

I tested using kernel 5.10 on a Fritzbox 7362 SL (with on-die ECC
configured via device tree which was broken before). I also checked on
a Fritzbox 7412 (this device uses software ECC). Both are working with
this patch.

Thanks,
Jan

2021-09-28 08:53:18

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH v2] mtd: rawnand: xway: No hardcoded ECC engine, use device tree setting

Hi Jan,

[email protected] wrote on Mon, 27 Sep 2021 22:32:13 +0200:

> Hi Miquèl,
>
> > So what I propose is:
> > - in the driver probe: set the default to software ECC
> > - in the attach() hook: drop the line setting the engine_type to SOFT.
> >
> > Please check the below diff and tell me if it works for you. I'll then
> > propose a wider series fixing the other drivers as well.
>
> I can confirm that your patch fixes the issue.
>
> I tested using kernel 5.10 on a Fritzbox 7362 SL (with on-die ECC
> configured via device tree which was broken before). I also checked on
> a Fritzbox 7412 (this device uses software ECC). Both are working with
> this patch.

Great, thanks for the quick feedback.

Cheers,
Miquèl