2024-04-17 07:13:54

by Sascha Hauer

[permalink] [raw]
Subject: [PATCH 3/4] mtd: nand: mxc_nand: support software ECC

To support software ECC we still need the driver provided read_oob,
read_page_raw and write_page_raw ops, so set them unconditionally
no matter which engine_type we use. The OOB layout on the other hand
represents the layout the i.MX ECC hardware uses, so set this only
when NAND_ECC_ENGINE_TYPE_ON_HOST is in use.

With these changes the driver can be used with software BCH ECC which
is useful for NAND chips that require a stronger ECC than the i.MX
hardware supports.

Signed-off-by: Sascha Hauer <[email protected]>
---
drivers/mtd/nand/raw/mxc_nand.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/mtd/nand/raw/mxc_nand.c b/drivers/mtd/nand/raw/mxc_nand.c
index fc70c65dea268..f44c130dca18d 100644
--- a/drivers/mtd/nand/raw/mxc_nand.c
+++ b/drivers/mtd/nand/raw/mxc_nand.c
@@ -1394,15 +1394,16 @@ static int mxcnd_attach_chip(struct nand_chip *chip)
chip->ecc.bytes = host->devtype_data->eccbytes;
host->eccsize = host->devtype_data->eccsize;
chip->ecc.size = 512;
- mtd_set_ooblayout(mtd, host->devtype_data->ooblayout);
+
+ chip->ecc.read_oob = mxc_nand_read_oob;
+ chip->ecc.read_page_raw = mxc_nand_read_page_raw;
+ chip->ecc.write_page_raw = mxc_nand_write_page_raw;

switch (chip->ecc.engine_type) {
case NAND_ECC_ENGINE_TYPE_ON_HOST:
+ mtd_set_ooblayout(mtd, host->devtype_data->ooblayout);
chip->ecc.read_page = mxc_nand_read_page;
- chip->ecc.read_page_raw = mxc_nand_read_page_raw;
- chip->ecc.read_oob = mxc_nand_read_oob;
chip->ecc.write_page = mxc_nand_write_page_ecc;
- chip->ecc.write_page_raw = mxc_nand_write_page_raw;
chip->ecc.write_oob = mxc_nand_write_oob;
break;


--
2.39.2



2024-05-06 14:05:28

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH 3/4] mtd: nand: mxc_nand: support software ECC

Hi Sascha,

[email protected] wrote on Wed, 17 Apr 2024 09:13:30 +0200:

> To support software ECC we still need the driver provided read_oob,
> read_page_raw and write_page_raw ops, so set them unconditionally
> no matter which engine_type we use. The OOB layout on the other hand
> represents the layout the i.MX ECC hardware uses, so set this only
> when NAND_ECC_ENGINE_TYPE_ON_HOST is in use.
>
> With these changes the driver can be used with software BCH ECC which
> is useful for NAND chips that require a stronger ECC than the i.MX
> hardware supports.
>
> Signed-off-by: Sascha Hauer <[email protected]>
> ---
> drivers/mtd/nand/raw/mxc_nand.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/mtd/nand/raw/mxc_nand.c b/drivers/mtd/nand/raw/mxc_nand.c
> index fc70c65dea268..f44c130dca18d 100644
> --- a/drivers/mtd/nand/raw/mxc_nand.c
> +++ b/drivers/mtd/nand/raw/mxc_nand.c
> @@ -1394,15 +1394,16 @@ static int mxcnd_attach_chip(struct nand_chip *chip)
> chip->ecc.bytes = host->devtype_data->eccbytes;
> host->eccsize = host->devtype_data->eccsize;
> chip->ecc.size = 512;
> - mtd_set_ooblayout(mtd, host->devtype_data->ooblayout);
> +
> + chip->ecc.read_oob = mxc_nand_read_oob;
> + chip->ecc.read_page_raw = mxc_nand_read_page_raw;
> + chip->ecc.write_page_raw = mxc_nand_write_page_raw;
>
> switch (chip->ecc.engine_type) {
> case NAND_ECC_ENGINE_TYPE_ON_HOST:
> + mtd_set_ooblayout(mtd, host->devtype_data->ooblayout);
> chip->ecc.read_page = mxc_nand_read_page;
> - chip->ecc.read_page_raw = mxc_nand_read_page_raw;
> - chip->ecc.read_oob = mxc_nand_read_oob;
> chip->ecc.write_page = mxc_nand_write_page_ecc;
> - chip->ecc.write_page_raw = mxc_nand_write_page_raw;
> chip->ecc.write_oob = mxc_nand_write_oob;
> break;

You also need to disable the ECC engine by default (and then you're
free to use the raw page helpers).

I thought patch 4 was needed for this patch to work, do you confirm?
Otherwise with the little change requested I might also merge this one.

Thanks, Miquèl

2024-05-06 15:51:19

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH 3/4] mtd: nand: mxc_nand: support software ECC

Hi Miquel,

[email protected] wrote on Mon, 6 May 2024 16:05:08 +0200:

> Hi Sascha,
>
> [email protected] wrote on Wed, 17 Apr 2024 09:13:30 +0200:
>
> > To support software ECC we still need the driver provided read_oob,
> > read_page_raw and write_page_raw ops, so set them unconditionally
> > no matter which engine_type we use. The OOB layout on the other hand
> > represents the layout the i.MX ECC hardware uses, so set this only
> > when NAND_ECC_ENGINE_TYPE_ON_HOST is in use.
> >
> > With these changes the driver can be used with software BCH ECC which
> > is useful for NAND chips that require a stronger ECC than the i.MX
> > hardware supports.
> >
> > Signed-off-by: Sascha Hauer <[email protected]>
> > ---
> > drivers/mtd/nand/raw/mxc_nand.c | 9 +++++----
> > 1 file changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/mtd/nand/raw/mxc_nand.c b/drivers/mtd/nand/raw/mxc_nand.c
> > index fc70c65dea268..f44c130dca18d 100644
> > --- a/drivers/mtd/nand/raw/mxc_nand.c
> > +++ b/drivers/mtd/nand/raw/mxc_nand.c
> > @@ -1394,15 +1394,16 @@ static int mxcnd_attach_chip(struct nand_chip *chip)
> > chip->ecc.bytes = host->devtype_data->eccbytes;
> > host->eccsize = host->devtype_data->eccsize;
> > chip->ecc.size = 512;
> > - mtd_set_ooblayout(mtd, host->devtype_data->ooblayout);
> > +
> > + chip->ecc.read_oob = mxc_nand_read_oob;
> > + chip->ecc.read_page_raw = mxc_nand_read_page_raw;
> > + chip->ecc.write_page_raw = mxc_nand_write_page_raw;

A second thought on this. Maybe you should consider keeping these for
on-host operations only.

The read/write_page_raw operations are supposed to detangle the data
organization to show a proper [all data][all oob] organization to the
user. But of course if the data is stored differently when using
software ECC, you'll expect the implementation to be different (and the
core provides such helpers, even though in your case they use RNDOUT
which is not yet supported).

> >
> > switch (chip->ecc.engine_type) {
> > case NAND_ECC_ENGINE_TYPE_ON_HOST:
> > + mtd_set_ooblayout(mtd, host->devtype_data->ooblayout);
> > chip->ecc.read_page = mxc_nand_read_page;
> > - chip->ecc.read_page_raw = mxc_nand_read_page_raw;
> > - chip->ecc.read_oob = mxc_nand_read_oob;
> > chip->ecc.write_page = mxc_nand_write_page_ecc;
> > - chip->ecc.write_page_raw = mxc_nand_write_page_raw;
> > chip->ecc.write_oob = mxc_nand_write_oob;
> > break;
>
> You also need to disable the ECC engine by default (and then you're
> free to use the raw page helpers).
>
> I thought patch 4 was needed for this patch to work, do you confirm?
> Otherwise with the little change requested I might also merge this one.
>
> Thanks, Miquèl


2024-05-07 07:13:05

by Sascha Hauer

[permalink] [raw]
Subject: Re: [PATCH 3/4] mtd: nand: mxc_nand: support software ECC

On Mon, May 06, 2024 at 05:51:06PM +0200, Miquel Raynal wrote:
> Hi Miquel,
>
> [email protected] wrote on Mon, 6 May 2024 16:05:08 +0200:
>
> > Hi Sascha,
> >
> > [email protected] wrote on Wed, 17 Apr 2024 09:13:30 +0200:
> >
> > > To support software ECC we still need the driver provided read_oob,
> > > read_page_raw and write_page_raw ops, so set them unconditionally
> > > no matter which engine_type we use. The OOB layout on the other hand
> > > represents the layout the i.MX ECC hardware uses, so set this only
> > > when NAND_ECC_ENGINE_TYPE_ON_HOST is in use.
> > >
> > > With these changes the driver can be used with software BCH ECC which
> > > is useful for NAND chips that require a stronger ECC than the i.MX
> > > hardware supports.
> > >
> > > Signed-off-by: Sascha Hauer <[email protected]>
> > > ---
> > > drivers/mtd/nand/raw/mxc_nand.c | 9 +++++----
> > > 1 file changed, 5 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/mtd/nand/raw/mxc_nand.c b/drivers/mtd/nand/raw/mxc_nand.c
> > > index fc70c65dea268..f44c130dca18d 100644
> > > --- a/drivers/mtd/nand/raw/mxc_nand.c
> > > +++ b/drivers/mtd/nand/raw/mxc_nand.c
> > > @@ -1394,15 +1394,16 @@ static int mxcnd_attach_chip(struct nand_chip *chip)
> > > chip->ecc.bytes = host->devtype_data->eccbytes;
> > > host->eccsize = host->devtype_data->eccsize;
> > > chip->ecc.size = 512;
> > > - mtd_set_ooblayout(mtd, host->devtype_data->ooblayout);
> > > +
> > > + chip->ecc.read_oob = mxc_nand_read_oob;
> > > + chip->ecc.read_page_raw = mxc_nand_read_page_raw;
> > > + chip->ecc.write_page_raw = mxc_nand_write_page_raw;
>
> A second thought on this. Maybe you should consider keeping these for
> on-host operations only.
>
> The read/write_page_raw operations are supposed to detangle the data
> organization to show a proper [all data][all oob] organization to the
> user.

Let me take one step back. The organisation in the raw NAND is like this
when using hardware ECC:

[512b data0][16b oob0][512b data1][16b oob1][512b data2][16b oob2][512b data3][16b oob3]

For a standard 2k+64b NAND. The read/write_page_raw operations detangle
this and present the data to the user like this:

[2048b data][64b OOB]

Is this the correct behaviour or should that be changed?
(Side note: The GPMI NAND driver behaves differently here. It has the
same interleaved organisation on the chip and also presents the same
interleaved organisation to the user when using read_page_raw)


With my current approach for software ECC the same layout is used on the
NAND chip. It would interleave the data with the OOB on the NAND chip
and, since using the same read/write_page_raw operations, also presents
[2048b data][64b OOB] to the user.

This works fine currently, but means that NAND_CMD_RNDOUT can't be used.
Using NAND_CMD_RNDOUT to position the cursor at offset 512b for example
doesn't give you the second subpage, but instead oob0. Positioning the
cursor at offset 2048 doesn't give you the start of OOB, but some
position in the middle of data3.

Ok, NAND_CMD_RNDOUT can't be used for hardware ECC and there's no way
around it. For software ECC we could change the organisation in the chip
to be [2048b data][64b oob]. With that NAND_CMD_RNDOUT then could be
used with software ECC.

You say that NAND_CMD_RNDOUT is a basic command that is supported by all
controllers, and yes, it is also supported with the mxc_nand controller.
You just can't control how many bytes are transferred between the NAND
chip and the controller. When using NAND_CMD_RNDOUT to read a few bytes
at a certain page offset we'll end up reading 512 bytes discarding most
of it. For the next ECC block we would move the cursor forward using
another NAND_CMD_RNDOUT command, again read 512 bytes and discard most
it (altough the desired data would have been in the first read already).

So I think NAND_CMD_RNDOUT should really be avoided for this controller,
eventhough we might be able to support it.

Sascha

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2024-05-07 07:46:15

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH 3/4] mtd: nand: mxc_nand: support software ECC

Hi Sascha,

[email protected] wrote on Tue, 7 May 2024 09:12:30 +0200:

> On Mon, May 06, 2024 at 05:51:06PM +0200, Miquel Raynal wrote:
> > Hi Miquel,
> >
> > [email protected] wrote on Mon, 6 May 2024 16:05:08 +0200:
> >
> > > Hi Sascha,
> > >
> > > [email protected] wrote on Wed, 17 Apr 2024 09:13:30 +0200:
> > >
> > > > To support software ECC we still need the driver provided read_oob,
> > > > read_page_raw and write_page_raw ops, so set them unconditionally
> > > > no matter which engine_type we use. The OOB layout on the other hand
> > > > represents the layout the i.MX ECC hardware uses, so set this only
> > > > when NAND_ECC_ENGINE_TYPE_ON_HOST is in use.
> > > >
> > > > With these changes the driver can be used with software BCH ECC which
> > > > is useful for NAND chips that require a stronger ECC than the i.MX
> > > > hardware supports.
> > > >
> > > > Signed-off-by: Sascha Hauer <[email protected]>
> > > > ---
> > > > drivers/mtd/nand/raw/mxc_nand.c | 9 +++++----
> > > > 1 file changed, 5 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/mtd/nand/raw/mxc_nand.c b/drivers/mtd/nand/raw/mxc_nand.c
> > > > index fc70c65dea268..f44c130dca18d 100644
> > > > --- a/drivers/mtd/nand/raw/mxc_nand.c
> > > > +++ b/drivers/mtd/nand/raw/mxc_nand.c
> > > > @@ -1394,15 +1394,16 @@ static int mxcnd_attach_chip(struct nand_chip *chip)
> > > > chip->ecc.bytes = host->devtype_data->eccbytes;
> > > > host->eccsize = host->devtype_data->eccsize;
> > > > chip->ecc.size = 512;
> > > > - mtd_set_ooblayout(mtd, host->devtype_data->ooblayout);
> > > > +
> > > > + chip->ecc.read_oob = mxc_nand_read_oob;
> > > > + chip->ecc.read_page_raw = mxc_nand_read_page_raw;
> > > > + chip->ecc.write_page_raw = mxc_nand_write_page_raw;
> >
> > A second thought on this. Maybe you should consider keeping these for
> > on-host operations only.
> >
> > The read/write_page_raw operations are supposed to detangle the data
> > organization to show a proper [all data][all oob] organization to the
> > user.
>
> Let me take one step back. The organisation in the raw NAND is like this
> when using hardware ECC:
>
> [512b data0][16b oob0][512b data1][16b oob1][512b data2][16b oob2][512b data3][16b oob3]
>
> For a standard 2k+64b NAND. The read/write_page_raw operations detangle
> this and present the data to the user like this:
>
> [2048b data][64b OOB]
>
> Is this the correct behaviour or should that be changed?

I believe so, yes.

> (Side note: The GPMI NAND driver behaves differently here. It has the
> same interleaved organisation on the chip and also presents the same
> interleaved organisation to the user when using read_page_raw)

I'd say the GPMI driver is wrong?

> With my current approach for software ECC the same layout is used on the
> NAND chip. It would interleave the data with the OOB on the NAND chip
> and, since using the same read/write_page_raw operations, also presents
> [2048b data][64b OOB] to the user.

No need, I believe the only reason for interleaving is that your
hardware ECC engine works like that (writes the ECC bytes slightly
after each chunk of data). So if you don't use on-host hardware ECC,
you don't need to deal with this data layout.

> This works fine currently, but means that NAND_CMD_RNDOUT can't be used.
> Using NAND_CMD_RNDOUT to position the cursor at offset 512b for example
> doesn't give you the second subpage, but instead oob0. Positioning the
> cursor at offset 2048 doesn't give you the start of OOB, but some
> position in the middle of data3.
>
> Ok, NAND_CMD_RNDOUT can't be used for hardware ECC and there's no way
> around it. For software ECC we could change the organisation in the chip
> to be [2048b data][64b oob]. With that NAND_CMD_RNDOUT then could be
> used with software ECC.
>
> You say that NAND_CMD_RNDOUT is a basic command that is supported by all
> controllers, and yes, it is also supported with the mxc_nand controller.
> You just can't control how many bytes are transferred between the NAND
> chip and the controller. When using NAND_CMD_RNDOUT to read a few bytes
> at a certain page offset we'll end up reading 512 bytes discarding most
> of it. For the next ECC block we would move the cursor forward using
> another NAND_CMD_RNDOUT command, again read 512 bytes and discard most
> it (altough the desired data would have been in the first read already).

I'm not sure the controller limitations are so bad in this case. The
core helpers (using the same example) will ask for:
- 512b at offset 0
- 512b at offset 512...
- and finally 64b at offset 2048.
In practice it does not look like a huge drawback? I don't understand
in which case so much data would be read and then discarded?

> So I think NAND_CMD_RNDOUT should really be avoided for this controller,
> eventhough we might be able to support it.

I also mentioned the monolithic accessors which try to avoid these
random column changes. You probably want to check them out, they might
just avoid the need for NAND_CMD_RNDOUT by forcing full page accesses
directly. The reason why they were introduced is not exactly our
current use case, but it feels like they might be handy.

658beb663960 ("mtd: rawnand: Expose monolithic read/write_page_raw() helpers")
0e7f4b64ea46 ("mtd: rawnand: Allow controllers to overload soft ECC hooks")

Thanks,
Miquèl

2024-05-07 11:00:25

by Sascha Hauer

[permalink] [raw]
Subject: Re: [PATCH 3/4] mtd: nand: mxc_nand: support software ECC

On Tue, May 07, 2024 at 09:45:38AM +0200, Miquel Raynal wrote:
> Hi Sascha,
>
> [email protected] wrote on Tue, 7 May 2024 09:12:30 +0200:
>
> > On Mon, May 06, 2024 at 05:51:06PM +0200, Miquel Raynal wrote:
> > > Hi Miquel,
> > >
> > > [email protected] wrote on Mon, 6 May 2024 16:05:08 +0200:
> > >
> > > > Hi Sascha,
> > > >
> > > > [email protected] wrote on Wed, 17 Apr 2024 09:13:30 +0200:
> > > >
> > > > > To support software ECC we still need the driver provided read_oob,
> > > > > read_page_raw and write_page_raw ops, so set them unconditionally
> > > > > no matter which engine_type we use. The OOB layout on the other hand
> > > > > represents the layout the i.MX ECC hardware uses, so set this only
> > > > > when NAND_ECC_ENGINE_TYPE_ON_HOST is in use.
> > > > >
> > > > > With these changes the driver can be used with software BCH ECC which
> > > > > is useful for NAND chips that require a stronger ECC than the i.MX
> > > > > hardware supports.
> > > > >
> > > > > Signed-off-by: Sascha Hauer <[email protected]>
> > > > > ---
> > > > > drivers/mtd/nand/raw/mxc_nand.c | 9 +++++----
> > > > > 1 file changed, 5 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/mtd/nand/raw/mxc_nand.c b/drivers/mtd/nand/raw/mxc_nand.c
> > > > > index fc70c65dea268..f44c130dca18d 100644
> > > > > --- a/drivers/mtd/nand/raw/mxc_nand.c
> > > > > +++ b/drivers/mtd/nand/raw/mxc_nand.c
> > > > > @@ -1394,15 +1394,16 @@ static int mxcnd_attach_chip(struct nand_chip *chip)
> > > > > chip->ecc.bytes = host->devtype_data->eccbytes;
> > > > > host->eccsize = host->devtype_data->eccsize;
> > > > > chip->ecc.size = 512;
> > > > > - mtd_set_ooblayout(mtd, host->devtype_data->ooblayout);
> > > > > +
> > > > > + chip->ecc.read_oob = mxc_nand_read_oob;
> > > > > + chip->ecc.read_page_raw = mxc_nand_read_page_raw;
> > > > > + chip->ecc.write_page_raw = mxc_nand_write_page_raw;
> > >
> > > A second thought on this. Maybe you should consider keeping these for
> > > on-host operations only.
> > >
> > > The read/write_page_raw operations are supposed to detangle the data
> > > organization to show a proper [all data][all oob] organization to the
> > > user.
> >
> > Let me take one step back. The organisation in the raw NAND is like this
> > when using hardware ECC:
> >
> > [512b data0][16b oob0][512b data1][16b oob1][512b data2][16b oob2][512b data3][16b oob3]
> >
> > For a standard 2k+64b NAND. The read/write_page_raw operations detangle
> > this and present the data to the user like this:
> >
> > [2048b data][64b OOB]
> >
> > Is this the correct behaviour or should that be changed?
>
> I believe so, yes.
>
> > (Side note: The GPMI NAND driver behaves differently here. It has the
> > same interleaved organisation on the chip and also presents the same
> > interleaved organisation to the user when using read_page_raw)
>
> I'd say the GPMI driver is wrong?
>
> > With my current approach for software ECC the same layout is used on the
> > NAND chip. It would interleave the data with the OOB on the NAND chip
> > and, since using the same read/write_page_raw operations, also presents
> > [2048b data][64b OOB] to the user.
>
> No need, I believe the only reason for interleaving is that your
> hardware ECC engine works like that (writes the ECC bytes slightly
> after each chunk of data). So if you don't use on-host hardware ECC,
> you don't need to deal with this data layout.

Right, I could use a different layout for software ECC. Using the same
layout for both hardware and software ECC is just quite convenient as
the same mxc_nand_read_page_raw()/mxc_nand_write_page_raw() could be
used for both software and hardware ECC.

Another thing that might be worth considering is that if we use
different functions for raw read/write page is that we would get
different views on the same raw page data if we switch from software to
hardware ECC or the other way round which might be confusing.

>
> > This works fine currently, but means that NAND_CMD_RNDOUT can't be used.
> > Using NAND_CMD_RNDOUT to position the cursor at offset 512b for example
> > doesn't give you the second subpage, but instead oob0. Positioning the
> > cursor at offset 2048 doesn't give you the start of OOB, but some
> > position in the middle of data3.
> >
> > Ok, NAND_CMD_RNDOUT can't be used for hardware ECC and there's no way
> > around it. For software ECC we could change the organisation in the chip
> > to be [2048b data][64b oob]. With that NAND_CMD_RNDOUT then could be
> > used with software ECC.
> >
> > You say that NAND_CMD_RNDOUT is a basic command that is supported by all
> > controllers, and yes, it is also supported with the mxc_nand controller.
> > You just can't control how many bytes are transferred between the NAND
> > chip and the controller. When using NAND_CMD_RNDOUT to read a few bytes
> > at a certain page offset we'll end up reading 512 bytes discarding most
> > of it. For the next ECC block we would move the cursor forward using
> > another NAND_CMD_RNDOUT command, again read 512 bytes and discard most
> > it (altough the desired data would have been in the first read already).
>
> I'm not sure the controller limitations are so bad in this case. The
> core helpers (using the same example) will ask for:
> - 512b at offset 0
> - 512b at offset 512...
> - and finally 64b at offset 2048.
> In practice it does not look like a huge drawback? I don't understand
> in which case so much data would be read and then discarded?

Yes, you're right. I misread the code and thought the core would read
the ECC separately for each subpage. In fact it doesn't do so, the ECC
is always read in one go even for multiple subpages.

>
> > So I think NAND_CMD_RNDOUT should really be avoided for this controller,
> > eventhough we might be able to support it.
>
> I also mentioned the monolithic accessors which try to avoid these
> random column changes. You probably want to check them out, they might
> just avoid the need for NAND_CMD_RNDOUT by forcing full page accesses
> directly. The reason why they were introduced is not exactly our
> current use case, but it feels like they might be handy.
>
> 658beb663960 ("mtd: rawnand: Expose monolithic read/write_page_raw() helpers")
> 0e7f4b64ea46 ("mtd: rawnand: Allow controllers to overload soft ECC hooks")

Yes, I already make use of 0e7f4b64ea46. My problem is only the ecc.read_subpage
hook which can't be overwritten and AFAIK this is the only way
NAND_CMD_RNDOUT might be used in my case.

I think my favourite solution would be to:

- store data/OOB interleaved for both hardware and software ECC
- For software ECC use a similar OOB layout as used with hardware
ECC. This allows us to read a subpage including its ECC data in
a single step (just like with hardware ECC the controller just
reads 512b + 16b for each subpage)
- Allow to disable subpage reads in the NAND core

As a further optimisation we could make ecc.read_subpage overwritable
for ecc->engine_type == NAND_ECC_ENGINE_TYPE_SOFT && ecc->algo ==
NAND_ECC_ALGO_BCH. With the OOB layout described above that would be
easily implementable with the mxc_nand controller.

What do you think?

If you insist I would go the path of making NAND_CMD_RNDOUT work for
software ECC, although I think it would cause me extra work with no
clear gain for me.

Sascha

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2024-05-07 14:09:32

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH 3/4] mtd: nand: mxc_nand: support software ECC

Hi Sascha,

> > No need, I believe the only reason for interleaving is that your
> > hardware ECC engine works like that (writes the ECC bytes slightly
> > after each chunk of data). So if you don't use on-host hardware ECC,
> > you don't need to deal with this data layout.
>
> Right, I could use a different layout for software ECC. Using the same
> layout for both hardware and software ECC is just quite convenient as
> the same mxc_nand_read_page_raw()/mxc_nand_write_page_raw() could be
> used for both software and hardware ECC.

I'm not sure I see why it would be more convenient, as you basically
don't need to provide anything if you use software ECC engines besides
a minimal exec_op() implementation. Anyway, that's clearly not the good
approach for software ECC: the engine decides where it wants to put the
data, there is just no reason to complexify the software layout (which
is free from any constraints).

> Another thing that might be worth considering is that if we use
> different functions for raw read/write page is that we would get
> different views on the same raw page data if we switch from software to
> hardware ECC or the other way round which might be confusing.

Don't worry about that: it's impossible to manage. Data layout might be
different of course, but most importantly once you've chosen an ECC
configuration, any access with another configuration will simply fail.
And I'm not just talking about the ECC/step size parameters, each
engine has it's own base polynomial on which it bases all its internal
calculations, and besides trying very hard to mimic your hardware
engine in software for some very good reason, you'll never want to do
that. Especially since the very first reason why you want software
support is usually to go beyond your hardware engine capabilities in
terms of strength.

Here is a blog post about such situation, if deemed useful:
https://bootlin.com/blog/supporting-a-misbehaving-nand-ecc-engine/

> > > This works fine currently, but means that NAND_CMD_RNDOUT can't be used.
> > > Using NAND_CMD_RNDOUT to position the cursor at offset 512b for example
> > > doesn't give you the second subpage, but instead oob0. Positioning the
> > > cursor at offset 2048 doesn't give you the start of OOB, but some
> > > position in the middle of data3.
> > >
> > > Ok, NAND_CMD_RNDOUT can't be used for hardware ECC and there's no way
> > > around it. For software ECC we could change the organisation in the chip
> > > to be [2048b data][64b oob]. With that NAND_CMD_RNDOUT then could be
> > > used with software ECC.
> > >
> > > You say that NAND_CMD_RNDOUT is a basic command that is supported by all
> > > controllers, and yes, it is also supported with the mxc_nand controller.
> > > You just can't control how many bytes are transferred between the NAND
> > > chip and the controller. When using NAND_CMD_RNDOUT to read a few bytes
> > > at a certain page offset we'll end up reading 512 bytes discarding most
> > > of it. For the next ECC block we would move the cursor forward using
> > > another NAND_CMD_RNDOUT command, again read 512 bytes and discard most
> > > it (altough the desired data would have been in the first read already).
> >
> > I'm not sure the controller limitations are so bad in this case. The
> > core helpers (using the same example) will ask for:
> > - 512b at offset 0
> > - 512b at offset 512...
> > - and finally 64b at offset 2048.
> > In practice it does not look like a huge drawback? I don't understand
> > in which case so much data would be read and then discarded?
>
> Yes, you're right. I misread the code and thought the core would read
> the ECC separately for each subpage. In fact it doesn't do so, the ECC
> is always read in one go even for multiple subpages.

"interleaved" layouts actually force us to perform so much
sub-readings, that's probably one reason why there is no reason to try
using an interleaved layout with software ECC engines.

> > > So I think NAND_CMD_RNDOUT should really be avoided for this controller,
> > > eventhough we might be able to support it.
> >
> > I also mentioned the monolithic accessors which try to avoid these
> > random column changes. You probably want to check them out, they might
> > just avoid the need for NAND_CMD_RNDOUT by forcing full page accesses
> > directly. The reason why they were introduced is not exactly our
> > current use case, but it feels like they might be handy.
> >
> > 658beb663960 ("mtd: rawnand: Expose monolithic read/write_page_raw() helpers")
> > 0e7f4b64ea46 ("mtd: rawnand: Allow controllers to overload soft ECC hooks")
>
> Yes, I already make use of 0e7f4b64ea46. My problem is only the ecc.read_subpage
> hook which can't be overwritten and AFAIK this is the only way
> NAND_CMD_RNDOUT might be used in my case.

It needs to be supported, we don't expect in the core that this command
will not be supported. There may be some constraints and limitations,
and this we can workaround them somehow, but we expect support for
NAND_CMD_RNDOUT.

Look at all users of nand_change_read_column_op(), NAND manufacturer
drivers use it, jedec/onfi drivers use it as well in case of
bitflip in the parameter page, and the core may want to use it (although
in your case I don't think it actually does if you don't try to support
over complex layouts, as software ECC engines will always request the
OOB data whereas subpages are not used in this situation).

> I think my favourite solution would be to:
>
> - store data/OOB interleaved for both hardware and software ECC
> - For software ECC use a similar OOB layout as used with hardware
> ECC. This allows us to read a subpage including its ECC data in
> a single step (just like with hardware ECC the controller just
> reads 512b + 16b for each subpage)
> - Allow to disable subpage reads in the NAND core
>
> As a further optimisation we could make ecc.read_subpage overwritable
> for ecc->engine_type == NAND_ECC_ENGINE_TYPE_SOFT && ecc->algo ==
> NAND_ECC_ALGO_BCH. With the OOB layout described above that would be
> easily implementable with the mxc_nand controller.
>
> What do you think?

I'm sorry but I feel like I need to answer "no" to all three items
above. It would be totally backwards.

> If you insist I would go the path of making NAND_CMD_RNDOUT work for
> software ECC, although I think it would cause me extra work with no
> clear gain for me.

Yes, please. I believe this command is not that complex to implement,
even with strong (and clearly advertised) limitations. I had a look at
your exec_op() implementation and to the datasheet of the imx27, the
controller clearly supports CMD/ADDR/CMD/DATAIN ops. It's true that you
can only request 16, 512 or 528 bytes, but, why not? It just needs
to be clearly identified that reading more data is not supported. Once
the data is in the local SRAM you just take what you need and done.
From a performance perspective, I don't think this operation will be
used often, at least not in the fast path (see above why), but we need
it for the driver to work properly in all situations.

There is perhaps something that is missing in your current
implementation though: there is a check_only boolean in ->exec_op()
which might require additional handling so that the core does not try to
perform unsupported operations. You can do that either manually by
checking the ops entirely by hand if there are only a couple that
cannot be supported, or leverage the core parser otherwise.

In this case you can have a look at the use of the "struct
nand_op_parser" in the subsystem. Please also don't hesitate to take
inspiration from other drivers, as you might need to advertise
limitations such as a maximum number of command, address or
data cycles (in this case, 528 or 512 if it's easier).

Thanks,
Miquèl