2017-04-19 12:13:36

by Pavel Machek

[permalink] [raw]
Subject: fsl_ifc_nand: are blank pages protected by ECC?

Hi!

We have some problems with fsl_ifc_nand ... in the old kernels, but
this one does not seem to be fixed in v4.11, either.

UBIFS complains:

UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630
UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB 282:252630
UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from LEB 282:252630
UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed

Possible explanation is here:

https://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/289605

# I see on the forum that this issue has been raised before - my
# understanding is that the omap2 nand driver does not perform ECC
# detection/correction on empty pages so when UBIFS checks the empty
# space data and doesn't read all 0xFF then it fails and mounts
# read-only. I didn't find any good solution - only a workaround to
# remove the UBIFS check..

So I checked fsl_ifc_nand.c in v4.11-rc, and yes, it seems to have the
same problem:

if (errors == 15) {
/*
* Uncorrectable error.
* OK only if the whole page is blank.
*
* We disable ECCER reporting due to...
* erratum IFC-A002770 -- so report it now if we
* see an uncorrectable error in ECCSTAT.
*/
if (!is_blank(mtd, bufnum))
ctrl->nand_stat |=
IFC_NAND_EVTER_STAT_ECCER;
break;
}

is_blank() checks for all 0xff's, so single-bit 0xfe in the data will
result in_blank() == 0 and uncorrectable error being signaled.

Should the driver be modified somehow?

Thanks,
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (2.02 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2017-04-19 21:18:08

by Boris Brezillon

[permalink] [raw]
Subject: Re: fsl_ifc_nand: are blank pages protected by ECC?

On Wed, 19 Apr 2017 14:13:32 +0200
Pavel Machek <[email protected]> wrote:

> Hi!
>
> We have some problems with fsl_ifc_nand ... in the old kernels, but
> this one does not seem to be fixed in v4.11, either.
>
> UBIFS complains:
>
> UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630
> UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB 282:252630
> UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from LEB 282:252630
> UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed
>
> Possible explanation is here:
>
> https://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/289605
>
> # I see on the forum that this issue has been raised before - my
> # understanding is that the omap2 nand driver does not perform ECC
> # detection/correction on empty pages so when UBIFS checks the empty
> # space data and doesn't read all 0xFF then it fails and mounts
> # read-only. I didn't find any good solution - only a workaround to
> # remove the UBIFS check..
>
> So I checked fsl_ifc_nand.c in v4.11-rc, and yes, it seems to have the
> same problem:
>
> if (errors == 15) {
> /*
> * Uncorrectable error.
> * OK only if the whole page is blank.
> *
> * We disable ECCER reporting due to...
> * erratum IFC-A002770 -- so report it now if we
> * see an uncorrectable error in ECCSTAT.
> */
> if (!is_blank(mtd, bufnum))
> ctrl->nand_stat |=
> IFC_NAND_EVTER_STAT_ECCER;
> break;
> }
>
> is_blank() checks for all 0xff's, so single-bit 0xfe in the data will
> result in_blank() == 0 and uncorrectable error being signaled.
>
> Should the driver be modified somehow?

Yep, nand_check_erased_ecc_chunk() [1] is here to help you check this
case, unfortunately, it's not directly applicable here, because this
function takes regular pointers and not __iomem ones. You'll either
have to copy the data in an intermediate buffer before calling
nand_check_erased_ecc_chunk(), or cast the SRAM region to a void
pointer (which is usually not a good idea). The last option would be to
open code nand_check_erased_ecc_chunk(), but I'd really like to avoid
that (for maintainability concerns).

[1]http://lxr.free-electrons.com/source/drivers/mtd/nand/nand_base.c#L1414

2017-04-19 22:15:11

by Pavel Machek

[permalink] [raw]
Subject: Re: fsl_ifc_nand: are blank pages protected by ECC?

Hi!

> > We have some problems with fsl_ifc_nand ... in the old kernels, but
> > this one does not seem to be fixed in v4.11, either.
> >
> > UBIFS complains:
> >
> > UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630
> > UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB 282:252630
> > UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from LEB 282:252630
> > UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed
> >
> > Possible explanation is here:
> >
> > https://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/289605
> >
> > # I see on the forum that this issue has been raised before - my
> > # understanding is that the omap2 nand driver does not perform ECC
> > # detection/correction on empty pages so when UBIFS checks the empty
> > # space data and doesn't read all 0xFF then it fails and mounts
> > # read-only. I didn't find any good solution - only a workaround to
> > # remove the UBIFS check..
> >
> > So I checked fsl_ifc_nand.c in v4.11-rc, and yes, it seems to have the
> > same problem:
> >
> > if (errors == 15) {
> > /*
> > * Uncorrectable error.
> > * OK only if the whole page is blank.
> > *
> > * We disable ECCER reporting due to...
> > * erratum IFC-A002770 -- so report it now if we
> > * see an uncorrectable error in ECCSTAT.
> > */
> > if (!is_blank(mtd, bufnum))
> > ctrl->nand_stat |=
> > IFC_NAND_EVTER_STAT_ECCER;
> > break;
> > }
> >
> > is_blank() checks for all 0xff's, so single-bit 0xfe in the data will
> > result in_blank() == 0 and uncorrectable error being signaled.
> >
> > Should the driver be modified somehow?
>
> Yep, nand_check_erased_ecc_chunk() [1] is here to help you check this
> case, unfortunately, it's not directly applicable here, because this
> function takes regular pointers and not __iomem ones. You'll either
> have to copy the data in an intermediate buffer before calling
> nand_check_erased_ecc_chunk(), or cast the SRAM region to a void
> pointer (which is usually not a good idea). The last option would be to
> open code nand_check_erased_ecc_chunk(), but I'd really like to avoid
> that (for maintainability concerns).

Ok, thanks a lot for the pointer, that should be doable.

Core of the code is:

1357 for (; len >= sizeof(long);
1358 len -= sizeof(long), bitmap += sizeof(long)) {
1359 weight = hweight_long(*((unsigned long
*)bitmap));
1360 bitflips += BITS_PER_LONG - weight;
1361 if (unlikely(bitflips > bitflips_threshold))
1362 return -EBADMSG;
1363 }

Someone clearly optimized this code (took care to do long accesses
etc), but afaict hweight is quite a heavy operation:

_GLOBAL(__arch_hweight32)
BEGIN_FTR_SECTION
b __sw_hweight32
nop
nop
nop
nop
nop
nop
FTR_SECTION_ELSE
BEGIN_FTR_SECTION_NESTED(51)
PPC_POPCNTB(R3,R3)
srdi r4,r3,16
add r3,r4,r3
srdi r4,r3,8
add r3,r4,r3
clrldi r3,r3,64-8
blr
FTR_SECTION_ELSE_NESTED(51)
PPC_POPCNTW(R3,R3)
clrldi r3,r3,64-8
blr
ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_POPCNTD, 51)
ALT_FTR_SECTION_END_IFCLR(CPU_FTR_POPCNTB)
EXPORT_SYMBOL(__arch_hweight32)

Would it make sense to only do hweight if *bitmap != ~0ULL ? Would it
make sense to only check for bitflips > bitflips_threshold each 128
bytes or something like that?

Thanks and best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (3.94 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2017-04-19 22:27:53

by Boris Brezillon

[permalink] [raw]
Subject: Re: fsl_ifc_nand: are blank pages protected by ECC?

On Thu, 20 Apr 2017 00:15:07 +0200
Pavel Machek <[email protected]> wrote:

> Hi!
>
> > > We have some problems with fsl_ifc_nand ... in the old kernels, but
> > > this one does not seem to be fixed in v4.11, either.
> > >
> > > UBIFS complains:
> > >
> > > UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630
> > > UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB 282:252630
> > > UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from LEB 282:252630
> > > UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed
> > >
> > > Possible explanation is here:
> > >
> > > https://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/289605
> > >
> > > # I see on the forum that this issue has been raised before - my
> > > # understanding is that the omap2 nand driver does not perform ECC
> > > # detection/correction on empty pages so when UBIFS checks the empty
> > > # space data and doesn't read all 0xFF then it fails and mounts
> > > # read-only. I didn't find any good solution - only a workaround to
> > > # remove the UBIFS check..
> > >
> > > So I checked fsl_ifc_nand.c in v4.11-rc, and yes, it seems to have the
> > > same problem:
> > >
> > > if (errors == 15) {
> > > /*
> > > * Uncorrectable error.
> > > * OK only if the whole page is blank.
> > > *
> > > * We disable ECCER reporting due to...
> > > * erratum IFC-A002770 -- so report it now if we
> > > * see an uncorrectable error in ECCSTAT.
> > > */
> > > if (!is_blank(mtd, bufnum))
> > > ctrl->nand_stat |=
> > > IFC_NAND_EVTER_STAT_ECCER;
> > > break;
> > > }
> > >
> > > is_blank() checks for all 0xff's, so single-bit 0xfe in the data will
> > > result in_blank() == 0 and uncorrectable error being signaled.
> > >
> > > Should the driver be modified somehow?
> >
> > Yep, nand_check_erased_ecc_chunk() [1] is here to help you check this
> > case, unfortunately, it's not directly applicable here, because this
> > function takes regular pointers and not __iomem ones. You'll either
> > have to copy the data in an intermediate buffer before calling
> > nand_check_erased_ecc_chunk(), or cast the SRAM region to a void
> > pointer (which is usually not a good idea). The last option would be to
> > open code nand_check_erased_ecc_chunk(), but I'd really like to avoid
> > that (for maintainability concerns).
>
> Ok, thanks a lot for the pointer, that should be doable.
>
> Core of the code is:
>
> 1357 for (; len >= sizeof(long);
> 1358 len -= sizeof(long), bitmap += sizeof(long)) {
> 1359 weight = hweight_long(*((unsigned long
> *)bitmap));
> 1360 bitflips += BITS_PER_LONG - weight;
> 1361 if (unlikely(bitflips > bitflips_threshold))
> 1362 return -EBADMSG;
> 1363 }
>
> Someone clearly optimized this code (took care to do long accesses
> etc), but afaict hweight is quite a heavy operation:
>
> _GLOBAL(__arch_hweight32)
> BEGIN_FTR_SECTION
> b __sw_hweight32
> nop
> nop
> nop
> nop
> nop
> nop
> FTR_SECTION_ELSE
> BEGIN_FTR_SECTION_NESTED(51)
> PPC_POPCNTB(R3,R3)
> srdi r4,r3,16
> add r3,r4,r3
> srdi r4,r3,8
> add r3,r4,r3
> clrldi r3,r3,64-8
> blr
> FTR_SECTION_ELSE_NESTED(51)
> PPC_POPCNTW(R3,R3)
> clrldi r3,r3,64-8
> blr
> ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_POPCNTD, 51)
> ALT_FTR_SECTION_END_IFCLR(CPU_FTR_POPCNTB)
> EXPORT_SYMBOL(__arch_hweight32)
>
> Would it make sense to only do hweight if *bitmap != ~0ULL ? Would it
> make sense to only check for bitflips > bitflips_threshold each 128
> bytes or something like that?

I didn't go as far as you did and simply assumed hweight32/64() were
already optimized. Feel free to propose extra improvements.

2017-04-20 11:41:08

by Pavel Machek

[permalink] [raw]
Subject: Re: fsl_ifc_nand: are blank pages protected by ECC?

Hi!

> > Would it make sense to only do hweight if *bitmap != ~0ULL ? Would it
> > make sense to only check for bitflips > bitflips_threshold each 128
> > bytes or something like that?
>
> I didn't go as far as you did and simply assumed hweight32/64() were
> already optimized. Feel free to propose extra improvements.

I'd propose this one (only compile tested, sorry, not sure how to test
this one). If we see ~0UL, there's no need for hweight, and no need to
check number of bitflips. So this should be net win.

Signed-off-by: Pavel Machek <[email protected]>

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index b0524f8..96c27ec 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -1357,7 +1357,10 @@ static int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)

for (; len >= sizeof(long);
len -= sizeof(long), bitmap += sizeof(long)) {
- weight = hweight_long(*((unsigned long *)bitmap));
+ unsigned long d = *((unsigned long *)bitmap);
+ if (d == ~0UL)
+ continue;
+ weight = hweight_long(d);
bitflips += BITS_PER_LONG - weight;
if (unlikely(bitflips > bitflips_threshold))
return -EBADMSG;

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (1.30 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2017-04-20 12:16:03

by Boris Brezillon

[permalink] [raw]
Subject: Re: fsl_ifc_nand: are blank pages protected by ECC?

On Thu, 20 Apr 2017 13:40:57 +0200
Pavel Machek <[email protected]> wrote:

> Hi!
>
> > > Would it make sense to only do hweight if *bitmap != ~0ULL ? Would it
> > > make sense to only check for bitflips > bitflips_threshold each 128
> > > bytes or something like that?
> >
> > I didn't go as far as you did and simply assumed hweight32/64() were
> > already optimized. Feel free to propose extra improvements.
>
> I'd propose this one (only compile tested, sorry, not sure how to test
> this one). If we see ~0UL, there's no need for hweight, and no need to
> check number of bitflips. So this should be net win.

Looks good to me. Can you send a patch with a real commit message?

Thanks,

Boris

>
> Signed-off-by: Pavel Machek <[email protected]>
>
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index b0524f8..96c27ec 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -1357,7 +1357,10 @@ static int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
>
> for (; len >= sizeof(long);
> len -= sizeof(long), bitmap += sizeof(long)) {
> - weight = hweight_long(*((unsigned long *)bitmap));
> + unsigned long d = *((unsigned long *)bitmap);
> + if (d == ~0UL)
> + continue;
> + weight = hweight_long(d);
> bitflips += BITS_PER_LONG - weight;
> if (unlikely(bitflips > bitflips_threshold))
> return -EBADMSG;
>

2017-04-21 10:08:20

by Pavel Machek

[permalink] [raw]
Subject: Re: fsl_ifc_nand: are blank pages protected by ECC?

Hi!

(Added driver author to the cc list, maybe he can help).

> > Hi!
> >
> > We have some problems with fsl_ifc_nand ... in the old kernels, but
> > this one does not seem to be fixed in v4.11, either.
> >
> > UBIFS complains:
> >
> > UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630
> > UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB 282:252630
> > UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from LEB 282:252630
> > UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed
> >
> > Possible explanation is here:
> >
> > https://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/289605
> >
> > # I see on the forum that this issue has been raised before - my
> > # understanding is that the omap2 nand driver does not perform ECC
> > # detection/correction on empty pages so when UBIFS checks the empty
> > # space data and doesn't read all 0xFF then it fails and mounts
> > # read-only. I didn't find any good solution - only a workaround to
> > # remove the UBIFS check..
> >
> > So I checked fsl_ifc_nand.c in v4.11-rc, and yes, it seems to have the
> > same problem:
> >
> > if (errors == 15) {
> > /*
> > * Uncorrectable error.
> > * OK only if the whole page is blank.
> > *
> > * We disable ECCER reporting due to...
> > * erratum IFC-A002770 -- so report it now if we
> > * see an uncorrectable error in ECCSTAT.
> > */
> > if (!is_blank(mtd, bufnum))
> > ctrl->nand_stat |=
> > IFC_NAND_EVTER_STAT_ECCER;
> > break;
> > }
> >
> > is_blank() checks for all 0xff's, so single-bit 0xfe in the data will
> > result in_blank() == 0 and uncorrectable error being signaled.
> >
> > Should the driver be modified somehow?
>
> Yep, nand_check_erased_ecc_chunk() [1] is here to help you check this
> case, unfortunately, it's not directly applicable here, because this
> function takes regular pointers and not __iomem ones. You'll either
> have to copy the data in an intermediate buffer before calling
> nand_check_erased_ecc_chunk(), or cast the SRAM region to a void
> pointer (which is usually not a good idea). The last option would be to
> open code nand_check_erased_ecc_chunk(), but I'd really like to avoid
> that (for maintainability concerns).

Ok, took a look. __iomem is part of a problem, another part is that
nand_check_erased_ecc_chunk() needs to actually write back 0xff's to
undo the corruption, which would probably be bad idea to do in the
iomem, and next one is that blank actually checks arbitrary number of
regions, based on ecc.layout.

So this could be used to simplify the code (if nand_check_erased_buf
was exported; it is not), but it does not fix the problem as we still
need to undo the corruption.

Hints welcome, especially if you know right place where to put this
checking.

(BTW, switching to ecc.mode = ECC_SOFT will cause compatibility
problems but should make the problem go away, right?)

Thanks,
Pavel

diff --git a/drivers/mtd/nand/fsl_ifc_nand.c b/drivers/mtd/nand/fsl_ifc_nand.c
index d1570f5..df02d4c 100644
--- a/drivers/mtd/nand/fsl_ifc_nand.c
+++ b/drivers/mtd/nand/fsl_ifc_nand.c
@@ -181,17 +181,15 @@ static int is_blank(struct mtd_info *mtd, unsigned int bufnum)
struct mtd_oob_region oobregion = { };
int i, section = 0;

- for (i = 0; i < mtd->writesize / 4; i++) {
- if (__raw_readl(&mainarea[i]) != 0xffffffff)
- return 0;
- }
+ i = nand_check_erased_buf(&mainarea[i], mtd->writesize, 0);
+ if (i)
+ return 0;

mtd_ooblayout_ecc(mtd, section++, &oobregion);
while (oobregion.length) {
- for (i = 0; i < oobregion.length; i++) {
- if (__raw_readb(&oob[oobregion.offset + i]) != 0xff)
- return 0;
- }
+ i = nand_check_erased_buf(&oob[oobregion.offset], oobregion.length, 0);
+ if (i)
+ return 0;

mtd_ooblayout_ecc(mtd, section++, &oobregion);
}




--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (4.30 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2017-04-21 10:12:22

by Richard Weinberger

[permalink] [raw]
Subject: Re: fsl_ifc_nand: are blank pages protected by ECC?

Pavel,

Am 21.04.2017 um 12:08 schrieb Pavel Machek:
> (BTW, switching to ecc.mode = ECC_SOFT will cause compatibility
> problems but should make the problem go away, right?)

Yes and it is slow.
So, fixing the driver is the way to go. :-)

Thanks,
//richard

2017-04-21 10:51:11

by Pavel Machek

[permalink] [raw]
Subject: [PATCH] nand_base: optimize checking of erased buffers


If we see ~0UL in flash, there's no need for hweight, and no need to
check number of bitflips. So this should be net win.

Signed-off-by: Pavel Machek <[email protected]>

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index b0524f8..96c27ec 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -1357,7 +1357,10 @@ static int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)

for (; len >= sizeof(long);
len -= sizeof(long), bitmap += sizeof(long)) {
- weight = hweight_long(*((unsigned long *)bitmap));
+ unsigned long d = *((unsigned long *)bitmap);
+ if (d == ~0UL)
+ continue;
+ weight = hweight_long(d);
bitflips += BITS_PER_LONG - weight;
if (unlikely(bitflips > bitflips_threshold))
return -EBADMSG;

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (937.00 B)
signature.asc (181.00 B)
Digital signature
Download all attachments

2017-04-21 12:04:25

by Boris Brezillon

[permalink] [raw]
Subject: Re: fsl_ifc_nand: are blank pages protected by ECC?

On Fri, 21 Apr 2017 12:08:13 +0200
Pavel Machek <[email protected]> wrote:

> Hi!
>
> (Added driver author to the cc list, maybe he can help).
>
> > > Hi!
> > >
> > > We have some problems with fsl_ifc_nand ... in the old kernels, but
> > > this one does not seem to be fixed in v4.11, either.
> > >
> > > UBIFS complains:
> > >
> > > UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630
> > > UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB 282:252630
> > > UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from LEB 282:252630
> > > UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed
> > >
> > > Possible explanation is here:
> > >
> > > https://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/289605
> > >
> > > # I see on the forum that this issue has been raised before - my
> > > # understanding is that the omap2 nand driver does not perform ECC
> > > # detection/correction on empty pages so when UBIFS checks the empty
> > > # space data and doesn't read all 0xFF then it fails and mounts
> > > # read-only. I didn't find any good solution - only a workaround to
> > > # remove the UBIFS check..
> > >
> > > So I checked fsl_ifc_nand.c in v4.11-rc, and yes, it seems to have the
> > > same problem:
> > >
> > > if (errors == 15) {
> > > /*
> > > * Uncorrectable error.
> > > * OK only if the whole page is blank.
> > > *
> > > * We disable ECCER reporting due to...
> > > * erratum IFC-A002770 -- so report it now if we
> > > * see an uncorrectable error in ECCSTAT.
> > > */
> > > if (!is_blank(mtd, bufnum))
> > > ctrl->nand_stat |=
> > > IFC_NAND_EVTER_STAT_ECCER;
> > > break;
> > > }
> > >
> > > is_blank() checks for all 0xff's, so single-bit 0xfe in the data will
> > > result in_blank() == 0 and uncorrectable error being signaled.
> > >
> > > Should the driver be modified somehow?
> >
> > Yep, nand_check_erased_ecc_chunk() [1] is here to help you check this
> > case, unfortunately, it's not directly applicable here, because this
> > function takes regular pointers and not __iomem ones. You'll either
> > have to copy the data in an intermediate buffer before calling
> > nand_check_erased_ecc_chunk(), or cast the SRAM region to a void
> > pointer (which is usually not a good idea). The last option would be to
> > open code nand_check_erased_ecc_chunk(), but I'd really like to avoid
> > that (for maintainability concerns).
>
> Ok, took a look. __iomem is part of a problem, another part is that
> nand_check_erased_ecc_chunk() needs to actually write back 0xff's to
> undo the corruption, which would probably be bad idea to do in the
> iomem, and next one is that blank actually checks arbitrary number of
> regions, based on ecc.layout.
>
> So this could be used to simplify the code (if nand_check_erased_buf
> was exported; it is not), but it does not fix the problem as we still
> need to undo the corruption.

Actually, there was a good reason for not directly exporting this
buffer (see Brian's comment here [1]), and I don't think we should start
exporting it. This and the fact that passing an iomem pointer sounds
like a bad idea makes me think you should modify the driver to put the
data in a buffer when you want to check for bitflips in erased pages.

>
> Hints welcome, especially if you know right place where to put this
> checking.

Just had a quick look at the driver, and it seems like you could move
things around to check for bitflips in erased pages after you've copied
the data in the user buffer (in fsl_ifc_read_page()).

>
> (BTW, switching to ecc.mode = ECC_SOFT will cause compatibility
> problems but should make the problem go away, right?)

Nope, I don't think switching to ECC_SOFT is the right solution here.

Regards,

Boris

[1]https://patchwork.ozlabs.org/patch/509970/

2017-04-21 13:37:26

by Pavel Machek

[permalink] [raw]
Subject: Re: fsl_ifc_nand: are blank pages protected by ECC?

Hi!

> (Added driver author to the cc list, maybe he can help).

> > > UBIFS complains:
> > >
> > > UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630
> > > UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB 282:252630
> > > UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from LEB 282:252630
> > > UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed
...
> > > is_blank() checks for all 0xff's, so single-bit 0xfe in the data will
> > > result in_blank() == 0 and uncorrectable error being signaled.
> > >
> > > Should the driver be modified somehow?
> >
> > Yep, nand_check_erased_ecc_chunk() [1] is here to help you check this
> > case, unfortunately, it's not directly applicable here, because this

Maybe I figured it out. Unfortunately, it is only compile tested. Does
it look approximately right?

Thanks,
Pavel

--- a/drivers/mtd/nand/fsl_ifc_nand.c
+++ b/drivers/mtd/nand/fsl_ifc_nand.c
@@ -171,32 +171,6 @@ static void set_addr(struct mtd_info *mtd, int column, int page_addr, int oob)
ifc_nand_ctrl->index += mtd->writesize;
}

-static int is_blank(struct mtd_info *mtd, unsigned int bufnum)
-{
- struct nand_chip *chip = mtd_to_nand(mtd);
- struct fsl_ifc_mtd *priv = nand_get_controller_data(chip);
- u8 __iomem *addr = priv->vbase + bufnum * (mtd->writesize * 2);
- u32 __iomem *mainarea = (u32 __iomem *)addr;
- u8 __iomem *oob = addr + mtd->writesize;
- struct mtd_oob_region oobregion = { };
- int i, section = 0;
-
- i = nand_check_erased_buf(&mainarea[i], mtd->writesize, 0);
- if (i)
- return 0;
-
- mtd_ooblayout_ecc(mtd, section++, &oobregion);
- while (oobregion.length) {
- i = nand_check_erased_buf(&oob[oobregion.offset], oobregion.length, 0);
- if (i)
- return 0;
-
- mtd_ooblayout_ecc(mtd, section++, &oobregion);
- }
-
- return 1;
-}
-
/* returns nonzero if entire page is blank */
static int check_read_ecc(struct mtd_info *mtd, struct fsl_ifc_ctrl *ctrl,
u32 *eccstat, unsigned int bufnum)
@@ -272,16 +246,14 @@ static void fsl_ifc_run_command(struct mtd_info *mtd)
if (errors == 15) {
/*
* Uncorrectable error.
- * OK only if the whole page is blank.
+ * We'll check for blank pages later.
*
* We disable ECCER reporting due to...
* erratum IFC-A002770 -- so report it now if we
* see an uncorrectable error in ECCSTAT.
*/
- if (!is_blank(mtd, bufnum))
- ctrl->nand_stat |=
- IFC_NAND_EVTER_STAT_ECCER;
- break;
+ ctrl->nand_stat |= IFC_NAND_EVTER_STAT_ECCER;
+ printk("NAND flash read: error but continuing.\n");
}

mtd->ecc_stats.corrected += errors;
@@ -676,6 +648,40 @@ static int fsl_ifc_wait(struct mtd_info *mtd, struct nand_chip *chip)
return nand_fsr | NAND_STATUS_WP;
}

+/*
+ * The controller does not check for bitflips in erased pages,
+ * therefore software must check instead.
+ */
+static int check_erased_page(struct nand_chip *chip, u8 *buf)
+{
+ struct mtd_info *mtd = nand_to_mtd(chip);
+ u8 *ecc = chip->oob_poi;
+ const int ecc_size = chip->ecc.bytes;
+ const int pkt_size = chip->ecc.size;
+ int i, res, bitflips = 0;
+ struct mtd_oob_region oobregion = { };
+
+ mtd_ooblayout_ecc(mtd, 0, &oobregion);
+ ecc += oobregion.offset;
+
+ for (i = 0; i < chip->ecc.steps; ++i) {
+ res = nand_check_erased_ecc_chunk(buf, pkt_size, ecc, ecc_size,
+ NULL, 0,
+ chip->ecc.strength);
+ if (res < 0)
+ mtd->ecc_stats.failed++;
+
+ bitflips = max(res, bitflips);
+ buf += pkt_size;
+ ecc += ecc_size;
+ }
+
+ mtd_ooblayout_ecc(mtd, 1, &oobregion);
+ BUG_ON(oobregion.length);
+
+ return bitflips;
+}
+
static int fsl_ifc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
uint8_t *buf, int oob_required, int page)
{
@@ -687,11 +693,23 @@ static int fsl_ifc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
if (oob_required)
fsl_ifc_read_buf(mtd, chip->oob_poi, mtd->oobsize);

- if (ctrl->nand_stat & IFC_NAND_EVTER_STAT_ECCER)
- dev_err(priv->dev, "NAND Flash ECC Uncorrectable Error\n");
-
if (ctrl->nand_stat != IFC_NAND_EVTER_STAT_OPC)
mtd->ecc_stats.failed++;
+
+ if (ctrl->nand_stat & IFC_NAND_EVTER_STAT_ECCER) {
+ int res;
+
+ if (!oob_required)
+ fsl_ifc_read_buf(mtd, chip->oob_poi, mtd->oobsize);
+
+ printk("NAND flash: have error (%d)\n", nctrl->max_bitflips);
+ res = check_erased_page(chip, buf);
+ printk("NAND flash: but erased page says %d\n", res);
+
+ dev_err(priv->dev, "NAND Flash ECC Uncorrectable Error or erased page\n");
+ return res;
+ }
+

return nctrl->max_bitflips;
}

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (4.62 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2017-04-21 13:49:21

by Boris Brezillon

[permalink] [raw]
Subject: Re: fsl_ifc_nand: are blank pages protected by ECC?

On Fri, 21 Apr 2017 15:37:21 +0200
Pavel Machek <[email protected]> wrote:

> Hi!
>
> > (Added driver author to the cc list, maybe he can help).
>
> > > > UBIFS complains:
> > > >
> > > > UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630
> > > > UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB 282:252630
> > > > UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from LEB 282:252630
> > > > UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed
> ...
> > > > is_blank() checks for all 0xff's, so single-bit 0xfe in the data will
> > > > result in_blank() == 0 and uncorrectable error being signaled.
> > > >
> > > > Should the driver be modified somehow?
> > >
> > > Yep, nand_check_erased_ecc_chunk() [1] is here to help you check this
> > > case, unfortunately, it's not directly applicable here, because this
>
> Maybe I figured it out. Unfortunately, it is only compile tested. Does
> it look approximately right?

Yep that's definitely better. Just one thing missing (see below),
otherwise it looks good.

>
> Thanks,
> Pavel
>
> --- a/drivers/mtd/nand/fsl_ifc_nand.c
> +++ b/drivers/mtd/nand/fsl_ifc_nand.c
> @@ -171,32 +171,6 @@ static void set_addr(struct mtd_info *mtd, int column, int page_addr, int oob)
> ifc_nand_ctrl->index += mtd->writesize;
> }
>
> -static int is_blank(struct mtd_info *mtd, unsigned int bufnum)
> -{
> - struct nand_chip *chip = mtd_to_nand(mtd);
> - struct fsl_ifc_mtd *priv = nand_get_controller_data(chip);
> - u8 __iomem *addr = priv->vbase + bufnum * (mtd->writesize * 2);
> - u32 __iomem *mainarea = (u32 __iomem *)addr;
> - u8 __iomem *oob = addr + mtd->writesize;
> - struct mtd_oob_region oobregion = { };
> - int i, section = 0;
> -
> - i = nand_check_erased_buf(&mainarea[i], mtd->writesize, 0);
> - if (i)
> - return 0;
> -
> - mtd_ooblayout_ecc(mtd, section++, &oobregion);
> - while (oobregion.length) {
> - i = nand_check_erased_buf(&oob[oobregion.offset], oobregion.length, 0);
> - if (i)
> - return 0;
> -
> - mtd_ooblayout_ecc(mtd, section++, &oobregion);
> - }
> -
> - return 1;
> -}
> -
> /* returns nonzero if entire page is blank */
> static int check_read_ecc(struct mtd_info *mtd, struct fsl_ifc_ctrl *ctrl,
> u32 *eccstat, unsigned int bufnum)
> @@ -272,16 +246,14 @@ static void fsl_ifc_run_command(struct mtd_info *mtd)
> if (errors == 15) {
> /*
> * Uncorrectable error.
> - * OK only if the whole page is blank.
> + * We'll check for blank pages later.
> *
> * We disable ECCER reporting due to...
> * erratum IFC-A002770 -- so report it now if we
> * see an uncorrectable error in ECCSTAT.
> */
> - if (!is_blank(mtd, bufnum))
> - ctrl->nand_stat |=
> - IFC_NAND_EVTER_STAT_ECCER;
> - break;
> + ctrl->nand_stat |= IFC_NAND_EVTER_STAT_ECCER;
> + printk("NAND flash read: error but continuing.\n");
> }
>
> mtd->ecc_stats.corrected += errors;
> @@ -676,6 +648,40 @@ static int fsl_ifc_wait(struct mtd_info *mtd, struct nand_chip *chip)
> return nand_fsr | NAND_STATUS_WP;
> }
>
> +/*
> + * The controller does not check for bitflips in erased pages,
> + * therefore software must check instead.
> + */
> +static int check_erased_page(struct nand_chip *chip, u8 *buf)
> +{
> + struct mtd_info *mtd = nand_to_mtd(chip);
> + u8 *ecc = chip->oob_poi;
> + const int ecc_size = chip->ecc.bytes;
> + const int pkt_size = chip->ecc.size;
> + int i, res, bitflips = 0;
> + struct mtd_oob_region oobregion = { };
> +
> + mtd_ooblayout_ecc(mtd, 0, &oobregion);
> + ecc += oobregion.offset;
> +
> + for (i = 0; i < chip->ecc.steps; ++i) {
> + res = nand_check_erased_ecc_chunk(buf, pkt_size, ecc, ecc_size,
> + NULL, 0,
> + chip->ecc.strength);
> + if (res < 0)
> + mtd->ecc_stats.failed++;

else
mtd->ecc_stats.corrected += res;


> +
> + bitflips = max(res, bitflips);
> + buf += pkt_size;
> + ecc += ecc_size;
> + }
> +
> + mtd_ooblayout_ecc(mtd, 1, &oobregion);
> + BUG_ON(oobregion.length);

Probably something you should check at registration time only.

> +
> + return bitflips;
> +}
> +
> static int fsl_ifc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
> uint8_t *buf, int oob_required, int page)
> {
> @@ -687,11 +693,23 @@ static int fsl_ifc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
> if (oob_required)
> fsl_ifc_read_buf(mtd, chip->oob_poi, mtd->oobsize);
>
> - if (ctrl->nand_stat & IFC_NAND_EVTER_STAT_ECCER)
> - dev_err(priv->dev, "NAND Flash ECC Uncorrectable Error\n");
> -
> if (ctrl->nand_stat != IFC_NAND_EVTER_STAT_OPC)
> mtd->ecc_stats.failed++;
> +
> + if (ctrl->nand_stat & IFC_NAND_EVTER_STAT_ECCER) {
> + int res;
> +
> + if (!oob_required)
> + fsl_ifc_read_buf(mtd, chip->oob_poi, mtd->oobsize);
> +
> + printk("NAND flash: have error (%d)\n", nctrl->max_bitflips);
> + res = check_erased_page(chip, buf);
> + printk("NAND flash: but erased page says %d\n", res);
> +
> + dev_err(priv->dev, "NAND Flash ECC Uncorrectable Error or erased page\n");
> + return res;
> + }
> +
>
> return nctrl->max_bitflips;
> }
>

2017-04-22 07:02:06

by Pavel Machek

[permalink] [raw]
Subject: Re: fsl_ifc_nand: are blank pages protected by ECC?

Hi!

> > Maybe I figured it out. Unfortunately, it is only compile tested. Does
> > it look approximately right?
>
> Yep that's definitely better. Just one thing missing (see below),
> otherwise it looks good.

Thanks for review. Unfortunately it is still untested, so...

> > + if (res < 0)
> > + mtd->ecc_stats.failed++;
>
> else
> mtd->ecc_stats.corrected += res;

Ok, I copied this from tango_nand. I'll submit a patch to fix it
there...

> > +
> > + bitflips = max(res, bitflips);
> > + buf += pkt_size;
> > + ecc += ecc_size;
> > + }
> > +
> > + mtd_ooblayout_ecc(mtd, 1, &oobregion);
> > + BUG_ON(oobregion.length);
>
> Probably something you should check at registration time only.

Drive defensively, buy a tank ;-). Ok, I'll delete this one in final
version.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (934.00 B)
signature.asc (181.00 B)
Digital signature
Download all attachments

2017-04-22 10:40:40

by Pavel Machek

[permalink] [raw]
Subject: [PATCH] tango_nand.c: fix ecc.stats_corrected in empty flash case

Fix ecc.stats_corrected in empty flash case.

Signed-off-by: Pavel Machek <[email protected]>

---

This was suggested by Boris Brezillon in another context. Not tested;
I don't have the hardware.

diff --git a/drivers/mtd/nand/tango_nand.c b/drivers/mtd/nand/tango_nand.c
index 4a5e948..db4bff4 100644
--- a/drivers/mtd/nand/tango_nand.c
+++ b/drivers/mtd/nand/tango_nand.c
@@ -193,6 +193,8 @@ static int check_erased_page(struct nand_chip *chip, u8 *buf)
chip->ecc.strength);
if (res < 0)
mtd->ecc_stats.failed++;
+ else
+ mtd->ecc_stats.corrected += res;

bitflips = max(res, bitflips);
buf += pkt_size;

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (766.00 B)
signature.asc (181.00 B)
Digital signature
Download all attachments

2017-04-23 09:58:58

by Pavel Machek

[permalink] [raw]
Subject: tango_nand: is logic right in error cases? (was Re: fsl_ifc_nand: are blank pages protected by ECC?)

Hi!

> > Maybe I figured it out. Unfortunately, it is only compile tested. Does
> > it look approximately right?
>
> Yep that's definitely better. Just one thing missing (see below),
> otherwise it looks good.

I'm copying from tango_nand, therefore I had to check tango_nand, too.

static int check_erased_page(struct nand_chip *chip, u8 *buf)
{
...
res = nand_check_erased_ecc_chunk(buf, pkt_size, ecc, ecc_size,
meta, meta_len,
chip->ecc.strength);
if (res < 0)
mtd->ecc_stats.failed++;
else
mtd->ecc_stats.corrected += res;

bitflips = max(res, bitflips);
...
return bitflips;
}

static int tango_read_page(struct mtd_info *mtd, struct nand_chip *chip,
u8 *buf, int oob_required, int page)
{
...
res = decode_error_report(nfc);
if (res < 0) {
chip->ecc.read_oob_raw(mtd, chip, page);
res = check_erased_page(chip, buf);
}

return res;
}


So nand_check_erased_ecc_chunk() returns < 0 (failed ECC), but then we
perform max() with bitflips (lets say 1, correctable ECC) and return
1? tango_read_page then returns 1 (correctable ECC) forgetting about
failed ECC...?

Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (1.51 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2017-04-24 07:13:09

by Boris Brezillon

[permalink] [raw]
Subject: Re: tango_nand: is logic right in error cases? (was Re: fsl_ifc_nand: are blank pages protected by ECC?)

On Sun, 23 Apr 2017 11:58:45 +0200
Pavel Machek <[email protected]> wrote:

> Hi!
>
> > > Maybe I figured it out. Unfortunately, it is only compile tested. Does
> > > it look approximately right?
> >
> > Yep that's definitely better. Just one thing missing (see below),
> > otherwise it looks good.
>
> I'm copying from tango_nand, therefore I had to check tango_nand, too.
>
> static int check_erased_page(struct nand_chip *chip, u8 *buf)
> {
> ...
> res = nand_check_erased_ecc_chunk(buf, pkt_size, ecc, ecc_size,
> meta, meta_len,
> chip->ecc.strength);
> if (res < 0)
> mtd->ecc_stats.failed++;
> else
> mtd->ecc_stats.corrected += res;
>
> bitflips = max(res, bitflips);
> ...
> return bitflips;
> }
>
> static int tango_read_page(struct mtd_info *mtd, struct nand_chip *chip,
> u8 *buf, int oob_required, int page)
> {
> ...
> res = decode_error_report(nfc);
> if (res < 0) {
> chip->ecc.read_oob_raw(mtd, chip, page);
> res = check_erased_page(chip, buf);
> }
>
> return res;
> }
>
>
> So nand_check_erased_ecc_chunk() returns < 0 (failed ECC), but then we
> perform max() with bitflips (lets say 1, correctable ECC) and return
> 1? tango_read_page then returns 1 (correctable ECC) forgetting about
> failed ECC...?

Yep, that's expected, see what's done in the core to detect
uncorrectable errors and return EBADMSG when appropriate [1].

[1]http://lxr.free-electrons.com/source/drivers/mtd/nand/nand_base.c#L2033