2015-07-30 17:35:05

by Boris Brezillon

[permalink] [raw]
Subject: [RFC PATCH 0/2] mtd: nand: properly handle bitflips in erased pages

Hi Brian,

This is the patch series I promised a while ago.
The implementation should be generic enough to be used by controller
drivers.
The only problem I can see is for the GPMI controller which might produce
non byte aligned ECC data, and the nand_check_erased helpers are only
working with byte aligned buffers.

Best Regards,

Boris

Boris Brezillon (2):
mtd: nand: add nand_check_erased helper functions
mtd: nand: use nand_check_erased_ecc_chunk in default ECC read
functions

drivers/mtd/nand/nand_base.c | 167 ++++++++++++++++++++++++++++++++++++++++---
include/linux/mtd/nand.h | 8 +++
2 files changed, 167 insertions(+), 8 deletions(-)

--
1.9.1


2015-07-30 17:35:06

by Boris Brezillon

[permalink] [raw]
Subject: [RFC PATCH 1/2] mtd: nand: add nand_check_erased helper functions

Add two helper functions to help NAND controller drivers test whether a
specific NAND region is erased or not.

Signed-off-by: Boris Brezillon <[email protected]>
---
drivers/mtd/nand/nand_base.c | 104 +++++++++++++++++++++++++++++++++++++++++++
include/linux/mtd/nand.h | 8 ++++
2 files changed, 112 insertions(+)

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index ceb68ca..1542ea7 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -1101,6 +1101,110 @@ out:
EXPORT_SYMBOL(nand_lock);

/**
+ * nand_check_erased_buf - check if a buffer contains (almost) only 0xff data
+ * @buf: buffer to test
+ * @len: buffer length
+ * @bitflips_threshold:maximum number of bitflips
+ *
+ * Check if a buffer contains only 0xff, which means the underlying region
+ * has been erased and is ready to be programmed.
+ * The bitflips_threshold specify the maximum number of bitflips before
+ * considering the region is not erased.
+ * Note: The logic of this function has been extracted from the memweight
+ * implementation, except that nand_check_erased_buf function exit before
+ * testing the whole buffer if the number of bitflips exceed the
+ * bitflips_threshold value.
+ *
+ * Returns a positive number of bitflips or -ERROR_CODE.
+ */
+int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
+{
+ const unsigned char *bitmap = buf;
+ int bitflips = 0;
+ int weight;
+ int longs;
+
+ for (; len && ((unsigned long)bitmap) % sizeof(long);
+ len--, bitmap++) {
+ weight = hweight8(*bitmap);
+
+ bitflips += sizeof(u8) - weight;
+ if (bitflips > bitflips_threshold)
+ return -EINVAL;
+ }
+
+
+ for (longs = len / sizeof(long); longs;
+ longs--, bitmap += sizeof(long)) {
+ BUG_ON(longs >= INT_MAX / BITS_PER_LONG);
+ weight = hweight_long(*((unsigned long *)bitmap));
+
+ bitflips += sizeof(long) - weight;
+ if (bitflips > bitflips_threshold)
+ return -EINVAL;
+ }
+
+ len %= sizeof(long);
+
+ for (; len > 0; len--, bitmap++) {
+ weight = hweight8(*bitmap);
+ bitflips += sizeof(u8) - weight;
+ }
+
+ return bitflips;
+}
+EXPORT_SYMBOL(nand_check_erased_buf);
+
+/**
+ * nand_check_erased_ecc_chunk - check if an ECC chunk contains (almost) only
+ * 0xff data
+ * @data: data buffer to test
+ * @datalen: data length
+ * @ecc: ECC buffer
+ * @ecclen: ECC length
+ * @extraoob: extra OOB buffer
+ * @extraooblen: extra OOB length
+ * @bitflips_threshold: maximum number of bitflips
+ *
+ * Check if a data buffer and its associated ECC and OOB data contains only
+ * 0xff pattern, which means the underlying region has been erased and is
+ * ready to be programmed.
+ * The bitflips_threshold specify the maximum number of bitflips before
+ * considering the region as not erased.
+ *
+ * Returns a positive number of bitflips or -ERROR_CODE.
+ */
+int nand_check_erased_ecc_chunk(void *data, int datalen,
+ void *ecc, int ecclen,
+ void *extraoob, int extraooblen,
+ int bitflips_threshold)
+{
+ int bitflips = 0;
+ int ret;
+
+ ret = nand_check_erased_buf(data, datalen, bitflips_threshold);
+ if (ret < 0)
+ return ret;
+
+ bitflips += ret;
+ bitflips_threshold -= ret;
+
+ ret = nand_check_erased_buf(ecc, ecclen, bitflips_threshold);
+ if (ret < 0)
+ return ret;
+
+ bitflips += ret;
+ bitflips_threshold -= ret;
+
+ ret = nand_check_erased_buf(extraoob, extraooblen, bitflips_threshold);
+ if (ret < 0)
+ return ret;
+
+ return bitflips + ret;
+}
+EXPORT_SYMBOL(nand_check_erased_ecc_chunk);
+
+/**
* nand_read_page_raw - [INTERN] read raw page data without ecc
* @mtd: mtd info structure
* @chip: nand chip info structure
diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
index 272f429..ae06a07 100644
--- a/include/linux/mtd/nand.h
+++ b/include/linux/mtd/nand.h
@@ -1030,4 +1030,12 @@ struct nand_sdr_timings {

/* get timing characteristics from ONFI timing mode. */
const struct nand_sdr_timings *onfi_async_timing_mode_to_sdr_timings(int mode);
+
+int nand_check_erased_buf(void *data, int datalen,
+ int threshold);
+
+int nand_check_erased_ecc_chunk(void *data, int datalen,
+ void *ecc, int ecclen,
+ void *extraoob, int extraooblen,
+ int threshold);
#endif /* __LINUX_MTD_NAND_H */
--
1.9.1

2015-07-30 17:35:07

by Boris Brezillon

[permalink] [raw]
Subject: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions

The default NAND read functions are relying on an underlying controller
to correct bitflips, but some of those controller cannot properly fix
bitflips in erased pages.
In case of ECC failures, check if the page of subpage is empty before
reporting an ECC failure.

Signed-off-by: Boris Brezillon <[email protected]>
---
drivers/mtd/nand/nand_base.c | 63 ++++++++++++++++++++++++++++++++++++++------
1 file changed, 55 insertions(+), 8 deletions(-)

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 1542ea7..d527c73 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -1396,6 +1396,19 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip,
stat = chip->ecc.correct(mtd, p,
&chip->buffers->ecccode[i], &chip->buffers->ecccalc[i]);
if (stat < 0) {
+ /* check for empty pages with bitflips */
+ int col = (int)(p - bufpoi);
+
+ chip->cmdfunc(mtd, NAND_CMD_RNDOUT, col, -1);
+ chip->read_buf(mtd, p, chip->ecc.size);
+ stat = nand_check_erased_ecc_chunk(p, chip->ecc.size,
+ &chip->buffers->ecccode[i],
+ chip->ecc.bytes,
+ NULL, 0,
+ chip->ecc.strength);
+ }
+
+ if (stat < 0) {
mtd->ecc_stats.failed++;
} else {
mtd->ecc_stats.corrected += stat;
@@ -1445,6 +1458,16 @@ static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip,

stat = chip->ecc.correct(mtd, p, &ecc_code[i], &ecc_calc[i]);
if (stat < 0) {
+ /* check for empty pages with bitflips */
+ chip->cmdfunc(mtd, NAND_CMD_RNDOUT, i, -1);
+ chip->read_buf(mtd, p, eccsize);
+ stat = nand_check_erased_ecc_chunk(p, eccsize,
+ &ecc_code[i], eccbytes,
+ NULL, 0,
+ chip->ecc.strength);
+ }
+
+ if (stat < 0) {
mtd->ecc_stats.failed++;
} else {
mtd->ecc_stats.corrected += stat;
@@ -1495,7 +1518,18 @@ static int nand_read_page_hwecc_oob_first(struct mtd_info *mtd,
chip->read_buf(mtd, p, eccsize);
chip->ecc.calculate(mtd, p, &ecc_calc[i]);

- stat = chip->ecc.correct(mtd, p, &ecc_code[i], NULL);
+ stat = chip->ecc.correct(mtd, p, &ecc_code[i], &ecc_calc[i]);
+ if (stat < 0) {
+ /* check for empty pages with bitflips */
+ chip->cmdfunc(mtd, NAND_CMD_RNDOUT,
+ i + mtd->oobsize, -1);
+ chip->read_buf(mtd, p, eccsize);
+ stat = nand_check_erased_ecc_chunk(p, eccsize,
+ &ecc_code[i], eccbytes,
+ NULL, 0,
+ chip->ecc.strength);
+ }
+
if (stat < 0) {
mtd->ecc_stats.failed++;
} else {
@@ -1523,6 +1557,8 @@ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip,
int i, eccsize = chip->ecc.size;
int eccbytes = chip->ecc.bytes;
int eccsteps = chip->ecc.steps;
+ int eccstepsize = eccsize + eccbytes + chip->ecc.prepad +
+ chip->ecc.postpad;
uint8_t *p = buf;
uint8_t *oob = chip->oob_poi;
unsigned int max_bitflips = 0;
@@ -1542,19 +1578,30 @@ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip,
chip->read_buf(mtd, oob, eccbytes);
stat = chip->ecc.correct(mtd, p, oob, NULL);

- if (stat < 0) {
- mtd->ecc_stats.failed++;
- } else {
- mtd->ecc_stats.corrected += stat;
- max_bitflips = max_t(unsigned int, max_bitflips, stat);
- }
-
oob += eccbytes;

if (chip->ecc.postpad) {
chip->read_buf(mtd, oob, chip->ecc.postpad);
oob += chip->ecc.postpad;
}
+
+ if (stat < 0) {
+ /* check for empty pages with bitflips */
+ chip->cmdfunc(mtd, NAND_CMD_RNDOUT, i * eccstepsize, -1);
+ chip->read_buf(mtd, p, chip->ecc.size);
+ stat = nand_check_erased_ecc_chunk(p, chip->ecc.size,
+ oob - eccstepsize,
+ eccstepsize,
+ NULL, 0,
+ chip->ecc.strength);
+ }
+
+ if (stat < 0) {
+ mtd->ecc_stats.failed++;
+ } else {
+ mtd->ecc_stats.corrected += stat;
+ max_bitflips = max_t(unsigned int, max_bitflips, stat);
+ }
}

/* Calculate remaining oob bytes */
--
1.9.1

2015-07-31 07:10:43

by Boris Brezillon

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] mtd: nand: add nand_check_erased helper functions

On Thu, 30 Jul 2015 19:34:53 +0200
Boris Brezillon <[email protected]> wrote:

> Add two helper functions to help NAND controller drivers test whether a
> specific NAND region is erased or not.
>
> Signed-off-by: Boris Brezillon <[email protected]>
> ---
> drivers/mtd/nand/nand_base.c | 104 +++++++++++++++++++++++++++++++++++++++++++
> include/linux/mtd/nand.h | 8 ++++
> 2 files changed, 112 insertions(+)
>
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index ceb68ca..1542ea7 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -1101,6 +1101,110 @@ out:
> EXPORT_SYMBOL(nand_lock);
>
> /**
> + * nand_check_erased_buf - check if a buffer contains (almost) only 0xff data
> + * @buf: buffer to test
> + * @len: buffer length
> + * @bitflips_threshold:maximum number of bitflips
> + *
> + * Check if a buffer contains only 0xff, which means the underlying region
> + * has been erased and is ready to be programmed.
> + * The bitflips_threshold specify the maximum number of bitflips before
> + * considering the region is not erased.
> + * Note: The logic of this function has been extracted from the memweight
> + * implementation, except that nand_check_erased_buf function exit before
> + * testing the whole buffer if the number of bitflips exceed the
> + * bitflips_threshold value.
> + *
> + * Returns a positive number of bitflips or -ERROR_CODE.
> + */
> +int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
> +{
> + const unsigned char *bitmap = buf;
> + int bitflips = 0;
> + int weight;
> + int longs;
> +
> + for (; len && ((unsigned long)bitmap) % sizeof(long);
> + len--, bitmap++) {
> + weight = hweight8(*bitmap);
> +
> + bitflips += sizeof(u8) - weight;
> + if (bitflips > bitflips_threshold)
> + return -EINVAL;
> + }
> +
> +
> + for (longs = len / sizeof(long); longs;
> + longs--, bitmap += sizeof(long)) {
> + BUG_ON(longs >= INT_MAX / BITS_PER_LONG);
> + weight = hweight_long(*((unsigned long *)bitmap));
> +
> + bitflips += sizeof(long) - weight;
> + if (bitflips > bitflips_threshold)
> + return -EINVAL;
> + }
> +
> + len %= sizeof(long);
> +
> + for (; len > 0; len--, bitmap++) {
> + weight = hweight8(*bitmap);
> + bitflips += sizeof(u8) - weight;
> + }
> +
> + return bitflips;
> +}
> +EXPORT_SYMBOL(nand_check_erased_buf);
> +
> +/**
> + * nand_check_erased_ecc_chunk - check if an ECC chunk contains (almost) only
> + * 0xff data
> + * @data: data buffer to test
> + * @datalen: data length
> + * @ecc: ECC buffer
> + * @ecclen: ECC length
> + * @extraoob: extra OOB buffer
> + * @extraooblen: extra OOB length
> + * @bitflips_threshold: maximum number of bitflips
> + *
> + * Check if a data buffer and its associated ECC and OOB data contains only
> + * 0xff pattern, which means the underlying region has been erased and is
> + * ready to be programmed.
> + * The bitflips_threshold specify the maximum number of bitflips before
> + * considering the region as not erased.
> + *
> + * Returns a positive number of bitflips or -ERROR_CODE.
> + */
> +int nand_check_erased_ecc_chunk(void *data, int datalen,
> + void *ecc, int ecclen,
> + void *extraoob, int extraooblen,
> + int bitflips_threshold)
> +{
> + int bitflips = 0;
> + int ret;
> +
> + ret = nand_check_erased_buf(data, datalen, bitflips_threshold);
> + if (ret < 0)
> + return ret;
> +
> + bitflips += ret;
> + bitflips_threshold -= ret;
> +
> + ret = nand_check_erased_buf(ecc, ecclen, bitflips_threshold);
> + if (ret < 0)
> + return ret;
> +
> + bitflips += ret;
> + bitflips_threshold -= ret;
> +
> + ret = nand_check_erased_buf(extraoob, extraooblen, bitflips_threshold);
> + if (ret < 0)
> + return ret;
> +

Forgot the memset operations here:

memset(data, 0xff, datalen);
memset(ecc, 0xff, ecclen);
memset(extraoob, 0xff, extraooblen);

> + return bitflips + ret;
> +}
> +EXPORT_SYMBOL(nand_check_erased_ecc_chunk);
> +
> +/**
> * nand_read_page_raw - [INTERN] read raw page data without ecc
> * @mtd: mtd info structure
> * @chip: nand chip info structure
> diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
> index 272f429..ae06a07 100644
> --- a/include/linux/mtd/nand.h
> +++ b/include/linux/mtd/nand.h
> @@ -1030,4 +1030,12 @@ struct nand_sdr_timings {
>
> /* get timing characteristics from ONFI timing mode. */
> const struct nand_sdr_timings *onfi_async_timing_mode_to_sdr_timings(int mode);
> +
> +int nand_check_erased_buf(void *data, int datalen,
> + int threshold);
> +
> +int nand_check_erased_ecc_chunk(void *data, int datalen,
> + void *ecc, int ecclen,
> + void *extraoob, int extraooblen,
> + int threshold);
> #endif /* __LINUX_MTD_NAND_H */



--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

2015-07-31 10:06:42

by Andrea Scian

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] mtd: nand: add nand_check_erased helper functions


Dear Boris,

thanks for pointing this out again.

I'm on the same topic too, using iMX6 (I'll try to test you patch on the
next days, if I found some spare time, unfortunately I got a 3.10
kernel, so I think the patch will not apply cleanly :-( ).

See my comment below (and on the next mail too)

Il 31/07/2015 09:10, Boris Brezillon ha scritto:
> On Thu, 30 Jul 2015 19:34:53 +0200
> Boris Brezillon <[email protected]> wrote:
>
>> Add two helper functions to help NAND controller drivers test whether a
>> specific NAND region is erased or not.
>>
>> Signed-off-by: Boris Brezillon <[email protected]>
>> ---
>> drivers/mtd/nand/nand_base.c | 104 +++++++++++++++++++++++++++++++++++++++++++
>> include/linux/mtd/nand.h | 8 ++++
>> 2 files changed, 112 insertions(+)
>>
>> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
>> index ceb68ca..1542ea7 100644
>> --- a/drivers/mtd/nand/nand_base.c
>> +++ b/drivers/mtd/nand/nand_base.c
>> @@ -1101,6 +1101,110 @@ out:
>> EXPORT_SYMBOL(nand_lock);
>>
>> /**
>> + * nand_check_erased_buf - check if a buffer contains (almost) only 0xff data
>> + * @buf: buffer to test
>> + * @len: buffer length
>> + * @bitflips_threshold:maximum number of bitflips
>> + *
>> + * Check if a buffer contains only 0xff, which means the underlying region
>> + * has been erased and is ready to be programmed.
>> + * The bitflips_threshold specify the maximum number of bitflips before
>> + * considering the region is not erased.
>> + * Note: The logic of this function has been extracted from the memweight
>> + * implementation, except that nand_check_erased_buf function exit before
>> + * testing the whole buffer if the number of bitflips exceed the
>> + * bitflips_threshold value.
>> + *
>> + * Returns a positive number of bitflips or -ERROR_CODE.
>> + */
>> +int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
>> +{
>> + const unsigned char *bitmap = buf;
>> + int bitflips = 0;
>> + int weight;
>> + int longs;
>> +
>> + for (; len && ((unsigned long)bitmap) % sizeof(long);
>> + len--, bitmap++) {
>> + weight = hweight8(*bitmap);
>> +
>> + bitflips += sizeof(u8) - weight;
>> + if (bitflips > bitflips_threshold)
>> + return -EINVAL;

I think it's better to do something like:

if (UNLIKELY(bitflips > bitflips_threshold))
return -EINVAL;

isn't it? :-)
(the same for the other if)


>> + }
>> +
>> +
>> + for (longs = len / sizeof(long); longs;
>> + longs--, bitmap += sizeof(long)) {
>> + BUG_ON(longs >= INT_MAX / BITS_PER_LONG);
>> + weight = hweight_long(*((unsigned long *)bitmap));
>> +
>> + bitflips += sizeof(long) - weight;
>> + if (bitflips > bitflips_threshold)
>> + return -EINVAL;
>> + }
>> +
>> + len %= sizeof(long);
>> +
>> + for (; len > 0; len--, bitmap++) {
>> + weight = hweight8(*bitmap);
>> + bitflips += sizeof(u8) - weight;
>> + }
>> +
>> + return bitflips;
>> +}
>> +EXPORT_SYMBOL(nand_check_erased_buf);
>> +
>> +/**
>> + * nand_check_erased_ecc_chunk - check if an ECC chunk contains (almost) only
>> + * 0xff data
>> + * @data: data buffer to test
>> + * @datalen: data length
>> + * @ecc: ECC buffer
>> + * @ecclen: ECC length
>> + * @extraoob: extra OOB buffer
>> + * @extraooblen: extra OOB length
>> + * @bitflips_threshold: maximum number of bitflips
>> + *
>> + * Check if a data buffer and its associated ECC and OOB data contains only
>> + * 0xff pattern, which means the underlying region has been erased and is
>> + * ready to be programmed.
>> + * The bitflips_threshold specify the maximum number of bitflips before
>> + * considering the region as not erased.
>> + *
>> + * Returns a positive number of bitflips or -ERROR_CODE.
>> + */
>> +int nand_check_erased_ecc_chunk(void *data, int datalen,
>> + void *ecc, int ecclen,
>> + void *extraoob, int extraooblen,
>> + int bitflips_threshold)
>> +{
>> + int bitflips = 0;
>> + int ret;
>> +
>> + ret = nand_check_erased_buf(data, datalen, bitflips_threshold);
>> + if (ret < 0)
>> + return ret;
>> +
>> + bitflips += ret;
>> + bitflips_threshold -= ret;
>> +
>> + ret = nand_check_erased_buf(ecc, ecclen, bitflips_threshold);
>> + if (ret < 0)
>> + return ret;
>> +
>> + bitflips += ret;
>> + bitflips_threshold -= ret;
>> +
>> + ret = nand_check_erased_buf(extraoob, extraooblen, bitflips_threshold);
>> + if (ret < 0)
>> + return ret;
>> +
>
> Forgot the memset operations here:
>
> memset(data, 0xff, datalen);
> memset(ecc, 0xff, ecclen);
> memset(extraoob, 0xff, extraooblen);

Yes, you're right.. I did the same mistake on my first implementation
too ;-)

As additional optimization you may also check if the lower layer already
did the check for you (e.g. if you have an iMXQP as we saw in latest
days), but I think it's a minor one, because you'll face this situation
very very unlikely.

--

Andrea SCIAN

DAVE Embedded Systems

2015-07-31 10:07:26

by Andrea Scian

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions


Dear Boris,


Il 30/07/2015 19:34, Boris Brezillon ha scritto:
> The default NAND read functions are relying on an underlying controller
> to correct bitflips, but some of those controller cannot properly fix
> bitflips in erased pages.
> In case of ECC failures, check if the page of subpage is empty before
> reporting an ECC failure.

I'm still wondering if chip->ecc.strength is the right threshold.

Did you see my comments here [1]? WDYT?

Maybe we can have this discussion in a separate thread, if you want ;-)

Kind Regards,

--

Andrea SCIAN

DAVE Embedded Systems

[1] http://lists.infradead.org/pipermail/linux-mtd/2015-July/060520.html

>
> Signed-off-by: Boris Brezillon <[email protected]>
> ---
> drivers/mtd/nand/nand_base.c | 63 ++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 55 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index 1542ea7..d527c73 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -1396,6 +1396,19 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip,
> stat = chip->ecc.correct(mtd, p,
> &chip->buffers->ecccode[i], &chip->buffers->ecccalc[i]);
> if (stat < 0) {
> + /* check for empty pages with bitflips */
> + int col = (int)(p - bufpoi);
> +
> + chip->cmdfunc(mtd, NAND_CMD_RNDOUT, col, -1);
> + chip->read_buf(mtd, p, chip->ecc.size);
> + stat = nand_check_erased_ecc_chunk(p, chip->ecc.size,
> + &chip->buffers->ecccode[i],
> + chip->ecc.bytes,
> + NULL, 0,
> + chip->ecc.strength);
> + }
> +
> + if (stat < 0) {
> mtd->ecc_stats.failed++;
> } else {
> mtd->ecc_stats.corrected += stat;
> @@ -1445,6 +1458,16 @@ static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip,
>
> stat = chip->ecc.correct(mtd, p, &ecc_code[i], &ecc_calc[i]);
> if (stat < 0) {
> + /* check for empty pages with bitflips */
> + chip->cmdfunc(mtd, NAND_CMD_RNDOUT, i, -1);
> + chip->read_buf(mtd, p, eccsize);
> + stat = nand_check_erased_ecc_chunk(p, eccsize,
> + &ecc_code[i], eccbytes,
> + NULL, 0,
> + chip->ecc.strength);
> + }
> +
> + if (stat < 0) {
> mtd->ecc_stats.failed++;
> } else {
> mtd->ecc_stats.corrected += stat;
> @@ -1495,7 +1518,18 @@ static int nand_read_page_hwecc_oob_first(struct mtd_info *mtd,
> chip->read_buf(mtd, p, eccsize);
> chip->ecc.calculate(mtd, p, &ecc_calc[i]);
>
> - stat = chip->ecc.correct(mtd, p, &ecc_code[i], NULL);
> + stat = chip->ecc.correct(mtd, p, &ecc_code[i], &ecc_calc[i]);
> + if (stat < 0) {
> + /* check for empty pages with bitflips */
> + chip->cmdfunc(mtd, NAND_CMD_RNDOUT,
> + i + mtd->oobsize, -1);
> + chip->read_buf(mtd, p, eccsize);
> + stat = nand_check_erased_ecc_chunk(p, eccsize,
> + &ecc_code[i], eccbytes,
> + NULL, 0,
> + chip->ecc.strength);
> + }
> +
> if (stat < 0) {
> mtd->ecc_stats.failed++;
> } else {
> @@ -1523,6 +1557,8 @@ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip,
> int i, eccsize = chip->ecc.size;
> int eccbytes = chip->ecc.bytes;
> int eccsteps = chip->ecc.steps;
> + int eccstepsize = eccsize + eccbytes + chip->ecc.prepad +
> + chip->ecc.postpad;
> uint8_t *p = buf;
> uint8_t *oob = chip->oob_poi;
> unsigned int max_bitflips = 0;
> @@ -1542,19 +1578,30 @@ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip,
> chip->read_buf(mtd, oob, eccbytes);
> stat = chip->ecc.correct(mtd, p, oob, NULL);
>
> - if (stat < 0) {
> - mtd->ecc_stats.failed++;
> - } else {
> - mtd->ecc_stats.corrected += stat;
> - max_bitflips = max_t(unsigned int, max_bitflips, stat);
> - }
> -
> oob += eccbytes;
>
> if (chip->ecc.postpad) {
> chip->read_buf(mtd, oob, chip->ecc.postpad);
> oob += chip->ecc.postpad;
> }
> +
> + if (stat < 0) {
> + /* check for empty pages with bitflips */
> + chip->cmdfunc(mtd, NAND_CMD_RNDOUT, i * eccstepsize, -1);
> + chip->read_buf(mtd, p, chip->ecc.size);
> + stat = nand_check_erased_ecc_chunk(p, chip->ecc.size,
> + oob - eccstepsize,
> + eccstepsize,
> + NULL, 0,
> + chip->ecc.strength);
> + }
> +
> + if (stat < 0) {
> + mtd->ecc_stats.failed++;
> + } else {
> + mtd->ecc_stats.corrected += stat;
> + max_bitflips = max_t(unsigned int, max_bitflips, stat);
> + }
> }
>
> /* Calculate remaining oob bytes */
>

2015-07-31 10:22:05

by Boris Brezillon

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] mtd: nand: add nand_check_erased helper functions

Hi Andrea,

On Fri, 31 Jul 2015 12:06:32 +0200
Andrea Scian <[email protected]> wrote:

>
> Dear Boris,
>
> thanks for pointing this out again.
>
> I'm on the same topic too, using iMX6 (I'll try to test you patch on the
> next days, if I found some spare time, unfortunately I got a 3.10
> kernel, so I think the patch will not apply cleanly :-( ).
>
> See my comment below (and on the next mail too)
>
> Il 31/07/2015 09:10, Boris Brezillon ha scritto:
> > On Thu, 30 Jul 2015 19:34:53 +0200
> > Boris Brezillon <[email protected]> wrote:
> >
> >> Add two helper functions to help NAND controller drivers test whether a
> >> specific NAND region is erased or not.
> >>
> >> Signed-off-by: Boris Brezillon <[email protected]>
> >> ---
> >> drivers/mtd/nand/nand_base.c | 104 +++++++++++++++++++++++++++++++++++++++++++
> >> include/linux/mtd/nand.h | 8 ++++
> >> 2 files changed, 112 insertions(+)
> >>
> >> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> >> index ceb68ca..1542ea7 100644
> >> --- a/drivers/mtd/nand/nand_base.c
> >> +++ b/drivers/mtd/nand/nand_base.c
> >> @@ -1101,6 +1101,110 @@ out:
> >> EXPORT_SYMBOL(nand_lock);
> >>
> >> /**
> >> + * nand_check_erased_buf - check if a buffer contains (almost) only 0xff data
> >> + * @buf: buffer to test
> >> + * @len: buffer length
> >> + * @bitflips_threshold:maximum number of bitflips
> >> + *
> >> + * Check if a buffer contains only 0xff, which means the underlying region
> >> + * has been erased and is ready to be programmed.
> >> + * The bitflips_threshold specify the maximum number of bitflips before
> >> + * considering the region is not erased.
> >> + * Note: The logic of this function has been extracted from the memweight
> >> + * implementation, except that nand_check_erased_buf function exit before
> >> + * testing the whole buffer if the number of bitflips exceed the
> >> + * bitflips_threshold value.
> >> + *
> >> + * Returns a positive number of bitflips or -ERROR_CODE.
> >> + */
> >> +int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
> >> +{
> >> + const unsigned char *bitmap = buf;
> >> + int bitflips = 0;
> >> + int weight;
> >> + int longs;
> >> +
> >> + for (; len && ((unsigned long)bitmap) % sizeof(long);
> >> + len--, bitmap++) {
> >> + weight = hweight8(*bitmap);
> >> +
> >> + bitflips += sizeof(u8) - weight;
> >> + if (bitflips > bitflips_threshold)
> >> + return -EINVAL;
>
> I think it's better to do something like:
>
> if (UNLIKELY(bitflips > bitflips_threshold))
> return -EINVAL;
>
> isn't it? :-)
> (the same for the other if)

Maybe, or maybe not. It depends on whether you expect to have a lot of
corrupted pages or a lot of blank pages with bitflips ;-).
Anyway, I'm not opposed to this change.

>
>
> >> + }
> >> +
> >> +
> >> + for (longs = len / sizeof(long); longs;
> >> + longs--, bitmap += sizeof(long)) {
> >> + BUG_ON(longs >= INT_MAX / BITS_PER_LONG);
> >> + weight = hweight_long(*((unsigned long *)bitmap));
> >> +
> >> + bitflips += sizeof(long) - weight;
> >> + if (bitflips > bitflips_threshold)
> >> + return -EINVAL;
> >> + }
> >> +
> >> + len %= sizeof(long);
> >> +
> >> + for (; len > 0; len--, bitmap++) {
> >> + weight = hweight8(*bitmap);
> >> + bitflips += sizeof(u8) - weight;
> >> + }
> >> +
> >> + return bitflips;
> >> +}
> >> +EXPORT_SYMBOL(nand_check_erased_buf);
> >> +
> >> +/**
> >> + * nand_check_erased_ecc_chunk - check if an ECC chunk contains (almost) only
> >> + * 0xff data
> >> + * @data: data buffer to test
> >> + * @datalen: data length
> >> + * @ecc: ECC buffer
> >> + * @ecclen: ECC length
> >> + * @extraoob: extra OOB buffer
> >> + * @extraooblen: extra OOB length
> >> + * @bitflips_threshold: maximum number of bitflips
> >> + *
> >> + * Check if a data buffer and its associated ECC and OOB data contains only
> >> + * 0xff pattern, which means the underlying region has been erased and is
> >> + * ready to be programmed.
> >> + * The bitflips_threshold specify the maximum number of bitflips before
> >> + * considering the region as not erased.
> >> + *
> >> + * Returns a positive number of bitflips or -ERROR_CODE.
> >> + */
> >> +int nand_check_erased_ecc_chunk(void *data, int datalen,
> >> + void *ecc, int ecclen,
> >> + void *extraoob, int extraooblen,
> >> + int bitflips_threshold)
> >> +{
> >> + int bitflips = 0;
> >> + int ret;
> >> +
> >> + ret = nand_check_erased_buf(data, datalen, bitflips_threshold);
> >> + if (ret < 0)
> >> + return ret;
> >> +
> >> + bitflips += ret;
> >> + bitflips_threshold -= ret;
> >> +
> >> + ret = nand_check_erased_buf(ecc, ecclen, bitflips_threshold);
> >> + if (ret < 0)
> >> + return ret;
> >> +
> >> + bitflips += ret;
> >> + bitflips_threshold -= ret;
> >> +
> >> + ret = nand_check_erased_buf(extraoob, extraooblen, bitflips_threshold);
> >> + if (ret < 0)
> >> + return ret;
> >> +
> >
> > Forgot the memset operations here:
> >
> > memset(data, 0xff, datalen);
> > memset(ecc, 0xff, ecclen);
> > memset(extraoob, 0xff, extraooblen);
>
> Yes, you're right.. I did the same mistake on my first implementation
> too ;-)

Hehe.

>
> As additional optimization you may also check if the lower layer already
> did the check for you (e.g. if you have an iMXQP as we saw in latest
> days), but I think it's a minor one, because you'll face this situation
> very very unlikely.

If the hardware is capable of doing such test (I mean counting the
number of bits to one and considering the page as erased under a given
limit of bitflips), there's a lot of chance it will implement its own
ecc_read_page function, and will never use this helper.

Best Regards,

Boris

--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

2015-07-31 10:32:25

by Boris Brezillon

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions

Hi Andrea,

Adding Han in Cc.

On Fri, 31 Jul 2015 12:07:21 +0200
Andrea Scian <[email protected]> wrote:

>
> Dear Boris,
>
>
> Il 30/07/2015 19:34, Boris Brezillon ha scritto:
> > The default NAND read functions are relying on an underlying controller
> > to correct bitflips, but some of those controller cannot properly fix
> > bitflips in erased pages.
> > In case of ECC failures, check if the page of subpage is empty before
> > reporting an ECC failure.
>
> I'm still wondering if chip->ecc.strength is the right threshold.
>
> Did you see my comments here [1]? WDYT?

Yes I've read it, and decided to go for ecc->strength as a first
step (I'm more interested in discussing the approach than the threshold
value right now ;-)).

Anyway, as you pointed out in the thread, writing data on an erased
page already containing some bitflips might generate even more
bitflips, so using a different threshold for the erased page check
makes sense. This threshold should definitely be correlated to the ECC
strength, but how, that's the question.

How about taking a rather conservative value like 10% of the specified
ECC strength, and see how it goes.

>
> Maybe we can have this discussion in a separate thread, if you want ;-)

No, I think we should keep discussing it in this thread.

Thanks,

Boris


--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

2015-07-31 13:29:26

by Andrea Scian

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] mtd: nand: add nand_check_erased helper functions


Boris,

Il 31/07/2015 12:21, Boris Brezillon ha scritto:
> Hi Andrea,
>
> On Fri, 31 Jul 2015 12:06:32 +0200
> Andrea Scian <[email protected]> wrote:
>
>>
>> Dear Boris,
>>
>> thanks for pointing this out again.
>>
>> I'm on the same topic too, using iMX6 (I'll try to test you patch on the
>> next days, if I found some spare time, unfortunately I got a 3.10
>> kernel, so I think the patch will not apply cleanly :-( ).
>>
>> See my comment below (and on the next mail too)
>>
>> Il 31/07/2015 09:10, Boris Brezillon ha scritto:
>>> On Thu, 30 Jul 2015 19:34:53 +0200
>>> Boris Brezillon <[email protected]> wrote:
>>>
>>>> Add two helper functions to help NAND controller drivers test whether a
>>>> specific NAND region is erased or not.
>>>>
>>>> Signed-off-by: Boris Brezillon <[email protected]>
>>>> ---
>>>> drivers/mtd/nand/nand_base.c | 104 +++++++++++++++++++++++++++++++++++++++++++
>>>> include/linux/mtd/nand.h | 8 ++++
>>>> 2 files changed, 112 insertions(+)
>>>>
>>>> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
>>>> index ceb68ca..1542ea7 100644
>>>> --- a/drivers/mtd/nand/nand_base.c
>>>> +++ b/drivers/mtd/nand/nand_base.c
>>>> @@ -1101,6 +1101,110 @@ out:
>>>> EXPORT_SYMBOL(nand_lock);
>>>>
>>>> /**
>>>> + * nand_check_erased_buf - check if a buffer contains (almost) only 0xff data
>>>> + * @buf: buffer to test
>>>> + * @len: buffer length
>>>> + * @bitflips_threshold:maximum number of bitflips
>>>> + *
>>>> + * Check if a buffer contains only 0xff, which means the underlying region
>>>> + * has been erased and is ready to be programmed.
>>>> + * The bitflips_threshold specify the maximum number of bitflips before
>>>> + * considering the region is not erased.
>>>> + * Note: The logic of this function has been extracted from the memweight
>>>> + * implementation, except that nand_check_erased_buf function exit before
>>>> + * testing the whole buffer if the number of bitflips exceed the
>>>> + * bitflips_threshold value.
>>>> + *
>>>> + * Returns a positive number of bitflips or -ERROR_CODE.
>>>> + */
>>>> +int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
>>>> +{
>>>> + const unsigned char *bitmap = buf;
>>>> + int bitflips = 0;
>>>> + int weight;
>>>> + int longs;
>>>> +
>>>> + for (; len && ((unsigned long)bitmap) % sizeof(long);
>>>> + len--, bitmap++) {
>>>> + weight = hweight8(*bitmap);
>>>> +
>>>> + bitflips += sizeof(u8) - weight;
>>>> + if (bitflips > bitflips_threshold)
>>>> + return -EINVAL;
>>
>> I think it's better to do something like:
>>
>> if (UNLIKELY(bitflips > bitflips_threshold))
>> return -EINVAL;
>>
>> isn't it? :-)
>> (the same for the other if)
>
> Maybe, or maybe not. It depends on whether you expect to have a lot of
> corrupted pages or a lot of blank pages with bitflips ;-).
> Anyway, I'm not opposed to this change.

I think that everything implemented inside the MTD stack
(NAND/MTD/UBI/UBIFS) should lead to a "working" solid state device, that
do not show any uncorrectable bitflips.
Uncorrectable pages, IMO, should happens, on stable systems, only in
some rare case, because it means that you loss some data (or power
during erase/write. Any other case?).

What is more frequent is that bitflips > mtd->bitflip_threshold (by
default DIV_ROUND_UP(mtd->ecc_strength * 3, 4)), which should avoid
bitflips > ecc_strength

>>
>>
>>>> + }
>>>> +
>>>> +
>>>> + for (longs = len / sizeof(long); longs;
>>>> + longs--, bitmap += sizeof(long)) {
>>>> + BUG_ON(longs >= INT_MAX / BITS_PER_LONG);
>>>> + weight = hweight_long(*((unsigned long *)bitmap));
>>>> +
>>>> + bitflips += sizeof(long) - weight;
>>>> + if (bitflips > bitflips_threshold)
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> + len %= sizeof(long);
>>>> +
>>>> + for (; len > 0; len--, bitmap++) {
>>>> + weight = hweight8(*bitmap);
>>>> + bitflips += sizeof(u8) - weight;
>>>> + }
>>>> +
>>>> + return bitflips;
>>>> +}
>>>> +EXPORT_SYMBOL(nand_check_erased_buf);
>>>> +
>>>> +/**
>>>> + * nand_check_erased_ecc_chunk - check if an ECC chunk contains (almost) only
>>>> + * 0xff data
>>>> + * @data: data buffer to test
>>>> + * @datalen: data length
>>>> + * @ecc: ECC buffer
>>>> + * @ecclen: ECC length
>>>> + * @extraoob: extra OOB buffer
>>>> + * @extraooblen: extra OOB length
>>>> + * @bitflips_threshold: maximum number of bitflips
>>>> + *
>>>> + * Check if a data buffer and its associated ECC and OOB data contains only
>>>> + * 0xff pattern, which means the underlying region has been erased and is
>>>> + * ready to be programmed.
>>>> + * The bitflips_threshold specify the maximum number of bitflips before
>>>> + * considering the region as not erased.
>>>> + *
>>>> + * Returns a positive number of bitflips or -ERROR_CODE.
>>>> + */
>>>> +int nand_check_erased_ecc_chunk(void *data, int datalen,
>>>> + void *ecc, int ecclen,
>>>> + void *extraoob, int extraooblen,
>>>> + int bitflips_threshold)
>>>> +{
>>>> + int bitflips = 0;
>>>> + int ret;
>>>> +
>>>> + ret = nand_check_erased_buf(data, datalen, bitflips_threshold);
>>>> + if (ret < 0)
>>>> + return ret;
>>>> +
>>>> + bitflips += ret;
>>>> + bitflips_threshold -= ret;
>>>> +
>>>> + ret = nand_check_erased_buf(ecc, ecclen, bitflips_threshold);
>>>> + if (ret < 0)
>>>> + return ret;
>>>> +
>>>> + bitflips += ret;
>>>> + bitflips_threshold -= ret;
>>>> +
>>>> + ret = nand_check_erased_buf(extraoob, extraooblen, bitflips_threshold);
>>>> + if (ret < 0)
>>>> + return ret;
>>>> +
>>>
>>> Forgot the memset operations here:
>>>
>>> memset(data, 0xff, datalen);
>>> memset(ecc, 0xff, ecclen);
>>> memset(extraoob, 0xff, extraooblen);
>>
>> Yes, you're right.. I did the same mistake on my first implementation
>> too ;-)
>
> Hehe.
>
>>
>> As additional optimization you may also check if the lower layer already
>> did the check for you (e.g. if you have an iMXQP as we saw in latest
>> days), but I think it's a minor one, because you'll face this situation
>> very very unlikely.
>
> If the hardware is capable of doing such test (I mean counting the
> number of bits to one and considering the page as erased under a given
> limit of bitflips), there's a lot of chance it will implement its own
> ecc_read_page function, and will never use this helper.
>

Ops.. I misunderstand your patch. I think it was something similar to
what Brian already proposed some time ago [1].
IIUC Brial solution works, out of the box, even with the ones that
override read_page callback, as I think most of current nand controller
do (please correct me if I'm wrong).
If we want to add erased block check to omap2.c, atmel_nand.c,
sh_flctl.c we have to modify them all.

I'm really not the right one to make such a decision ;-) but I think you
already thought about it and can tell me the pros and cons of your patch
vs the Brian's one.

What I understand up until now, is that Brian solution does not fit into
all weird stuff that we find in single NAND controller implementation
and this is where your solution come in handy. Am I wrong?

Kind Regards,

--

Andrea SCIAN

DAVE Embedded Systems

[1] http://article.gmane.org/gmane.linux.drivers.mtd/52216

2015-07-31 13:40:19

by Andrea Scian

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions


Boris,

Il 31/07/2015 12:32, Boris Brezillon ha scritto:
> Hi Andrea,
>
> Adding Han in Cc.
>
> On Fri, 31 Jul 2015 12:07:21 +0200
> Andrea Scian <[email protected]> wrote:
>
>>
>> Dear Boris,
>>
>>
>> Il 30/07/2015 19:34, Boris Brezillon ha scritto:
>>> The default NAND read functions are relying on an underlying controller
>>> to correct bitflips, but some of those controller cannot properly fix
>>> bitflips in erased pages.
>>> In case of ECC failures, check if the page of subpage is empty before
>>> reporting an ECC failure.
>>
>> I'm still wondering if chip->ecc.strength is the right threshold.
>>
>> Did you see my comments here [1]? WDYT?
>
> Yes I've read it, and decided to go for ecc->strength as a first
> step (I'm more interested in discussing the approach than the threshold
> value right now ;-)).

I perfectly understand, that's the reason why I ask if you want to move
to another thread ;-)

> Anyway, as you pointed out in the thread, writing data on an erased
> page already containing some bitflips might generate even more
> bitflips, so using a different threshold for the erased page check
> makes sense. This threshold should definitely be correlated to the ECC
> strength, but how, that's the question.
>
> How about taking a rather conservative value like 10% of the specified
> ECC strength, and see how it goes.

Yes, I think that there's no real way to get the right value, other than
feedbacks from on-field testing with various devices.

I'm also thinking about changing how a NAND page is written on the
device, now that we know that even erased page may have (too many!)
bitflips if they has not been so-freshly erased.

Read on NAND device is lot's faster that write, so maybe we can:

a) read the page before write it, check for bitflips on erased area and
write it only if it fit our threshold

b) read the page after write it and check if the bitflips are lower that
a give value

In this way:
- we can use ecc_strength as read threshold, because it fits all the
other NAND read

- we can use "something a bit lower than" mtd->bitflip_threshold on
read-before-write or read-after-write. If we don't do so the block will
be scrubbed next time we read it again (if we are lucky.. if we are
unlucky the block will have bitflip > ecc_strength!): IOW we did a write
that will trigger another erase/write cycle.

Am I misunderstanding something?


>> Maybe we can have this discussion in a separate thread, if you want ;-)
>
> No, I think we should keep discussing it in this thread.

OK

KR,

--

Andrea SCIAN

DAVE Embedded Systems

2015-07-31 13:58:38

by Boris Brezillon

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] mtd: nand: add nand_check_erased helper functions

On Fri, 31 Jul 2015 15:29:21 +0200
Andrea Scian <[email protected]> wrote:

> >>>> +int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
> >>>> +{
> >>>> + const unsigned char *bitmap = buf;
> >>>> + int bitflips = 0;
> >>>> + int weight;
> >>>> + int longs;
> >>>> +
> >>>> + for (; len && ((unsigned long)bitmap) % sizeof(long);
> >>>> + len--, bitmap++) {
> >>>> + weight = hweight8(*bitmap);
> >>>> +
> >>>> + bitflips += sizeof(u8) - weight;
> >>>> + if (bitflips > bitflips_threshold)
> >>>> + return -EINVAL;
> >>
> >> I think it's better to do something like:
> >>
> >> if (UNLIKELY(bitflips > bitflips_threshold))
> >> return -EINVAL;
> >>
> >> isn't it? :-)
> >> (the same for the other if)
> >
> > Maybe, or maybe not. It depends on whether you expect to have a lot of
> > corrupted pages or a lot of blank pages with bitflips ;-).
> > Anyway, I'm not opposed to this change.
>
> I think that everything implemented inside the MTD stack
> (NAND/MTD/UBI/UBIFS) should lead to a "working" solid state device, that
> do not show any uncorrectable bitflips.
> Uncorrectable pages, IMO, should happens, on stable systems, only in
> some rare case, because it means that you loss some data (or power
> during erase/write. Any other case?).
>
> What is more frequent is that bitflips > mtd->bitflip_threshold (by
> default DIV_ROUND_UP(mtd->ecc_strength * 3, 4)), which should avoid
> bitflips > ecc_strength

Yes, you're probably right. I'll add the unlikely keyword in the next
version.


> >>
> >> As additional optimization you may also check if the lower layer already
> >> did the check for you (e.g. if you have an iMXQP as we saw in latest
> >> days), but I think it's a minor one, because you'll face this situation
> >> very very unlikely.
> >
> > If the hardware is capable of doing such test (I mean counting the
> > number of bits to one and considering the page as erased under a given
> > limit of bitflips), there's a lot of chance it will implement its own
> > ecc_read_page function, and will never use this helper.
> >
>
> Ops.. I misunderstand your patch. I think it was something similar to
> what Brian already proposed some time ago [1].
> IIUC Brial solution works, out of the box, even with the ones that
> override read_page callback, as I think most of current nand controller
> do (please correct me if I'm wrong).
> If we want to add erased block check to omap2.c, atmel_nand.c,
> sh_flctl.c we have to modify them all.
>
> I'm really not the right one to make such a decision ;-) but I think you
> already thought about it and can tell me the pros and cons of your patch
> vs the Brian's one.
>
> What I understand up until now, is that Brian solution does not fit into
> all weird stuff that we find in single NAND controller implementation
> and this is where your solution come in handy. Am I wrong?

That's exactly why I'm proposing it as helper functions instead of
trying to apply the erased page check for everybody.
Here are a few missing things to make the test applicable to everybody:
- some controllers are not implementing the read_page_raw function and
thus checking ECC bytes is impossible
- some of them are not able to retrieve OOB bytes (or are only able to
retrieve a small portion of it)
- some controllers do not properly define the ECC layout, which makes
it impossible to check which ECC byte is assigned to which ECC chunk.
- some ECC controllers (like the GPMI one) do not align ECC data on a
byte, which renders the erased check even more complicated

The other reason to not enforce this test for everybody is that some
controllers generate 0xff ECC bytes for 0xff data (the sama5d4 NAND
controller does), which means they are fully capable of correcting
bitflips in erased pages by their own, and if they report a read
failure, there's no need to check for an empty page, this is a real
failure.

By only providing helper functions, we let the NAND controller
drivers decide how to check it, but we're still providing common
functions instead of duplicating the same code in all drivers.

Best Regards,

Boris

--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

2015-07-31 14:10:38

by Boris Brezillon

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions

On Fri, 31 Jul 2015 15:40:13 +0200
Andrea Scian <[email protected]> wrote:

>
> Boris,
>
> Il 31/07/2015 12:32, Boris Brezillon ha scritto:
> > Hi Andrea,
> >
> > Adding Han in Cc.
> >
> > On Fri, 31 Jul 2015 12:07:21 +0200
> > Andrea Scian <[email protected]> wrote:
> >
> >>
> >> Dear Boris,
> >>
> >>
> >> Il 30/07/2015 19:34, Boris Brezillon ha scritto:
> >>> The default NAND read functions are relying on an underlying controller
> >>> to correct bitflips, but some of those controller cannot properly fix
> >>> bitflips in erased pages.
> >>> In case of ECC failures, check if the page of subpage is empty before
> >>> reporting an ECC failure.
> >>
> >> I'm still wondering if chip->ecc.strength is the right threshold.
> >>
> >> Did you see my comments here [1]? WDYT?
> >
> > Yes I've read it, and decided to go for ecc->strength as a first
> > step (I'm more interested in discussing the approach than the threshold
> > value right now ;-)).
>
> I perfectly understand, that's the reason why I ask if you want to move
> to another thread ;-)
>
> > Anyway, as you pointed out in the thread, writing data on an erased
> > page already containing some bitflips might generate even more
> > bitflips, so using a different threshold for the erased page check
> > makes sense. This threshold should definitely be correlated to the ECC
> > strength, but how, that's the question.
> >
> > How about taking a rather conservative value like 10% of the specified
> > ECC strength, and see how it goes.
>
> Yes, I think that there's no real way to get the right value, other than
> feedbacks from on-field testing with various devices.
>
> I'm also thinking about changing how a NAND page is written on the
> device, now that we know that even erased page may have (too many!)
> bitflips if they has not been so-freshly erased.
>
> Read on NAND device is lot's faster that write, so maybe we can:
>
> a) read the page before write it, check for bitflips on erased area and
> write it only if it fit our threshold
>
> b) read the page after write it and check if the bitflips are lower that
> a give value
>
> In this way:
> - we can use ecc_strength as read threshold, because it fits all the
> other NAND read
>
> - we can use "something a bit lower than" mtd->bitflip_threshold on
> read-before-write or read-after-write. If we don't do so the block will
> be scrubbed next time we read it again (if we are lucky.. if we are
> unlucky the block will have bitflip > ecc_strength!): IOW we did a write
> that will trigger another erase/write cycle.
>
> Am I misunderstanding something?

Nope, but this implies doing an extra read after each write :-/

--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

2015-07-31 16:19:34

by Andrea Scian

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions

Il 31/07/2015 16:10, Boris Brezillon ha scritto:
> On Fri, 31 Jul 2015 15:40:13 +0200
> Andrea Scian <[email protected]> wrote:
>
>>
>> Boris,
>>
>> Il 31/07/2015 12:32, Boris Brezillon ha scritto:
>>> Hi Andrea,
>>>
>>> Adding Han in Cc.
>>>
>>> On Fri, 31 Jul 2015 12:07:21 +0200
>>> Andrea Scian <[email protected]> wrote:
>>>
>>>>
>>>> Dear Boris,
>>>>
>>>>
>>>> Il 30/07/2015 19:34, Boris Brezillon ha scritto:
>>>>> The default NAND read functions are relying on an underlying controller
>>>>> to correct bitflips, but some of those controller cannot properly fix
>>>>> bitflips in erased pages.
>>>>> In case of ECC failures, check if the page of subpage is empty before
>>>>> reporting an ECC failure.
>>>>
>>>> I'm still wondering if chip->ecc.strength is the right threshold.
>>>>
>>>> Did you see my comments here [1]? WDYT?
>>>
>>> Yes I've read it, and decided to go for ecc->strength as a first
>>> step (I'm more interested in discussing the approach than the threshold
>>> value right now ;-)).
>>
>> I perfectly understand, that's the reason why I ask if you want to move
>> to another thread ;-)
>>
>>> Anyway, as you pointed out in the thread, writing data on an erased
>>> page already containing some bitflips might generate even more
>>> bitflips, so using a different threshold for the erased page check
>>> makes sense. This threshold should definitely be correlated to the ECC
>>> strength, but how, that's the question.
>>>
>>> How about taking a rather conservative value like 10% of the specified
>>> ECC strength, and see how it goes.
>>
>> Yes, I think that there's no real way to get the right value, other than
>> feedbacks from on-field testing with various devices.
>>
>> I'm also thinking about changing how a NAND page is written on the
>> device, now that we know that even erased page may have (too many!)
>> bitflips if they has not been so-freshly erased.
>>
>> Read on NAND device is lot's faster that write, so maybe we can:
>>
>> a) read the page before write it, check for bitflips on erased area and
>> write it only if it fit our threshold
>>
>> b) read the page after write it and check if the bitflips are lower that
>> a give value
>>
>> In this way:
>> - we can use ecc_strength as read threshold, because it fits all the
>> other NAND read
>>
>> - we can use "something a bit lower than" mtd->bitflip_threshold on
>> read-before-write or read-after-write. If we don't do so the block will
>> be scrubbed next time we read it again (if we are lucky.. if we are
>> unlucky the block will have bitflip > ecc_strength!): IOW we did a write
>> that will trigger another erase/write cycle.
>>
>> Am I misunderstanding something?
>
> Nope, but this implies doing an extra read after each write :-/
>

Let's wait what the others says about this, but I would like to put some
numbers in it.

My micron MLC device says
- read page max 75 uS
- write page typ 1300uS, max 2600uS

If we implement read-before-write (which is, IMO, the best approach), in
the worst overhead we have is 1375uS vs 1300uS, which is ~6%.
Please note that, if you read a page that "is not suitable" for write,
you avoid the write time, schedule it for scrubbing, and use another
free page.

Probably I'm a bit optimistic because we also need to take in account
other latencies (DMA setup, ECC engine, buffer copies and so on) but
it's a starting point ;-)

KR,

--

Andrea SCIAN

DAVE Embedded Systems

2015-07-31 16:27:14

by Boris Brezillon

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions

On Fri, 31 Jul 2015 18:19:30 +0200
Andrea Scian <[email protected]> wrote:

> Il 31/07/2015 16:10, Boris Brezillon ha scritto:
> > On Fri, 31 Jul 2015 15:40:13 +0200
> > Andrea Scian <[email protected]> wrote:
> >
> >>
> >> Boris,
> >>
> >> Il 31/07/2015 12:32, Boris Brezillon ha scritto:
> >>> Hi Andrea,
> >>>
> >>> Adding Han in Cc.
> >>>
> >>> On Fri, 31 Jul 2015 12:07:21 +0200
> >>> Andrea Scian <[email protected]> wrote:
> >>>
> >>>>
> >>>> Dear Boris,
> >>>>
> >>>>
> >>>> Il 30/07/2015 19:34, Boris Brezillon ha scritto:
> >>>>> The default NAND read functions are relying on an underlying controller
> >>>>> to correct bitflips, but some of those controller cannot properly fix
> >>>>> bitflips in erased pages.
> >>>>> In case of ECC failures, check if the page of subpage is empty before
> >>>>> reporting an ECC failure.
> >>>>
> >>>> I'm still wondering if chip->ecc.strength is the right threshold.
> >>>>
> >>>> Did you see my comments here [1]? WDYT?
> >>>
> >>> Yes I've read it, and decided to go for ecc->strength as a first
> >>> step (I'm more interested in discussing the approach than the threshold
> >>> value right now ;-)).
> >>
> >> I perfectly understand, that's the reason why I ask if you want to move
> >> to another thread ;-)
> >>
> >>> Anyway, as you pointed out in the thread, writing data on an erased
> >>> page already containing some bitflips might generate even more
> >>> bitflips, so using a different threshold for the erased page check
> >>> makes sense. This threshold should definitely be correlated to the ECC
> >>> strength, but how, that's the question.
> >>>
> >>> How about taking a rather conservative value like 10% of the specified
> >>> ECC strength, and see how it goes.
> >>
> >> Yes, I think that there's no real way to get the right value, other than
> >> feedbacks from on-field testing with various devices.
> >>
> >> I'm also thinking about changing how a NAND page is written on the
> >> device, now that we know that even erased page may have (too many!)
> >> bitflips if they has not been so-freshly erased.
> >>
> >> Read on NAND device is lot's faster that write, so maybe we can:
> >>
> >> a) read the page before write it, check for bitflips on erased area and
> >> write it only if it fit our threshold
> >>
> >> b) read the page after write it and check if the bitflips are lower that
> >> a give value
> >>
> >> In this way:
> >> - we can use ecc_strength as read threshold, because it fits all the
> >> other NAND read
> >>
> >> - we can use "something a bit lower than" mtd->bitflip_threshold on
> >> read-before-write or read-after-write. If we don't do so the block will
> >> be scrubbed next time we read it again (if we are lucky.. if we are
> >> unlucky the block will have bitflip > ecc_strength!): IOW we did a write
> >> that will trigger another erase/write cycle.
> >>
> >> Am I misunderstanding something?
> >
> > Nope, but this implies doing an extra read after each write :-/
> >
>
> Let's wait what the others says about this, but I would like to put some
> numbers in it.

Sure.

>
> My micron MLC device says
> - read page max 75 uS
> - write page typ 1300uS, max 2600uS
>
> If we implement read-before-write (which is, IMO, the best approach), in
> the worst overhead we have is 1375uS vs 1300uS, which is ~6%.
> Please note that, if you read a page that "is not suitable" for write,
> you avoid the write time, schedule it for scrubbing, and use another
> free page.

Indeed, that's not such a big overhead.

>
> Probably I'm a bit optimistic because we also need to take in account
> other latencies (DMA setup, ECC engine, buffer copies and so on) but
> it's a starting point ;-)

Yep, if you test it, could you provide some speed test results (with
and without this solution).

And I wonder if we shouldn't do it the other way around, write then
read-back and check the content.
Of course this implies doing the extra write even when the erased page
contains too many bitflips, but at least your sure that the data you
put in the page were correct at that time.


--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

2015-08-03 11:16:10

by Andrea Scian

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions


Dear Boris,

Il 31/07/2015 18:27, Boris Brezillon ha scritto:
> On Fri, 31 Jul 2015 18:19:30 +0200
> Andrea Scian <[email protected]> wrote:
>
>> Il 31/07/2015 16:10, Boris Brezillon ha scritto:
>>> On Fri, 31 Jul 2015 15:40:13 +0200
>>> Andrea Scian <[email protected]> wrote:
>>>
>>>> Boris,
>>>>
>>>> Il 31/07/2015 12:32, Boris Brezillon ha scritto:
>>>>> Hi Andrea,
>>>>>
>>>>> Adding Han in Cc.
>>>>>
>>>>> On Fri, 31 Jul 2015 12:07:21 +0200
>>>>> Andrea Scian <[email protected]> wrote:
>>>>>
>>>>>> Dear Boris,
>>>>>>
>>>>>>
>>>>>> Il 30/07/2015 19:34, Boris Brezillon ha scritto:
>>>>>>> The default NAND read functions are relying on an underlying controller
>>>>>>> to correct bitflips, but some of those controller cannot properly fix
>>>>>>> bitflips in erased pages.
>>>>>>> In case of ECC failures, check if the page of subpage is empty before
>>>>>>> reporting an ECC failure.
>>>>>> I'm still wondering if chip->ecc.strength is the right threshold.
>>>>>>
>>>>>> Did you see my comments here [1]? WDYT?
>>>>> Yes I've read it, and decided to go for ecc->strength as a first
>>>>> step (I'm more interested in discussing the approach than the threshold
>>>>> value right now ;-)).
>>>> I perfectly understand, that's the reason why I ask if you want to move
>>>> to another thread ;-)
>>>>
>>>>> Anyway, as you pointed out in the thread, writing data on an erased
>>>>> page already containing some bitflips might generate even more
>>>>> bitflips, so using a different threshold for the erased page check
>>>>> makes sense. This threshold should definitely be correlated to the ECC
>>>>> strength, but how, that's the question.
>>>>>
>>>>> How about taking a rather conservative value like 10% of the specified
>>>>> ECC strength, and see how it goes.
>>>> Yes, I think that there's no real way to get the right value, other than
>>>> feedbacks from on-field testing with various devices.
>>>>
>>>> I'm also thinking about changing how a NAND page is written on the
>>>> device, now that we know that even erased page may have (too many!)
>>>> bitflips if they has not been so-freshly erased.
>>>>
>>>> Read on NAND device is lot's faster that write, so maybe we can:
>>>>
>>>> a) read the page before write it, check for bitflips on erased area and
>>>> write it only if it fit our threshold
>>>>
>>>> b) read the page after write it and check if the bitflips are lower that
>>>> a give value
>>>>
>>>> In this way:
>>>> - we can use ecc_strength as read threshold, because it fits all the
>>>> other NAND read
>>>>
>>>> - we can use "something a bit lower than" mtd->bitflip_threshold on
>>>> read-before-write or read-after-write. If we don't do so the block will
>>>> be scrubbed next time we read it again (if we are lucky.. if we are
>>>> unlucky the block will have bitflip > ecc_strength!): IOW we did a write
>>>> that will trigger another erase/write cycle.
>>>>
>>>> Am I misunderstanding something?
>>> Nope, but this implies doing an extra read after each write :-/
>>>
>> Let's wait what the others says about this, but I would like to put some
>> numbers in it.
> Sure.
>
>> My micron MLC device says
>> - read page max 75 uS
>> - write page typ 1300uS, max 2600uS
>>
>> If we implement read-before-write (which is, IMO, the best approach), in
>> the worst overhead we have is 1375uS vs 1300uS, which is ~6%.
>> Please note that, if you read a page that "is not suitable" for write,
>> you avoid the write time, schedule it for scrubbing, and use another
>> free page.
> Indeed, that's not such a big overhead.
>
>> Probably I'm a bit optimistic because we also need to take in account
>> other latencies (DMA setup, ECC engine, buffer copies and so on) but
>> it's a starting point ;-)
> Yep, if you test it, could you provide some speed test results (with
> and without this solution).

I think I can find some time to do some performance tests on real hardware.
Can you please help me in finding:
- which benchmark to use (currently I'm using bonnie++ on UBIFS, maybe I
can you just mtd_speedtest)
- where to implement those read

For the second point I think we can implement it a UBI or MTD level.
I think the former will allow us to easily schedule scrubbing and choose
another block to issue the write to. However I don't really know how to
implement it (I don't really know so much about the UBI code).

The latter, at least for me, is easier to implement: I think I can find
the place to add the page read on my own, anyway any clue is welcome ;-)
But I think it will be harder (or impossible) to choose where to issue
the write, unless UBI already do so when it saw an MTD write failure.

> And I wonder if we shouldn't do it the other way around, write then
> read-back and check the content.
> Of course this implies doing the extra write even when the erased page
> contains too many bitflips, but at least your sure that the data you
> put in the page were correct at that time.
>

You're right, I think this is something that once we can find inside the
MTD code (something like "check NAND written page" kconfig option) but I
cannot find this option anymore on latest kernel.

You're approach will also have another advantage, currently nearly all
platform will use software implementation for ECC check on erased
blocks, while nearly all of them has hardware ECC once programmed. Using
hardware ECC will remove CPU load and, maybe, be faster than call
nand_check_erased_ecc_chunk()
I also think that the situation of having failure on write is very
unlikely, unless you have a very "used" NAND device or you're not using
Richard's bitrot check. So we'll have a performance impact (issuing
another write) only when it's really needed.

Kind Regards,

--

Andrea SCIAN

DAVE Embedded Systems

2015-08-03 12:42:58

by Boris Brezillon

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions

Adding Artem and Richard in the loop.

On Mon, 3 Aug 2015 13:16:02 +0200
Andrea Scian <[email protected]> wrote:

>
> Dear Boris,
>
> Il 31/07/2015 18:27, Boris Brezillon ha scritto:
> > On Fri, 31 Jul 2015 18:19:30 +0200
> > Andrea Scian <[email protected]> wrote:
> >
> >> Il 31/07/2015 16:10, Boris Brezillon ha scritto:
> >>> On Fri, 31 Jul 2015 15:40:13 +0200
> >>> Andrea Scian <[email protected]> wrote:
> >>>
> >>>> Boris,
> >>>>
> >>>> Il 31/07/2015 12:32, Boris Brezillon ha scritto:
> >>>>> Hi Andrea,
> >>>>>
> >>>>> Adding Han in Cc.
> >>>>>
> >>>>> On Fri, 31 Jul 2015 12:07:21 +0200
> >>>>> Andrea Scian <[email protected]> wrote:
> >>>>>
> >>>>>> Dear Boris,
> >>>>>>
> >>>>>>
> >>>>>> Il 30/07/2015 19:34, Boris Brezillon ha scritto:
> >>>>>>> The default NAND read functions are relying on an underlying controller
> >>>>>>> to correct bitflips, but some of those controller cannot properly fix
> >>>>>>> bitflips in erased pages.
> >>>>>>> In case of ECC failures, check if the page of subpage is empty before
> >>>>>>> reporting an ECC failure.
> >>>>>> I'm still wondering if chip->ecc.strength is the right threshold.
> >>>>>>
> >>>>>> Did you see my comments here [1]? WDYT?
> >>>>> Yes I've read it, and decided to go for ecc->strength as a first
> >>>>> step (I'm more interested in discussing the approach than the threshold
> >>>>> value right now ;-)).
> >>>> I perfectly understand, that's the reason why I ask if you want to move
> >>>> to another thread ;-)
> >>>>
> >>>>> Anyway, as you pointed out in the thread, writing data on an erased
> >>>>> page already containing some bitflips might generate even more
> >>>>> bitflips, so using a different threshold for the erased page check
> >>>>> makes sense. This threshold should definitely be correlated to the ECC
> >>>>> strength, but how, that's the question.
> >>>>>
> >>>>> How about taking a rather conservative value like 10% of the specified
> >>>>> ECC strength, and see how it goes.
> >>>> Yes, I think that there's no real way to get the right value, other than
> >>>> feedbacks from on-field testing with various devices.
> >>>>
> >>>> I'm also thinking about changing how a NAND page is written on the
> >>>> device, now that we know that even erased page may have (too many!)
> >>>> bitflips if they has not been so-freshly erased.
> >>>>
> >>>> Read on NAND device is lot's faster that write, so maybe we can:
> >>>>
> >>>> a) read the page before write it, check for bitflips on erased area and
> >>>> write it only if it fit our threshold
> >>>>
> >>>> b) read the page after write it and check if the bitflips are lower that
> >>>> a give value
> >>>>
> >>>> In this way:
> >>>> - we can use ecc_strength as read threshold, because it fits all the
> >>>> other NAND read
> >>>>
> >>>> - we can use "something a bit lower than" mtd->bitflip_threshold on
> >>>> read-before-write or read-after-write. If we don't do so the block will
> >>>> be scrubbed next time we read it again (if we are lucky.. if we are
> >>>> unlucky the block will have bitflip > ecc_strength!): IOW we did a write
> >>>> that will trigger another erase/write cycle.
> >>>>
> >>>> Am I misunderstanding something?
> >>> Nope, but this implies doing an extra read after each write :-/
> >>>
> >> Let's wait what the others says about this, but I would like to put some
> >> numbers in it.
> > Sure.
> >
> >> My micron MLC device says
> >> - read page max 75 uS
> >> - write page typ 1300uS, max 2600uS
> >>
> >> If we implement read-before-write (which is, IMO, the best approach), in
> >> the worst overhead we have is 1375uS vs 1300uS, which is ~6%.
> >> Please note that, if you read a page that "is not suitable" for write,
> >> you avoid the write time, schedule it for scrubbing, and use another
> >> free page.
> > Indeed, that's not such a big overhead.
> >
> >> Probably I'm a bit optimistic because we also need to take in account
> >> other latencies (DMA setup, ECC engine, buffer copies and so on) but
> >> it's a starting point ;-)
> > Yep, if you test it, could you provide some speed test results (with
> > and without this solution).
>
> I think I can find some time to do some performance tests on real hardware.
> Can you please help me in finding:
> - which benchmark to use (currently I'm using bonnie++ on UBIFS, maybe I
> can you just mtd_speedtest)
> - where to implement those read

I think the test should be done at the UBI layer if we want to check
the real impact of the additional read sequence, but given the answer I
gave to your other question I'm not sure this is relevant anymore ;-).

>
> For the second point I think we can implement it a UBI or MTD level.
> I think the former will allow us to easily schedule scrubbing and choose
> another block to issue the write to. However I don't really know how to
> implement it (I don't really know so much about the UBI code).
>
> The latter, at least for me, is easier to implement: I think I can find
> the place to add the page read on my own, anyway any clue is welcome ;-)
> But I think it will be harder (or impossible) to choose where to issue
> the write, unless UBI already do so when it saw an MTD write failure.
>
> > And I wonder if we shouldn't do it the other way around, write then
> > read-back and check the content.
> > Of course this implies doing the extra write even when the erased page
> > contains too many bitflips, but at least your sure that the data you
> > put in the page were correct at that time.
> >
>
> You're right, I think this is something that once we can find inside the
> MTD code (something like "check NAND written page" kconfig option) but I
> cannot find this option anymore on latest kernel.
>
> You're approach will also have another advantage, currently nearly all
> platform will use software implementation for ECC check on erased
> blocks, while nearly all of them has hardware ECC once programmed. Using
> hardware ECC will remove CPU load and, maybe, be faster than call
> nand_check_erased_ecc_chunk()
> I also think that the situation of having failure on write is very
> unlikely, unless you have a very "used" NAND device or you're not using
> Richard's bitrot check. So we'll have a performance impact (issuing
> another write) only when it's really needed.

I didn't check before suggesting that, but it seems that the UBI layer
is already doing this check for you [1], so if you're using UBI/UBIFS
you shouldn't worry about bitflips in erased pages: if there is any,
and their presence impact the write result, they should be detected.
AFAICT, the only thing that is not checked is whether the number of
bitflips after a write exceed the bitflips threshold or not, and I
guess this can be added.

Best Regards,

Boris

[1]http://lxr.free-electrons.com/source/drivers/mtd/ubi/io.c#L294

--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

2015-08-03 13:39:26

by Andrea Scian

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions

Il 03/08/2015 14:42, Boris Brezillon ha scritto:
> Adding Artem and Richard in the loop.

thanks ;-)

>
> On Mon, 3 Aug 2015 13:16:02 +0200
> Andrea Scian <[email protected]> wrote:
>
>>
>> Dear Boris,
>>
>> Il 31/07/2015 18:27, Boris Brezillon ha scritto:
>>> On Fri, 31 Jul 2015 18:19:30 +0200
>>> Andrea Scian <[email protected]> wrote:
>>>
>>>> Il 31/07/2015 16:10, Boris Brezillon ha scritto:
>>>>> On Fri, 31 Jul 2015 15:40:13 +0200
>>>>> Andrea Scian <[email protected]> wrote:
>>>>>
>>>>>> Boris,
>>>>>>
>>>>>> Il 31/07/2015 12:32, Boris Brezillon ha scritto:
>>>>>>> Hi Andrea,
>>>>>>>
>>>>>>> Adding Han in Cc.
>>>>>>>
>>>>>>> On Fri, 31 Jul 2015 12:07:21 +0200
>>>>>>> Andrea Scian <[email protected]> wrote:
>>>>>>>
>>>>>>>> Dear Boris,
>>>>>>>>
>>>>>>>>
>>>>>>>> Il 30/07/2015 19:34, Boris Brezillon ha scritto:
>>>>>>>>> The default NAND read functions are relying on an underlying controller
>>>>>>>>> to correct bitflips, but some of those controller cannot properly fix
>>>>>>>>> bitflips in erased pages.
>>>>>>>>> In case of ECC failures, check if the page of subpage is empty before
>>>>>>>>> reporting an ECC failure.
>>>>>>>> I'm still wondering if chip->ecc.strength is the right threshold.
>>>>>>>>
>>>>>>>> Did you see my comments here [1]? WDYT?
>>>>>>> Yes I've read it, and decided to go for ecc->strength as a first
>>>>>>> step (I'm more interested in discussing the approach than the threshold
>>>>>>> value right now ;-)).
>>>>>> I perfectly understand, that's the reason why I ask if you want to move
>>>>>> to another thread ;-)
>>>>>>
>>>>>>> Anyway, as you pointed out in the thread, writing data on an erased
>>>>>>> page already containing some bitflips might generate even more
>>>>>>> bitflips, so using a different threshold for the erased page check
>>>>>>> makes sense. This threshold should definitely be correlated to the ECC
>>>>>>> strength, but how, that's the question.
>>>>>>>
>>>>>>> How about taking a rather conservative value like 10% of the specified
>>>>>>> ECC strength, and see how it goes.
>>>>>> Yes, I think that there's no real way to get the right value, other than
>>>>>> feedbacks from on-field testing with various devices.
>>>>>>
>>>>>> I'm also thinking about changing how a NAND page is written on the
>>>>>> device, now that we know that even erased page may have (too many!)
>>>>>> bitflips if they has not been so-freshly erased.
>>>>>>
>>>>>> Read on NAND device is lot's faster that write, so maybe we can:
>>>>>>
>>>>>> a) read the page before write it, check for bitflips on erased area and
>>>>>> write it only if it fit our threshold
>>>>>>
>>>>>> b) read the page after write it and check if the bitflips are lower that
>>>>>> a give value
>>>>>>
>>>>>> In this way:
>>>>>> - we can use ecc_strength as read threshold, because it fits all the
>>>>>> other NAND read
>>>>>>
>>>>>> - we can use "something a bit lower than" mtd->bitflip_threshold on
>>>>>> read-before-write or read-after-write. If we don't do so the block will
>>>>>> be scrubbed next time we read it again (if we are lucky.. if we are
>>>>>> unlucky the block will have bitflip > ecc_strength!): IOW we did a write
>>>>>> that will trigger another erase/write cycle.
>>>>>>
>>>>>> Am I misunderstanding something?
>>>>> Nope, but this implies doing an extra read after each write :-/
>>>>>
>>>> Let's wait what the others says about this, but I would like to put some
>>>> numbers in it.
>>> Sure.
>>>
>>>> My micron MLC device says
>>>> - read page max 75 uS
>>>> - write page typ 1300uS, max 2600uS
>>>>
>>>> If we implement read-before-write (which is, IMO, the best approach), in
>>>> the worst overhead we have is 1375uS vs 1300uS, which is ~6%.
>>>> Please note that, if you read a page that "is not suitable" for write,
>>>> you avoid the write time, schedule it for scrubbing, and use another
>>>> free page.
>>> Indeed, that's not such a big overhead.
>>>
>>>> Probably I'm a bit optimistic because we also need to take in account
>>>> other latencies (DMA setup, ECC engine, buffer copies and so on) but
>>>> it's a starting point ;-)
>>> Yep, if you test it, could you provide some speed test results (with
>>> and without this solution).
>>
>> I think I can find some time to do some performance tests on real hardware.
>> Can you please help me in finding:
>> - which benchmark to use (currently I'm using bonnie++ on UBIFS, maybe I
>> can you just mtd_speedtest)
>> - where to implement those read
>
> I think the test should be done at the UBI layer if we want to check
> the real impact of the additional read sequence, but given the answer I
> gave to your other question I'm not sure this is relevant anymore ;-).
>
>>
>> For the second point I think we can implement it a UBI or MTD level.
>> I think the former will allow us to easily schedule scrubbing and choose
>> another block to issue the write to. However I don't really know how to
>> implement it (I don't really know so much about the UBI code).
>>
>> The latter, at least for me, is easier to implement: I think I can find
>> the place to add the page read on my own, anyway any clue is welcome ;-)
>> But I think it will be harder (or impossible) to choose where to issue
>> the write, unless UBI already do so when it saw an MTD write failure.
>>
>>> And I wonder if we shouldn't do it the other way around, write then
>>> read-back and check the content.
>>> Of course this implies doing the extra write even when the erased page
>>> contains too many bitflips, but at least your sure that the data you
>>> put in the page were correct at that time.
>>>
>>
>> You're right, I think this is something that once we can find inside the
>> MTD code (something like "check NAND written page" kconfig option) but I
>> cannot find this option anymore on latest kernel.
>>
>> You're approach will also have another advantage, currently nearly all
>> platform will use software implementation for ECC check on erased
>> blocks, while nearly all of them has hardware ECC once programmed. Using
>> hardware ECC will remove CPU load and, maybe, be faster than call
>> nand_check_erased_ecc_chunk()
>> I also think that the situation of having failure on write is very
>> unlikely, unless you have a very "used" NAND device or you're not using
>> Richard's bitrot check. So we'll have a performance impact (issuing
>> another write) only when it's really needed.
>
> I didn't check before suggesting that, but it seems that the UBI layer
> is already doing this check for you [1], so if you're using UBI/UBIFS
> you shouldn't worry about bitflips in erased pages: if there is any,
> and their presence impact the write result, they should be detected.
> AFAICT, the only thing that is not checked is whether the number of
> bitflips after a write exceed the bitflips threshold or not, and I
> guess this can be added.

IIUC this is a runtime debug check

if (!ubi_dbg_chk_io(ubi))
....

And thus is disabled by default.
Anyway, this answer my missing "check NAND written page" question above ;-)

IIUC (again) /sys/kernel/debug/ubi/ubi0/chk_io enable this and many
other runtime check, so it's only partially useful to have a first raw
approach to performance impact.

And, yes, those functions only check for valid data, not about bitflip
counts.

Kind Regards,

--

Andrea SCIAN

DAVE Embedded Systems

2015-08-03 19:32:47

by Richard Weinberger

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions

Am 03.08.2015 um 15:39 schrieb Andrea Scian:
>>> I think I can find some time to do some performance tests on real hardware.
>>> Can you please help me in finding:
>>> - which benchmark to use (currently I'm using bonnie++ on UBIFS, maybe I
>>> can you just mtd_speedtest)
>>> - where to implement those read
>>
>> I think the test should be done at the UBI layer if we want to check
>> the real impact of the additional read sequence, but given the answer I
>> gave to your other question I'm not sure this is relevant anymore ;-).

I'm not sure whether introducing a read-before-write check is the best solution.
At least we need hard numbers for slow/old SLC NANDs too.

We has such checks already and got rid of them.
commit 657f28f8811c92724db10d18bbbec70d540147d6
Author: Huang Shijie <[email protected]>
Date: Tue Aug 14 22:38:45 2012 -0400

mtd: kill MTD_NAND_VERIFY_WRITE


Although the goal of 657f28f8 was something else.

In general I don't think putting much MTD/ECC logic into UBI is the way to go.
UBI is a layer above MTD and MTD should do as much as possible wrt. ECC.

>>>
>>> For the second point I think we can implement it a UBI or MTD level.
>>> I think the former will allow us to easily schedule scrubbing and choose
>>> another block to issue the write to. However I don't really know how to
>>> implement it (I don't really know so much about the UBI code).

Implementing this is not much work.
I've done such hacks for various customers to hunt down hardware issues.

>> I didn't check before suggesting that, but it seems that the UBI layer
>> is already doing this check for you [1], so if you're using UBI/UBIFS
>> you shouldn't worry about bitflips in erased pages: if there is any,
>> and their presence impact the write result, they should be detected.
>> AFAICT, the only thing that is not checked is whether the number of
>> bitflips after a write exceed the bitflips threshold or not, and I
>> guess this can be added.
>
> IIUC this is a runtime debug check
>
> if (!ubi_dbg_chk_io(ubi))
> ....
>
> And thus is disabled by default.

That's correct.

Thanks,
//richard

2015-08-04 07:02:23

by Andrea Scian

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions


Richard,

Il 03/08/2015 21:32, Richard Weinberger ha scritto:
> Am 03.08.2015 um 15:39 schrieb Andrea Scian:
>>>> I think I can find some time to do some performance tests on real hardware.
>>>> Can you please help me in finding:
>>>> - which benchmark to use (currently I'm using bonnie++ on UBIFS, maybe I
>>>> can you just mtd_speedtest)
>>>> - where to implement those read
>>>
>>> I think the test should be done at the UBI layer if we want to check
>>> the real impact of the additional read sequence, but given the answer I
>>> gave to your other question I'm not sure this is relevant anymore ;-).
>
> I'm not sure whether introducing a read-before-write check is the best solution.
> At least we need hard numbers for slow/old SLC NANDs too.

We can enable the feature only for MLC, AFAIK it has not been required
for old SLC ;-)

Anyway, maybe I can do some performance test if you point me to the
right userspace tool to use.
As I already say I'm using bonnie++ to stress the device, more from a
stability than from performance point of view.
I'm also used to run mtd_speedtest but this may be useless if we put
some code inside the upper layers.

> We has such checks already and got rid of them.
> commit 657f28f8811c92724db10d18bbbec70d540147d6
> Author: Huang Shijie <[email protected]>
> Date: Tue Aug 14 22:38:45 2012 -0400
>
> mtd: kill MTD_NAND_VERIFY_WRITE
>
>
> Although the goal of 657f28f8 was something else.

Understood, thanks for point this out

>
> In general I don't think putting much MTD/ECC logic into UBI is the way to go.
> UBI is a layer above MTD and MTD should do as much as possible wrt. ECC.
>
>>>>
>>>> For the second point I think we can implement it a UBI or MTD level.
>>>> I think the former will allow us to easily schedule scrubbing and choose
>>>> another block to issue the write to. However I don't really know how to
>>>> implement it (I don't really know so much about the UBI code).
>
> Implementing this is not much work.
> I've done such hacks for various customers to hunt down hardware issues.
>
>>> I didn't check before suggesting that, but it seems that the UBI layer
>>> is already doing this check for you [1], so if you're using UBI/UBIFS
>>> you shouldn't worry about bitflips in erased pages: if there is any,
>>> and their presence impact the write result, they should be detected.
>>> AFAICT, the only thing that is not checked is whether the number of
>>> bitflips after a write exceed the bitflips threshold or not, and I
>>> guess this can be added.
>>
>> IIUC this is a runtime debug check
>>
>> if (!ubi_dbg_chk_io(ubi))
>> ....
>>
>> And thus is disabled by default.
>
> That's correct.

Thanks.
In your opinion, enabling chk_io is correct to rough estimate the
overhead or does it enable too much checks?

TIA,

--

Andrea SCIAN

DAVE Embedded Systems

2015-08-04 07:21:34

by Richard Weinberger

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions

Andrea,

Am 04.08.2015 um 09:02 schrieb Andrea Scian:
>> I'm not sure whether introducing a read-before-write check is the best solution.
>> At least we need hard numbers for slow/old SLC NANDs too.
>
> We can enable the feature only for MLC, AFAIK it has not been required for old SLC ;-)

I think this needs more discussion.

Boris, Brian, will you be at Embedded Linux Conference Europe in Dublin?
Maybe we can discuss these issues (data retention, ff-checks, etc...) in person and figure out where to address them.
I really want to avoid ad-hoc solutions. :)

> Thanks.
> In your opinion, enabling chk_io is correct to rough estimate the overhead or does it enable too much checks?

You mean the other checks bedside of self_check_write()? You can comment them out for your tests.

Thanks,
//richard

2015-08-04 15:42:48

by Andrea Scian

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] mtd: nand: add nand_check_erased helper functions


Boris,

sorry for the later review.
I'm trying to run your patch on our systems. See below

Il 31/07/2015 09:10, Boris Brezillon ha scritto:
> On Thu, 30 Jul 2015 19:34:53 +0200
> Boris Brezillon <[email protected]> wrote:
>
>> Add two helper functions to help NAND controller drivers test whether a
>> specific NAND region is erased or not.
>>
>> Signed-off-by: Boris Brezillon <[email protected]>
>> ---
>> drivers/mtd/nand/nand_base.c | 104 +++++++++++++++++++++++++++++++++++++++++++
>> include/linux/mtd/nand.h | 8 ++++
>> 2 files changed, 112 insertions(+)
>>
>> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
>> index ceb68ca..1542ea7 100644
>> --- a/drivers/mtd/nand/nand_base.c
>> +++ b/drivers/mtd/nand/nand_base.c
>> @@ -1101,6 +1101,110 @@ out:
>> EXPORT_SYMBOL(nand_lock);
>>
>> /**
>> + * nand_check_erased_buf - check if a buffer contains (almost) only 0xff data
>> + * @buf: buffer to test
>> + * @len: buffer length
>> + * @bitflips_threshold:maximum number of bitflips
>> + *
>> + * Check if a buffer contains only 0xff, which means the underlying region
>> + * has been erased and is ready to be programmed.
>> + * The bitflips_threshold specify the maximum number of bitflips before
>> + * considering the region is not erased.
>> + * Note: The logic of this function has been extracted from the memweight
>> + * implementation, except that nand_check_erased_buf function exit before
>> + * testing the whole buffer if the number of bitflips exceed the
>> + * bitflips_threshold value.
>> + *
>> + * Returns a positive number of bitflips or -ERROR_CODE.
>> + */
>> +int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
>> +{
>> + const unsigned char *bitmap = buf;
>> + int bitflips = 0;
>> + int weight;
>> + int longs;
>> +
>> + for (; len && ((unsigned long)bitmap) % sizeof(long);
>> + len--, bitmap++) {
>> + weight = hweight8(*bitmap);
>> +
>> + bitflips += sizeof(u8) - weight;

I think the above like should be

bitflips += sizeof(u8)*BITS_PER_BYTE - weight;


>> + if (bitflips > bitflips_threshold)
>> + return -EINVAL;
>> + }
>> +
>> +
>> + for (longs = len / sizeof(long); longs;
>> + longs--, bitmap += sizeof(long)) {
>> + BUG_ON(longs >= INT_MAX / BITS_PER_LONG);
>> + weight = hweight_long(*((unsigned long *)bitmap));
>> +
>> + bitflips += sizeof(long) - weight;

as above:

bitflips += sizeof(long)*BITS_PER_BYTE - weight;

>> + if (bitflips > bitflips_threshold)
>> + return -EINVAL;
>> + }
>> +
>> + len %= sizeof(long);
>> +
>> + for (; len > 0; len--, bitmap++) {
>> + weight = hweight8(*bitmap);
>> + bitflips += sizeof(u8) - weight;


as above:

bitflips += sizeof(u8)*BITS_PER_BYTE - weight;


>> + }
>> +
>> + return bitflips;
>> +}
>> +EXPORT_SYMBOL(nand_check_erased_buf);
>> +
>> +/**
>> + * nand_check_erased_ecc_chunk - check if an ECC chunk contains (almost) only
>> + * 0xff data
>> + * @data: data buffer to test
>> + * @datalen: data length
>> + * @ecc: ECC buffer
>> + * @ecclen: ECC length
>> + * @extraoob: extra OOB buffer
>> + * @extraooblen: extra OOB length
>> + * @bitflips_threshold: maximum number of bitflips
>> + *
>> + * Check if a data buffer and its associated ECC and OOB data contains only
>> + * 0xff pattern, which means the underlying region has been erased and is
>> + * ready to be programmed.
>> + * The bitflips_threshold specify the maximum number of bitflips before
>> + * considering the region as not erased.
>> + *
>> + * Returns a positive number of bitflips or -ERROR_CODE.
>> + */
>> +int nand_check_erased_ecc_chunk(void *data, int datalen,
>> + void *ecc, int ecclen,
>> + void *extraoob, int extraooblen,
>> + int bitflips_threshold)
>> +{
>> + int bitflips = 0;
>> + int ret;
>> +
>> + ret = nand_check_erased_buf(data, datalen, bitflips_threshold);
>> + if (ret < 0)
>> + return ret;
>> +
>> + bitflips += ret;
>> + bitflips_threshold -= ret;
>> +
>> + ret = nand_check_erased_buf(ecc, ecclen, bitflips_threshold);
>> + if (ret < 0)
>> + return ret;
>> +
>> + bitflips += ret;
>> + bitflips_threshold -= ret;
>> +
>> + ret = nand_check_erased_buf(extraoob, extraooblen, bitflips_threshold);
>> + if (ret < 0)
>> + return ret;
>> +
>
> Forgot the memset operations here:
>
> memset(data, 0xff, datalen);
> memset(ecc, 0xff, ecclen);
> memset(extraoob, 0xff, extraooblen);
>
>> + return bitflips + ret;
>> +}
>> +EXPORT_SYMBOL(nand_check_erased_ecc_chunk);
>> +
>> +/**
>> * nand_read_page_raw - [INTERN] read raw page data without ecc
>> * @mtd: mtd info structure
>> * @chip: nand chip info structure
>> diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
>> index 272f429..ae06a07 100644
>> --- a/include/linux/mtd/nand.h
>> +++ b/include/linux/mtd/nand.h
>> @@ -1030,4 +1030,12 @@ struct nand_sdr_timings {
>>
>> /* get timing characteristics from ONFI timing mode. */
>> const struct nand_sdr_timings *onfi_async_timing_mode_to_sdr_timings(int mode);
>> +
>> +int nand_check_erased_buf(void *data, int datalen,
>> + int threshold);
>> +
>> +int nand_check_erased_ecc_chunk(void *data, int datalen,
>> + void *ecc, int ecclen,
>> + void *extraoob, int extraooblen,
>> + int threshold);
>> #endif /* __LINUX_MTD_NAND_H */
>
>
>

2015-08-04 16:30:30

by Boris Brezillon

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] mtd: nand: add nand_check_erased helper functions



Hi Andrea

Le 4 août 2015 17:42:43 GMT+02:00, Andrea Scian <[email protected]> a écrit :
>
>>> +
>>> + bitflips += sizeof(long) - weight;
>
>as above:
>
>bitflips += sizeof(long)*BITS_PER_BYTE - weight;


Indeed, or we could just replace sizeof(u8) by BITS_PER_BYTE and sizeof(long) by BITS_PER_LONG.

I'll fix that in the next version.

Note that I didn't test the series on a real platform and won't be able to do so for the next two weeks, so you might find other inconsistencies/bugs.

Thanks for testing it.

Best regards,

Boris

2015-08-06 04:28:52

by Andrea Scian

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions

Il 04/08/2015 09:21, Richard Weinberger ha scritto:
> Andrea,
>
> Am 04.08.2015 um 09:02 schrieb Andrea Scian:
>>> I'm not sure whether introducing a read-before-write check is the best solution.
>>> At least we need hard numbers for slow/old SLC NANDs too.
>>
>> We can enable the feature only for MLC, AFAIK it has not been required for old SLC ;-)
>
> I think this needs more discussion.
>
> Boris, Brian, will you be at Embedded Linux Conference Europe in Dublin?
> Maybe we can discuss these issues (data retention, ff-checks, etc...) in person and
> figure out where to address them.
> I really want to avoid ad-hoc solutions. :)

Maybe I'll be at ELCE this year too
I'll be glad to meet all of you in person and participate to this
discussion. :)

It will be nice if also some silicon vendor would like to participate. I
know that someone from micron is actively following us on this ML, but I
don't really know if there's someone here in Europe. :)

>
>> Thanks.
>> In your opinion, enabling chk_io is correct to rough estimate the overhead
>> or does it enable too much checks?
>
> You mean the other checks bedside of self_check_write()? You can comment them out
> for your tests.
>
> Thanks,
> //richard
>

Kind Regards,

--

Andrea SCIAN

DAVE Embedded Systems

2015-08-06 09:28:16

by Boris Brezillon

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions

Hi Richard,

On Tue, 4 Aug 2015 09:21:27 +0200
Richard Weinberger <[email protected]> wrote:

> Andrea,
>
> Am 04.08.2015 um 09:02 schrieb Andrea Scian:
> >> I'm not sure whether introducing a read-before-write check is the best solution.
> >> At least we need hard numbers for slow/old SLC NANDs too.
> >
> > We can enable the feature only for MLC, AFAIK it has not been required for old SLC ;-)
>
> I think this needs more discussion.
>
> Boris, Brian, will you be at Embedded Linux Conference Europe in Dublin?
> Maybe we can discuss these issues (data retention, ff-checks, etc...) in person and figure out where to address them.
> I really want to avoid ad-hoc solutions. :)

I'll be at ELCE and I'd be happy to discuss all these NAND/MTD/UBI
related issues with you, though I think we should keep discussing
those problems on the ML too.

Best Regards,

Boris

--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

2015-08-06 09:42:30

by Richard Weinberger

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions



Am 06.08.2015 um 11:19 schrieb Boris Brezillon:
>> I think this needs more discussion.
>>
>> Boris, Brian, will you be at Embedded Linux Conference Europe in Dublin?
>> Maybe we can discuss these issues (data retention, ff-checks, etc...) in person and figure out where to address them.
>> I really want to avoid ad-hoc solutions. :)
>
> I'll be at ELCE and I'd be happy to discuss all these NAND/MTD/UBI
> related issues with you, though I think we should keep discussing
> those problems on the ML too.

Sure, I did not recommend a secret meeting. ;-)
But often a face to face conversation is better for brain storming.

Thanks,
//richard