2018-04-04 12:44:10

by Abhishek Sahu

[permalink] [raw]
Subject: [PATCH 0/9] Update for QCOM NAND driver

This patch series mainly deals with error handling and erased page
bitflip detection for QCOM NAND driver.

1. The error handling was missing for some of the cases so fixed
the same.

2. Add the support for taking ECC strength from ONFI parameter.
The earlier QCOM boards were coming with 4-bit ECC chip but
now the same boards are coming with 8-bit ECC chip since the
earlier 4-bit parts are obsolete from some vendors.

3. We got few issues related with NAND erased page bitflips. The
QCOM NAND controller can’t detect the bitflip in completely erased
page so added the support to detect the same. It implemented the
logic mentioned in patch [1] which didn’t go in mainline and later
the generic functions were provided [2] to count the number of
bitflips and make all 0xff. This patch series did some optimization
logic to prevent the unnecessary full page raw read and data copy
from QCOM NAND controller to DMA.

4. Following are the testing done for these patches in QCOM IPQ8074
HK01 (4-bit and 8-bit ECC chip) and IPQ806x AP148 boards.
a. Run all mtd test and check if it passes
b. Introduce custom bitflips in erased page and check if it
returns no error/EUCLEAN/EBADMSG depending upon number of
bitflips and position.
c. Introduce failure condition for operational failure and
check if it detects the same.

[1]: https://patchwork.ozlabs.org/patch/328994/
[2]: https://patchwork.ozlabs.org/patch/509970/

Abhishek Sahu (9):
mtd: nand: qcom: use the ecc strength from device parameter
mtd: nand: qcom: wait for desc completion in all BAM channels
mtd: nand: qcom: erased page detection for uncorrectable errors only
mtd: nand: qcom: fix null pointer access for erased buffer detection
mtd: nand: qcom: parse read errors for read oob also
mtd: nand: qcom: support for checking read errors for last codeword
mtd: nand: qcom: check for operation errors in case of raw read
mtd: nand: qcom: helper function for raw read
mtd: nand: qcom: erased page bitflips detection

drivers/mtd/nand/qcom_nandc.c | 468 ++++++++++++++++++++++++++++++------------
1 file changed, 333 insertions(+), 135 deletions(-)

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation



2018-04-04 12:44:45

by Abhishek Sahu

[permalink] [raw]
Subject: [PATCH 4/9] mtd: nand: qcom: fix null pointer access for erased buffer detection

parse_read_errors can be called with only oob buf also in which
case data_buf will be NULL. If data_buf is NULL, then don’t
treat this page as completely erased in case of ECC uncorrectable
error.

Signed-off-by: Abhishek Sahu <[email protected]>
---
drivers/mtd/nand/qcom_nandc.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/mtd/nand/qcom_nandc.c b/drivers/mtd/nand/qcom_nandc.c
index 57c16a6..0ebcc55 100644
--- a/drivers/mtd/nand/qcom_nandc.c
+++ b/drivers/mtd/nand/qcom_nandc.c
@@ -1607,9 +1607,11 @@ static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
if (host->bch_enabled) {
erased = (erased_cw & ERASED_CW) == ERASED_CW ?
true : false;
- } else {
+ } else if (data_buf) {
erased = erased_chunk_check_and_fixup(data_buf,
data_len);
+ } else {
+ erased = false;
}

if (erased) {
@@ -1652,7 +1654,8 @@ static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
max_bitflips = max(max_bitflips, stat);
}

- data_buf += data_len;
+ if (data_buf)
+ data_buf += data_len;
if (oob_buf)
oob_buf += oob_len + ecc->bytes;
}
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation


2018-04-04 12:44:55

by Abhishek Sahu

[permalink] [raw]
Subject: [PATCH 6/9] mtd: nand: qcom: support for checking read errors for last codeword

Add boolean function argument in parse_read_errors to identify
whether the read error has been called for complete page read or
only last codeword read. This will help in subsequent patches to
detect ECC errors in case of last codeword read.

Signed-off-by: Abhishek Sahu <[email protected]>
---
drivers/mtd/nand/qcom_nandc.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/mtd/nand/qcom_nandc.c b/drivers/mtd/nand/qcom_nandc.c
index ba43752..dce97e8 100644
--- a/drivers/mtd/nand/qcom_nandc.c
+++ b/drivers/mtd/nand/qcom_nandc.c
@@ -1570,7 +1570,7 @@ struct read_stats {
* errors. this is equivalent to what 'ecc->correct()' would do.
*/
static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
- u8 *oob_buf)
+ u8 *oob_buf, bool last_cw)
{
struct nand_chip *chip = &host->chip;
struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
@@ -1579,12 +1579,12 @@ static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
unsigned int max_bitflips = 0;
struct read_stats *buf;
bool flash_op_err = false;
- int i;
+ int i, cw_cnt = last_cw ? 1 : ecc->steps;

buf = (struct read_stats *)nandc->reg_read_buf;
nandc_read_buffer_sync(nandc, true);

- for (i = 0; i < ecc->steps; i++, buf++) {
+ for (i = 0; i < cw_cnt; i++, buf++) {
u32 flash, buffer, erased_cw;
int data_len, oob_len;

@@ -1743,7 +1743,8 @@ static int read_page_ecc(struct qcom_nand_host *host, u8 *data_buf,
free_descs(nandc);

if (!ret)
- ret = parse_read_errors(host, data_buf_start, oob_buf_start);
+ ret = parse_read_errors(host, data_buf_start, oob_buf_start,
+ false);

return ret;
}
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation


2018-04-04 12:45:14

by Abhishek Sahu

[permalink] [raw]
Subject: [PATCH 8/9] mtd: nand: qcom: helper function for raw read

This patch does minor code reorganization for raw reads.
Currently the raw read is required for complete page but for
subsequent patches related with erased codeword bit flips
detection, only few CW should be read. So, this patch adds
helper function and introduces the read CW bitmask which
specifies which CW reads are required in complete page.

Signed-off-by: Abhishek Sahu <[email protected]>
---
drivers/mtd/nand/qcom_nandc.c | 186 +++++++++++++++++++++++++-----------------
1 file changed, 110 insertions(+), 76 deletions(-)

diff --git a/drivers/mtd/nand/qcom_nandc.c b/drivers/mtd/nand/qcom_nandc.c
index 40c790e..f5d1fa4 100644
--- a/drivers/mtd/nand/qcom_nandc.c
+++ b/drivers/mtd/nand/qcom_nandc.c
@@ -1590,6 +1590,114 @@ static int check_flash_errors(struct qcom_nand_host *host, int cw_cnt)
}

/*
+ * Helper to perform the page raw read operation. The read_cw_mask will be
+ * used to specify the codewords for which the data should be read. The
+ * single page contains multiple CW. Sometime, only few CW data is required
+ * in complete page. Also, start address will be determined with
+ * this CW mask to skip unnecessary data copy from NAND flash device. Then,
+ * actual data copy from NAND controller to data buffer will be done only
+ * for the CWs which have the mask set.
+ */
+static int
+nandc_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip,
+ u8 *data_buf, u8 *oob_buf,
+ int page, unsigned long read_cw_mask)
+{
+ struct qcom_nand_host *host = to_qcom_nand_host(chip);
+ struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
+ struct nand_ecc_ctrl *ecc = &chip->ecc;
+ int i, ret;
+ int read_loc, start_step, last_step;
+
+ nand_read_page_op(chip, page, 0, NULL, 0);
+
+ host->use_ecc = false;
+ start_step = ffs(read_cw_mask) - 1;
+ last_step = fls(read_cw_mask);
+
+ clear_bam_transaction(nandc);
+ set_address(host, host->cw_size * start_step, page);
+ update_rw_regs(host, last_step - start_step, true);
+ config_nand_page_read(nandc);
+
+ for (i = start_step; i < last_step; i++) {
+ int data_size1, data_size2, oob_size1, oob_size2;
+ int reg_off = FLASH_BUF_ACC;
+
+ data_size1 = mtd->writesize - host->cw_size * (ecc->steps - 1);
+ oob_size1 = host->bbm_size;
+
+ if (i == (ecc->steps - 1)) {
+ data_size2 = ecc->size - data_size1 -
+ ((ecc->steps - 1) << 2);
+ oob_size2 = (ecc->steps << 2) + host->ecc_bytes_hw +
+ host->spare_bytes;
+ } else {
+ data_size2 = host->cw_data - data_size1;
+ oob_size2 = host->ecc_bytes_hw + host->spare_bytes;
+ }
+
+ /*
+ * Don't perform actual data copy from NAND controller to data
+ * buffer through DMA for this codeword
+ */
+ if (!(read_cw_mask & BIT(i))) {
+ if (nandc->props->is_bam)
+ nandc_set_read_loc(nandc, 0, 0, 0, 1);
+
+ config_nand_cw_read(nandc, false);
+
+ data_buf += data_size1 + data_size2;
+ oob_buf += oob_size1 + oob_size2;
+
+ continue;
+ }
+
+ if (nandc->props->is_bam) {
+ read_loc = 0;
+ nandc_set_read_loc(nandc, 0, read_loc, data_size1, 0);
+ read_loc += data_size1;
+
+ nandc_set_read_loc(nandc, 1, read_loc, oob_size1, 0);
+ read_loc += oob_size1;
+
+ nandc_set_read_loc(nandc, 2, read_loc, data_size2, 0);
+ read_loc += data_size2;
+
+ nandc_set_read_loc(nandc, 3, read_loc, oob_size2, 1);
+ }
+
+ config_nand_cw_read(nandc, false);
+
+ read_data_dma(nandc, reg_off, data_buf, data_size1, 0);
+ reg_off += data_size1;
+ data_buf += data_size1;
+
+ read_data_dma(nandc, reg_off, oob_buf, oob_size1, 0);
+ reg_off += oob_size1;
+ oob_buf += oob_size1;
+
+ read_data_dma(nandc, reg_off, data_buf, data_size2, 0);
+ reg_off += data_size2;
+ data_buf += data_size2;
+
+ read_data_dma(nandc, reg_off, oob_buf, oob_size2, 0);
+ oob_buf += oob_size2;
+ }
+
+ ret = submit_descs(nandc);
+ if (ret)
+ dev_err(nandc->dev, "failure to read raw page\n");
+
+ free_descs(nandc);
+
+ if (!ret)
+ ret = check_flash_errors(host, last_step - start_step);
+
+ return 0;
+}
+
+/*
* reads back status registers set by the controller to notify page read
* errors. this is equivalent to what 'ecc->correct()' would do.
*/
@@ -1839,82 +1947,8 @@ static int qcom_nandc_read_page_raw(struct mtd_info *mtd,
struct nand_chip *chip, uint8_t *buf,
int oob_required, int page)
{
- struct qcom_nand_host *host = to_qcom_nand_host(chip);
- struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
- u8 *data_buf, *oob_buf;
- struct nand_ecc_ctrl *ecc = &chip->ecc;
- int i, ret;
- int read_loc;
-
- nand_read_page_op(chip, page, 0, NULL, 0);
- data_buf = buf;
- oob_buf = chip->oob_poi;
-
- host->use_ecc = false;
-
- clear_bam_transaction(nandc);
- update_rw_regs(host, ecc->steps, true);
- config_nand_page_read(nandc);
-
- for (i = 0; i < ecc->steps; i++) {
- int data_size1, data_size2, oob_size1, oob_size2;
- int reg_off = FLASH_BUF_ACC;
-
- data_size1 = mtd->writesize - host->cw_size * (ecc->steps - 1);
- oob_size1 = host->bbm_size;
-
- if (i == (ecc->steps - 1)) {
- data_size2 = ecc->size - data_size1 -
- ((ecc->steps - 1) << 2);
- oob_size2 = (ecc->steps << 2) + host->ecc_bytes_hw +
- host->spare_bytes;
- } else {
- data_size2 = host->cw_data - data_size1;
- oob_size2 = host->ecc_bytes_hw + host->spare_bytes;
- }
-
- if (nandc->props->is_bam) {
- read_loc = 0;
- nandc_set_read_loc(nandc, 0, read_loc, data_size1, 0);
- read_loc += data_size1;
-
- nandc_set_read_loc(nandc, 1, read_loc, oob_size1, 0);
- read_loc += oob_size1;
-
- nandc_set_read_loc(nandc, 2, read_loc, data_size2, 0);
- read_loc += data_size2;
-
- nandc_set_read_loc(nandc, 3, read_loc, oob_size2, 1);
- }
-
- config_nand_cw_read(nandc, false);
-
- read_data_dma(nandc, reg_off, data_buf, data_size1, 0);
- reg_off += data_size1;
- data_buf += data_size1;
-
- read_data_dma(nandc, reg_off, oob_buf, oob_size1, 0);
- reg_off += oob_size1;
- oob_buf += oob_size1;
-
- read_data_dma(nandc, reg_off, data_buf, data_size2, 0);
- reg_off += data_size2;
- data_buf += data_size2;
-
- read_data_dma(nandc, reg_off, oob_buf, oob_size2, 0);
- oob_buf += oob_size2;
- }
-
- ret = submit_descs(nandc);
- if (ret)
- dev_err(nandc->dev, "failure to read raw page\n");
-
- free_descs(nandc);
-
- if (!ret)
- ret = check_flash_errors(host, ecc->steps);
-
- return 0;
+ return nandc_read_page_raw(mtd, chip, buf, chip->oob_poi, page,
+ BIT(chip->ecc.steps) - 1);
}

/* implements ecc->read_oob() */
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation


2018-04-04 12:45:16

by Abhishek Sahu

[permalink] [raw]
Subject: [PATCH 9/9] mtd: nand: qcom: erased page bitflips detection

Some of the newer nand parts can have bit flips in an erased
page due to the process technology used. In this case, qpic
nand controller is not able to identify that page as an erased
page. Currently the driver calls nand_check_erased_ecc_chunk for
identifying the erased pages but this won’t work always since the
checking is being with ECC engine returned data. In case of
bitflips, the ECC engine tries to correct the data and then it
generates the uncorrectable error. Now, this data is not equal to
original raw data. For erased CW identification, the raw data
should be read again from NAND device and this
nand_check_erased_ecc_chunk function should be called for raw
data only.

Now following logic is being added to identify the erased
codeword bitflips.

1. In most of the case, not all the codewords will have bitflips
and only single CW will have bitflips. So, there is no need to
read the complete raw page data. The NAND raw read can be
scheduled for any CW in page. The NAND controller works on CW
basis and it will update the status register after each CW read.
Maintain the bitmask for the CW which generated the uncorrectable
error.
2. Schedule the raw flash read from NAND flash device to
NAND controller buffer for all these CWs between first and last
uncorrectable errors CWs. Copy the content from NAND controller
buffer to actual data buffer only for the uncorrectable errors
CWs so that other CW data content won’t be affected, and
unnecessary data copy can be avoided.
3. Both DATA and OOB need to be checked for number of 0. The
top-level API can be called with only data buf or oob buf so use
chip->databuf if data buf is null and chip->oob_poi if
oob buf is null for copying the raw bytes temporarily.
4. For each CW, check the number of 0 in cw_data and usable
oob bytes, The bbm and spare bytes bit flip won’t affect the ECC
so don’t check the number of bitflips in this area.

Signed-off-by: Abhishek Sahu <[email protected]>
---
drivers/mtd/nand/qcom_nandc.c | 144 ++++++++++++++++++++++++++++++------------
1 file changed, 104 insertions(+), 40 deletions(-)

diff --git a/drivers/mtd/nand/qcom_nandc.c b/drivers/mtd/nand/qcom_nandc.c
index f5d1fa4..ec0b7db 100644
--- a/drivers/mtd/nand/qcom_nandc.c
+++ b/drivers/mtd/nand/qcom_nandc.c
@@ -1698,25 +1698,112 @@ static int check_flash_errors(struct qcom_nand_host *host, int cw_cnt)
}

/*
+ * Bitflips can happen in erased codewords also so this function counts the
+ * number of 0 in each CW for which ECC engine returns the uncorrectable
+ * error. The page will be assumed as erased if this count is less than or
+ * equal to the ecc->strength for each CW.
+ *
+ * 1. Both DATA and OOB need to be checked for number of 0. The
+ * top-level API can be called with only data buf or oob buf so use
+ * chip->data_buf if data buf is null and chip->oob_poi if oob buf
+ * is null for copying the raw bytes.
+ * 2. Perform raw read for all the CW which has uncorrectable errors.
+ * 3. For each CW, check the number of 0 in cw_data and usable oob bytes.
+ * The bbm and spare bytes bit flip won’t affect the ECC so don’t check
+ * the number of bitflips in this area.
+ */
+static int
+check_for_erased_page(struct qcom_nand_host *host, u8 *data_buf,
+ u8 *oob_buf, unsigned long uncorrectable_err_cws,
+ int page, unsigned int max_bitflips, bool last_cw)
+{
+ struct nand_chip *chip = &host->chip;
+ struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
+ struct mtd_info *mtd = nand_to_mtd(chip);
+ struct nand_ecc_ctrl *ecc = &chip->ecc;
+ int i, start_step, last_step, ret = 0;
+
+ start_step = ffs(uncorrectable_err_cws) - 1;
+ last_step = fls(uncorrectable_err_cws);
+
+ if (!last_cw) {
+ if (!data_buf)
+ data_buf = chip->data_buf;
+ if (!oob_buf)
+ oob_buf = chip->oob_poi;
+ data_buf += start_step * host->cw_data;
+ oob_buf += start_step * ecc->bytes;
+ }
+
+ clear_read_regs(nandc);
+ nandc_read_page_raw(mtd, chip, data_buf, oob_buf, page,
+ uncorrectable_err_cws);
+
+ for (i = start_step; i < last_step; i++) {
+ int data_size, oob_size;
+
+ if (i == (ecc->steps - 1)) {
+ data_size = ecc->size - ((ecc->steps - 1) << 2);
+ oob_size = (ecc->steps << 2) + host->ecc_bytes_hw;
+ } else {
+ data_size = host->cw_data;
+ oob_size = host->ecc_bytes_hw;
+ }
+
+ if (uncorrectable_err_cws & BIT(i)) {
+ /*
+ * make sure it isn't an erased page reported
+ * as not-erased by HW because of a few bitflips
+ */
+ ret = nand_check_erased_ecc_chunk(data_buf,
+ data_size, oob_buf + host->bbm_size,
+ oob_size, NULL,
+ 0, ecc->strength);
+ if (ret < 0) {
+ mtd->ecc_stats.failed++;
+ } else {
+ mtd->ecc_stats.corrected += ret;
+ max_bitflips =
+ max_t(unsigned int, max_bitflips, ret);
+ }
+ }
+
+ data_buf += data_size;
+ oob_buf += ecc->bytes;
+ }
+
+ return max_bitflips;
+}
+
+/*
* reads back status registers set by the controller to notify page read
* errors. this is equivalent to what 'ecc->correct()' would do.
*/
static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
- u8 *oob_buf, bool last_cw)
+ u8 *oob_buf, bool last_cw, int page)
{
struct nand_chip *chip = &host->chip;
struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
struct mtd_info *mtd = nand_to_mtd(chip);
struct nand_ecc_ctrl *ecc = &chip->ecc;
- unsigned int max_bitflips = 0;
+ unsigned int max_bitflips = 0, uncorrectable_err_cws = 0;
struct read_stats *buf;
bool flash_op_err = false;
- int i, cw_cnt = last_cw ? 1 : ecc->steps;
+ int i, start_cw, cw_cnt = last_cw ? 1 : ecc->steps;
+ u8 *data_buf_start = data_buf, *oob_buf_start = oob_buf;

buf = (struct read_stats *)nandc->reg_read_buf;
nandc_read_buffer_sync(nandc, true);

- for (i = 0; i < cw_cnt; i++, buf++) {
+ if (last_cw) {
+ start_cw = ecc->steps - 1;
+ cw_cnt = 1;
+ } else {
+ start_cw = 0;
+ cw_cnt = ecc->steps;
+ }
+
+ for (i = start_cw; i < start_cw + cw_cnt; i++, buf++) {
u32 flash, buffer, erased_cw;
int data_len, oob_len;

@@ -1746,36 +1833,8 @@ static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
erased = false;
}

- if (erased) {
- data_buf += data_len;
- if (oob_buf)
- oob_buf += oob_len + ecc->bytes;
- continue;
- }
-
- if (buffer & BS_UNCORRECTABLE_BIT) {
- int ret, ecclen, extraooblen;
- void *eccbuf;
-
- eccbuf = oob_buf ? oob_buf + oob_len : NULL;
- ecclen = oob_buf ? host->ecc_bytes_hw : 0;
- extraooblen = oob_buf ? oob_len : 0;
-
- /*
- * make sure it isn't an erased page reported
- * as not-erased by HW because of a few bitflips
- */
- ret = nand_check_erased_ecc_chunk(data_buf,
- data_len, eccbuf, ecclen, oob_buf,
- extraooblen, ecc->strength);
- if (ret < 0) {
- mtd->ecc_stats.failed++;
- } else {
- mtd->ecc_stats.corrected += ret;
- max_bitflips =
- max_t(unsigned int, max_bitflips, ret);
- }
- }
+ if (!erased)
+ uncorrectable_err_cws |= BIT(i);
} else if (flash & (FS_OP_ERR | FS_MPU_ERR)) {
flash_op_err = true;
} else {
@@ -1795,7 +1854,12 @@ static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
if (flash_op_err)
return -EIO;

- return max_bitflips;
+ if (!uncorrectable_err_cws)
+ return max_bitflips;
+
+ return check_for_erased_page(host, data_buf_start, oob_buf_start,
+ uncorrectable_err_cws, page,
+ max_bitflips, last_cw);
}

/*
@@ -1803,7 +1867,7 @@ static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
* ecc->read_oob()
*/
static int read_page_ecc(struct qcom_nand_host *host, u8 *data_buf,
- u8 *oob_buf)
+ u8 *oob_buf, int page)
{
struct nand_chip *chip = &host->chip;
struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
@@ -1876,7 +1940,7 @@ static int read_page_ecc(struct qcom_nand_host *host, u8 *data_buf,

if (!ret)
ret = parse_read_errors(host, data_buf_start, oob_buf_start,
- false);
+ false, page);

return ret;
}
@@ -1917,7 +1981,7 @@ static int copy_last_cw(struct qcom_nand_host *host, int page)
if (host->use_ecc)
ret = parse_read_errors(host, nandc->data_buffer,
nandc->data_buffer + size,
- true);
+ true, page);
else
ret = check_flash_errors(host, 1);
}
@@ -1939,7 +2003,7 @@ static int qcom_nandc_read_page(struct mtd_info *mtd, struct nand_chip *chip,

clear_bam_transaction(nandc);

- return read_page_ecc(host, data_buf, oob_buf);
+ return read_page_ecc(host, data_buf, oob_buf, page);
}

/* implements ecc->read_page_raw() */
@@ -1966,7 +2030,7 @@ static int qcom_nandc_read_oob(struct mtd_info *mtd, struct nand_chip *chip,
set_address(host, 0, page);
update_rw_regs(host, ecc->steps, true);

- return read_page_ecc(host, NULL, chip->oob_poi);
+ return read_page_ecc(host, NULL, chip->oob_poi, page);
}

/* implements ecc->write_page() */
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation


2018-04-04 12:45:32

by Abhishek Sahu

[permalink] [raw]
Subject: [PATCH 1/9] mtd: nand: qcom: use the ecc strength from device parameter

Currently the driver uses the ECC strength specified in
device tree. The ONFI or JEDEC device parameter page
contains the ‘ECC correctability’ field which indicates the
number of bits that the host should be able to correct per
512 bytes of data. The ecc correctability is assigned in
chip parameter during device probe time. QPIC/EBI2 NAND
supports 4/8-bit ecc correction. The Same kind of board
can have different NAND parts so use the ecc strength
from device parameter (if its non zero) instead of
device tree.

Signed-off-by: Abhishek Sahu <[email protected]>
---
drivers/mtd/nand/qcom_nandc.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/drivers/mtd/nand/qcom_nandc.c b/drivers/mtd/nand/qcom_nandc.c
index 563b759..8dd40de 100644
--- a/drivers/mtd/nand/qcom_nandc.c
+++ b/drivers/mtd/nand/qcom_nandc.c
@@ -2334,6 +2334,14 @@ static int qcom_nand_host_setup(struct qcom_nand_host *host)
return -EINVAL;
}

+ /*
+ * Read the required ecc strength from NAND device and overwrite
+ * the device tree ecc strength for devices which require
+ * ecc correctability bits >= 8
+ */
+ if (chip->ecc_strength_ds >= 8)
+ ecc->strength = 8;
+
wide_bus = chip->options & NAND_BUSWIDTH_16 ? true : false;

if (ecc->strength >= 8) {
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation


2018-04-04 12:46:05

by Abhishek Sahu

[permalink] [raw]
Subject: [PATCH 5/9] mtd: nand: qcom: parse read errors for read oob also

read_page and read_oob both calls the read_page_ecc function.
The QCOM NAND controller protect the OOB available bytes with
ECC so read errors should be checked for read_oob also. Now
this patch moves the error checking code inside read_page_ecc
so caller does not have to check explicitly for read errors.

Signed-off-by: Abhishek Sahu <[email protected]>
---
drivers/mtd/nand/qcom_nandc.c | 19 ++++++-------------
1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/mtd/nand/qcom_nandc.c b/drivers/mtd/nand/qcom_nandc.c
index 0ebcc55..ba43752 100644
--- a/drivers/mtd/nand/qcom_nandc.c
+++ b/drivers/mtd/nand/qcom_nandc.c
@@ -1676,6 +1676,7 @@ static int read_page_ecc(struct qcom_nand_host *host, u8 *data_buf,
struct nand_chip *chip = &host->chip;
struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
struct nand_ecc_ctrl *ecc = &chip->ecc;
+ u8 *data_buf_start = data_buf, *oob_buf_start = oob_buf;
int i, ret;

config_nand_page_read(nandc);
@@ -1741,6 +1742,9 @@ static int read_page_ecc(struct qcom_nand_host *host, u8 *data_buf,

free_descs(nandc);

+ if (!ret)
+ ret = parse_read_errors(host, data_buf_start, oob_buf_start);
+
return ret;
}

@@ -1786,20 +1790,14 @@ static int qcom_nandc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
struct qcom_nand_host *host = to_qcom_nand_host(chip);
struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
u8 *data_buf, *oob_buf = NULL;
- int ret;

nand_read_page_op(chip, page, 0, NULL, 0);
data_buf = buf;
oob_buf = oob_required ? chip->oob_poi : NULL;

clear_bam_transaction(nandc);
- ret = read_page_ecc(host, data_buf, oob_buf);
- if (ret) {
- dev_err(nandc->dev, "failure to read page\n");
- return ret;
- }

- return parse_read_errors(host, data_buf, oob_buf);
+ return read_page_ecc(host, data_buf, oob_buf);
}

/* implements ecc->read_page_raw() */
@@ -1889,7 +1887,6 @@ static int qcom_nandc_read_oob(struct mtd_info *mtd, struct nand_chip *chip,
struct qcom_nand_host *host = to_qcom_nand_host(chip);
struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
struct nand_ecc_ctrl *ecc = &chip->ecc;
- int ret;

clear_read_regs(nandc);
clear_bam_transaction(nandc);
@@ -1898,11 +1895,7 @@ static int qcom_nandc_read_oob(struct mtd_info *mtd, struct nand_chip *chip,
set_address(host, 0, page);
update_rw_regs(host, ecc->steps, true);

- ret = read_page_ecc(host, NULL, chip->oob_poi);
- if (ret)
- dev_err(nandc->dev, "failure to read oob\n");
-
- return ret;
+ return read_page_ecc(host, NULL, chip->oob_poi);
}

/* implements ecc->write_page() */
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation


2018-04-04 12:46:06

by Abhishek Sahu

[permalink] [raw]
Subject: [PATCH 7/9] mtd: nand: qcom: check for operation errors in case of raw read

Currently there is no error checking for raw read. For raw
reads, there won’t be any ECC failure but the operational
failures are possible so schedule the NAND_FLASH_STATUS read
after each codeword.

Signed-off-by: Abhishek Sahu <[email protected]>
---
drivers/mtd/nand/qcom_nandc.c | 56 +++++++++++++++++++++++++++++++++++--------
1 file changed, 46 insertions(+), 10 deletions(-)

diff --git a/drivers/mtd/nand/qcom_nandc.c b/drivers/mtd/nand/qcom_nandc.c
index dce97e8..40c790e 100644
--- a/drivers/mtd/nand/qcom_nandc.c
+++ b/drivers/mtd/nand/qcom_nandc.c
@@ -1099,7 +1099,8 @@ static void config_nand_page_read(struct qcom_nand_controller *nandc)
* Helper to prepare DMA descriptors for configuring registers
* before reading each codeword in NAND page.
*/
-static void config_nand_cw_read(struct qcom_nand_controller *nandc)
+static void
+config_nand_cw_read(struct qcom_nand_controller *nandc, bool use_ecc)
{
if (nandc->props->is_bam)
write_reg_dma(nandc, NAND_READ_LOCATION_0, 4,
@@ -1108,19 +1109,25 @@ static void config_nand_cw_read(struct qcom_nand_controller *nandc)
write_reg_dma(nandc, NAND_FLASH_CMD, 1, NAND_BAM_NEXT_SGL);
write_reg_dma(nandc, NAND_EXEC_CMD, 1, NAND_BAM_NEXT_SGL);

- read_reg_dma(nandc, NAND_FLASH_STATUS, 2, 0);
- read_reg_dma(nandc, NAND_ERASED_CW_DETECT_STATUS, 1,
- NAND_BAM_NEXT_SGL);
+ if (use_ecc) {
+ read_reg_dma(nandc, NAND_FLASH_STATUS, 2, 0);
+ read_reg_dma(nandc, NAND_ERASED_CW_DETECT_STATUS, 1,
+ NAND_BAM_NEXT_SGL);
+ } else {
+ read_reg_dma(nandc, NAND_FLASH_STATUS, 1, NAND_BAM_NEXT_SGL);
+ }
}

/*
* Helper to prepare dma descriptors to configure registers needed for reading a
* single codeword in page
*/
-static void config_nand_single_cw_page_read(struct qcom_nand_controller *nandc)
+static void
+config_nand_single_cw_page_read(struct qcom_nand_controller *nandc,
+ bool use_ecc)
{
config_nand_page_read(nandc);
- config_nand_cw_read(nandc);
+ config_nand_cw_read(nandc, use_ecc);
}

/*
@@ -1201,7 +1208,7 @@ static int nandc_param(struct qcom_nand_host *host)
nandc->buf_count = 512;
memset(nandc->data_buffer, 0xff, nandc->buf_count);

- config_nand_single_cw_page_read(nandc);
+ config_nand_single_cw_page_read(nandc, false);

read_data_dma(nandc, FLASH_BUF_ACC, nandc->data_buffer,
nandc->buf_count, 0);
@@ -1565,6 +1572,23 @@ struct read_stats {
__le32 erased_cw;
};

+/* reads back FLASH_STATUS register set by the controller */
+static int check_flash_errors(struct qcom_nand_host *host, int cw_cnt)
+{
+ struct nand_chip *chip = &host->chip;
+ struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
+ int i;
+
+ for (i = 0; i < cw_cnt; i++) {
+ u32 flash = le32_to_cpu(nandc->reg_read_buf[i]);
+
+ if (flash & (FS_OP_ERR | FS_MPU_ERR))
+ return -EIO;
+ }
+
+ return 0;
+}
+
/*
* reads back status registers set by the controller to notify page read
* errors. this is equivalent to what 'ecc->correct()' would do.
@@ -1707,7 +1731,7 @@ static int read_page_ecc(struct qcom_nand_host *host, u8 *data_buf,
}
}

- config_nand_cw_read(nandc);
+ config_nand_cw_read(nandc, true);

if (data_buf)
read_data_dma(nandc, FLASH_BUF_ACC, data_buf,
@@ -1771,7 +1795,7 @@ static int copy_last_cw(struct qcom_nand_host *host, int page)
set_address(host, host->cw_size * (ecc->steps - 1), page);
update_rw_regs(host, 1, true);

- config_nand_single_cw_page_read(nandc);
+ config_nand_single_cw_page_read(nandc, host->use_ecc);

read_data_dma(nandc, FLASH_BUF_ACC, nandc->data_buffer, size, 0);

@@ -1781,6 +1805,15 @@ static int copy_last_cw(struct qcom_nand_host *host, int page)

free_descs(nandc);

+ if (!ret) {
+ if (host->use_ecc)
+ ret = parse_read_errors(host, nandc->data_buffer,
+ nandc->data_buffer + size,
+ true);
+ else
+ ret = check_flash_errors(host, 1);
+ }
+
return ret;
}

@@ -1854,7 +1887,7 @@ static int qcom_nandc_read_page_raw(struct mtd_info *mtd,
nandc_set_read_loc(nandc, 3, read_loc, oob_size2, 1);
}

- config_nand_cw_read(nandc);
+ config_nand_cw_read(nandc, false);

read_data_dma(nandc, reg_off, data_buf, data_size1, 0);
reg_off += data_size1;
@@ -1878,6 +1911,9 @@ static int qcom_nandc_read_page_raw(struct mtd_info *mtd,

free_descs(nandc);

+ if (!ret)
+ ret = check_flash_errors(host, ecc->steps);
+
return 0;
}

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation


2018-04-04 12:46:42

by Abhishek Sahu

[permalink] [raw]
Subject: [PATCH 2/9] mtd: nand: qcom: wait for desc completion in all BAM channels

The BAM has 3 channels - tx, rx and command. command channel
is used for register read/writes, tx channel for data writes
and rx channel for data reads. Currently, the driver assumes the
transfer completion once it gets all the command descriptor
completed. Sometimes, there is race condition in data channel
(tx/rx) and command channel completion and in these cases,
the data in buffer is not valid during the small window between
command descriptor completion and data descriptor completion.

Now, the changes have been made to assign the callback for
channel's final descriptor. The DMA will generate the callback
when all the descriptor have completed in that channel.
The NAND transfer will be completed only when all required
DMA channels have generated the completion callback.

Signed-off-by: Abhishek Sahu <[email protected]>
---
drivers/mtd/nand/qcom_nandc.c | 55 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 54 insertions(+), 1 deletion(-)

diff --git a/drivers/mtd/nand/qcom_nandc.c b/drivers/mtd/nand/qcom_nandc.c
index 8dd40de..17321fc 100644
--- a/drivers/mtd/nand/qcom_nandc.c
+++ b/drivers/mtd/nand/qcom_nandc.c
@@ -213,6 +213,8 @@
#define QPIC_PER_CW_CMD_SGL 32
#define QPIC_PER_CW_DATA_SGL 8

+#define QPIC_NAND_COMPLETION_TIMEOUT msecs_to_jiffies(2000)
+
/*
* Flags used in DMA descriptor preparation helper functions
* (i.e. read_reg_dma/write_reg_dma/read_data_dma/write_data_dma)
@@ -245,6 +247,11 @@
* @tx_sgl_start - start index in data sgl for tx.
* @rx_sgl_pos - current index in data sgl for rx.
* @rx_sgl_start - start index in data sgl for rx.
+ * @first_chan_done - if current transfer already has got first channel
+ * DMA desc completion.
+ * @txn_done - completion for nand transfer.
+ * @last_data_desc - last DMA desc in data channel (tx/rx).
+ * @last_cmd_desc - last DMA desc in command channel.
*/
struct bam_transaction {
struct bam_cmd_element *bam_ce;
@@ -258,6 +265,10 @@ struct bam_transaction {
u32 tx_sgl_start;
u32 rx_sgl_pos;
u32 rx_sgl_start;
+ bool first_chan_done;
+ struct completion txn_done;
+ struct dma_async_tx_descriptor *last_data_desc;
+ struct dma_async_tx_descriptor *last_cmd_desc;
};

/*
@@ -504,6 +515,8 @@ static void free_bam_transaction(struct qcom_nand_controller *nandc)

bam_txn->data_sgl = bam_txn_buf;

+ init_completion(&bam_txn->txn_done);
+
return bam_txn;
}

@@ -523,11 +536,36 @@ static void clear_bam_transaction(struct qcom_nand_controller *nandc)
bam_txn->tx_sgl_start = 0;
bam_txn->rx_sgl_pos = 0;
bam_txn->rx_sgl_start = 0;
+ bam_txn->last_data_desc = NULL;
+ bam_txn->first_chan_done = false;

sg_init_table(bam_txn->cmd_sgl, nandc->max_cwperpage *
QPIC_PER_CW_CMD_SGL);
sg_init_table(bam_txn->data_sgl, nandc->max_cwperpage *
QPIC_PER_CW_DATA_SGL);
+
+ reinit_completion(&bam_txn->txn_done);
+}
+
+/* Callback for DMA descriptor completion */
+static void qpic_bam_dma_done(void *data)
+{
+ struct bam_transaction *bam_txn = data;
+
+ /*
+ * In case of data transfer with NAND, 2 callbacks will be generated.
+ * One for command channel and another one for data channel.
+ * If current transaction has data descriptors then check if its
+ * already got one DMA channel completion callback. In this case
+ * make the NAND transfer complete otherwise mark first_chan_done true
+ * and wait for next channel DMA completion callback.
+ */
+ if (bam_txn->last_data_desc && !bam_txn->first_chan_done) {
+ bam_txn->first_chan_done = true;
+ return;
+ }
+
+ complete(&bam_txn->txn_done);
}

static inline struct qcom_nand_host *to_qcom_nand_host(struct nand_chip *chip)
@@ -756,6 +794,12 @@ static int prepare_bam_async_desc(struct qcom_nand_controller *nandc,

desc->dma_desc = dma_desc;

+ /* update last data/command descriptor */
+ if (chan == nandc->cmd_chan)
+ bam_txn->last_cmd_desc = dma_desc;
+ else
+ bam_txn->last_data_desc = dma_desc;
+
list_add_tail(&desc->node, &nandc->desc_list);

return 0;
@@ -1273,10 +1317,19 @@ static int submit_descs(struct qcom_nand_controller *nandc)
cookie = dmaengine_submit(desc->dma_desc);

if (nandc->props->is_bam) {
+ bam_txn->last_cmd_desc->callback = qpic_bam_dma_done;
+ bam_txn->last_cmd_desc->callback_param = bam_txn;
+ if (bam_txn->last_data_desc) {
+ bam_txn->last_data_desc->callback = qpic_bam_dma_done;
+ bam_txn->last_data_desc->callback_param = bam_txn;
+ }
+
dma_async_issue_pending(nandc->tx_chan);
dma_async_issue_pending(nandc->rx_chan);
+ dma_async_issue_pending(nandc->cmd_chan);

- if (dma_sync_wait(nandc->cmd_chan, cookie) != DMA_COMPLETE)
+ if (!wait_for_completion_timeout(&bam_txn->txn_done,
+ QPIC_NAND_COMPLETION_TIMEOUT))
return -ETIMEDOUT;
} else {
if (dma_sync_wait(nandc->chan, cookie) != DMA_COMPLETE)
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation


2018-04-04 12:47:42

by Abhishek Sahu

[permalink] [raw]
Subject: [PATCH 3/9] mtd: nand: qcom: erased page detection for uncorrectable errors only

The NAND flash controller generates ECC uncorrectable error
first in case of completely erased page. Currently driver
applies the erased page detection logic for other operation
errors also so fix this and return EIO for other operational
errors.

Signed-off-by: Abhishek Sahu <[email protected]>
---
drivers/mtd/nand/qcom_nandc.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/mtd/nand/qcom_nandc.c b/drivers/mtd/nand/qcom_nandc.c
index 17321fc..57c16a6 100644
--- a/drivers/mtd/nand/qcom_nandc.c
+++ b/drivers/mtd/nand/qcom_nandc.c
@@ -1578,6 +1578,7 @@ static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
struct nand_ecc_ctrl *ecc = &chip->ecc;
unsigned int max_bitflips = 0;
struct read_stats *buf;
+ bool flash_op_err = false;
int i;

buf = (struct read_stats *)nandc->reg_read_buf;
@@ -1599,7 +1600,7 @@ static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
buffer = le32_to_cpu(buf->buffer);
erased_cw = le32_to_cpu(buf->erased_cw);

- if (flash & (FS_OP_ERR | FS_MPU_ERR)) {
+ if ((flash & FS_OP_ERR) && (buffer & BS_UNCORRECTABLE_BIT)) {
bool erased;

/* ignore erased codeword errors */
@@ -1641,6 +1642,8 @@ static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
max_t(unsigned int, max_bitflips, ret);
}
}
+ } else if (flash & (FS_OP_ERR | FS_MPU_ERR)) {
+ flash_op_err = true;
} else {
unsigned int stat;

@@ -1654,6 +1657,9 @@ static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
oob_buf += oob_len + ecc->bytes;
}

+ if (flash_op_err)
+ return -EIO;
+
return max_bitflips;
}

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation


2018-04-06 12:32:59

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH 1/9] mtd: nand: qcom: use the ecc strength from device parameter

Hi Abhishek,

On Wed, 4 Apr 2018 18:12:17 +0530, Abhishek Sahu
<[email protected]> wrote:

> Currently the driver uses the ECC strength specified in
> device tree. The ONFI or JEDEC device parameter page
> contains the ‘ECC correctability’ field which indicates the
> number of bits that the host should be able to correct per
> 512 bytes of data.

This is misleading. This field is not about the controller but rather
the chip requirements in terms of minimal strength for nominal use.

> The ecc correctability is assigned in
> chip parameter during device probe time. QPIC/EBI2 NAND
> supports 4/8-bit ecc correction. The Same kind of board
> can have different NAND parts so use the ecc strength
> from device parameter (if its non zero) instead of
> device tree.

That is not what you do.

What you do is forcing the strength to be 8-bit per ECC chunk if the
NAND chip requires at least 8-bit/chunk strength.

The DT property is here to force a strength. Otherwise, Linux will
propose to the NAND controller to use the minimum strength required by
the chip (from either the ONFI/JEDEC parameter page or from a static
table).

IMHO, you have two solutions:
1/ Remove these properties from the board DT (breaks DT backward
compatibility though);
2/ Create another DT for the board.

However, there is something to do in this driver: the strength chosen
should be limited to the controller capabilities (8-bit/512B if I
understand correctly). In this case you have two options: either you
limit the strength like the size [1] if (ecc->strength > 8); or you
just limit the maximum strength to 8 like this [2] and the core will
spawn a warning in the dmesg telling that the ECC strength used is
below the NAND chip requirements.

Thanks,
Miquèl

[1] https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/qcom_nandc.c#L2332
[2] http://code.bulix.org/nyf63w-315268


>
> Signed-off-by: Abhishek Sahu <[email protected]>
> ---
> drivers/mtd/nand/qcom_nandc.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/drivers/mtd/nand/qcom_nandc.c b/drivers/mtd/nand/qcom_nandc.c
> index 563b759..8dd40de 100644
> --- a/drivers/mtd/nand/qcom_nandc.c
> +++ b/drivers/mtd/nand/qcom_nandc.c
> @@ -2334,6 +2334,14 @@ static int qcom_nand_host_setup(struct qcom_nand_host *host)
> return -EINVAL;
> }
>
> + /*
> + * Read the required ecc strength from NAND device and overwrite
> + * the device tree ecc strength for devices which require
> + * ecc correctability bits >= 8
> + */
> + if (chip->ecc_strength_ds >= 8)
> + ecc->strength = 8;
> +
> wide_bus = chip->options & NAND_BUSWIDTH_16 ? true : false;
>
> if (ecc->strength >= 8) {



--
Miquel Raynal, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

2018-04-10 06:13:43

by Abhishek Sahu

[permalink] [raw]
Subject: Re: [PATCH 1/9] mtd: nand: qcom: use the ecc strength from device parameter

On 2018-04-06 18:01, Miquel Raynal wrote:
> Hi Abhishek,
>
> On Wed, 4 Apr 2018 18:12:17 +0530, Abhishek Sahu
> <[email protected]> wrote:
>
>> Currently the driver uses the ECC strength specified in
>> device tree. The ONFI or JEDEC device parameter page
>> contains the ‘ECC correctability’ field which indicates the
>> number of bits that the host should be able to correct per
>> 512 bytes of data.
>
> This is misleading. This field is not about the controller but rather
> the chip requirements in terms of minimal strength for nominal use.
>

Thanks Miquel.

Yes. Its NAND chip requirement. I have used the description for
NAND ONFI param page

5.6.1.24. Byte 112: Number of bits ECC correctability
This field indicates the number of bits that the host should be
able to correct per 512 bytes of data.

>> The ecc correctability is assigned in
>> chip parameter during device probe time. QPIC/EBI2 NAND
>> supports 4/8-bit ecc correction. The Same kind of board
>> can have different NAND parts so use the ecc strength
>> from device parameter (if its non zero) instead of
>> device tree.
>
> That is not what you do.
>
> What you do is forcing the strength to be 8-bit per ECC chunk if the
> NAND chip requires at least 8-bit/chunk strength.
>
> The DT property is here to force a strength. Otherwise, Linux will
> propose to the NAND controller to use the minimum strength required by
> the chip (from either the ONFI/JEDEC parameter page or from a static
> table).
>

The main problem is that the same kind of boards can have different
NAND parts.

Lets assume, we have following 2 cases.

1. Non ONFI/JEDEC device for which chip->ecc_strength_ds
will be zero. In this case, the ecc->strength from DT
will be used
2. ONFI/JEDEC device for which chip->ecc_strength_ds > 8.
Since QCOM nand controller can not support
ECC correction greater than 8 bits so we can use 8 bit ECC
itself instead of failing NAND boot completely.

> IMHO, you have two solutions:
> 1/ Remove these properties from the board DT (breaks DT backward
> compatibility though);

- nand-ecc-strength: This is optional property in nand.txt and
Required property in qcom_nandc.txt. We can't remove since
if the device is Non ONFI/JEDEC, then ecc strength will come
from DT only.

> 2/ Create another DT for the board.
>

Its not about board but about part. We have IPQ8074 HK01 board
with 4 bit ECC chip/8 bit ECC chip/non ONFI/JEDEC chip.

> However, there is something to do in this driver: the strength chosen
> should be limited to the controller capabilities (8-bit/512B if I
> understand correctly). In this case you have two options: either you
> limit the strength like the size [1] if (ecc->strength > 8);

Limiting the strength will make all the boards with ecc strength > 8
to fail completely

> just limit the maximum strength to 8 like this [2] and the core will
> spawn a warning in the dmesg telling that the ECC strength used is
> below the NAND chip requirements.

Yes its good idea. I can update the patch with dmesg warning.

Thanks,
Abhishek

>
> Thanks,
> Miquèl
>
> [1]
> https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/qcom_nandc.c#L2332
> [2] http://code.bulix.org/nyf63w-315268
>
>
>>
>> Signed-off-by: Abhishek Sahu <[email protected]>
>> ---
>> drivers/mtd/nand/qcom_nandc.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/mtd/nand/qcom_nandc.c
>> b/drivers/mtd/nand/qcom_nandc.c
>> index 563b759..8dd40de 100644
>> --- a/drivers/mtd/nand/qcom_nandc.c
>> +++ b/drivers/mtd/nand/qcom_nandc.c
>> @@ -2334,6 +2334,14 @@ static int qcom_nand_host_setup(struct
>> qcom_nand_host *host)
>> return -EINVAL;
>> }
>>
>> + /*
>> + * Read the required ecc strength from NAND device and overwrite
>> + * the device tree ecc strength for devices which require
>> + * ecc correctability bits >= 8
>> + */
>> + if (chip->ecc_strength_ds >= 8)
>> + ecc->strength = 8;
>> +
>> wide_bus = chip->options & NAND_BUSWIDTH_16 ? true : false;
>>
>> if (ecc->strength >= 8) {

2018-04-10 07:53:47

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH 1/9] mtd: nand: qcom: use the ecc strength from device parameter

Hi Abhishek,

On Tue, 10 Apr 2018 11:39:35 +0530, Abhishek Sahu
<[email protected]> wrote:

> On 2018-04-06 18:01, Miquel Raynal wrote:
> > Hi Abhishek,
> >
> > On Wed, 4 Apr 2018 18:12:17 +0530, Abhishek Sahu
> > <[email protected]> wrote:
> >
> >> Currently the driver uses the ECC strength specified in
> >> device tree. The ONFI or JEDEC device parameter page
> >> contains the ‘ECC correctability’ field which indicates the
> >> number of bits that the host should be able to correct per
> >> 512 bytes of data.
> >
> > This is misleading. This field is not about the controller but rather
> > the chip requirements in terms of minimal strength for nominal use.
> >
>
> Thanks Miquel.
>
> Yes. Its NAND chip requirement. I have used the description for
> NAND ONFI param page
>
> 5.6.1.24. Byte 112: Number of bits ECC correctability
> This field indicates the number of bits that the host should be
> able to correct per 512 bytes of data.
>
> >> The ecc correctability is assigned in
> >> chip parameter during device probe time. QPIC/EBI2 NAND
> >> supports 4/8-bit ecc correction. The Same kind of board
> >> can have different NAND parts so use the ecc strength
> >> from device parameter (if its non zero) instead of
> >> device tree.
> >
> > That is not what you do.
> >
> > What you do is forcing the strength to be 8-bit per ECC chunk if the
> > NAND chip requires at least 8-bit/chunk strength.
> >
> > The DT property is here to force a strength. Otherwise, Linux will
> > propose to the NAND controller to use the minimum strength required by
> > the chip (from either the ONFI/JEDEC parameter page or from a static
> > table).
> >
>
> The main problem is that the same kind of boards can have different
> NAND parts.
>
> Lets assume, we have following 2 cases.
>
> 1. Non ONFI/JEDEC device for which chip->ecc_strength_ds
> will be zero. In this case, the ecc->strength from DT
> will be used

No, the strength from DT will always be used if the property is
present, no matter the type of chip.

> 2. ONFI/JEDEC device for which chip->ecc_strength_ds > 8.
> Since QCOM nand controller can not support
> ECC correction greater than 8 bits so we can use 8 bit ECC
> itself instead of failing NAND boot completely.

I understand that. But this is still not what you do.

>
> > IMHO, you have two solutions:
> > 1/ Remove these properties from the board DT (breaks DT backward
> > compatibility though);
>
> - nand-ecc-strength: This is optional property in nand.txt and
> Required property in qcom_nandc.txt.

Well, this property is not controller specific but chip specific. The
controller driver does not rely on it, so this property should not be
required.

> We can't remove since
> if the device is Non ONFI/JEDEC, then ecc strength will come
> from DT only.

We can remove it and let the core handle this (as this is generic to
all raw NANDs and not specific to this controller driver). Try it out!

However if the defaults value do not match your expectations, I think
you can add your non-ONFI/JEDEC chip in 'nand_ids.c', this should fill
your strength_ds field and let you avoid using these properties.

>
> > 2/ Create another DT for the board.
> >
>
> Its not about board but about part. We have IPQ8074 HK01 board
> with 4 bit ECC chip/8 bit ECC chip/non ONFI/JEDEC chip.
>
> > However, there is something to do in this driver: the strength chosen
> > should be limited to the controller capabilities (8-bit/512B if I
> > understand correctly). In this case you have two options: either you
> > limit the strength like the size [1] if (ecc->strength > 8);
>
> Limiting the strength will make all the boards with ecc strength > 8
> to fail completely
>
> > just limit the maximum strength to 8 like this [2] and the core will
> > spawn a warning in the dmesg telling that the ECC strength used is
> > below the NAND chip requirements.
>
> Yes its good idea. I can update the patch with dmesg warning.
>
> Thanks,
> Abhishek
>
> >
> > Thanks,
> > Miquèl
> >
> > [1]
> > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/qcom_nandc.c#L2332
> > [2] http://code.bulix.org/nyf63w-315268
> >
> >
> >>
> >> Signed-off-by: Abhishek Sahu <[email protected]>
> >> ---
> >> drivers/mtd/nand/qcom_nandc.c | 8 ++++++++
> >> 1 file changed, 8 insertions(+)
> >>
> >> diff --git a/drivers/mtd/nand/qcom_nandc.c
> >> b/drivers/mtd/nand/qcom_nandc.c
> >> index 563b759..8dd40de 100644
> >> --- a/drivers/mtd/nand/qcom_nandc.c
> >> +++ b/drivers/mtd/nand/qcom_nandc.c
> >> @@ -2334,6 +2334,14 @@ static int qcom_nand_host_setup(struct
> >> qcom_nand_host *host)
> >> return -EINVAL;
> >> }
> >>
> >> + /*
> >> + * Read the required ecc strength from NAND device and overwrite
> >> + * the device tree ecc strength for devices which require
> >> + * ecc correctability bits >= 8
> >> + */
> >> + if (chip->ecc_strength_ds >= 8)
> >> + ecc->strength = 8;
> >> +
> >> wide_bus = chip->options & NAND_BUSWIDTH_16 ? true : false;
> >>
> >> if (ecc->strength >= 8) {
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/


Thanks,
Miquèl

--
Miquel Raynal, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

2018-04-10 07:59:56

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH 1/9] mtd: nand: qcom: use the ecc strength from device parameter


> Hi Abhishek,
>
> On Tue, 10 Apr 2018 11:39:35 +0530, Abhishek Sahu
> <[email protected]> wrote:
>
> > On 2018-04-06 18:01, Miquel Raynal wrote:
> > > Hi Abhishek,
> > >
> > > On Wed, 4 Apr 2018 18:12:17 +0530, Abhishek Sahu
> > > <[email protected]> wrote:
> > >
> > >> Currently the driver uses the ECC strength specified in
> > >> device tree. The ONFI or JEDEC device parameter page
> > >> contains the ‘ECC correctability’ field which indicates the
> > >> number of bits that the host should be able to correct per
> > >> 512 bytes of data.
> > >
> > > This is misleading. This field is not about the controller but rather
> > > the chip requirements in terms of minimal strength for nominal use.
> > >
> >
> > Thanks Miquel.
> >
> > Yes. Its NAND chip requirement. I have used the description for
> > NAND ONFI param page
> >
> > 5.6.1.24. Byte 112: Number of bits ECC correctability
> > This field indicates the number of bits that the host should be
> > able to correct per 512 bytes of data.
> >
> > >> The ecc correctability is assigned in
> > >> chip parameter during device probe time. QPIC/EBI2 NAND
> > >> supports 4/8-bit ecc correction. The Same kind of board
> > >> can have different NAND parts so use the ecc strength
> > >> from device parameter (if its non zero) instead of
> > >> device tree.
> > >
> > > That is not what you do.
> > >
> > > What you do is forcing the strength to be 8-bit per ECC chunk if the
> > > NAND chip requires at least 8-bit/chunk strength.
> > >
> > > The DT property is here to force a strength. Otherwise, Linux will
> > > propose to the NAND controller to use the minimum strength required by
> > > the chip (from either the ONFI/JEDEC parameter page or from a static
> > > table).
> > >
> >
> > The main problem is that the same kind of boards can have different
> > NAND parts.
> >
> > Lets assume, we have following 2 cases.
> >
> > 1. Non ONFI/JEDEC device for which chip->ecc_strength_ds
> > will be zero. In this case, the ecc->strength from DT
> > will be used
>
> No, the strength from DT will always be used if the property is
> present, no matter the type of chip.
>
> > 2. ONFI/JEDEC device for which chip->ecc_strength_ds > 8.
> > Since QCOM nand controller can not support
> > ECC correction greater than 8 bits so we can use 8 bit ECC
> > itself instead of failing NAND boot completely.
>
> I understand that. But this is still not what you do.
>
> >
> > > IMHO, you have two solutions:
> > > 1/ Remove these properties from the board DT (breaks DT backward
> > > compatibility though);
> >
> > - nand-ecc-strength: This is optional property in nand.txt and
> > Required property in qcom_nandc.txt.
>
> Well, this property is not controller specific but chip specific. The
> controller driver does not rely on it, so this property should not be
> required.
>
> > We can't remove since
> > if the device is Non ONFI/JEDEC, then ecc strength will come
> > from DT only.
>
> We can remove it and let the core handle this (as this is generic to
> all raw NANDs and not specific to this controller driver). Try it out!
>
> However if the defaults value do not match your expectations, I think
> you can add your non-ONFI/JEDEC chip in 'nand_ids.c', this should fill
> your strength_ds field and let you avoid using these properties.

Actually nand_ids.c should not be filled anymore, instead you can
implement this detection thanks to the part full name in the vendor
code nand_samsung.c, nand_micron.c, nand_macronix.c, nand_hynix.c, etc.
Depending on what part you are using, it might already work.

>
> >
> > > 2/ Create another DT for the board.
> > >
> >
> > Its not about board but about part. We have IPQ8074 HK01 board
> > with 4 bit ECC chip/8 bit ECC chip/non ONFI/JEDEC chip.
> >
> > > However, there is something to do in this driver: the strength chosen
> > > should be limited to the controller capabilities (8-bit/512B if I
> > > understand correctly). In this case you have two options: either you
> > > limit the strength like the size [1] if (ecc->strength > 8);
> >
> > Limiting the strength will make all the boards with ecc strength > 8
> > to fail completely
> >
> > > just limit the maximum strength to 8 like this [2] and the core will
> > > spawn a warning in the dmesg telling that the ECC strength used is
> > > below the NAND chip requirements.
> >
> > Yes its good idea. I can update the patch with dmesg warning.
> >
> > Thanks,
> > Abhishek
> >
> > >
> > > Thanks,
> > > Miquèl
> > >
> > > [1]
> > > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/qcom_nandc.c#L2332
> > > [2] http://code.bulix.org/nyf63w-315268
> > >
> > >
> > >>
> > >> Signed-off-by: Abhishek Sahu <[email protected]>
> > >> ---
> > >> drivers/mtd/nand/qcom_nandc.c | 8 ++++++++
> > >> 1 file changed, 8 insertions(+)
> > >>
> > >> diff --git a/drivers/mtd/nand/qcom_nandc.c
> > >> b/drivers/mtd/nand/qcom_nandc.c
> > >> index 563b759..8dd40de 100644
> > >> --- a/drivers/mtd/nand/qcom_nandc.c
> > >> +++ b/drivers/mtd/nand/qcom_nandc.c
> > >> @@ -2334,6 +2334,14 @@ static int qcom_nand_host_setup(struct
> > >> qcom_nand_host *host)
> > >> return -EINVAL;
> > >> }
> > >>
> > >> + /*
> > >> + * Read the required ecc strength from NAND device and overwrite
> > >> + * the device tree ecc strength for devices which require
> > >> + * ecc correctability bits >= 8
> > >> + */
> > >> + if (chip->ecc_strength_ds >= 8)
> > >> + ecc->strength = 8;
> > >> +
> > >> wide_bus = chip->options & NAND_BUSWIDTH_16 ? true : false;
> > >>
> > >> if (ecc->strength >= 8) {
> >
> > ______________________________________________________
> > Linux MTD discussion mailing list
> > http://lists.infradead.org/mailman/listinfo/linux-mtd/
>
>

Thanks,
Miquèl


--
Miquel Raynal, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

2018-04-10 08:11:26

by Boris Brezillon

[permalink] [raw]
Subject: Re: [PATCH 1/9] mtd: nand: qcom: use the ecc strength from device parameter

On Tue, 10 Apr 2018 09:55:58 +0200
Miquel Raynal <[email protected]> wrote:

> > Hi Abhishek,
> >
> > On Tue, 10 Apr 2018 11:39:35 +0530, Abhishek Sahu
> > <[email protected]> wrote:
> >
> > > On 2018-04-06 18:01, Miquel Raynal wrote:
> > > > Hi Abhishek,
> > > >
> > > > On Wed, 4 Apr 2018 18:12:17 +0530, Abhishek Sahu
> > > > <[email protected]> wrote:
> > > >
> > > >> Currently the driver uses the ECC strength specified in
> > > >> device tree. The ONFI or JEDEC device parameter page
> > > >> contains the ‘ECC correctability’ field which indicates the
> > > >> number of bits that the host should be able to correct per
> > > >> 512 bytes of data.
> > > >
> > > > This is misleading. This field is not about the controller but rather
> > > > the chip requirements in terms of minimal strength for nominal use.
> > > >
> > >
> > > Thanks Miquel.
> > >
> > > Yes. Its NAND chip requirement. I have used the description for
> > > NAND ONFI param page
> > >
> > > 5.6.1.24. Byte 112: Number of bits ECC correctability
> > > This field indicates the number of bits that the host should be
> > > able to correct per 512 bytes of data.
> > >
> > > >> The ecc correctability is assigned in
> > > >> chip parameter during device probe time. QPIC/EBI2 NAND
> > > >> supports 4/8-bit ecc correction. The Same kind of board
> > > >> can have different NAND parts so use the ecc strength
> > > >> from device parameter (if its non zero) instead of
> > > >> device tree.
> > > >
> > > > That is not what you do.
> > > >
> > > > What you do is forcing the strength to be 8-bit per ECC chunk if the
> > > > NAND chip requires at least 8-bit/chunk strength.
> > > >
> > > > The DT property is here to force a strength. Otherwise, Linux will
> > > > propose to the NAND controller to use the minimum strength required by
> > > > the chip (from either the ONFI/JEDEC parameter page or from a static
> > > > table).
> > > >
> > >
> > > The main problem is that the same kind of boards can have different
> > > NAND parts.
> > >
> > > Lets assume, we have following 2 cases.
> > >
> > > 1. Non ONFI/JEDEC device for which chip->ecc_strength_ds
> > > will be zero. In this case, the ecc->strength from DT
> > > will be used
> >
> > No, the strength from DT will always be used if the property is
> > present, no matter the type of chip.
> >
> > > 2. ONFI/JEDEC device for which chip->ecc_strength_ds > 8.
> > > Since QCOM nand controller can not support
> > > ECC correction greater than 8 bits so we can use 8 bit ECC
> > > itself instead of failing NAND boot completely.
> >
> > I understand that. But this is still not what you do.
> >
> > >
> > > > IMHO, you have two solutions:
> > > > 1/ Remove these properties from the board DT (breaks DT backward
> > > > compatibility though);
> > >
> > > - nand-ecc-strength: This is optional property in nand.txt and
> > > Required property in qcom_nandc.txt.
> >
> > Well, this property is not controller specific but chip specific. The
> > controller driver does not rely on it, so this property should not be
> > required.
> >
> > > We can't remove since
> > > if the device is Non ONFI/JEDEC, then ecc strength will come
> > > from DT only.
> >
> > We can remove it and let the core handle this (as this is generic to
> > all raw NANDs and not specific to this controller driver). Try it out!
> >
> > However if the defaults value do not match your expectations, I think
> > you can add your non-ONFI/JEDEC chip in 'nand_ids.c', this should fill
> > your strength_ds field and let you avoid using these properties.
>
> Actually nand_ids.c should not be filled anymore, instead you can
> implement this detection thanks to the part full name in the vendor
> code nand_samsung.c, nand_micron.c, nand_macronix.c, nand_hynix.c, etc.

Usually you don't have to go that far, and the ECC requirements are
encoded somewhere in the ID (after byte 2). When that's not the
case, you may have to check the full ID.

> Depending on what part you are using, it might already work.

Yep.

2018-04-10 09:04:25

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH 3/9] mtd: nand: qcom: erased page detection for uncorrectable errors only

Hi Abhishek,

On Wed, 4 Apr 2018 18:12:19 +0530, Abhishek Sahu
<[email protected]> wrote:

> The NAND flash controller generates ECC uncorrectable error
> first in case of completely erased page. Currently driver
> applies the erased page detection logic for other operation
> errors also so fix this and return EIO for other operational
> errors.

I am sorry I don't understand very well what is the purpose of this
patch, could you please explain it again?

Do you mean that you want to avoid having rising ECC errors when you
read erased pages?

>
> Signed-off-by: Abhishek Sahu <[email protected]>
> ---
> drivers/mtd/nand/qcom_nandc.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/mtd/nand/qcom_nandc.c b/drivers/mtd/nand/qcom_nandc.c
> index 17321fc..57c16a6 100644
> --- a/drivers/mtd/nand/qcom_nandc.c
> +++ b/drivers/mtd/nand/qcom_nandc.c
> @@ -1578,6 +1578,7 @@ static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
> struct nand_ecc_ctrl *ecc = &chip->ecc;
> unsigned int max_bitflips = 0;
> struct read_stats *buf;
> + bool flash_op_err = false;
> int i;
>
> buf = (struct read_stats *)nandc->reg_read_buf;
> @@ -1599,7 +1600,7 @@ static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
> buffer = le32_to_cpu(buf->buffer);
> erased_cw = le32_to_cpu(buf->erased_cw);
>
> - if (flash & (FS_OP_ERR | FS_MPU_ERR)) {
> + if ((flash & FS_OP_ERR) && (buffer & BS_UNCORRECTABLE_BIT)) {

And later you have another "if (buffer & BS_UNCORRECTABLE_BIT)" which
is then redundant, unless that is not what you actually want to do?

Maybe you can add comments before the if ()/ else if () to explain in
which case you enter each branch.

> bool erased;
>
> /* ignore erased codeword errors */
> @@ -1641,6 +1642,8 @@ static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
> max_t(unsigned int, max_bitflips, ret);
> }
> }
> + } else if (flash & (FS_OP_ERR | FS_MPU_ERR)) {
> + flash_op_err = true;
> } else {
> unsigned int stat;
>
> @@ -1654,6 +1657,9 @@ static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
> oob_buf += oob_len + ecc->bytes;
> }
>
> + if (flash_op_err)
> + return -EIO;
> +

In you are propagating an error related to the controller, this is
fine, but I think you just want to raise the fact that a NAND
uncorrectable error occurred, in this case you should just increment
mtd->ecc_stats.failed and return 0 (returning max_bitflips here would be
fine too has it would be 0 too).

> return max_bitflips;
> }
>

Thanks,
Miquèl

--
Miquel Raynal, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

2018-04-10 09:16:48

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH 4/9] mtd: nand: qcom: fix null pointer access for erased buffer detection

Hi Abhishek,

On Wed, 4 Apr 2018 18:12:20 +0530, Abhishek Sahu
<[email protected]> wrote:

> parse_read_errors can be called with only oob buf also in which
> case data_buf will be NULL. If data_buf is NULL, then don’t
> treat this page as completely erased in case of ECC uncorrectable
> error.
>
> Signed-off-by: Abhishek Sahu <[email protected]>
> ---
> drivers/mtd/nand/qcom_nandc.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/mtd/nand/qcom_nandc.c b/drivers/mtd/nand/qcom_nandc.c
> index 57c16a6..0ebcc55 100644
> --- a/drivers/mtd/nand/qcom_nandc.c
> +++ b/drivers/mtd/nand/qcom_nandc.c
> @@ -1607,9 +1607,11 @@ static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
> if (host->bch_enabled) {
> erased = (erased_cw & ERASED_CW) == ERASED_CW ?
> true : false;

Why the parse_read_errors() function could not be called without
data_buf when using BCH? Are you sure the situation can only happen
without it?

Would the following apply here too, with a:

if (!data_buf) {
erased = false;
} else {
if (host->bch_enabled)
...
else
...
}

> - } else {
> + } else if (data_buf) {
> erased = erased_chunk_check_and_fixup(data_buf,
> data_len);
> + } else {
> + erased = false;
> }
>
> if (erased) {
> @@ -1652,7 +1654,8 @@ static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
> max_bitflips = max(max_bitflips, stat);
> }
>
> - data_buf += data_len;
> + if (data_buf)
> + data_buf += data_len;
> if (oob_buf)
> oob_buf += oob_len + ecc->bytes;
> }

Thanks,
Miquèl

--
Miquel Raynal, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

2018-04-10 09:49:08

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH 8/9] mtd: nand: qcom: helper function for raw read

Hi Abhishek,

On Wed, 4 Apr 2018 18:12:24 +0530, Abhishek Sahu
<[email protected]> wrote:

> This patch does minor code reorganization for raw reads.
> Currently the raw read is required for complete page but for
> subsequent patches related with erased codeword bit flips
> detection, only few CW should be read. So, this patch adds
> helper function and introduces the read CW bitmask which
> specifies which CW reads are required in complete page.

I am not sure this is the right approach for subpage reads. If the
controller supports it, you should just enable it in chip->options.

>
> Signed-off-by: Abhishek Sahu <[email protected]>
> ---
> drivers/mtd/nand/qcom_nandc.c | 186 +++++++++++++++++++++++++-----------------
> 1 file changed, 110 insertions(+), 76 deletions(-)
>
> diff --git a/drivers/mtd/nand/qcom_nandc.c b/drivers/mtd/nand/qcom_nandc.c
> index 40c790e..f5d1fa4 100644
> --- a/drivers/mtd/nand/qcom_nandc.c
> +++ b/drivers/mtd/nand/qcom_nandc.c
> @@ -1590,6 +1590,114 @@ static int check_flash_errors(struct qcom_nand_host *host, int cw_cnt)
> }
>
> /*
> + * Helper to perform the page raw read operation. The read_cw_mask will be
> + * used to specify the codewords for which the data should be read. The
> + * single page contains multiple CW. Sometime, only few CW data is required
> + * in complete page. Also, start address will be determined with
> + * this CW mask to skip unnecessary data copy from NAND flash device. Then,
> + * actual data copy from NAND controller to data buffer will be done only
> + * for the CWs which have the mask set.
> + */
> +static int
> +nandc_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip,
> + u8 *data_buf, u8 *oob_buf,
> + int page, unsigned long read_cw_mask)
> +{
> + struct qcom_nand_host *host = to_qcom_nand_host(chip);
> + struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
> + struct nand_ecc_ctrl *ecc = &chip->ecc;
> + int i, ret;
> + int read_loc, start_step, last_step;
> +
> + nand_read_page_op(chip, page, 0, NULL, 0);
> +
> + host->use_ecc = false;
> + start_step = ffs(read_cw_mask) - 1;
> + last_step = fls(read_cw_mask);
> +
> + clear_bam_transaction(nandc);
> + set_address(host, host->cw_size * start_step, page);
> + update_rw_regs(host, last_step - start_step, true);
> + config_nand_page_read(nandc);
> +
> + for (i = start_step; i < last_step; i++) {
> + int data_size1, data_size2, oob_size1, oob_size2;
> + int reg_off = FLASH_BUF_ACC;
> +
> + data_size1 = mtd->writesize - host->cw_size * (ecc->steps - 1);
> + oob_size1 = host->bbm_size;
> +
> + if (i == (ecc->steps - 1)) {
> + data_size2 = ecc->size - data_size1 -
> + ((ecc->steps - 1) << 2);
> + oob_size2 = (ecc->steps << 2) + host->ecc_bytes_hw +
> + host->spare_bytes;
> + } else {
> + data_size2 = host->cw_data - data_size1;
> + oob_size2 = host->ecc_bytes_hw + host->spare_bytes;
> + }
> +
> + /*
> + * Don't perform actual data copy from NAND controller to data
> + * buffer through DMA for this codeword
> + */
> + if (!(read_cw_mask & BIT(i))) {
> + if (nandc->props->is_bam)
> + nandc_set_read_loc(nandc, 0, 0, 0, 1);
> +
> + config_nand_cw_read(nandc, false);
> +
> + data_buf += data_size1 + data_size2;
> + oob_buf += oob_size1 + oob_size2;
> +
> + continue;
> + }
> +
> + if (nandc->props->is_bam) {
> + read_loc = 0;
> + nandc_set_read_loc(nandc, 0, read_loc, data_size1, 0);
> + read_loc += data_size1;
> +
> + nandc_set_read_loc(nandc, 1, read_loc, oob_size1, 0);
> + read_loc += oob_size1;
> +
> + nandc_set_read_loc(nandc, 2, read_loc, data_size2, 0);
> + read_loc += data_size2;
> +
> + nandc_set_read_loc(nandc, 3, read_loc, oob_size2, 1);
> + }
> +
> + config_nand_cw_read(nandc, false);
> +
> + read_data_dma(nandc, reg_off, data_buf, data_size1, 0);
> + reg_off += data_size1;
> + data_buf += data_size1;
> +
> + read_data_dma(nandc, reg_off, oob_buf, oob_size1, 0);
> + reg_off += oob_size1;
> + oob_buf += oob_size1;
> +
> + read_data_dma(nandc, reg_off, data_buf, data_size2, 0);
> + reg_off += data_size2;
> + data_buf += data_size2;
> +
> + read_data_dma(nandc, reg_off, oob_buf, oob_size2, 0);
> + oob_buf += oob_size2;
> + }
> +
> + ret = submit_descs(nandc);
> + if (ret)
> + dev_err(nandc->dev, "failure to read raw page\n");
> +
> + free_descs(nandc);
> +
> + if (!ret)
> + ret = check_flash_errors(host, last_step - start_step);
> +
> + return 0;
> +}
> +
> +/*
> * reads back status registers set by the controller to notify page read
> * errors. this is equivalent to what 'ecc->correct()' would do.
> */
> @@ -1839,82 +1947,8 @@ static int qcom_nandc_read_page_raw(struct mtd_info *mtd,
> struct nand_chip *chip, uint8_t *buf,
> int oob_required, int page)
> {
> - struct qcom_nand_host *host = to_qcom_nand_host(chip);
> - struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
> - u8 *data_buf, *oob_buf;
> - struct nand_ecc_ctrl *ecc = &chip->ecc;
> - int i, ret;
> - int read_loc;
> -
> - nand_read_page_op(chip, page, 0, NULL, 0);
> - data_buf = buf;
> - oob_buf = chip->oob_poi;
> -
> - host->use_ecc = false;
> -
> - clear_bam_transaction(nandc);
> - update_rw_regs(host, ecc->steps, true);
> - config_nand_page_read(nandc);
> -
> - for (i = 0; i < ecc->steps; i++) {
> - int data_size1, data_size2, oob_size1, oob_size2;
> - int reg_off = FLASH_BUF_ACC;
> -
> - data_size1 = mtd->writesize - host->cw_size * (ecc->steps - 1);
> - oob_size1 = host->bbm_size;
> -
> - if (i == (ecc->steps - 1)) {
> - data_size2 = ecc->size - data_size1 -
> - ((ecc->steps - 1) << 2);
> - oob_size2 = (ecc->steps << 2) + host->ecc_bytes_hw +
> - host->spare_bytes;
> - } else {
> - data_size2 = host->cw_data - data_size1;
> - oob_size2 = host->ecc_bytes_hw + host->spare_bytes;
> - }
> -
> - if (nandc->props->is_bam) {
> - read_loc = 0;
> - nandc_set_read_loc(nandc, 0, read_loc, data_size1, 0);
> - read_loc += data_size1;
> -
> - nandc_set_read_loc(nandc, 1, read_loc, oob_size1, 0);
> - read_loc += oob_size1;
> -
> - nandc_set_read_loc(nandc, 2, read_loc, data_size2, 0);
> - read_loc += data_size2;
> -
> - nandc_set_read_loc(nandc, 3, read_loc, oob_size2, 1);
> - }
> -
> - config_nand_cw_read(nandc, false);
> -
> - read_data_dma(nandc, reg_off, data_buf, data_size1, 0);
> - reg_off += data_size1;
> - data_buf += data_size1;
> -
> - read_data_dma(nandc, reg_off, oob_buf, oob_size1, 0);
> - reg_off += oob_size1;
> - oob_buf += oob_size1;
> -
> - read_data_dma(nandc, reg_off, data_buf, data_size2, 0);
> - reg_off += data_size2;
> - data_buf += data_size2;
> -
> - read_data_dma(nandc, reg_off, oob_buf, oob_size2, 0);
> - oob_buf += oob_size2;
> - }
> -
> - ret = submit_descs(nandc);
> - if (ret)
> - dev_err(nandc->dev, "failure to read raw page\n");
> -
> - free_descs(nandc);
> -
> - if (!ret)
> - ret = check_flash_errors(host, ecc->steps);
> -
> - return 0;
> + return nandc_read_page_raw(mtd, chip, buf, chip->oob_poi, page,
> + BIT(chip->ecc.steps) - 1);

I don't understand this. chip->ecc.steps is constant, right? So you
always ask for the same subpage?

> }
>
> /* implements ecc->read_oob() */



--
Miquel Raynal, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

2018-04-10 10:07:10

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH 5/9] mtd: nand: qcom: parse read errors for read oob also

Hi Abhishek,

On Wed, 4 Apr 2018 18:12:21 +0530, Abhishek Sahu
<[email protected]> wrote:

> read_page and read_oob both calls the read_page_ecc function.
> The QCOM NAND controller protect the OOB available bytes with
> ECC so read errors should be checked for read_oob also. Now
> this patch moves the error checking code inside read_page_ecc
> so caller does not have to check explicitly for read errors.
>
> Signed-off-by: Abhishek Sahu <[email protected]>

Nitpick: the prefix should be "mtd: rawnand: qcom: " now as this driver
has been moved to drivers/mtd/nand/raw/.

Otherwise:
Reviewed-by: Miquel Raynal <[email protected]>

> ---
> drivers/mtd/nand/qcom_nandc.c | 19 ++++++-------------
> 1 file changed, 6 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/mtd/nand/qcom_nandc.c b/drivers/mtd/nand/qcom_nandc.c
> index 0ebcc55..ba43752 100644
> --- a/drivers/mtd/nand/qcom_nandc.c
> +++ b/drivers/mtd/nand/qcom_nandc.c
> @@ -1676,6 +1676,7 @@ static int read_page_ecc(struct qcom_nand_host *host, u8 *data_buf,
> struct nand_chip *chip = &host->chip;
> struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
> struct nand_ecc_ctrl *ecc = &chip->ecc;
> + u8 *data_buf_start = data_buf, *oob_buf_start = oob_buf;
> int i, ret;
>
> config_nand_page_read(nandc);
> @@ -1741,6 +1742,9 @@ static int read_page_ecc(struct qcom_nand_host *host, u8 *data_buf,
>
> free_descs(nandc);
>
> + if (!ret)
> + ret = parse_read_errors(host, data_buf_start, oob_buf_start);
> +
> return ret;
> }
>
> @@ -1786,20 +1790,14 @@ static int qcom_nandc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
> struct qcom_nand_host *host = to_qcom_nand_host(chip);
> struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
> u8 *data_buf, *oob_buf = NULL;
> - int ret;
>
> nand_read_page_op(chip, page, 0, NULL, 0);
> data_buf = buf;
> oob_buf = oob_required ? chip->oob_poi : NULL;
>
> clear_bam_transaction(nandc);
> - ret = read_page_ecc(host, data_buf, oob_buf);
> - if (ret) {
> - dev_err(nandc->dev, "failure to read page\n");
> - return ret;
> - }
>
> - return parse_read_errors(host, data_buf, oob_buf);
> + return read_page_ecc(host, data_buf, oob_buf);
> }
>
> /* implements ecc->read_page_raw() */
> @@ -1889,7 +1887,6 @@ static int qcom_nandc_read_oob(struct mtd_info *mtd, struct nand_chip *chip,
> struct qcom_nand_host *host = to_qcom_nand_host(chip);
> struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
> struct nand_ecc_ctrl *ecc = &chip->ecc;
> - int ret;
>
> clear_read_regs(nandc);
> clear_bam_transaction(nandc);
> @@ -1898,11 +1895,7 @@ static int qcom_nandc_read_oob(struct mtd_info *mtd, struct nand_chip *chip,
> set_address(host, 0, page);
> update_rw_regs(host, ecc->steps, true);
>
> - ret = read_page_ecc(host, NULL, chip->oob_poi);
> - if (ret)
> - dev_err(nandc->dev, "failure to read oob\n");
> -
> - return ret;
> + return read_page_ecc(host, NULL, chip->oob_poi);
> }
>
> /* implements ecc->write_page() */



--
Miquel Raynal, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

2018-04-10 10:10:39

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH 6/9] mtd: nand: qcom: support for checking read errors for last codeword

Hi Abhishek,

On Wed, 4 Apr 2018 18:12:22 +0530, Abhishek Sahu
<[email protected]> wrote:

> Add boolean function argument in parse_read_errors to identify
> whether the read error has been called for complete page read or
> only last codeword read. This will help in subsequent patches to
> detect ECC errors in case of last codeword read.

Can you explain when this happen: "last codeword read"? I don't see the
use case.

Thanks,
Miquèl

>
> Signed-off-by: Abhishek Sahu <[email protected]>
> ---
> drivers/mtd/nand/qcom_nandc.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/mtd/nand/qcom_nandc.c b/drivers/mtd/nand/qcom_nandc.c
> index ba43752..dce97e8 100644
> --- a/drivers/mtd/nand/qcom_nandc.c
> +++ b/drivers/mtd/nand/qcom_nandc.c
> @@ -1570,7 +1570,7 @@ struct read_stats {
> * errors. this is equivalent to what 'ecc->correct()' would do.
> */
> static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
> - u8 *oob_buf)
> + u8 *oob_buf, bool last_cw)
> {
> struct nand_chip *chip = &host->chip;
> struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
> @@ -1579,12 +1579,12 @@ static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
> unsigned int max_bitflips = 0;
> struct read_stats *buf;
> bool flash_op_err = false;
> - int i;
> + int i, cw_cnt = last_cw ? 1 : ecc->steps;
>
> buf = (struct read_stats *)nandc->reg_read_buf;
> nandc_read_buffer_sync(nandc, true);
>
> - for (i = 0; i < ecc->steps; i++, buf++) {
> + for (i = 0; i < cw_cnt; i++, buf++) {
> u32 flash, buffer, erased_cw;
> int data_len, oob_len;
>
> @@ -1743,7 +1743,8 @@ static int read_page_ecc(struct qcom_nand_host *host, u8 *data_buf,
> free_descs(nandc);
>
> if (!ret)
> - ret = parse_read_errors(host, data_buf_start, oob_buf_start);
> + ret = parse_read_errors(host, data_buf_start, oob_buf_start,
> + false);
>
> return ret;
> }



--
Miquel Raynal, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

2018-04-10 10:16:03

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH 7/9] mtd: nand: qcom: check for operation errors in case of raw read

Hi Abhishek,

On Wed, 4 Apr 2018 18:12:23 +0530, Abhishek Sahu
<[email protected]> wrote:

> Currently there is no error checking for raw read. For raw
> reads, there won’t be any ECC failure but the operational
> failures are possible so schedule the NAND_FLASH_STATUS read
> after each codeword.
>
> Signed-off-by: Abhishek Sahu <[email protected]>
> ---
> drivers/mtd/nand/qcom_nandc.c | 56 +++++++++++++++++++++++++++++++++++--------
> 1 file changed, 46 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/mtd/nand/qcom_nandc.c b/drivers/mtd/nand/qcom_nandc.c
> index dce97e8..40c790e 100644
> --- a/drivers/mtd/nand/qcom_nandc.c
> +++ b/drivers/mtd/nand/qcom_nandc.c
> @@ -1099,7 +1099,8 @@ static void config_nand_page_read(struct qcom_nand_controller *nandc)
> * Helper to prepare DMA descriptors for configuring registers
> * before reading each codeword in NAND page.
> */
> -static void config_nand_cw_read(struct qcom_nand_controller *nandc)
> +static void
> +config_nand_cw_read(struct qcom_nand_controller *nandc, bool use_ecc)
> {
> if (nandc->props->is_bam)
> write_reg_dma(nandc, NAND_READ_LOCATION_0, 4,
> @@ -1108,19 +1109,25 @@ static void config_nand_cw_read(struct qcom_nand_controller *nandc)
> write_reg_dma(nandc, NAND_FLASH_CMD, 1, NAND_BAM_NEXT_SGL);
> write_reg_dma(nandc, NAND_EXEC_CMD, 1, NAND_BAM_NEXT_SGL);
>
> - read_reg_dma(nandc, NAND_FLASH_STATUS, 2, 0);
> - read_reg_dma(nandc, NAND_ERASED_CW_DETECT_STATUS, 1,
> - NAND_BAM_NEXT_SGL);
> + if (use_ecc) {
> + read_reg_dma(nandc, NAND_FLASH_STATUS, 2, 0);
> + read_reg_dma(nandc, NAND_ERASED_CW_DETECT_STATUS, 1,
> + NAND_BAM_NEXT_SGL);
> + } else {
> + read_reg_dma(nandc, NAND_FLASH_STATUS, 1, NAND_BAM_NEXT_SGL);
> + }
> }
>
> /*
> * Helper to prepare dma descriptors to configure registers needed for reading a
> * single codeword in page
> */
> -static void config_nand_single_cw_page_read(struct qcom_nand_controller *nandc)
> +static void
> +config_nand_single_cw_page_read(struct qcom_nand_controller *nandc,
> + bool use_ecc)
> {
> config_nand_page_read(nandc);
> - config_nand_cw_read(nandc);
> + config_nand_cw_read(nandc, use_ecc);
> }
>
> /*
> @@ -1201,7 +1208,7 @@ static int nandc_param(struct qcom_nand_host *host)
> nandc->buf_count = 512;
> memset(nandc->data_buffer, 0xff, nandc->buf_count);
>
> - config_nand_single_cw_page_read(nandc);
> + config_nand_single_cw_page_read(nandc, false);
>
> read_data_dma(nandc, FLASH_BUF_ACC, nandc->data_buffer,
> nandc->buf_count, 0);
> @@ -1565,6 +1572,23 @@ struct read_stats {
> __le32 erased_cw;
> };
>
> +/* reads back FLASH_STATUS register set by the controller */
> +static int check_flash_errors(struct qcom_nand_host *host, int cw_cnt)
> +{
> + struct nand_chip *chip = &host->chip;
> + struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
> + int i;
> +
> + for (i = 0; i < cw_cnt; i++) {
> + u32 flash = le32_to_cpu(nandc->reg_read_buf[i]);
> +
> + if (flash & (FS_OP_ERR | FS_MPU_ERR))
> + return -EIO;

This is already checked in parse_read_error(), maybe it would be
preferable to have different path inside this function depending on the
'raw' nature of the operation?

> + }
> +
> + return 0;
> +}
> +
> /*
> * reads back status registers set by the controller to notify page read
> * errors. this is equivalent to what 'ecc->correct()' would do.
> @@ -1707,7 +1731,7 @@ static int read_page_ecc(struct qcom_nand_host *host, u8 *data_buf,
> }
> }
>
> - config_nand_cw_read(nandc);
> + config_nand_cw_read(nandc, true);
>
> if (data_buf)
> read_data_dma(nandc, FLASH_BUF_ACC, data_buf,
> @@ -1771,7 +1795,7 @@ static int copy_last_cw(struct qcom_nand_host *host, int page)
> set_address(host, host->cw_size * (ecc->steps - 1), page);
> update_rw_regs(host, 1, true);
>
> - config_nand_single_cw_page_read(nandc);
> + config_nand_single_cw_page_read(nandc, host->use_ecc);
>
> read_data_dma(nandc, FLASH_BUF_ACC, nandc->data_buffer, size, 0);
>
> @@ -1781,6 +1805,15 @@ static int copy_last_cw(struct qcom_nand_host *host, int page)
>
> free_descs(nandc);
>
> + if (!ret) {
> + if (host->use_ecc)
> + ret = parse_read_errors(host, nandc->data_buffer,
> + nandc->data_buffer + size,
> + true);
> + else
> + ret = check_flash_errors(host, 1);

This way you would avoid this ^

> + }
> +

As a general way, I don't like very much this kind of error checking
structure:

if (!ret)
ret = something();
...
return ret;

I would rather prefer:

if (ret)
return ret;

return something();

> return ret;
> }
>
> @@ -1854,7 +1887,7 @@ static int qcom_nandc_read_page_raw(struct mtd_info *mtd,
> nandc_set_read_loc(nandc, 3, read_loc, oob_size2, 1);
> }
>
> - config_nand_cw_read(nandc);
> + config_nand_cw_read(nandc, false);
>
> read_data_dma(nandc, reg_off, data_buf, data_size1, 0);
> reg_off += data_size1;
> @@ -1878,6 +1911,9 @@ static int qcom_nandc_read_page_raw(struct mtd_info *mtd,
>
> free_descs(nandc);
>
> + if (!ret)
> + ret = check_flash_errors(host, ecc->steps);
> +

There is not point in doing ret = ... if you return 0 right after.
Please check what would be the most appropriate.

> return 0;
> }
>

Thanks,
Miquèl

--
Miquel Raynal, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

2018-04-10 10:33:37

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH 9/9] mtd: nand: qcom: erased page bitflips detection

Hi Abhishek,

On Wed, 4 Apr 2018 18:12:25 +0530, Abhishek Sahu
<[email protected]> wrote:

> Some of the newer nand parts can have bit flips in an erased
> page due to the process technology used. In this case, qpic

AFAIK, this has always been possible, it was just rare.

> nand controller is not able to identify that page as an erased
> page. Currently the driver calls nand_check_erased_ecc_chunk for
> identifying the erased pages but this won’t work always since the
> checking is being with ECC engine returned data. In case of
> bitflips, the ECC engine tries to correct the data and then it
> generates the uncorrectable error. Now, this data is not equal to
> original raw data. For erased CW identification, the raw data
> should be read again from NAND device and this
> nand_check_erased_ecc_chunk function should be called for raw
> data only.

Absolutely.

>
> Now following logic is being added to identify the erased
> codeword bitflips.
>
> 1. In most of the case, not all the codewords will have bitflips
> and only single CW will have bitflips. So, there is no need to
> read the complete raw page data. The NAND raw read can be
> scheduled for any CW in page. The NAND controller works on CW
> basis and it will update the status register after each CW read.
> Maintain the bitmask for the CW which generated the uncorrectable
> error.
> 2. Schedule the raw flash read from NAND flash device to
> NAND controller buffer for all these CWs between first and last
> uncorrectable errors CWs. Copy the content from NAND controller
> buffer to actual data buffer only for the uncorrectable errors
> CWs so that other CW data content won’t be affected, and
> unnecessary data copy can be avoided.

In case of uncorrectable error, the penalty is huge anyway.

> 3. Both DATA and OOB need to be checked for number of 0. The
> top-level API can be called with only data buf or oob buf so use
> chip->databuf if data buf is null and chip->oob_poi if
> oob buf is null for copying the raw bytes temporarily.

You can do that. But when you do, you should tell the core you used
that buffer and that it cannot rely on what is inside. Please
invalidate the page cache with:

chip->pagebuf = -1;

> 4. For each CW, check the number of 0 in cw_data and usable
> oob bytes, The bbm and spare bytes bit flip won’t affect the ECC
> so don’t check the number of bitflips in this area.

OOB is an area in which you are supposed to find the BBM, the ECC bytes
and the spare bytes. Spare bytes == usable OOB bytes. And the BBM
should be protected too. I don't get this sentence but I don't see its
application neither in the code?

>
> Signed-off-by: Abhishek Sahu <[email protected]>
> ---

Thanks,
Miquèl

--
Miquel Raynal, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

2018-04-12 06:37:24

by Abhishek Sahu

[permalink] [raw]
Subject: Re: [PATCH 3/9] mtd: nand: qcom: erased page detection for uncorrectable errors only

On 2018-04-10 14:29, Miquel Raynal wrote:
> Hi Abhishek,
>
> On Wed, 4 Apr 2018 18:12:19 +0530, Abhishek Sahu
> <[email protected]> wrote:
>
>> The NAND flash controller generates ECC uncorrectable error
>> first in case of completely erased page. Currently driver
>> applies the erased page detection logic for other operation
>> errors also so fix this and return EIO for other operational
>> errors.
>
> I am sorry I don't understand very well what is the purpose of this
> patch, could you please explain it again?
>
> Do you mean that you want to avoid having rising ECC errors when you
> read erased pages?
>
Thanks Miquel for your review.

QCOM NAND flash controller has in built erased page
detection HW.
Following is the flow in the HW if controller tries
to read erased page

1. First ECC uncorrectable error will be generated from
ECC engine since ECC engine first calculates the ECC with
all 0xff and match the calculated ECC with ECC code in OOB
(which is again all 0xff).
2. After getting ECC error, erased CW detection HW checks if
all the bytes in page are 0xff and then it updates the
status in separate register NAND_ERASED_CW_DETECT_STATUS

So the erased CW detect status should be checked only if
ECC engine generated the uncorrectable error.

Currently for all other operational errors also (like TIMEOUT,
MPU errors etc), the erased CW detect register was being
checked.

>>
>> Signed-off-by: Abhishek Sahu <[email protected]>
>> ---
>> drivers/mtd/nand/qcom_nandc.c | 8 +++++++-
>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/mtd/nand/qcom_nandc.c
>> b/drivers/mtd/nand/qcom_nandc.c
>> index 17321fc..57c16a6 100644
>> --- a/drivers/mtd/nand/qcom_nandc.c
>> +++ b/drivers/mtd/nand/qcom_nandc.c
>> @@ -1578,6 +1578,7 @@ static int parse_read_errors(struct
>> qcom_nand_host *host, u8 *data_buf,
>> struct nand_ecc_ctrl *ecc = &chip->ecc;
>> unsigned int max_bitflips = 0;
>> struct read_stats *buf;
>> + bool flash_op_err = false;
>> int i;
>>
>> buf = (struct read_stats *)nandc->reg_read_buf;
>> @@ -1599,7 +1600,7 @@ static int parse_read_errors(struct
>> qcom_nand_host *host, u8 *data_buf,
>> buffer = le32_to_cpu(buf->buffer);
>> erased_cw = le32_to_cpu(buf->erased_cw);
>>
>> - if (flash & (FS_OP_ERR | FS_MPU_ERR)) {
>> + if ((flash & FS_OP_ERR) && (buffer & BS_UNCORRECTABLE_BIT)) {
>
> And later you have another "if (buffer & BS_UNCORRECTABLE_BIT)" which
> is then redundant, unless that is not what you actually want to do?

Yes. That check seems to be redundant. I will fix that.

>
> Maybe you can add comments before the if ()/ else if () to explain in
> which case you enter each branch.

Sure. That would be better. Will add the same in next patch set.

>
>> bool erased;
>>
>> /* ignore erased codeword errors */
>> @@ -1641,6 +1642,8 @@ static int parse_read_errors(struct
>> qcom_nand_host *host, u8 *data_buf,
>> max_t(unsigned int, max_bitflips, ret);
>> }
>> }
>> + } else if (flash & (FS_OP_ERR | FS_MPU_ERR)) {
>> + flash_op_err = true;
>> } else {
>> unsigned int stat;
>>
>> @@ -1654,6 +1657,9 @@ static int parse_read_errors(struct
>> qcom_nand_host *host, u8 *data_buf,
>> oob_buf += oob_len + ecc->bytes;
>> }
>>
>> + if (flash_op_err)
>> + return -EIO;
>> +
>
> In you are propagating an error related to the controller, this is
> fine, but I think you just want to raise the fact that a NAND
> uncorrectable error occurred, in this case you should just increment
> mtd->ecc_stats.failed and return 0 (returning max_bitflips here would
> be
> fine too has it would be 0 too).

The flash_op_err will be for other operational errors only (like
timeout,
MPU error, device failure etc). For correctable errors,

ret = nand_check_erased_ecc_chunk(data_buf,
data_len, eccbuf, ecclen, oob_buf,
extraooblen, ecc->strength);
if (ret < 0) {
mtd->ecc_stats.failed++;
} else {
mtd->ecc_stats.corrected += ret;

Already, it is incrementing mtd->ecc_stats.failed

Thanks,
Abhishek

2018-04-12 06:54:07

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH 3/9] mtd: nand: qcom: erased page detection for uncorrectable errors only

Hi Abhishek,

On Thu, 12 Apr 2018 12:03:58 +0530, Abhishek Sahu
<[email protected]> wrote:

> On 2018-04-10 14:29, Miquel Raynal wrote:
> > Hi Abhishek,
> > > On Wed, 4 Apr 2018 18:12:19 +0530, Abhishek Sahu
> > <[email protected]> wrote:
> > >> The NAND flash controller generates ECC uncorrectable error
> >> first in case of completely erased page. Currently driver
> >> applies the erased page detection logic for other operation
> >> errors also so fix this and return EIO for other operational
> >> errors.
> > > I am sorry I don't understand very well what is the purpose of this
> > patch, could you please explain it again?
> > > Do you mean that you want to avoid having rising ECC errors when you
> > read erased pages?
> > Thanks Miquel for your review.
>
> QCOM NAND flash controller has in built erased page
> detection HW.
> Following is the flow in the HW if controller tries
> to read erased page
>
> 1. First ECC uncorrectable error will be generated from
> ECC engine since ECC engine first calculates the ECC with
> all 0xff and match the calculated ECC with ECC code in OOB
> (which is again all 0xff).
> 2. After getting ECC error, erased CW detection HW checks if
> all the bytes in page are 0xff and then it updates the
> status in separate register NAND_ERASED_CW_DETECT_STATUS
>
> So the erased CW detect status should be checked only if
> ECC engine generated the uncorrectable error.
>
> Currently for all other operational errors also (like TIMEOUT,
> MPU errors etc), the erased CW detect register was being
> checked.

This is very clear, thanks. I don't know very much this controller so I
think you can add this information in the commit message for future
reference.

>
> >> >> Signed-off-by: Abhishek Sahu <[email protected]>
> >> ---
> >> drivers/mtd/nand/qcom_nandc.c | 8 +++++++-
> >> 1 file changed, 7 insertions(+), 1 deletion(-)
> >> >> diff --git a/drivers/mtd/nand/qcom_nandc.c >> b/drivers/mtd/nand/qcom_nandc.c
> >> index 17321fc..57c16a6 100644
> >> --- a/drivers/mtd/nand/qcom_nandc.c
> >> +++ b/drivers/mtd/nand/qcom_nandc.c
> >> @@ -1578,6 +1578,7 @@ static int parse_read_errors(struct >> qcom_nand_host *host, u8 *data_buf,
> >> struct nand_ecc_ctrl *ecc = &chip->ecc;
> >> unsigned int max_bitflips = 0;
> >> struct read_stats *buf;
> >> + bool flash_op_err = false;
> >> int i;
> >> >> buf = (struct read_stats *)nandc->reg_read_buf;
> >> @@ -1599,7 +1600,7 @@ static int parse_read_errors(struct >> qcom_nand_host *host, u8 *data_buf,
> >> buffer = le32_to_cpu(buf->buffer);
> >> erased_cw = le32_to_cpu(buf->erased_cw);
> >> >> - if (flash & (FS_OP_ERR | FS_MPU_ERR)) {
> >> + if ((flash & FS_OP_ERR) && (buffer & BS_UNCORRECTABLE_BIT)) {
> > > And later you have another "if (buffer & BS_UNCORRECTABLE_BIT)" which
> > is then redundant, unless that is not what you actually want to do?
>
> Yes. That check seems to be redundant. I will fix that.
>
> > > Maybe you can add comments before the if ()/ else if () to explain in
> > which case you enter each branch.
>
> Sure. That would be better. Will add the same in next patch set.
>
> > >> bool erased;
> >> >> /* ignore erased codeword errors */
> >> @@ -1641,6 +1642,8 @@ static int parse_read_errors(struct >> qcom_nand_host *host, u8 *data_buf,
> >> max_t(unsigned int, max_bitflips, ret);
> >> }
> >> }
> >> + } else if (flash & (FS_OP_ERR | FS_MPU_ERR)) {
> >> + flash_op_err = true;
> >> } else {
> >> unsigned int stat;
> >> >> @@ -1654,6 +1657,9 @@ static int parse_read_errors(struct >> qcom_nand_host *host, u8 *data_buf,
> >> oob_buf += oob_len + ecc->bytes;
> >> }
> >> >> + if (flash_op_err)
> >> + return -EIO;
> >> +
> > > In you are propagating an error related to the controller, this is
> > fine, but I think you just want to raise the fact that a NAND
> > uncorrectable error occurred, in this case you should just increment
> > mtd->ecc_stats.failed and return 0 (returning max_bitflips here would > be
> > fine too has it would be 0 too).
>
> The flash_op_err will be for other operational errors only (like timeout,
> MPU error, device failure etc). For correctable errors,
>
> ret = nand_check_erased_ecc_chunk(data_buf,
> data_len, eccbuf, ecclen, oob_buf,
> extraooblen, ecc->strength);

Why do you need nand_check_erased_ecc_chunk() if the blank page check
is done in hw?

Thanks,
Miquèl

> if (ret < 0) {
> mtd->ecc_stats.failed++;
> } else {
> mtd->ecc_stats.corrected += ret;
>
> Already, it is incrementing mtd->ecc_stats.failed
>
> Thanks,
> Abhishek



--
Miquel Raynal, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

2018-04-12 07:01:07

by Abhishek Sahu

[permalink] [raw]
Subject: Re: [PATCH 4/9] mtd: nand: qcom: fix null pointer access for erased buffer detection

On 2018-04-10 14:42, Miquel Raynal wrote:
> Hi Abhishek,
>
> On Wed, 4 Apr 2018 18:12:20 +0530, Abhishek Sahu
> <[email protected]> wrote:
>
>> parse_read_errors can be called with only oob buf also in which
>> case data_buf will be NULL. If data_buf is NULL, then don’t
>> treat this page as completely erased in case of ECC uncorrectable
>> error.
>>
>> Signed-off-by: Abhishek Sahu <[email protected]>
>> ---
>> drivers/mtd/nand/qcom_nandc.c | 7 +++++--
>> 1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/mtd/nand/qcom_nandc.c
>> b/drivers/mtd/nand/qcom_nandc.c
>> index 57c16a6..0ebcc55 100644
>> --- a/drivers/mtd/nand/qcom_nandc.c
>> +++ b/drivers/mtd/nand/qcom_nandc.c
>> @@ -1607,9 +1607,11 @@ static int parse_read_errors(struct
>> qcom_nand_host *host, u8 *data_buf,
>> if (host->bch_enabled) {
>> erased = (erased_cw & ERASED_CW) == ERASED_CW ?
>> true : false;
>
> Why the parse_read_errors() function could not be called without
> data_buf when using BCH? Are you sure the situation can only happen
> without it?
>

host->bch_enabled case is different where controller itself tells
regarding erased page in status register.

> Would the following apply here too, with a:
>

erased_chunk_check_and_fixup will be used only for 4 bit RS ECC
code in which there is no support from HW for erased page detection
and we need to check few data bytes value.

Thanks,
Abhishek

> if (!data_buf) {
> erased = false;
> } else {
> if (host->bch_enabled)
> ...
> else
> ...
> }
>
>> - } else {
>> + } else if (data_buf) {
>> erased = erased_chunk_check_and_fixup(data_buf,
>> data_len);
>> + } else {
>> + erased = false;
>> }
>>
>> if (erased) {
>> @@ -1652,7 +1654,8 @@ static int parse_read_errors(struct
>> qcom_nand_host *host, u8 *data_buf,
>> max_bitflips = max(max_bitflips, stat);
>> }
>>
>> - data_buf += data_len;
>> + if (data_buf)
>> + data_buf += data_len;
>> if (oob_buf)
>> oob_buf += oob_len + ecc->bytes;
>> }
>
> Thanks,
> Miquèl

2018-04-12 07:01:49

by Abhishek Sahu

[permalink] [raw]
Subject: Re: [PATCH 3/9] mtd: nand: qcom: erased page detection for uncorrectable errors only

On 2018-04-12 12:19, Miquel Raynal wrote:
> Hi Abhishek,
>
> On Thu, 12 Apr 2018 12:03:58 +0530, Abhishek Sahu
> <[email protected]> wrote:
>
>> On 2018-04-10 14:29, Miquel Raynal wrote:
>> > Hi Abhishek,
>> > > On Wed, 4 Apr 2018 18:12:19 +0530, Abhishek Sahu
>> > <[email protected]> wrote:
>> > >> The NAND flash controller generates ECC uncorrectable error
>> >> first in case of completely erased page. Currently driver
>> >> applies the erased page detection logic for other operation
>> >> errors also so fix this and return EIO for other operational
>> >> errors.
>> > > I am sorry I don't understand very well what is the purpose of this
>> > patch, could you please explain it again?
>> > > Do you mean that you want to avoid having rising ECC errors when you
>> > read erased pages?
>> > Thanks Miquel for your review.
>>
>> QCOM NAND flash controller has in built erased page
>> detection HW.
>> Following is the flow in the HW if controller tries
>> to read erased page
>>
>> 1. First ECC uncorrectable error will be generated from
>> ECC engine since ECC engine first calculates the ECC with
>> all 0xff and match the calculated ECC with ECC code in OOB
>> (which is again all 0xff).
>> 2. After getting ECC error, erased CW detection HW checks if
>> all the bytes in page are 0xff and then it updates the
>> status in separate register NAND_ERASED_CW_DETECT_STATUS
>>
>> So the erased CW detect status should be checked only if
>> ECC engine generated the uncorrectable error.
>>
>> Currently for all other operational errors also (like TIMEOUT,
>> MPU errors etc), the erased CW detect register was being
>> checked.
>
> This is very clear, thanks. I don't know very much this controller so I
> think you can add this information in the commit message for future
> reference.
>

Sure Miquel.
I Will update the commit message to include more detail.

>>
>> >> >> Signed-off-by: Abhishek Sahu <[email protected]>
>> >> ---
>> >> drivers/mtd/nand/qcom_nandc.c | 8 +++++++-
>> >> 1 file changed, 7 insertions(+), 1 deletion(-)
>> >> >> diff --git a/drivers/mtd/nand/qcom_nandc.c >> b/drivers/mtd/nand/qcom_nandc.c
>> >> index 17321fc..57c16a6 100644
>> >> --- a/drivers/mtd/nand/qcom_nandc.c
>> >> +++ b/drivers/mtd/nand/qcom_nandc.c
>> >> @@ -1578,6 +1578,7 @@ static int parse_read_errors(struct >> qcom_nand_host *host, u8 *data_buf,
>> >> struct nand_ecc_ctrl *ecc = &chip->ecc;
>> >> unsigned int max_bitflips = 0;
>> >> struct read_stats *buf;
>> >> + bool flash_op_err = false;
>> >> int i;
>> >> >> buf = (struct read_stats *)nandc->reg_read_buf;
>> >> @@ -1599,7 +1600,7 @@ static int parse_read_errors(struct >> qcom_nand_host *host, u8 *data_buf,
>> >> buffer = le32_to_cpu(buf->buffer);
>> >> erased_cw = le32_to_cpu(buf->erased_cw);
>> >> >> - if (flash & (FS_OP_ERR | FS_MPU_ERR)) {
>> >> + if ((flash & FS_OP_ERR) && (buffer & BS_UNCORRECTABLE_BIT)) {
>> > > And later you have another "if (buffer & BS_UNCORRECTABLE_BIT)" which
>> > is then redundant, unless that is not what you actually want to do?
>>
>> Yes. That check seems to be redundant. I will fix that.
>>
>> > > Maybe you can add comments before the if ()/ else if () to explain in
>> > which case you enter each branch.
>>
>> Sure. That would be better. Will add the same in next patch set.
>>
>> > >> bool erased;
>> >> >> /* ignore erased codeword errors */
>> >> @@ -1641,6 +1642,8 @@ static int parse_read_errors(struct >> qcom_nand_host *host, u8 *data_buf,
>> >> max_t(unsigned int, max_bitflips, ret);
>> >> }
>> >> }
>> >> + } else if (flash & (FS_OP_ERR | FS_MPU_ERR)) {
>> >> + flash_op_err = true;
>> >> } else {
>> >> unsigned int stat;
>> >> >> @@ -1654,6 +1657,9 @@ static int parse_read_errors(struct >> qcom_nand_host *host, u8 *data_buf,
>> >> oob_buf += oob_len + ecc->bytes;
>> >> }
>> >> >> + if (flash_op_err)
>> >> + return -EIO;
>> >> +
>> > > In you are propagating an error related to the controller, this is
>> > fine, but I think you just want to raise the fact that a NAND
>> > uncorrectable error occurred, in this case you should just increment
>> > mtd->ecc_stats.failed and return 0 (returning max_bitflips here would > be
>> > fine too has it would be 0 too).
>>
>> The flash_op_err will be for other operational errors only (like
>> timeout,
>> MPU error, device failure etc). For correctable errors,
>>
>> ret = nand_check_erased_ecc_chunk(data_buf,
>> data_len, eccbuf, ecclen, oob_buf,
>> extraooblen, ecc->strength);
>
> Why do you need nand_check_erased_ecc_chunk() if the blank page check
> is done in hw?
>

This is only applicable for BCH algorithm.
IPQ806x uses RS code for 4 bit ECC which does not have HW blank page
detection.

You can get more detail in function comment of
erased_chunk_check_and_fixup

/*
* when using BCH ECC, the HW flags an error in NAND_FLASH_STATUS if it
read
* an erased CW, and reports an erased CW in
NAND_ERASED_CW_DETECT_STATUS.
*
* when using RS ECC, the HW reports the same erros when reading an
erased CW,
* but it notifies that it is an erased CW by placing special
characters at
* certain offsets in the buffer.
*
* verify if the page is erased or not, and fix up the page for RS ECC
by
* replacing the special characters with 0xff.
*/
static bool erased_chunk_check_and_fixup(u8 *data_buf, int data_len)
{

Thanks,
Abhishek


2018-04-12 07:10:07

by Abhishek Sahu

[permalink] [raw]
Subject: Re: [PATCH 8/9] mtd: nand: qcom: helper function for raw read

On 2018-04-10 15:14, Miquel Raynal wrote:
> Hi Abhishek,
>
> On Wed, 4 Apr 2018 18:12:24 +0530, Abhishek Sahu
> <[email protected]> wrote:
>
>> This patch does minor code reorganization for raw reads.
>> Currently the raw read is required for complete page but for
>> subsequent patches related with erased codeword bit flips
>> detection, only few CW should be read. So, this patch adds
>> helper function and introduces the read CW bitmask which
>> specifies which CW reads are required in complete page.
>
> I am not sure this is the right approach for subpage reads. If the
> controller supports it, you should just enable it in chip->options.
>

Thanks Miquel.

It is internal to this file only. This patch makes one static helper
function which provides the support to read subpages.

>>
>> Signed-off-by: Abhishek Sahu <[email protected]>
>> ---
>> drivers/mtd/nand/qcom_nandc.c | 186
>> +++++++++++++++++++++++++-----------------
>> 1 file changed, 110 insertions(+), 76 deletions(-)
>>
>> diff --git a/drivers/mtd/nand/qcom_nandc.c
>> b/drivers/mtd/nand/qcom_nandc.c
>> index 40c790e..f5d1fa4 100644
>> --- a/drivers/mtd/nand/qcom_nandc.c
>> +++ b/drivers/mtd/nand/qcom_nandc.c
>> @@ -1590,6 +1590,114 @@ static int check_flash_errors(struct
>> qcom_nand_host *host, int cw_cnt)
>> }
>>
>> /*
>> + * Helper to perform the page raw read operation. The read_cw_mask
>> will be
>> + * used to specify the codewords for which the data should be read.
>> The
>> + * single page contains multiple CW. Sometime, only few CW data is
>> required
>> + * in complete page. Also, start address will be determined with
>> + * this CW mask to skip unnecessary data copy from NAND flash device.
>> Then,
>> + * actual data copy from NAND controller to data buffer will be done
>> only
>> + * for the CWs which have the mask set.
>> + */
>> +static int
>> +nandc_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip,
>> + u8 *data_buf, u8 *oob_buf,
>> + int page, unsigned long read_cw_mask)
>> +{
>> + struct qcom_nand_host *host = to_qcom_nand_host(chip);
>> + struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
>> + struct nand_ecc_ctrl *ecc = &chip->ecc;
>> + int i, ret;
>> + int read_loc, start_step, last_step;
>> +
>> + nand_read_page_op(chip, page, 0, NULL, 0);
>> +
>> + host->use_ecc = false;
>> + start_step = ffs(read_cw_mask) - 1;
>> + last_step = fls(read_cw_mask);
>> +
>> + clear_bam_transaction(nandc);
>> + set_address(host, host->cw_size * start_step, page);
>> + update_rw_regs(host, last_step - start_step, true);
>> + config_nand_page_read(nandc);
>> +
>> + for (i = start_step; i < last_step; i++) {
>> + int data_size1, data_size2, oob_size1, oob_size2;
>> + int reg_off = FLASH_BUF_ACC;
>> +
>> + data_size1 = mtd->writesize - host->cw_size * (ecc->steps - 1);
>> + oob_size1 = host->bbm_size;
>> +
>> + if (i == (ecc->steps - 1)) {
>> + data_size2 = ecc->size - data_size1 -
>> + ((ecc->steps - 1) << 2);
>> + oob_size2 = (ecc->steps << 2) + host->ecc_bytes_hw +
>> + host->spare_bytes;
>> + } else {
>> + data_size2 = host->cw_data - data_size1;
>> + oob_size2 = host->ecc_bytes_hw + host->spare_bytes;
>> + }
>> +
>> + /*
>> + * Don't perform actual data copy from NAND controller to data
>> + * buffer through DMA for this codeword
>> + */
>> + if (!(read_cw_mask & BIT(i))) {
>> + if (nandc->props->is_bam)
>> + nandc_set_read_loc(nandc, 0, 0, 0, 1);
>> +
>> + config_nand_cw_read(nandc, false);
>> +
>> + data_buf += data_size1 + data_size2;
>> + oob_buf += oob_size1 + oob_size2;
>> +
>> + continue;
>> + }
>> +
>> + if (nandc->props->is_bam) {
>> + read_loc = 0;
>> + nandc_set_read_loc(nandc, 0, read_loc, data_size1, 0);
>> + read_loc += data_size1;
>> +
>> + nandc_set_read_loc(nandc, 1, read_loc, oob_size1, 0);
>> + read_loc += oob_size1;
>> +
>> + nandc_set_read_loc(nandc, 2, read_loc, data_size2, 0);
>> + read_loc += data_size2;
>> +
>> + nandc_set_read_loc(nandc, 3, read_loc, oob_size2, 1);
>> + }
>> +
>> + config_nand_cw_read(nandc, false);
>> +
>> + read_data_dma(nandc, reg_off, data_buf, data_size1, 0);
>> + reg_off += data_size1;
>> + data_buf += data_size1;
>> +
>> + read_data_dma(nandc, reg_off, oob_buf, oob_size1, 0);
>> + reg_off += oob_size1;
>> + oob_buf += oob_size1;
>> +
>> + read_data_dma(nandc, reg_off, data_buf, data_size2, 0);
>> + reg_off += data_size2;
>> + data_buf += data_size2;
>> +
>> + read_data_dma(nandc, reg_off, oob_buf, oob_size2, 0);
>> + oob_buf += oob_size2;
>> + }
>> +
>> + ret = submit_descs(nandc);
>> + if (ret)
>> + dev_err(nandc->dev, "failure to read raw page\n");
>> +
>> + free_descs(nandc);
>> +
>> + if (!ret)
>> + ret = check_flash_errors(host, last_step - start_step);
>> +
>> + return 0;
>> +}
>> +
>> +/*
>> * reads back status registers set by the controller to notify page
>> read
>> * errors. this is equivalent to what 'ecc->correct()' would do.
>> */
>> @@ -1839,82 +1947,8 @@ static int qcom_nandc_read_page_raw(struct
>> mtd_info *mtd,
>> struct nand_chip *chip, uint8_t *buf,
>> int oob_required, int page)
>> {
>> - struct qcom_nand_host *host = to_qcom_nand_host(chip);
>> - struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
>> - u8 *data_buf, *oob_buf;
>> - struct nand_ecc_ctrl *ecc = &chip->ecc;
>> - int i, ret;
>> - int read_loc;
>> -
>> - nand_read_page_op(chip, page, 0, NULL, 0);
>> - data_buf = buf;
>> - oob_buf = chip->oob_poi;
>> -
>> - host->use_ecc = false;
>> -
>> - clear_bam_transaction(nandc);
>> - update_rw_regs(host, ecc->steps, true);
>> - config_nand_page_read(nandc);
>> -
>> - for (i = 0; i < ecc->steps; i++) {
>> - int data_size1, data_size2, oob_size1, oob_size2;
>> - int reg_off = FLASH_BUF_ACC;
>> -
>> - data_size1 = mtd->writesize - host->cw_size * (ecc->steps - 1);
>> - oob_size1 = host->bbm_size;
>> -
>> - if (i == (ecc->steps - 1)) {
>> - data_size2 = ecc->size - data_size1 -
>> - ((ecc->steps - 1) << 2);
>> - oob_size2 = (ecc->steps << 2) + host->ecc_bytes_hw +
>> - host->spare_bytes;
>> - } else {
>> - data_size2 = host->cw_data - data_size1;
>> - oob_size2 = host->ecc_bytes_hw + host->spare_bytes;
>> - }
>> -
>> - if (nandc->props->is_bam) {
>> - read_loc = 0;
>> - nandc_set_read_loc(nandc, 0, read_loc, data_size1, 0);
>> - read_loc += data_size1;
>> -
>> - nandc_set_read_loc(nandc, 1, read_loc, oob_size1, 0);
>> - read_loc += oob_size1;
>> -
>> - nandc_set_read_loc(nandc, 2, read_loc, data_size2, 0);
>> - read_loc += data_size2;
>> -
>> - nandc_set_read_loc(nandc, 3, read_loc, oob_size2, 1);
>> - }
>> -
>> - config_nand_cw_read(nandc, false);
>> -
>> - read_data_dma(nandc, reg_off, data_buf, data_size1, 0);
>> - reg_off += data_size1;
>> - data_buf += data_size1;
>> -
>> - read_data_dma(nandc, reg_off, oob_buf, oob_size1, 0);
>> - reg_off += oob_size1;
>> - oob_buf += oob_size1;
>> -
>> - read_data_dma(nandc, reg_off, data_buf, data_size2, 0);
>> - reg_off += data_size2;
>> - data_buf += data_size2;
>> -
>> - read_data_dma(nandc, reg_off, oob_buf, oob_size2, 0);
>> - oob_buf += oob_size2;
>> - }
>> -
>> - ret = submit_descs(nandc);
>> - if (ret)
>> - dev_err(nandc->dev, "failure to read raw page\n");
>> -
>> - free_descs(nandc);
>> -
>> - if (!ret)
>> - ret = check_flash_errors(host, ecc->steps);
>> -
>> - return 0;
>> + return nandc_read_page_raw(mtd, chip, buf, chip->oob_poi, page,
>> + BIT(chip->ecc.steps) - 1);
>
> I don't understand this. chip->ecc.steps is constant, right? So you
> always ask for the same subpage?

We need to do raw read for subpages in which we got uncorrectable
error in next patch for erased page bitflip detection. This patch does
reorganization of raw read and moves common code in helper function
nandc_read_page_raw.

For nomral raw read, all the subpages will be read so
BIT(chip->ecc.steps) - 1 is used for qcom_nandc_read_page_raw.

While for erased page raw read, only those sub pages will be
read for which the controller gives the uncorrectable error.

Thanks,
Abhishek

>
>> }
>>
>> /* implements ecc->read_oob() */

2018-04-12 07:17:31

by Abhishek Sahu

[permalink] [raw]
Subject: Re: [PATCH 5/9] mtd: nand: qcom: parse read errors for read oob also

On 2018-04-10 15:33, Miquel Raynal wrote:
> Hi Abhishek,
>
> On Wed, 4 Apr 2018 18:12:21 +0530, Abhishek Sahu
> <[email protected]> wrote:
>
>> read_page and read_oob both calls the read_page_ecc function.
>> The QCOM NAND controller protect the OOB available bytes with
>> ECC so read errors should be checked for read_oob also. Now
>> this patch moves the error checking code inside read_page_ecc
>> so caller does not have to check explicitly for read errors.
>>
>> Signed-off-by: Abhishek Sahu <[email protected]>
>
> Nitpick: the prefix should be "mtd: rawnand: qcom: " now as this driver
> has been moved to drivers/mtd/nand/raw/.
>
> Otherwise:
> Reviewed-by: Miquel Raynal <[email protected]>

Thanks Miquel for your review.

I will update the same in this patch and other patches.
and rebase my patches over 4.17-rc1 once its available.

Thanks,
Abhishek

2018-04-12 07:37:22

by Abhishek Sahu

[permalink] [raw]
Subject: Re: [PATCH 7/9] mtd: nand: qcom: check for operation errors in case of raw read

On 2018-04-10 15:42, Miquel Raynal wrote:
> Hi Abhishek,
>
> On Wed, 4 Apr 2018 18:12:23 +0530, Abhishek Sahu
> <[email protected]> wrote:
>
>> Currently there is no error checking for raw read. For raw
>> reads, there won’t be any ECC failure but the operational
>> failures are possible so schedule the NAND_FLASH_STATUS read
>> after each codeword.
>>
>> Signed-off-by: Abhishek Sahu <[email protected]>
>> ---
>> drivers/mtd/nand/qcom_nandc.c | 56
>> +++++++++++++++++++++++++++++++++++--------
>> 1 file changed, 46 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/mtd/nand/qcom_nandc.c
>> b/drivers/mtd/nand/qcom_nandc.c
>> index dce97e8..40c790e 100644
>> --- a/drivers/mtd/nand/qcom_nandc.c
>> +++ b/drivers/mtd/nand/qcom_nandc.c
>> @@ -1099,7 +1099,8 @@ static void config_nand_page_read(struct
>> qcom_nand_controller *nandc)
>> * Helper to prepare DMA descriptors for configuring registers
>> * before reading each codeword in NAND page.
>> */
>> -static void config_nand_cw_read(struct qcom_nand_controller *nandc)
>> +static void
>> +config_nand_cw_read(struct qcom_nand_controller *nandc, bool use_ecc)
>> {
>> if (nandc->props->is_bam)
>> write_reg_dma(nandc, NAND_READ_LOCATION_0, 4,
>> @@ -1108,19 +1109,25 @@ static void config_nand_cw_read(struct
>> qcom_nand_controller *nandc)
>> write_reg_dma(nandc, NAND_FLASH_CMD, 1, NAND_BAM_NEXT_SGL);
>> write_reg_dma(nandc, NAND_EXEC_CMD, 1, NAND_BAM_NEXT_SGL);
>>
>> - read_reg_dma(nandc, NAND_FLASH_STATUS, 2, 0);
>> - read_reg_dma(nandc, NAND_ERASED_CW_DETECT_STATUS, 1,
>> - NAND_BAM_NEXT_SGL);
>> + if (use_ecc) {
>> + read_reg_dma(nandc, NAND_FLASH_STATUS, 2, 0);
>> + read_reg_dma(nandc, NAND_ERASED_CW_DETECT_STATUS, 1,
>> + NAND_BAM_NEXT_SGL);
>> + } else {
>> + read_reg_dma(nandc, NAND_FLASH_STATUS, 1, NAND_BAM_NEXT_SGL);
>> + }
>> }
>>
>> /*
>> * Helper to prepare dma descriptors to configure registers needed
>> for reading a
>> * single codeword in page
>> */
>> -static void config_nand_single_cw_page_read(struct
>> qcom_nand_controller *nandc)
>> +static void
>> +config_nand_single_cw_page_read(struct qcom_nand_controller *nandc,
>> + bool use_ecc)
>> {
>> config_nand_page_read(nandc);
>> - config_nand_cw_read(nandc);
>> + config_nand_cw_read(nandc, use_ecc);
>> }
>>
>> /*
>> @@ -1201,7 +1208,7 @@ static int nandc_param(struct qcom_nand_host
>> *host)
>> nandc->buf_count = 512;
>> memset(nandc->data_buffer, 0xff, nandc->buf_count);
>>
>> - config_nand_single_cw_page_read(nandc);
>> + config_nand_single_cw_page_read(nandc, false);
>>
>> read_data_dma(nandc, FLASH_BUF_ACC, nandc->data_buffer,
>> nandc->buf_count, 0);
>> @@ -1565,6 +1572,23 @@ struct read_stats {
>> __le32 erased_cw;
>> };
>>
>> +/* reads back FLASH_STATUS register set by the controller */
>> +static int check_flash_errors(struct qcom_nand_host *host, int
>> cw_cnt)
>> +{
>> + struct nand_chip *chip = &host->chip;
>> + struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
>> + int i;
>> +
>> + for (i = 0; i < cw_cnt; i++) {
>> + u32 flash = le32_to_cpu(nandc->reg_read_buf[i]);
>> +
>> + if (flash & (FS_OP_ERR | FS_MPU_ERR))
>> + return -EIO;
>
> This is already checked in parse_read_error(), maybe it would be
> preferable to have different path inside this function depending on the
> 'raw' nature of the operation?
>

Thanks Miquel,

The parse_read_error will be called only for reads with ECC enabled
which uses 3 status registers. It has other code also related with
erased page detection and more code will be added in last patch
for bitflip detection.

For all others cases, only one status register FLASH_STATUS needs
to be checked and this check_flash_errors does the same.

>> + }
>> +
>> + return 0;
>> +}
>> +
>> /*
>> * reads back status registers set by the controller to notify page
>> read
>> * errors. this is equivalent to what 'ecc->correct()' would do.
>> @@ -1707,7 +1731,7 @@ static int read_page_ecc(struct qcom_nand_host
>> *host, u8 *data_buf,
>> }
>> }
>>
>> - config_nand_cw_read(nandc);
>> + config_nand_cw_read(nandc, true);
>>
>> if (data_buf)
>> read_data_dma(nandc, FLASH_BUF_ACC, data_buf,
>> @@ -1771,7 +1795,7 @@ static int copy_last_cw(struct qcom_nand_host
>> *host, int page)
>> set_address(host, host->cw_size * (ecc->steps - 1), page);
>> update_rw_regs(host, 1, true);
>>
>> - config_nand_single_cw_page_read(nandc);
>> + config_nand_single_cw_page_read(nandc, host->use_ecc);
>>
>> read_data_dma(nandc, FLASH_BUF_ACC, nandc->data_buffer, size, 0);
>>
>> @@ -1781,6 +1805,15 @@ static int copy_last_cw(struct qcom_nand_host
>> *host, int page)
>>
>> free_descs(nandc);
>>
>> + if (!ret) {
>> + if (host->use_ecc)
>> + ret = parse_read_errors(host, nandc->data_buffer,
>> + nandc->data_buffer + size,
>> + true);
>> + else
>> + ret = check_flash_errors(host, 1);
>
> This way you would avoid this ^
>
>> + }
>> +
>
> As a general way, I don't like very much this kind of error checking
> structure:
>
> if (!ret)
> ret = something();
> ...
> return ret;
>
> I would rather prefer:
>
> if (ret)
> return ret;
>
> return something();
>
>> return ret;
>> }
>>

Yes. That would make it more readable.
I will fix that.

>> @@ -1854,7 +1887,7 @@ static int qcom_nandc_read_page_raw(struct
>> mtd_info *mtd,
>> nandc_set_read_loc(nandc, 3, read_loc, oob_size2, 1);
>> }
>>
>> - config_nand_cw_read(nandc);
>> + config_nand_cw_read(nandc, false);
>>
>> read_data_dma(nandc, reg_off, data_buf, data_size1, 0);
>> reg_off += data_size1;
>> @@ -1878,6 +1911,9 @@ static int qcom_nandc_read_page_raw(struct
>> mtd_info *mtd,
>>
>> free_descs(nandc);
>>
>> + if (!ret)
>> + ret = check_flash_errors(host, ecc->steps);
>> +
>
> There is not point in doing ret = ... if you return 0 right after.
> Please check what would be the most appropriate.
>

Thanks Miquel for noticing it.
This 'return 0' was present from the initial commit itself.
I will raise separate patch to fix this.

Thanks,
Abhishek

>> return 0;
>> }
>>
>
> Thanks,
> Miquèl

2018-04-12 08:06:52

by Abhishek Sahu

[permalink] [raw]
Subject: Re: [PATCH 9/9] mtd: nand: qcom: erased page bitflips detection

On 2018-04-10 16:00, Miquel Raynal wrote:
> Hi Abhishek,
>
> On Wed, 4 Apr 2018 18:12:25 +0530, Abhishek Sahu
> <[email protected]> wrote:
>
>> Some of the newer nand parts can have bit flips in an erased
>> page due to the process technology used. In this case, qpic
>
> AFAIK, this has always been possible, it was just rare.
>

Yes Miquel. It was rare earlier.
Now, we are observing this more for newer parts coming.

>> nand controller is not able to identify that page as an erased
>> page. Currently the driver calls nand_check_erased_ecc_chunk for
>> identifying the erased pages but this won’t work always since the
>> checking is being with ECC engine returned data. In case of
>> bitflips, the ECC engine tries to correct the data and then it
>> generates the uncorrectable error. Now, this data is not equal to
>> original raw data. For erased CW identification, the raw data
>> should be read again from NAND device and this
>> nand_check_erased_ecc_chunk function should be called for raw
>> data only.
>
> Absolutely.
>
>>
>> Now following logic is being added to identify the erased
>> codeword bitflips.
>>
>> 1. In most of the case, not all the codewords will have bitflips
>> and only single CW will have bitflips. So, there is no need to
>> read the complete raw page data. The NAND raw read can be
>> scheduled for any CW in page. The NAND controller works on CW
>> basis and it will update the status register after each CW read.
>> Maintain the bitmask for the CW which generated the uncorrectable
>> error.
>> 2. Schedule the raw flash read from NAND flash device to
>> NAND controller buffer for all these CWs between first and last
>> uncorrectable errors CWs. Copy the content from NAND controller
>> buffer to actual data buffer only for the uncorrectable errors
>> CWs so that other CW data content won’t be affected, and
>> unnecessary data copy can be avoided.
>
> In case of uncorrectable error, the penalty is huge anyway.
>

Yes. We can't avoid that.
But we are reducing that by doing raw read for few subpages in
which we got uncorrectale error.

>> 3. Both DATA and OOB need to be checked for number of 0. The
>> top-level API can be called with only data buf or oob buf so use
>> chip->databuf if data buf is null and chip->oob_poi if
>> oob buf is null for copying the raw bytes temporarily.
>
> You can do that. But when you do, you should tell the core you used
> that buffer and that it cannot rely on what is inside. Please
> invalidate the page cache with:
>
> chip->pagebuf = -1;
>

Thanks Miquel. I will check and update the patch.

>> 4. For each CW, check the number of 0 in cw_data and usable
>> oob bytes, The bbm and spare bytes bit flip won’t affect the ECC
>> so don’t check the number of bitflips in this area.
>
> OOB is an area in which you are supposed to find the BBM, the ECC bytes
> and the spare bytes. Spare bytes == usable OOB bytes. And the BBM
> should be protected too. I don't get this sentence but I don't see its
> application neither in the code?
>

QCOM NAND layout does not support the BBM ECC protection.

IN OOB,

For all the possible layouts (4 bit RS/4 bit BCH/8 bit BCH)
it has 16 usable OOB bytes which is protected with ECC.

All the bytes in OOB other than BBM, ECC bytes and usable
OOB bytes are ununsed.

You can refer qcom_nand_host_setup for layout detail.

Thanks,
Abhishek



2018-04-12 10:03:38

by Abhishek Sahu

[permalink] [raw]
Subject: Re: [PATCH 1/9] mtd: nand: qcom: use the ecc strength from device parameter

On 2018-04-10 13:37, Boris Brezillon wrote:
> On Tue, 10 Apr 2018 09:55:58 +0200
> Miquel Raynal <[email protected]> wrote:
>
>> > Hi Abhishek,
>> >
>> > On Tue, 10 Apr 2018 11:39:35 +0530, Abhishek Sahu
>> > <[email protected]> wrote:
>> >
>> > > On 2018-04-06 18:01, Miquel Raynal wrote:
>> > > > Hi Abhishek,
>> > > >
>> > > > On Wed, 4 Apr 2018 18:12:17 +0530, Abhishek Sahu
>> > > > <[email protected]> wrote:
>> > > >
>> > > >> Currently the driver uses the ECC strength specified in
>> > > >> device tree. The ONFI or JEDEC device parameter page
>> > > >> contains the ‘ECC correctability’ field which indicates the
>> > > >> number of bits that the host should be able to correct per
>> > > >> 512 bytes of data.
>> > > >
>> > > > This is misleading. This field is not about the controller but rather
>> > > > the chip requirements in terms of minimal strength for nominal use.
>> > > >
>> > >
>> > > Thanks Miquel.
>> > >
>> > > Yes. Its NAND chip requirement. I have used the description for
>> > > NAND ONFI param page
>> > >
>> > > 5.6.1.24. Byte 112: Number of bits ECC correctability
>> > > This field indicates the number of bits that the host should be
>> > > able to correct per 512 bytes of data.
>> > >
>> > > >> The ecc correctability is assigned in
>> > > >> chip parameter during device probe time. QPIC/EBI2 NAND
>> > > >> supports 4/8-bit ecc correction. The Same kind of board
>> > > >> can have different NAND parts so use the ecc strength
>> > > >> from device parameter (if its non zero) instead of
>> > > >> device tree.
>> > > >
>> > > > That is not what you do.
>> > > >
>> > > > What you do is forcing the strength to be 8-bit per ECC chunk if the
>> > > > NAND chip requires at least 8-bit/chunk strength.
>> > > >
>> > > > The DT property is here to force a strength. Otherwise, Linux will
>> > > > propose to the NAND controller to use the minimum strength required by
>> > > > the chip (from either the ONFI/JEDEC parameter page or from a static
>> > > > table).
>> > > >
>> > >
>> > > The main problem is that the same kind of boards can have different
>> > > NAND parts.
>> > >
>> > > Lets assume, we have following 2 cases.
>> > >
>> > > 1. Non ONFI/JEDEC device for which chip->ecc_strength_ds
>> > > will be zero. In this case, the ecc->strength from DT
>> > > will be used
>> >
>> > No, the strength from DT will always be used if the property is
>> > present, no matter the type of chip.
>> >
>> > > 2. ONFI/JEDEC device for which chip->ecc_strength_ds > 8.
>> > > Since QCOM nand controller can not support
>> > > ECC correction greater than 8 bits so we can use 8 bit ECC
>> > > itself instead of failing NAND boot completely.
>> >
>> > I understand that. But this is still not what you do.
>> >
>> > >
>> > > > IMHO, you have two solutions:
>> > > > 1/ Remove these properties from the board DT (breaks DT backward
>> > > > compatibility though);
>> > >
>> > > - nand-ecc-strength: This is optional property in nand.txt and
>> > > Required property in qcom_nandc.txt.
>> >
>> > Well, this property is not controller specific but chip specific. The
>> > controller driver does not rely on it, so this property should not be
>> > required.
>> >
>> > > We can't remove since
>> > > if the device is Non ONFI/JEDEC, then ecc strength will come
>> > > from DT only.
>> >
>> > We can remove it and let the core handle this (as this is generic to
>> > all raw NANDs and not specific to this controller driver). Try it out!

Thanks Boris and Miquel for your inputs.

Just want to confirm if already its implemented in core layer
or shall I explore regrading this option.

I checked by removing this property alone from dtsi and it was
failing with

"Driver must set ecc.strength when using hardware ECC"

I checked the code in nand_base.c also but couldn't get
anything related with this.

Thanks,
Abhishek

>> >
>> > However if the defaults value do not match your expectations, I think
>> > you can add your non-ONFI/JEDEC chip in 'nand_ids.c', this should fill
>> > your strength_ds field and let you avoid using these properties.
>>
>> Actually nand_ids.c should not be filled anymore, instead you can
>> implement this detection thanks to the part full name in the vendor
>> code nand_samsung.c, nand_micron.c, nand_macronix.c, nand_hynix.c,
>> etc.
>
> Usually you don't have to go that far, and the ECC requirements are
> encoded somewhere in the ID (after byte 2). When that's not the
> case, you may have to check the full ID.
>
>> Depending on what part you are using, it might already work.
>
> Yep.

2018-04-22 16:21:10

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH 8/9] mtd: nand: qcom: helper function for raw read

Hi Abhishek,

On Thu, 12 Apr 2018 12:36:42 +0530, Abhishek Sahu
<[email protected]> wrote:

> On 2018-04-10 15:14, Miquel Raynal wrote:
> > Hi Abhishek,
> > > On Wed, 4 Apr 2018 18:12:24 +0530, Abhishek Sahu
> > <[email protected]> wrote:
> > >> This patch does minor code reorganization for raw reads.
> >> Currently the raw read is required for complete page but for
> >> subsequent patches related with erased codeword bit flips
> >> detection, only few CW should be read. So, this patch adds
> >> helper function and introduces the read CW bitmask which
> >> specifies which CW reads are required in complete page.
> > > I am not sure this is the right approach for subpage reads. If the
> > controller supports it, you should just enable it in chip->options.
> >
> Thanks Miquel.
>
> It is internal to this file only. This patch makes one static helper
> function which provides the support to read subpages.

Drivers should expose raw helpers, why keeping this helper static then?

>
> >> >> Signed-off-by: Abhishek Sahu <[email protected]>
> >> ---
> >> drivers/mtd/nand/qcom_nandc.c | 186 >> +++++++++++++++++++++++++-----------------
> >> 1 file changed, 110 insertions(+), 76 deletions(-)
> >> >> diff --git a/drivers/mtd/nand/qcom_nandc.c >> b/drivers/mtd/nand/qcom_nandc.c
> >> index 40c790e..f5d1fa4 100644
> >> --- a/drivers/mtd/nand/qcom_nandc.c
> >> +++ b/drivers/mtd/nand/qcom_nandc.c
> >> @@ -1590,6 +1590,114 @@ static int check_flash_errors(struct >> qcom_nand_host *host, int cw_cnt)
> >> }
> >> >> /*
> >> + * Helper to perform the page raw read operation. The read_cw_mask >> will be
> >> + * used to specify the codewords for which the data should be read. >> The
> >> + * single page contains multiple CW. Sometime, only few CW data is >> required
> >> + * in complete page. Also, start address will be determined with
> >> + * this CW mask to skip unnecessary data copy from NAND flash device. >> Then,
> >> + * actual data copy from NAND controller to data buffer will be done >> only
> >> + * for the CWs which have the mask set.
> >> + */
> >> +static int
> >> +nandc_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip,
> >> + u8 *data_buf, u8 *oob_buf,
> >> + int page, unsigned long read_cw_mask)
> >> +{
> >> + struct qcom_nand_host *host = to_qcom_nand_host(chip);
> >> + struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
> >> + struct nand_ecc_ctrl *ecc = &chip->ecc;
> >> + int i, ret;
> >> + int read_loc, start_step, last_step;
> >> +
> >> + nand_read_page_op(chip, page, 0, NULL, 0);
> >> +
> >> + host->use_ecc = false;
> >> + start_step = ffs(read_cw_mask) - 1;
> >> + last_step = fls(read_cw_mask);
> >> +
> >> + clear_bam_transaction(nandc);
> >> + set_address(host, host->cw_size * start_step, page);
> >> + update_rw_regs(host, last_step - start_step, true);
> >> + config_nand_page_read(nandc);
> >> +
> >> + for (i = start_step; i < last_step; i++) {
> >> + int data_size1, data_size2, oob_size1, oob_size2;
> >> + int reg_off = FLASH_BUF_ACC;
> >> +
> >> + data_size1 = mtd->writesize - host->cw_size * (ecc->steps - 1);
> >> + oob_size1 = host->bbm_size;
> >> +
> >> + if (i == (ecc->steps - 1)) {
> >> + data_size2 = ecc->size - data_size1 -
> >> + ((ecc->steps - 1) << 2);
> >> + oob_size2 = (ecc->steps << 2) + host->ecc_bytes_hw +
> >> + host->spare_bytes;
> >> + } else {
> >> + data_size2 = host->cw_data - data_size1;
> >> + oob_size2 = host->ecc_bytes_hw + host->spare_bytes;
> >> + }
> >> +
> >> + /*
> >> + * Don't perform actual data copy from NAND controller to data
> >> + * buffer through DMA for this codeword
> >> + */
> >> + if (!(read_cw_mask & BIT(i))) {
> >> + if (nandc->props->is_bam)
> >> + nandc_set_read_loc(nandc, 0, 0, 0, 1);
> >> +
> >> + config_nand_cw_read(nandc, false);
> >> +
> >> + data_buf += data_size1 + data_size2;
> >> + oob_buf += oob_size1 + oob_size2;
> >> +
> >> + continue;
> >> + }
> >> +
> >> + if (nandc->props->is_bam) {
> >> + read_loc = 0;
> >> + nandc_set_read_loc(nandc, 0, read_loc, data_size1, 0);
> >> + read_loc += data_size1;
> >> +
> >> + nandc_set_read_loc(nandc, 1, read_loc, oob_size1, 0);
> >> + read_loc += oob_size1;
> >> +
> >> + nandc_set_read_loc(nandc, 2, read_loc, data_size2, 0);
> >> + read_loc += data_size2;
> >> +
> >> + nandc_set_read_loc(nandc, 3, read_loc, oob_size2, 1);
> >> + }
> >> +
> >> + config_nand_cw_read(nandc, false);
> >> +
> >> + read_data_dma(nandc, reg_off, data_buf, data_size1, 0);
> >> + reg_off += data_size1;
> >> + data_buf += data_size1;
> >> +
> >> + read_data_dma(nandc, reg_off, oob_buf, oob_size1, 0);
> >> + reg_off += oob_size1;
> >> + oob_buf += oob_size1;
> >> +
> >> + read_data_dma(nandc, reg_off, data_buf, data_size2, 0);
> >> + reg_off += data_size2;
> >> + data_buf += data_size2;
> >> +
> >> + read_data_dma(nandc, reg_off, oob_buf, oob_size2, 0);
> >> + oob_buf += oob_size2;
> >> + }
> >> +
> >> + ret = submit_descs(nandc);
> >> + if (ret)
> >> + dev_err(nandc->dev, "failure to read raw page\n");
> >> +
> >> + free_descs(nandc);
> >> +
> >> + if (!ret)
> >> + ret = check_flash_errors(host, last_step - start_step);
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +/*
> >> * reads back status registers set by the controller to notify page >> read
> >> * errors. this is equivalent to what 'ecc->correct()' would do.
> >> */
> >> @@ -1839,82 +1947,8 @@ static int qcom_nandc_read_page_raw(struct >> mtd_info *mtd,
> >> struct nand_chip *chip, uint8_t *buf,
> >> int oob_required, int page)
> >> {
> >> - struct qcom_nand_host *host = to_qcom_nand_host(chip);
> >> - struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
> >> - u8 *data_buf, *oob_buf;
> >> - struct nand_ecc_ctrl *ecc = &chip->ecc;
> >> - int i, ret;
> >> - int read_loc;
> >> -
> >> - nand_read_page_op(chip, page, 0, NULL, 0);
> >> - data_buf = buf;
> >> - oob_buf = chip->oob_poi;
> >> -
> >> - host->use_ecc = false;
> >> -
> >> - clear_bam_transaction(nandc);
> >> - update_rw_regs(host, ecc->steps, true);
> >> - config_nand_page_read(nandc);
> >> -
> >> - for (i = 0; i < ecc->steps; i++) {
> >> - int data_size1, data_size2, oob_size1, oob_size2;
> >> - int reg_off = FLASH_BUF_ACC;
> >> -
> >> - data_size1 = mtd->writesize - host->cw_size * (ecc->steps - 1);
> >> - oob_size1 = host->bbm_size;
> >> -
> >> - if (i == (ecc->steps - 1)) {
> >> - data_size2 = ecc->size - data_size1 -
> >> - ((ecc->steps - 1) << 2);
> >> - oob_size2 = (ecc->steps << 2) + host->ecc_bytes_hw +
> >> - host->spare_bytes;
> >> - } else {
> >> - data_size2 = host->cw_data - data_size1;
> >> - oob_size2 = host->ecc_bytes_hw + host->spare_bytes;
> >> - }
> >> -
> >> - if (nandc->props->is_bam) {
> >> - read_loc = 0;
> >> - nandc_set_read_loc(nandc, 0, read_loc, data_size1, 0);
> >> - read_loc += data_size1;
> >> -
> >> - nandc_set_read_loc(nandc, 1, read_loc, oob_size1, 0);
> >> - read_loc += oob_size1;
> >> -
> >> - nandc_set_read_loc(nandc, 2, read_loc, data_size2, 0);
> >> - read_loc += data_size2;
> >> -
> >> - nandc_set_read_loc(nandc, 3, read_loc, oob_size2, 1);
> >> - }
> >> -
> >> - config_nand_cw_read(nandc, false);
> >> -
> >> - read_data_dma(nandc, reg_off, data_buf, data_size1, 0);
> >> - reg_off += data_size1;
> >> - data_buf += data_size1;
> >> -
> >> - read_data_dma(nandc, reg_off, oob_buf, oob_size1, 0);
> >> - reg_off += oob_size1;
> >> - oob_buf += oob_size1;
> >> -
> >> - read_data_dma(nandc, reg_off, data_buf, data_size2, 0);
> >> - reg_off += data_size2;
> >> - data_buf += data_size2;
> >> -
> >> - read_data_dma(nandc, reg_off, oob_buf, oob_size2, 0);
> >> - oob_buf += oob_size2;
> >> - }
> >> -
> >> - ret = submit_descs(nandc);
> >> - if (ret)
> >> - dev_err(nandc->dev, "failure to read raw page\n");
> >> -
> >> - free_descs(nandc);
> >> -
> >> - if (!ret)
> >> - ret = check_flash_errors(host, ecc->steps);
> >> -
> >> - return 0;
> >> + return nandc_read_page_raw(mtd, chip, buf, chip->oob_poi, page,
> >> + BIT(chip->ecc.steps) - 1);
> > > I don't understand this. chip->ecc.steps is constant, right? So you
> > always ask for the same subpage?
>
> We need to do raw read for subpages in which we got uncorrectable
> error in next patch for erased page bitflip detection. This patch does
> reorganization of raw read and moves common code in helper function
> nandc_read_page_raw.
>
> For nomral raw read, all the subpages will be read so
> BIT(chip->ecc.steps) - 1 is used for qcom_nandc_read_page_raw.
>
> While for erased page raw read, only those sub pages will be
> read for which the controller gives the uncorrectable error.

Still not okay: the driver should expose a way to do raw reads no
matter the length and the start and you should use that in a generic
way.

>
> Thanks,
> Abhishek
>
> > >> }
> >> >> /* implements ecc->read_oob() */



--
Miquel Raynal, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

2018-04-22 16:27:22

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH 4/9] mtd: nand: qcom: fix null pointer access for erased buffer detection

Hi Abhishek,

On Thu, 12 Apr 2018 12:24:16 +0530, Abhishek Sahu
<[email protected]> wrote:

> On 2018-04-10 14:42, Miquel Raynal wrote:
> > Hi Abhishek,
> > > On Wed, 4 Apr 2018 18:12:20 +0530, Abhishek Sahu
> > <[email protected]> wrote:
> > >> parse_read_errors can be called with only oob buf also in which
> >> case data_buf will be NULL. If data_buf is NULL, then don’t
> >> treat this page as completely erased in case of ECC uncorrectable
> >> error.
> >> >> Signed-off-by: Abhishek Sahu <[email protected]>
> >> ---
> >> drivers/mtd/nand/qcom_nandc.c | 7 +++++--
> >> 1 file changed, 5 insertions(+), 2 deletions(-)
> >> >> diff --git a/drivers/mtd/nand/qcom_nandc.c >> b/drivers/mtd/nand/qcom_nandc.c
> >> index 57c16a6..0ebcc55 100644
> >> --- a/drivers/mtd/nand/qcom_nandc.c
> >> +++ b/drivers/mtd/nand/qcom_nandc.c
> >> @@ -1607,9 +1607,11 @@ static int parse_read_errors(struct >> qcom_nand_host *host, u8 *data_buf,
> >> if (host->bch_enabled) {
> >> erased = (erased_cw & ERASED_CW) == ERASED_CW ?
> >> true : false;
> > > Why the parse_read_errors() function could not be called without
> > data_buf when using BCH? Are you sure the situation can only happen
> > without it?
> >
> host->bch_enabled case is different where controller itself tells
> regarding erased page in status register.
>
> > Would the following apply here too, with a:
> >
> erased_chunk_check_and_fixup will be used only for 4 bit RS ECC
> code in which there is no support from HW for erased page detection
> and we need to check few data bytes value.

So please explain this with a comment.

Thanks,
Miquèl

2018-04-22 16:36:10

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH 1/9] mtd: nand: qcom: use the ecc strength from device parameter

Hi Abhishek,

On Thu, 12 Apr 2018 15:29:48 +0530, Abhishek Sahu
<[email protected]> wrote:

> On 2018-04-10 13:37, Boris Brezillon wrote:
> > On Tue, 10 Apr 2018 09:55:58 +0200
> > Miquel Raynal <[email protected]> wrote:
> > >> > Hi Abhishek,
> >> >
> >> > On Tue, 10 Apr 2018 11:39:35 +0530, Abhishek Sahu
> >> > <[email protected]> wrote:
> >> >
> >> > > On 2018-04-06 18:01, Miquel Raynal wrote:
> >> > > > Hi Abhishek,
> >> > > >
> >> > > > On Wed, 4 Apr 2018 18:12:17 +0530, Abhishek Sahu
> >> > > > <[email protected]> wrote:
> >> > > >
> >> > > >> Currently the driver uses the ECC strength specified in
> >> > > >> device tree. The ONFI or JEDEC device parameter page
> >> > > >> contains the ‘ECC correctability’ field which indicates the
> >> > > >> number of bits that the host should be able to correct per
> >> > > >> 512 bytes of data.
> >> > > >
> >> > > > This is misleading. This field is not about the controller but rather
> >> > > > the chip requirements in terms of minimal strength for nominal use.
> >> > > >
> >> > >
> >> > > Thanks Miquel.
> >> > >
> >> > > Yes. Its NAND chip requirement. I have used the description for
> >> > > NAND ONFI param page
> >> > >
> >> > > 5.6.1.24. Byte 112: Number of bits ECC correctability
> >> > > This field indicates the number of bits that the host should be
> >> > > able to correct per 512 bytes of data.
> >> > >
> >> > > >> The ecc correctability is assigned in
> >> > > >> chip parameter during device probe time. QPIC/EBI2 NAND
> >> > > >> supports 4/8-bit ecc correction. The Same kind of board
> >> > > >> can have different NAND parts so use the ecc strength
> >> > > >> from device parameter (if its non zero) instead of
> >> > > >> device tree.
> >> > > >
> >> > > > That is not what you do.
> >> > > >
> >> > > > What you do is forcing the strength to be 8-bit per ECC chunk if the
> >> > > > NAND chip requires at least 8-bit/chunk strength.
> >> > > >
> >> > > > The DT property is here to force a strength. Otherwise, Linux will
> >> > > > propose to the NAND controller to use the minimum strength required by
> >> > > > the chip (from either the ONFI/JEDEC parameter page or from a static
> >> > > > table).
> >> > > >
> >> > >
> >> > > The main problem is that the same kind of boards can have different
> >> > > NAND parts.
> >> > >
> >> > > Lets assume, we have following 2 cases.
> >> > >
> >> > > 1. Non ONFI/JEDEC device for which chip->ecc_strength_ds
> >> > > will be zero. In this case, the ecc->strength from DT
> >> > > will be used
> >> >
> >> > No, the strength from DT will always be used if the property is
> >> > present, no matter the type of chip.
> >> >
> >> > > 2. ONFI/JEDEC device for which chip->ecc_strength_ds > 8.
> >> > > Since QCOM nand controller can not support
> >> > > ECC correction greater than 8 bits so we can use 8 bit ECC
> >> > > itself instead of failing NAND boot completely.
> >> >
> >> > I understand that. But this is still not what you do.
> >> >
> >> > >
> >> > > > IMHO, you have two solutions:
> >> > > > 1/ Remove these properties from the board DT (breaks DT backward
> >> > > > compatibility though);
> >> > >
> >> > > - nand-ecc-strength: This is optional property in nand.txt and
> >> > > Required property in qcom_nandc.txt.
> >> >
> >> > Well, this property is not controller specific but chip specific. The
> >> > controller driver does not rely on it, so this property should not be
> >> > required.
> >> >
> >> > > We can't remove since
> >> > > if the device is Non ONFI/JEDEC, then ecc strength will come
> >> > > from DT only.
> >> >
> >> > We can remove it and let the core handle this (as this is generic to
> >> > all raw NANDs and not specific to this controller driver). Try it out!
>
> Thanks Boris and Miquel for your inputs.
>
> Just want to confirm if already its implemented in core layer
> or shall I explore regrading this option.
>
> I checked by removing this property alone from dtsi and it was
> failing with
>
> "Driver must set ecc.strength when using hardware ECC"
>
> I checked the code in nand_base.c also but couldn't get
> anything related with this.

I don't know exactly what you did but you should have a look at what
lead you to this path:
https://elixir.bootlin.com/linux/v4.17-rc1/source/drivers/mtd/nand/raw/nand_base.c#L6421

Thanks,
Miquèl


2018-04-23 06:30:04

by Abhishek Sahu

[permalink] [raw]
Subject: Re: [PATCH 8/9] mtd: nand: qcom: helper function for raw read

On 2018-04-22 21:49, Miquel Raynal wrote:
> Hi Abhishek,
>
> On Thu, 12 Apr 2018 12:36:42 +0530, Abhishek Sahu
> <[email protected]> wrote:
>
>> On 2018-04-10 15:14, Miquel Raynal wrote:
>> > Hi Abhishek,
>> > > On Wed, 4 Apr 2018 18:12:24 +0530, Abhishek Sahu
>> > <[email protected]> wrote:
>> > >> This patch does minor code reorganization for raw reads.
>> >> Currently the raw read is required for complete page but for
>> >> subsequent patches related with erased codeword bit flips
>> >> detection, only few CW should be read. So, this patch adds
>> >> helper function and introduces the read CW bitmask which
>> >> specifies which CW reads are required in complete page.
>> > > I am not sure this is the right approach for subpage reads. If the
>> > controller supports it, you should just enable it in chip->options.
>> >
>> Thanks Miquel.
>>
>> It is internal to this file only. This patch makes one static helper
>> function which provides the support to read subpages.
>
> Drivers should expose raw helpers, why keeping this helper static then?
>
>>
>> >> >> Signed-off-by: Abhishek Sahu <[email protected]>
>> >> ---
>> >> drivers/mtd/nand/qcom_nandc.c | 186 >> +++++++++++++++++++++++++-----------------
>> >> 1 file changed, 110 insertions(+), 76 deletions(-)
>> >> >> diff --git a/drivers/mtd/nand/qcom_nandc.c >> b/drivers/mtd/nand/qcom_nandc.c
>> >> index 40c790e..f5d1fa4 100644
>> >> --- a/drivers/mtd/nand/qcom_nandc.c
>> >> +++ b/drivers/mtd/nand/qcom_nandc.c
>> >> @@ -1590,6 +1590,114 @@ static int check_flash_errors(struct >> qcom_nand_host *host, int cw_cnt)
>> >> }
>> >> >> /*
>> >> + * Helper to perform the page raw read operation. The read_cw_mask >> will be
>> >> + * used to specify the codewords for which the data should be read. >> The
>> >> + * single page contains multiple CW. Sometime, only few CW data is >> required
>> >> + * in complete page. Also, start address will be determined with
>> >> + * this CW mask to skip unnecessary data copy from NAND flash device. >> Then,
>> >> + * actual data copy from NAND controller to data buffer will be done >> only
>> >> + * for the CWs which have the mask set.
>> >> + */
>> >> +static int
>> >> +nandc_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip,
>> >> + u8 *data_buf, u8 *oob_buf,
>> >> + int page, unsigned long read_cw_mask)
>> >> +{
>> >> + struct qcom_nand_host *host = to_qcom_nand_host(chip);
>> >> + struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
>> >> + struct nand_ecc_ctrl *ecc = &chip->ecc;
>> >> + int i, ret;
>> >> + int read_loc, start_step, last_step;
>> >> +
>> >> + nand_read_page_op(chip, page, 0, NULL, 0);
>> >> +
>> >> + host->use_ecc = false;
>> >> + start_step = ffs(read_cw_mask) - 1;
>> >> + last_step = fls(read_cw_mask);
>> >> +
>> >> + clear_bam_transaction(nandc);
>> >> + set_address(host, host->cw_size * start_step, page);
>> >> + update_rw_regs(host, last_step - start_step, true);
>> >> + config_nand_page_read(nandc);
>> >> +
>> >> + for (i = start_step; i < last_step; i++) {
>> >> + int data_size1, data_size2, oob_size1, oob_size2;
>> >> + int reg_off = FLASH_BUF_ACC;
>> >> +
>> >> + data_size1 = mtd->writesize - host->cw_size * (ecc->steps - 1);
>> >> + oob_size1 = host->bbm_size;
>> >> +
>> >> + if (i == (ecc->steps - 1)) {
>> >> + data_size2 = ecc->size - data_size1 -
>> >> + ((ecc->steps - 1) << 2);
>> >> + oob_size2 = (ecc->steps << 2) + host->ecc_bytes_hw +
>> >> + host->spare_bytes;
>> >> + } else {
>> >> + data_size2 = host->cw_data - data_size1;
>> >> + oob_size2 = host->ecc_bytes_hw + host->spare_bytes;
>> >> + }
>> >> +
>> >> + /*
>> >> + * Don't perform actual data copy from NAND controller to data
>> >> + * buffer through DMA for this codeword
>> >> + */
>> >> + if (!(read_cw_mask & BIT(i))) {
>> >> + if (nandc->props->is_bam)
>> >> + nandc_set_read_loc(nandc, 0, 0, 0, 1);
>> >> +
>> >> + config_nand_cw_read(nandc, false);
>> >> +
>> >> + data_buf += data_size1 + data_size2;
>> >> + oob_buf += oob_size1 + oob_size2;
>> >> +
>> >> + continue;
>> >> + }
>> >> +
>> >> + if (nandc->props->is_bam) {
>> >> + read_loc = 0;
>> >> + nandc_set_read_loc(nandc, 0, read_loc, data_size1, 0);
>> >> + read_loc += data_size1;
>> >> +
>> >> + nandc_set_read_loc(nandc, 1, read_loc, oob_size1, 0);
>> >> + read_loc += oob_size1;
>> >> +
>> >> + nandc_set_read_loc(nandc, 2, read_loc, data_size2, 0);
>> >> + read_loc += data_size2;
>> >> +
>> >> + nandc_set_read_loc(nandc, 3, read_loc, oob_size2, 1);
>> >> + }
>> >> +
>> >> + config_nand_cw_read(nandc, false);
>> >> +
>> >> + read_data_dma(nandc, reg_off, data_buf, data_size1, 0);
>> >> + reg_off += data_size1;
>> >> + data_buf += data_size1;
>> >> +
>> >> + read_data_dma(nandc, reg_off, oob_buf, oob_size1, 0);
>> >> + reg_off += oob_size1;
>> >> + oob_buf += oob_size1;
>> >> +
>> >> + read_data_dma(nandc, reg_off, data_buf, data_size2, 0);
>> >> + reg_off += data_size2;
>> >> + data_buf += data_size2;
>> >> +
>> >> + read_data_dma(nandc, reg_off, oob_buf, oob_size2, 0);
>> >> + oob_buf += oob_size2;
>> >> + }
>> >> +
>> >> + ret = submit_descs(nandc);
>> >> + if (ret)
>> >> + dev_err(nandc->dev, "failure to read raw page\n");
>> >> +
>> >> + free_descs(nandc);
>> >> +
>> >> + if (!ret)
>> >> + ret = check_flash_errors(host, last_step - start_step);
>> >> +
>> >> + return 0;
>> >> +}
>> >> +
>> >> +/*
>> >> * reads back status registers set by the controller to notify page >> read
>> >> * errors. this is equivalent to what 'ecc->correct()' would do.
>> >> */
>> >> @@ -1839,82 +1947,8 @@ static int qcom_nandc_read_page_raw(struct >> mtd_info *mtd,
>> >> struct nand_chip *chip, uint8_t *buf,
>> >> int oob_required, int page)
>> >> {
>> >> - struct qcom_nand_host *host = to_qcom_nand_host(chip);
>> >> - struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
>> >> - u8 *data_buf, *oob_buf;
>> >> - struct nand_ecc_ctrl *ecc = &chip->ecc;
>> >> - int i, ret;
>> >> - int read_loc;
>> >> -
>> >> - nand_read_page_op(chip, page, 0, NULL, 0);
>> >> - data_buf = buf;
>> >> - oob_buf = chip->oob_poi;
>> >> -
>> >> - host->use_ecc = false;
>> >> -
>> >> - clear_bam_transaction(nandc);
>> >> - update_rw_regs(host, ecc->steps, true);
>> >> - config_nand_page_read(nandc);
>> >> -
>> >> - for (i = 0; i < ecc->steps; i++) {
>> >> - int data_size1, data_size2, oob_size1, oob_size2;
>> >> - int reg_off = FLASH_BUF_ACC;
>> >> -
>> >> - data_size1 = mtd->writesize - host->cw_size * (ecc->steps - 1);
>> >> - oob_size1 = host->bbm_size;
>> >> -
>> >> - if (i == (ecc->steps - 1)) {
>> >> - data_size2 = ecc->size - data_size1 -
>> >> - ((ecc->steps - 1) << 2);
>> >> - oob_size2 = (ecc->steps << 2) + host->ecc_bytes_hw +
>> >> - host->spare_bytes;
>> >> - } else {
>> >> - data_size2 = host->cw_data - data_size1;
>> >> - oob_size2 = host->ecc_bytes_hw + host->spare_bytes;
>> >> - }
>> >> -
>> >> - if (nandc->props->is_bam) {
>> >> - read_loc = 0;
>> >> - nandc_set_read_loc(nandc, 0, read_loc, data_size1, 0);
>> >> - read_loc += data_size1;
>> >> -
>> >> - nandc_set_read_loc(nandc, 1, read_loc, oob_size1, 0);
>> >> - read_loc += oob_size1;
>> >> -
>> >> - nandc_set_read_loc(nandc, 2, read_loc, data_size2, 0);
>> >> - read_loc += data_size2;
>> >> -
>> >> - nandc_set_read_loc(nandc, 3, read_loc, oob_size2, 1);
>> >> - }
>> >> -
>> >> - config_nand_cw_read(nandc, false);
>> >> -
>> >> - read_data_dma(nandc, reg_off, data_buf, data_size1, 0);
>> >> - reg_off += data_size1;
>> >> - data_buf += data_size1;
>> >> -
>> >> - read_data_dma(nandc, reg_off, oob_buf, oob_size1, 0);
>> >> - reg_off += oob_size1;
>> >> - oob_buf += oob_size1;
>> >> -
>> >> - read_data_dma(nandc, reg_off, data_buf, data_size2, 0);
>> >> - reg_off += data_size2;
>> >> - data_buf += data_size2;
>> >> -
>> >> - read_data_dma(nandc, reg_off, oob_buf, oob_size2, 0);
>> >> - oob_buf += oob_size2;
>> >> - }
>> >> -
>> >> - ret = submit_descs(nandc);
>> >> - if (ret)
>> >> - dev_err(nandc->dev, "failure to read raw page\n");
>> >> -
>> >> - free_descs(nandc);
>> >> -
>> >> - if (!ret)
>> >> - ret = check_flash_errors(host, ecc->steps);
>> >> -
>> >> - return 0;
>> >> + return nandc_read_page_raw(mtd, chip, buf, chip->oob_poi, page,
>> >> + BIT(chip->ecc.steps) - 1);
>> > > I don't understand this. chip->ecc.steps is constant, right? So you
>> > always ask for the same subpage?
>>
>> We need to do raw read for subpages in which we got uncorrectable
>> error in next patch for erased page bitflip detection. This patch
>> does
>> reorganization of raw read and moves common code in helper function
>> nandc_read_page_raw.
>>
>> For nomral raw read, all the subpages will be read so
>> BIT(chip->ecc.steps) - 1 is used for qcom_nandc_read_page_raw.
>>
>> While for erased page raw read, only those sub pages will be
>> read for which the controller gives the uncorrectable error.
>
> Still not okay: the driver should expose a way to do raw reads no
> matter the length and the start and you should use that in a generic
> way.
>

Thanks Miquel.
I will explore regarding that.
Looks like, we can some helper like read_subpage.

Regards,
Abhishek


2018-04-23 06:31:18

by Abhishek Sahu

[permalink] [raw]
Subject: Re: [PATCH 4/9] mtd: nand: qcom: fix null pointer access for erased buffer detection

On 2018-04-22 21:55, Miquel Raynal wrote:
> Hi Abhishek,
>
> On Thu, 12 Apr 2018 12:24:16 +0530, Abhishek Sahu
> <[email protected]> wrote:
>
>> On 2018-04-10 14:42, Miquel Raynal wrote:
>> > Hi Abhishek,
>> > > On Wed, 4 Apr 2018 18:12:20 +0530, Abhishek Sahu
>> > <[email protected]> wrote:
>> > >> parse_read_errors can be called with only oob buf also in which
>> >> case data_buf will be NULL. If data_buf is NULL, then don’t
>> >> treat this page as completely erased in case of ECC uncorrectable
>> >> error.
>> >> >> Signed-off-by: Abhishek Sahu <[email protected]>
>> >> ---
>> >> drivers/mtd/nand/qcom_nandc.c | 7 +++++--
>> >> 1 file changed, 5 insertions(+), 2 deletions(-)
>> >> >> diff --git a/drivers/mtd/nand/qcom_nandc.c >> b/drivers/mtd/nand/qcom_nandc.c
>> >> index 57c16a6..0ebcc55 100644
>> >> --- a/drivers/mtd/nand/qcom_nandc.c
>> >> +++ b/drivers/mtd/nand/qcom_nandc.c
>> >> @@ -1607,9 +1607,11 @@ static int parse_read_errors(struct >> qcom_nand_host *host, u8 *data_buf,
>> >> if (host->bch_enabled) {
>> >> erased = (erased_cw & ERASED_CW) == ERASED_CW ?
>> >> true : false;
>> > > Why the parse_read_errors() function could not be called without
>> > data_buf when using BCH? Are you sure the situation can only happen
>> > without it?
>> >
>> host->bch_enabled case is different where controller itself tells
>> regarding erased page in status register.
>>
>> > Would the following apply here too, with a:
>> >
>> erased_chunk_check_and_fixup will be used only for 4 bit RS ECC
>> code in which there is no support from HW for erased page detection
>> and we need to check few data bytes value.
>
> So please explain this with a comment.
>
> Thanks,
> Miquèl

Sure Miquel.
I will do the same and update the patch with more comments.

Thanks,
Abhishek

2018-04-23 06:46:06

by Abhishek Sahu

[permalink] [raw]
Subject: Re: [PATCH 1/9] mtd: nand: qcom: use the ecc strength from device parameter

On 2018-04-22 22:04, Miquel Raynal wrote:
> Hi Abhishek,
>
> On Thu, 12 Apr 2018 15:29:48 +0530, Abhishek Sahu
> <[email protected]> wrote:
>
>> On 2018-04-10 13:37, Boris Brezillon wrote:
>> > On Tue, 10 Apr 2018 09:55:58 +0200
>> > Miquel Raynal <[email protected]> wrote:
>> > >> > Hi Abhishek,
>> >> >
>> >> > On Tue, 10 Apr 2018 11:39:35 +0530, Abhishek Sahu
>> >> > <[email protected]> wrote:
>> >> >
>> >> > > On 2018-04-06 18:01, Miquel Raynal wrote:
>> >> > > > Hi Abhishek,
>> >> > > >
>> >> > > > On Wed, 4 Apr 2018 18:12:17 +0530, Abhishek Sahu
>> >> > > > <[email protected]> wrote:
>> >> > > >
>> >> > > >> Currently the driver uses the ECC strength specified in
>> >> > > >> device tree. The ONFI or JEDEC device parameter page
>> >> > > >> contains the ‘ECC correctability’ field which indicates the
>> >> > > >> number of bits that the host should be able to correct per
>> >> > > >> 512 bytes of data.
>> >> > > >
>> >> > > > This is misleading. This field is not about the controller but rather
>> >> > > > the chip requirements in terms of minimal strength for nominal use.
>> >> > > >
>> >> > >
>> >> > > Thanks Miquel.
>> >> > >
>> >> > > Yes. Its NAND chip requirement. I have used the description for
>> >> > > NAND ONFI param page
>> >> > >
>> >> > > 5.6.1.24. Byte 112: Number of bits ECC correctability
>> >> > > This field indicates the number of bits that the host should be
>> >> > > able to correct per 512 bytes of data.
>> >> > >
>> >> > > >> The ecc correctability is assigned in
>> >> > > >> chip parameter during device probe time. QPIC/EBI2 NAND
>> >> > > >> supports 4/8-bit ecc correction. The Same kind of board
>> >> > > >> can have different NAND parts so use the ecc strength
>> >> > > >> from device parameter (if its non zero) instead of
>> >> > > >> device tree.
>> >> > > >
>> >> > > > That is not what you do.
>> >> > > >
>> >> > > > What you do is forcing the strength to be 8-bit per ECC chunk if the
>> >> > > > NAND chip requires at least 8-bit/chunk strength.
>> >> > > >
>> >> > > > The DT property is here to force a strength. Otherwise, Linux will
>> >> > > > propose to the NAND controller to use the minimum strength required by
>> >> > > > the chip (from either the ONFI/JEDEC parameter page or from a static
>> >> > > > table).
>> >> > > >
>> >> > >
>> >> > > The main problem is that the same kind of boards can have different
>> >> > > NAND parts.
>> >> > >
>> >> > > Lets assume, we have following 2 cases.
>> >> > >
>> >> > > 1. Non ONFI/JEDEC device for which chip->ecc_strength_ds
>> >> > > will be zero. In this case, the ecc->strength from DT
>> >> > > will be used
>> >> >
>> >> > No, the strength from DT will always be used if the property is
>> >> > present, no matter the type of chip.
>> >> >
>> >> > > 2. ONFI/JEDEC device for which chip->ecc_strength_ds > 8.
>> >> > > Since QCOM nand controller can not support
>> >> > > ECC correction greater than 8 bits so we can use 8 bit ECC
>> >> > > itself instead of failing NAND boot completely.
>> >> >
>> >> > I understand that. But this is still not what you do.
>> >> >
>> >> > >
>> >> > > > IMHO, you have two solutions:
>> >> > > > 1/ Remove these properties from the board DT (breaks DT backward
>> >> > > > compatibility though);
>> >> > >
>> >> > > - nand-ecc-strength: This is optional property in nand.txt and
>> >> > > Required property in qcom_nandc.txt.
>> >> >
>> >> > Well, this property is not controller specific but chip specific. The
>> >> > controller driver does not rely on it, so this property should not be
>> >> > required.
>> >> >
>> >> > > We can't remove since
>> >> > > if the device is Non ONFI/JEDEC, then ecc strength will come
>> >> > > from DT only.
>> >> >
>> >> > We can remove it and let the core handle this (as this is generic to
>> >> > all raw NANDs and not specific to this controller driver). Try it out!
>>
>> Thanks Boris and Miquel for your inputs.
>>
>> Just want to confirm if already its implemented in core layer
>> or shall I explore regrading this option.
>>
>> I checked by removing this property alone from dtsi and it was
>> failing with
>>
>> "Driver must set ecc.strength when using hardware ECC"
>>
>> I checked the code in nand_base.c also but couldn't get
>> anything related with this.
>
> I don't know exactly what you did but you should have a look at what
> lead you to this path:
> https://elixir.bootlin.com/linux/v4.17-rc1/source/drivers/mtd/nand/raw/nand_base.c#L6421
>

Our driver supports both ECC strength 4 bits and 8 bits
and normally till now, we need to specify the ecc strength in device
tree.

Now, since same board can have different ECC strength chip so we
can't fix the ecc strength in device tree and we need to look
the required correction in ONFI param.

We can have some code in generic layer which

1. Provides the way to specify the supported strength in DT by NAND
controller (for our case, it is 4 and 8)
2. Read the chip ONFI/JEDEC Param and choose the configure to use
controller strength according to its requirement.
3. For Non ONFI/JEDEC devices, choose the maximum strength according
to OOB bytes.

I just want to check if we have something like this already in place
or I can add the same in generic code so that this can be used by
other drivers also.

Thanks,
Abhishek

2018-04-23 07:01:01

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH 8/9] mtd: nand: qcom: helper function for raw read

Hi Abhishek,

On Mon, 23 Apr 2018 11:58:42 +0530, Abhishek Sahu
<[email protected]> wrote:

> On 2018-04-22 21:49, Miquel Raynal wrote:
> > Hi Abhishek,
> > > On Thu, 12 Apr 2018 12:36:42 +0530, Abhishek Sahu
> > <[email protected]> wrote:
> > >> On 2018-04-10 15:14, Miquel Raynal wrote:
> >> > Hi Abhishek,
> >> > > On Wed, 4 Apr 2018 18:12:24 +0530, Abhishek Sahu
> >> > <[email protected]> wrote:
> >> > >> This patch does minor code reorganization for raw reads.
> >> >> Currently the raw read is required for complete page but for
> >> >> subsequent patches related with erased codeword bit flips
> >> >> detection, only few CW should be read. So, this patch adds
> >> >> helper function and introduces the read CW bitmask which
> >> >> specifies which CW reads are required in complete page.
> >> > > I am not sure this is the right approach for subpage reads. If the
> >> > controller supports it, you should just enable it in chip->options.
> >> >
> >> Thanks Miquel.
> >> >> It is internal to this file only. This patch makes one static helper
> >> function which provides the support to read subpages.
> > > Drivers should expose raw helpers, why keeping this helper static then?
> > >> >> >> >> Signed-off-by: Abhishek Sahu <[email protected]>
> >> >> ---
> >> >> drivers/mtd/nand/qcom_nandc.c | 186 >> +++++++++++++++++++++++++-----------------
> >> >> 1 file changed, 110 insertions(+), 76 deletions(-)
> >> >> >> diff --git a/drivers/mtd/nand/qcom_nandc.c >> b/drivers/mtd/nand/qcom_nandc.c
> >> >> index 40c790e..f5d1fa4 100644
> >> >> --- a/drivers/mtd/nand/qcom_nandc.c
> >> >> +++ b/drivers/mtd/nand/qcom_nandc.c
> >> >> @@ -1590,6 +1590,114 @@ static int check_flash_errors(struct >> qcom_nand_host *host, int cw_cnt)
> >> >> }
> >> >> >> /*
> >> >> + * Helper to perform the page raw read operation. The read_cw_mask >> will be
> >> >> + * used to specify the codewords for which the data should be read. >> The
> >> >> + * single page contains multiple CW. Sometime, only few CW data is >> required
> >> >> + * in complete page. Also, start address will be determined with
> >> >> + * this CW mask to skip unnecessary data copy from NAND flash device. >> Then,
> >> >> + * actual data copy from NAND controller to data buffer will be done >> only
> >> >> + * for the CWs which have the mask set.
> >> >> + */
> >> >> +static int
> >> >> +nandc_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip,
> >> >> + u8 *data_buf, u8 *oob_buf,
> >> >> + int page, unsigned long read_cw_mask)
> >> >> +{
> >> >> + struct qcom_nand_host *host = to_qcom_nand_host(chip);
> >> >> + struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
> >> >> + struct nand_ecc_ctrl *ecc = &chip->ecc;
> >> >> + int i, ret;
> >> >> + int read_loc, start_step, last_step;
> >> >> +
> >> >> + nand_read_page_op(chip, page, 0, NULL, 0);
> >> >> +
> >> >> + host->use_ecc = false;
> >> >> + start_step = ffs(read_cw_mask) - 1;
> >> >> + last_step = fls(read_cw_mask);
> >> >> +
> >> >> + clear_bam_transaction(nandc);
> >> >> + set_address(host, host->cw_size * start_step, page);
> >> >> + update_rw_regs(host, last_step - start_step, true);
> >> >> + config_nand_page_read(nandc);
> >> >> +
> >> >> + for (i = start_step; i < last_step; i++) {
> >> >> + int data_size1, data_size2, oob_size1, oob_size2;
> >> >> + int reg_off = FLASH_BUF_ACC;
> >> >> +
> >> >> + data_size1 = mtd->writesize - host->cw_size * (ecc->steps - 1);
> >> >> + oob_size1 = host->bbm_size;
> >> >> +
> >> >> + if (i == (ecc->steps - 1)) {
> >> >> + data_size2 = ecc->size - data_size1 -
> >> >> + ((ecc->steps - 1) << 2);
> >> >> + oob_size2 = (ecc->steps << 2) + host->ecc_bytes_hw +
> >> >> + host->spare_bytes;
> >> >> + } else {
> >> >> + data_size2 = host->cw_data - data_size1;
> >> >> + oob_size2 = host->ecc_bytes_hw + host->spare_bytes;
> >> >> + }
> >> >> +
> >> >> + /*
> >> >> + * Don't perform actual data copy from NAND controller to data
> >> >> + * buffer through DMA for this codeword
> >> >> + */
> >> >> + if (!(read_cw_mask & BIT(i))) {
> >> >> + if (nandc->props->is_bam)
> >> >> + nandc_set_read_loc(nandc, 0, 0, 0, 1);
> >> >> +
> >> >> + config_nand_cw_read(nandc, false);
> >> >> +
> >> >> + data_buf += data_size1 + data_size2;
> >> >> + oob_buf += oob_size1 + oob_size2;
> >> >> +
> >> >> + continue;
> >> >> + }
> >> >> +
> >> >> + if (nandc->props->is_bam) {
> >> >> + read_loc = 0;
> >> >> + nandc_set_read_loc(nandc, 0, read_loc, data_size1, 0);
> >> >> + read_loc += data_size1;
> >> >> +
> >> >> + nandc_set_read_loc(nandc, 1, read_loc, oob_size1, 0);
> >> >> + read_loc += oob_size1;
> >> >> +
> >> >> + nandc_set_read_loc(nandc, 2, read_loc, data_size2, 0);
> >> >> + read_loc += data_size2;
> >> >> +
> >> >> + nandc_set_read_loc(nandc, 3, read_loc, oob_size2, 1);
> >> >> + }
> >> >> +
> >> >> + config_nand_cw_read(nandc, false);
> >> >> +
> >> >> + read_data_dma(nandc, reg_off, data_buf, data_size1, 0);
> >> >> + reg_off += data_size1;
> >> >> + data_buf += data_size1;
> >> >> +
> >> >> + read_data_dma(nandc, reg_off, oob_buf, oob_size1, 0);
> >> >> + reg_off += oob_size1;
> >> >> + oob_buf += oob_size1;
> >> >> +
> >> >> + read_data_dma(nandc, reg_off, data_buf, data_size2, 0);
> >> >> + reg_off += data_size2;
> >> >> + data_buf += data_size2;
> >> >> +
> >> >> + read_data_dma(nandc, reg_off, oob_buf, oob_size2, 0);
> >> >> + oob_buf += oob_size2;
> >> >> + }
> >> >> +
> >> >> + ret = submit_descs(nandc);
> >> >> + if (ret)
> >> >> + dev_err(nandc->dev, "failure to read raw page\n");
> >> >> +
> >> >> + free_descs(nandc);
> >> >> +
> >> >> + if (!ret)
> >> >> + ret = check_flash_errors(host, last_step - start_step);
> >> >> +
> >> >> + return 0;
> >> >> +}
> >> >> +
> >> >> +/*
> >> >> * reads back status registers set by the controller to notify page >> read
> >> >> * errors. this is equivalent to what 'ecc->correct()' would do.
> >> >> */
> >> >> @@ -1839,82 +1947,8 @@ static int qcom_nandc_read_page_raw(struct >> mtd_info *mtd,
> >> >> struct nand_chip *chip, uint8_t *buf,
> >> >> int oob_required, int page)
> >> >> {
> >> >> - struct qcom_nand_host *host = to_qcom_nand_host(chip);
> >> >> - struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
> >> >> - u8 *data_buf, *oob_buf;
> >> >> - struct nand_ecc_ctrl *ecc = &chip->ecc;
> >> >> - int i, ret;
> >> >> - int read_loc;
> >> >> -
> >> >> - nand_read_page_op(chip, page, 0, NULL, 0);
> >> >> - data_buf = buf;
> >> >> - oob_buf = chip->oob_poi;
> >> >> -
> >> >> - host->use_ecc = false;
> >> >> -
> >> >> - clear_bam_transaction(nandc);
> >> >> - update_rw_regs(host, ecc->steps, true);
> >> >> - config_nand_page_read(nandc);
> >> >> -
> >> >> - for (i = 0; i < ecc->steps; i++) {
> >> >> - int data_size1, data_size2, oob_size1, oob_size2;
> >> >> - int reg_off = FLASH_BUF_ACC;
> >> >> -
> >> >> - data_size1 = mtd->writesize - host->cw_size * (ecc->steps - 1);
> >> >> - oob_size1 = host->bbm_size;
> >> >> -
> >> >> - if (i == (ecc->steps - 1)) {
> >> >> - data_size2 = ecc->size - data_size1 -
> >> >> - ((ecc->steps - 1) << 2);
> >> >> - oob_size2 = (ecc->steps << 2) + host->ecc_bytes_hw +
> >> >> - host->spare_bytes;
> >> >> - } else {
> >> >> - data_size2 = host->cw_data - data_size1;
> >> >> - oob_size2 = host->ecc_bytes_hw + host->spare_bytes;
> >> >> - }
> >> >> -
> >> >> - if (nandc->props->is_bam) {
> >> >> - read_loc = 0;
> >> >> - nandc_set_read_loc(nandc, 0, read_loc, data_size1, 0);
> >> >> - read_loc += data_size1;
> >> >> -
> >> >> - nandc_set_read_loc(nandc, 1, read_loc, oob_size1, 0);
> >> >> - read_loc += oob_size1;
> >> >> -
> >> >> - nandc_set_read_loc(nandc, 2, read_loc, data_size2, 0);
> >> >> - read_loc += data_size2;
> >> >> -
> >> >> - nandc_set_read_loc(nandc, 3, read_loc, oob_size2, 1);
> >> >> - }
> >> >> -
> >> >> - config_nand_cw_read(nandc, false);
> >> >> -
> >> >> - read_data_dma(nandc, reg_off, data_buf, data_size1, 0);
> >> >> - reg_off += data_size1;
> >> >> - data_buf += data_size1;
> >> >> -
> >> >> - read_data_dma(nandc, reg_off, oob_buf, oob_size1, 0);
> >> >> - reg_off += oob_size1;
> >> >> - oob_buf += oob_size1;
> >> >> -
> >> >> - read_data_dma(nandc, reg_off, data_buf, data_size2, 0);
> >> >> - reg_off += data_size2;
> >> >> - data_buf += data_size2;
> >> >> -
> >> >> - read_data_dma(nandc, reg_off, oob_buf, oob_size2, 0);
> >> >> - oob_buf += oob_size2;
> >> >> - }
> >> >> -
> >> >> - ret = submit_descs(nandc);
> >> >> - if (ret)
> >> >> - dev_err(nandc->dev, "failure to read raw page\n");
> >> >> -
> >> >> - free_descs(nandc);
> >> >> -
> >> >> - if (!ret)
> >> >> - ret = check_flash_errors(host, ecc->steps);
> >> >> -
> >> >> - return 0;
> >> >> + return nandc_read_page_raw(mtd, chip, buf, chip->oob_poi, page,
> >> >> + BIT(chip->ecc.steps) - 1);
> >> > > I don't understand this. chip->ecc.steps is constant, right? So you
> >> > always ask for the same subpage?
> >> >> We need to do raw read for subpages in which we got uncorrectable
> >> error in next patch for erased page bitflip detection. This patch >> does
> >> reorganization of raw read and moves common code in helper function
> >> nandc_read_page_raw.
> >> >> For nomral raw read, all the subpages will be read so
> >> BIT(chip->ecc.steps) - 1 is used for qcom_nandc_read_page_raw.
> >> >> While for erased page raw read, only those sub pages will be
> >> read for which the controller gives the uncorrectable error.
> > > Still not okay: the driver should expose a way to do raw reads no
> > matter the length and the start and you should use that in a generic
> > way.
> >
> Thanks Miquel.
> I will explore regarding that.
> Looks like, we can some helper like read_subpage.

Of course, when you implement raw accessors you can have static helpers
to clarify the code.

Regards,
Miquèl


2018-04-25 06:33:58

by Abhishek Sahu

[permalink] [raw]
Subject: Re: [PATCH 8/9] mtd: nand: qcom: helper function for raw read

On 2018-04-23 12:28, Miquel Raynal wrote:
> Hi Abhishek,
>
> On Mon, 23 Apr 2018 11:58:42 +0530, Abhishek Sahu
> <[email protected]> wrote:
>
>> On 2018-04-22 21:49, Miquel Raynal wrote:
>> > Hi Abhishek,
>> > > On Thu, 12 Apr 2018 12:36:42 +0530, Abhishek Sahu
>> > <[email protected]> wrote:
>> > >> On 2018-04-10 15:14, Miquel Raynal wrote:
>> >> > Hi Abhishek,
>> >> > > On Wed, 4 Apr 2018 18:12:24 +0530, Abhishek Sahu
>> >> > <[email protected]> wrote:
>> >> > >> This patch does minor code reorganization for raw reads.
>> >> >> Currently the raw read is required for complete page but for
>> >> >> subsequent patches related with erased codeword bit flips
>> >> >> detection, only few CW should be read. So, this patch adds
>> >> >> helper function and introduces the read CW bitmask which
>> >> >> specifies which CW reads are required in complete page.
>> >> > > I am not sure this is the right approach for subpage reads. If the
>> >> > controller supports it, you should just enable it in chip->options.
>> >> >
>> >> Thanks Miquel.
>> >> >> It is internal to this file only. This patch makes one static helper
>> >> function which provides the support to read subpages.
>> > > Drivers should expose raw helpers, why keeping this helper static then?
>> > >> >> >> >> Signed-off-by: Abhishek Sahu <[email protected]>
>> >> >> ---
>> >> >> drivers/mtd/nand/qcom_nandc.c | 186 >> +++++++++++++++++++++++++-----------------
>> >> >> 1 file changed, 110 insertions(+), 76 deletions(-)
>> >> >> >> diff --git a/drivers/mtd/nand/qcom_nandc.c >> b/drivers/mtd/nand/qcom_nandc.c
>> >> >> index 40c790e..f5d1fa4 100644
>> >> >> --- a/drivers/mtd/nand/qcom_nandc.c
>> >> >> +++ b/drivers/mtd/nand/qcom_nandc.c
>> >> >> @@ -1590,6 +1590,114 @@ static int check_flash_errors(struct >> qcom_nand_host *host, int cw_cnt)
>> >> >> }
>> >> >> >> /*
>> >> >> + * Helper to perform the page raw read operation. The read_cw_mask >> will be
>> >> >> + * used to specify the codewords for which the data should be read. >> The
>> >> >> + * single page contains multiple CW. Sometime, only few CW data is >> required
>> >> >> + * in complete page. Also, start address will be determined with
>> >> >> + * this CW mask to skip unnecessary data copy from NAND flash device. >> Then,
>> >> >> + * actual data copy from NAND controller to data buffer will be done >> only
>> >> >> + * for the CWs which have the mask set.
>> >> >> + */
>> >> >> +static int
>> >> >> +nandc_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip,
>> >> >> + u8 *data_buf, u8 *oob_buf,
>> >> >> + int page, unsigned long read_cw_mask)
>> >> >> +{

<snip>

>> >> >> - free_descs(nandc);
>> >> >> -
>> >> >> - if (!ret)
>> >> >> - ret = check_flash_errors(host, ecc->steps);
>> >> >> -
>> >> >> - return 0;
>> >> >> + return nandc_read_page_raw(mtd, chip, buf, chip->oob_poi, page,
>> >> >> + BIT(chip->ecc.steps) - 1);
>> >> > > I don't understand this. chip->ecc.steps is constant, right? So you
>> >> > always ask for the same subpage?
>> >> >> We need to do raw read for subpages in which we got uncorrectable
>> >> error in next patch for erased page bitflip detection. This patch >> does
>> >> reorganization of raw read and moves common code in helper function
>> >> nandc_read_page_raw.
>> >> >> For nomral raw read, all the subpages will be read so
>> >> BIT(chip->ecc.steps) - 1 is used for qcom_nandc_read_page_raw.
>> >> >> While for erased page raw read, only those sub pages will be
>> >> read for which the controller gives the uncorrectable error.
>> > > Still not okay: the driver should expose a way to do raw reads no
>> > matter the length and the start and you should use that in a generic
>> > way.
>> >
>> Thanks Miquel.
>> I will explore regarding that.
>> Looks like, we can some helper like read_subpage.
>
> Of course, when you implement raw accessors you can have static helpers
> to clarify the code.
>

Hi Miquel,

I checked regarding generic function.

Normally the other NAND controller stores the data in main area
and ecc bytes in oob area. So if page size is 2048+64 then 2048
data bytes will go in main area followed by ECC bytes

Main | OOB
------------------------------------
D1 D2...........D2048|E1..........E64

The QCOM nand contoller follows different layout in which this
2048+64 is internally divided in 528 bytes codeword so it will
come to 4 codewords. Each codeword contains 516 bytes followed
by ECC parity bytes. Also in at 0x1f0 offset in each CW, the bad
block marker is stored. so each CW contains badblock marker also.
The avilable OOB bytes are 16 which is also protected by ECC.

516 * 4 = 2048 data bytes + 16 available oob bytes.

So for QCOM layout, it will be something lile


0 D1.........D495 B1 D496....D516 E1...E7 U1..U4
0x210 D517......D1012 B2 D1016..D1032 E8...E14 U5..U8
0x420 D1033.....D1528 B3 D1529..D1548 E15..E21 U9..U12
0x630 D1549.....D2044 B4 D2045..D2048 O1...O16 E22 E28 U13..U16

Where D - Data bytes
B - BBM bytes
E - ECC parity bytes
U - Unused bytes
O - Avilable OOB bytes

Now, the raw read expect data to be returned in buffer in the
form of

D1.....D2048 B1 E1...E7 U1..U4 B2 E8...E14 U5..U8 B3 E15..E21
U9..U12 B4 O1...O16 E22 E28 U13..U16

Now, the puporose of that helper function is read selected
codewords. This codeword is QCOM specific and other controller
normally doesn't follow this kind of layout.

Now If we make generic fuction for subpgage raw read with any
start and length then that function we can't use for erased page
bitflip detection.

If we give the start as 0 with length 528 then our requirement is
to get 516 data bytes in data buffer and 12 bytes in OOB buffer
but according to standard operations, it will be interpreted as
all data bytes from D1..D528

so that helper function is to read selected Codewords from
QCOM NAND controller. This Codeword is different from subpages

Regarding more info for layout, you can refer comment in

https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/qcom_nandc.c#L2191

Thanks,
Abhishek

2018-04-25 13:02:59

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH 8/9] mtd: nand: qcom: helper function for raw read

Hi Abhishek,

On Wed, 25 Apr 2018 12:02:29 +0530, Abhishek Sahu
<[email protected]> wrote:

> On 2018-04-23 12:28, Miquel Raynal wrote:
> > Hi Abhishek,
> > > On Mon, 23 Apr 2018 11:58:42 +0530, Abhishek Sahu
> > <[email protected]> wrote:
> > >> On 2018-04-22 21:49, Miquel Raynal wrote:
> >> > Hi Abhishek,
> >> > > On Thu, 12 Apr 2018 12:36:42 +0530, Abhishek Sahu
> >> > <[email protected]> wrote:
> >> > >> On 2018-04-10 15:14, Miquel Raynal wrote:
> >> >> > Hi Abhishek,
> >> >> > > On Wed, 4 Apr 2018 18:12:24 +0530, Abhishek Sahu
> >> >> > <[email protected]> wrote:
> >> >> > >> This patch does minor code reorganization for raw reads.
> >> >> >> Currently the raw read is required for complete page but for
> >> >> >> subsequent patches related with erased codeword bit flips
> >> >> >> detection, only few CW should be read. So, this patch adds
> >> >> >> helper function and introduces the read CW bitmask which
> >> >> >> specifies which CW reads are required in complete page.
> >> >> > > I am not sure this is the right approach for subpage reads. If the
> >> >> > controller supports it, you should just enable it in chip->options.
> >> >> >
> >> >> Thanks Miquel.
> >> >> >> It is internal to this file only. This patch makes one static helper
> >> >> function which provides the support to read subpages.
> >> > > Drivers should expose raw helpers, why keeping this helper static then?
> >> > >> >> >> >> Signed-off-by: Abhishek Sahu <[email protected]>
> >> >> >> ---
> >> >> >> drivers/mtd/nand/qcom_nandc.c | 186 >> +++++++++++++++++++++++++-----------------
> >> >> >> 1 file changed, 110 insertions(+), 76 deletions(-)
> >> >> >> >> diff --git a/drivers/mtd/nand/qcom_nandc.c >> b/drivers/mtd/nand/qcom_nandc.c
> >> >> >> index 40c790e..f5d1fa4 100644
> >> >> >> --- a/drivers/mtd/nand/qcom_nandc.c
> >> >> >> +++ b/drivers/mtd/nand/qcom_nandc.c
> >> >> >> @@ -1590,6 +1590,114 @@ static int check_flash_errors(struct >> qcom_nand_host *host, int cw_cnt)
> >> >> >> }
> >> >> >> >> /*
> >> >> >> + * Helper to perform the page raw read operation. The read_cw_mask >> will be
> >> >> >> + * used to specify the codewords for which the data should be read. >> The
> >> >> >> + * single page contains multiple CW. Sometime, only few CW data is >> required
> >> >> >> + * in complete page. Also, start address will be determined with
> >> >> >> + * this CW mask to skip unnecessary data copy from NAND flash device. >> Then,
> >> >> >> + * actual data copy from NAND controller to data buffer will be done >> only
> >> >> >> + * for the CWs which have the mask set.
> >> >> >> + */
> >> >> >> +static int
> >> >> >> +nandc_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip,
> >> >> >> + u8 *data_buf, u8 *oob_buf,
> >> >> >> + int page, unsigned long read_cw_mask)
> >> >> >> +{
>
> <snip>
>
> >> >> >> - free_descs(nandc);
> >> >> >> -
> >> >> >> - if (!ret)
> >> >> >> - ret = check_flash_errors(host, ecc->steps);
> >> >> >> -
> >> >> >> - return 0;
> >> >> >> + return nandc_read_page_raw(mtd, chip, buf, chip->oob_poi, page,
> >> >> >> + BIT(chip->ecc.steps) - 1);
> >> >> > > I don't understand this. chip->ecc.steps is constant, right? So you
> >> >> > always ask for the same subpage?
> >> >> >> We need to do raw read for subpages in which we got uncorrectable
> >> >> error in next patch for erased page bitflip detection. This patch >> does
> >> >> reorganization of raw read and moves common code in helper function
> >> >> nandc_read_page_raw.
> >> >> >> For nomral raw read, all the subpages will be read so
> >> >> BIT(chip->ecc.steps) - 1 is used for qcom_nandc_read_page_raw.
> >> >> >> While for erased page raw read, only those sub pages will be
> >> >> read for which the controller gives the uncorrectable error.
> >> > > Still not okay: the driver should expose a way to do raw reads no
> >> > matter the length and the start and you should use that in a generic
> >> > way.
> >> >
> >> Thanks Miquel.
> >> I will explore regarding that.
> >> Looks like, we can some helper like read_subpage.
> > > Of course, when you implement raw accessors you can have static helpers
> > to clarify the code.
> >
> Hi Miquel,
>
> I checked regarding generic function.
>
> Normally the other NAND controller stores the data in main area
> and ecc bytes in oob area. So if page size is 2048+64 then 2048
> data bytes will go in main area followed by ECC bytes
>
> Main | OOB
> ------------------------------------
> D1 D2...........D2048|E1..........E64
>
> The QCOM nand contoller follows different layout in which this
> 2048+64 is internally divided in 528 bytes codeword so it will
> come to 4 codewords. Each codeword contains 516 bytes followed
> by ECC parity bytes. Also in at 0x1f0 offset in each CW, the bad
> block marker is stored. so each CW contains badblock marker also.
> The avilable OOB bytes are 16 which is also protected by ECC.
>
> 516 * 4 = 2048 data bytes + 16 available oob bytes.
>
> So for QCOM layout, it will be something lile
>
>
> 0 D1.........D495 B1 D496....D516 E1...E7 U1..U4
> 0x210 D517......D1012 B2 D1016..D1032 E8...E14 U5..U8
> 0x420 D1033.....D1528 B3 D1529..D1548 E15..E21 U9..U12
> 0x630 D1549.....D2044 B4 D2045..D2048 O1...O16 E22 E28 U13..U16
>
> Where D - Data bytes
> B - BBM bytes
> E - ECC parity bytes
> U - Unused bytes
> O - Avilable OOB bytes
>
> Now, the raw read expect data to be returned in buffer in the
> form of
>
> D1.....D2048 B1 E1...E7 U1..U4 B2 E8...E14 U5..U8 B3 E15..E21
> U9..U12 B4 O1...O16 E22 E28 U13..U16
>
> Now, the puporose of that helper function is read selected
> codewords. This codeword is QCOM specific and other controller
> normally doesn't follow this kind of layout.
>
> Now If we make generic fuction for subpgage raw read with any
> start and length then that function we can't use for erased page
> bitflip detection.
>
> If we give the start as 0 with length 528 then our requirement is
> to get 516 data bytes in data buffer and 12 bytes in OOB buffer
> but according to standard operations, it will be interpreted as
> all data bytes from D1..D528
>
> so that helper function is to read selected Codewords from
> QCOM NAND controller. This Codeword is different from subpages

Thanks for the clarification, I understand now. Maybe you can add a
comment about this particular layout and why you need this helper to
check for erased pages.

BTW are these BBM preserved somehow?

Regards,
Miquèl

>
> Regarding more info for layout, you can refer comment in
> https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/qcom_nandc.c#L2191
>
> Thanks,
> Abhishek



--
Miquel Raynal, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

2018-04-26 05:56:29

by Abhishek Sahu

[permalink] [raw]
Subject: Re: [PATCH 8/9] mtd: nand: qcom: helper function for raw read

On 2018-04-25 18:29, Miquel Raynal wrote:
> Hi Abhishek,
>
> On Wed, 25 Apr 2018 12:02:29 +0530, Abhishek Sahu
> <[email protected]> wrote:
>
>> On 2018-04-23 12:28, Miquel Raynal wrote:
>> > Hi Abhishek,
>> > > On Mon, 23 Apr 2018 11:58:42 +0530, Abhishek Sahu
>> > <[email protected]> wrote:
>> > >> On 2018-04-22 21:49, Miquel Raynal wrote:
>> >> > Hi Abhishek,
>> >> > > On Thu, 12 Apr 2018 12:36:42 +0530, Abhishek Sahu
>> >> > <[email protected]> wrote:
>> >> > >> On 2018-04-10 15:14, Miquel Raynal wrote:
>> >> >> > Hi Abhishek,
>> >> >> > > On Wed, 4 Apr 2018 18:12:24 +0530, Abhishek Sahu
>> >> >> > <[email protected]> wrote:
>> >> >> > >> This patch does minor code reorganization for raw reads.
>> >> >> >> Currently the raw read is required for complete page but for
>> >> >> >> subsequent patches related with erased codeword bit flips
>> >> >> >> detection, only few CW should be read. So, this patch adds
>> >> >> >> helper function and introduces the read CW bitmask which
>> >> >> >> specifies which CW reads are required in complete page.
>> >> >> > > I am not sure this is the right approach for subpage reads. If the
>> >> >> > controller supports it, you should just enable it in chip->options.
>> >> >> >
>> >> >> Thanks Miquel.
>> >> >> >> It is internal to this file only. This patch makes one static helper
>> >> >> function which provides the support to read subpages.
>> >> > > Drivers should expose raw helpers, why keeping this helper static then?
>> >> > >> >> >> >> Signed-off-by: Abhishek Sahu <[email protected]>
>> >> >> >> ---
>> >> >> >> drivers/mtd/nand/qcom_nandc.c | 186 >> +++++++++++++++++++++++++-----------------
>> >> >> >> 1 file changed, 110 insertions(+), 76 deletions(-)
>> >> >> >> >> diff --git a/drivers/mtd/nand/qcom_nandc.c >> b/drivers/mtd/nand/qcom_nandc.c
>> >> >> >> index 40c790e..f5d1fa4 100644
>> >> >> >> --- a/drivers/mtd/nand/qcom_nandc.c
>> >> >> >> +++ b/drivers/mtd/nand/qcom_nandc.c
>> >> >> >> @@ -1590,6 +1590,114 @@ static int check_flash_errors(struct >> qcom_nand_host *host, int cw_cnt)
>> >> >> >> }
>> >> >> >> >> /*
>> >> >> >> + * Helper to perform the page raw read operation. The read_cw_mask >> will be
>> >> >> >> + * used to specify the codewords for which the data should be read. >> The
>> >> >> >> + * single page contains multiple CW. Sometime, only few CW data is >> required
>> >> >> >> + * in complete page. Also, start address will be determined with
>> >> >> >> + * this CW mask to skip unnecessary data copy from NAND flash device. >> Then,
>> >> >> >> + * actual data copy from NAND controller to data buffer will be done >> only
>> >> >> >> + * for the CWs which have the mask set.
>> >> >> >> + */
>> >> >> >> +static int
>> >> >> >> +nandc_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip,
>> >> >> >> + u8 *data_buf, u8 *oob_buf,
>> >> >> >> + int page, unsigned long read_cw_mask)
>> >> >> >> +{
>>
>> <snip>
>>
>> >> >> >> - free_descs(nandc);
>> >> >> >> -
>> >> >> >> - if (!ret)
>> >> >> >> - ret = check_flash_errors(host, ecc->steps);
>> >> >> >> -
>> >> >> >> - return 0;
>> >> >> >> + return nandc_read_page_raw(mtd, chip, buf, chip->oob_poi, page,
>> >> >> >> + BIT(chip->ecc.steps) - 1);
>> >> >> > > I don't understand this. chip->ecc.steps is constant, right? So you
>> >> >> > always ask for the same subpage?
>> >> >> >> We need to do raw read for subpages in which we got uncorrectable
>> >> >> error in next patch for erased page bitflip detection. This patch >> does
>> >> >> reorganization of raw read and moves common code in helper function
>> >> >> nandc_read_page_raw.
>> >> >> >> For nomral raw read, all the subpages will be read so
>> >> >> BIT(chip->ecc.steps) - 1 is used for qcom_nandc_read_page_raw.
>> >> >> >> While for erased page raw read, only those sub pages will be
>> >> >> read for which the controller gives the uncorrectable error.
>> >> > > Still not okay: the driver should expose a way to do raw reads no
>> >> > matter the length and the start and you should use that in a generic
>> >> > way.
>> >> >
>> >> Thanks Miquel.
>> >> I will explore regarding that.
>> >> Looks like, we can some helper like read_subpage.
>> > > Of course, when you implement raw accessors you can have static helpers
>> > to clarify the code.
>> >
>> Hi Miquel,
>>
>> I checked regarding generic function.
>>
>> Normally the other NAND controller stores the data in main area
>> and ecc bytes in oob area. So if page size is 2048+64 then 2048
>> data bytes will go in main area followed by ECC bytes
>>
>> Main | OOB
>> ------------------------------------
>> D1 D2...........D2048|E1..........E64
>>
>> The QCOM nand contoller follows different layout in which this
>> 2048+64 is internally divided in 528 bytes codeword so it will
>> come to 4 codewords. Each codeword contains 516 bytes followed
>> by ECC parity bytes. Also in at 0x1f0 offset in each CW, the bad

Sorry Miquel.
Its 0x1d0 instead of 0x1f0.

>> block marker is stored. so each CW contains badblock marker also.
>> The avilable OOB bytes are 16 which is also protected by ECC.
>>
>> 516 * 4 = 2048 data bytes + 16 available oob bytes.
>>
>> So for QCOM layout, it will be something lile
>>
>>
>> 0 D1.........D495 B1 D496....D516 E1...E7 U1..U4
>> 0x210 D517......D1012 B2 D1016..D1032 E8...E14 U5..U8
>> 0x420 D1033.....D1528 B3 D1529..D1548 E15..E21 U9..U12
>> 0x630 D1549.....D2044 B4 D2045..D2048 O1...O16 E22 E28 U13..U16
>>
>> Where D - Data bytes
>> B - BBM bytes
>> E - ECC parity bytes
>> U - Unused bytes
>> O - Avilable OOB bytes
>>
>> Now, the raw read expect data to be returned in buffer in the
>> form of
>>
>> D1.....D2048 B1 E1...E7 U1..U4 B2 E8...E14 U5..U8 B3 E15..E21
>> U9..U12 B4 O1...O16 E22 E28 U13..U16
>>
>> Now, the puporose of that helper function is read selected
>> codewords. This codeword is QCOM specific and other controller
>> normally doesn't follow this kind of layout.
>>
>> Now If we make generic fuction for subpgage raw read with any
>> start and length then that function we can't use for erased page
>> bitflip detection.
>>
>> If we give the start as 0 with length 528 then our requirement is
>> to get 516 data bytes in data buffer and 12 bytes in OOB buffer
>> but according to standard operations, it will be interpreted as
>> all data bytes from D1..D528
>>
>> so that helper function is to read selected Codewords from
>> QCOM NAND controller. This Codeword is different from subpages
>
> Thanks for the clarification, I understand now. Maybe you can add a
> comment about this particular layout and why you need this helper to
> check for erased pages.
>

Sure Miquel. Will add the same in comment.

> BTW are these BBM preserved somehow?

The QCOM nand layout is such that, the bad block byte for last codeowrd
will come to first byte in spare area. If we have 2048+64 bytes device,
then it will have 4 codewords of size 528. For the last codeword, the
B4 will come at

528 * 3 + 0x1d0 = 0x800 which is first byte in OOB area.

Normally vendor make this byte as non 0xff for factory bad block.
For bad blocks created after that this byte will be marked
as non 0xff in

https://elixir.bootlin.com/linux/v4.16/source/drivers/mtd/nand/qcom_nandc.c#L2657

This bad block byte won't be touched during read/write with ECC.

Thanks,
Abhishek


2018-04-26 07:13:19

by Miquel Raynal

[permalink] [raw]
Subject: Re: [PATCH 8/9] mtd: nand: qcom: helper function for raw read

Hi Abhishek,

On Thu, 26 Apr 2018 11:23:19 +0530, Abhishek Sahu
<[email protected]> wrote:

> On 2018-04-25 18:29, Miquel Raynal wrote:
> > Hi Abhishek,
> > > On Wed, 25 Apr 2018 12:02:29 +0530, Abhishek Sahu
> > <[email protected]> wrote:
> > >> On 2018-04-23 12:28, Miquel Raynal wrote:
> >> > Hi Abhishek,
> >> > > On Mon, 23 Apr 2018 11:58:42 +0530, Abhishek Sahu
> >> > <[email protected]> wrote:
> >> > >> On 2018-04-22 21:49, Miquel Raynal wrote:
> >> >> > Hi Abhishek,
> >> >> > > On Thu, 12 Apr 2018 12:36:42 +0530, Abhishek Sahu
> >> >> > <[email protected]> wrote:
> >> >> > >> On 2018-04-10 15:14, Miquel Raynal wrote:
> >> >> >> > Hi Abhishek,
> >> >> >> > > On Wed, 4 Apr 2018 18:12:24 +0530, Abhishek Sahu
> >> >> >> > <[email protected]> wrote:
> >> >> >> > >> This patch does minor code reorganization for raw reads.
> >> >> >> >> Currently the raw read is required for complete page but for
> >> >> >> >> subsequent patches related with erased codeword bit flips
> >> >> >> >> detection, only few CW should be read. So, this patch adds
> >> >> >> >> helper function and introduces the read CW bitmask which
> >> >> >> >> specifies which CW reads are required in complete page.
> >> >> >> > > I am not sure this is the right approach for subpage reads. If the
> >> >> >> > controller supports it, you should just enable it in chip->options.
> >> >> >> >
> >> >> >> Thanks Miquel.
> >> >> >> >> It is internal to this file only. This patch makes one static helper
> >> >> >> function which provides the support to read subpages.
> >> >> > > Drivers should expose raw helpers, why keeping this helper static then?
> >> >> > >> >> >> >> Signed-off-by: Abhishek Sahu <[email protected]>
> >> >> >> >> ---
> >> >> >> >> drivers/mtd/nand/qcom_nandc.c | 186 >> +++++++++++++++++++++++++-----------------
> >> >> >> >> 1 file changed, 110 insertions(+), 76 deletions(-)
> >> >> >> >> >> diff --git a/drivers/mtd/nand/qcom_nandc.c >> b/drivers/mtd/nand/qcom_nandc.c
> >> >> >> >> index 40c790e..f5d1fa4 100644
> >> >> >> >> --- a/drivers/mtd/nand/qcom_nandc.c
> >> >> >> >> +++ b/drivers/mtd/nand/qcom_nandc.c
> >> >> >> >> @@ -1590,6 +1590,114 @@ static int check_flash_errors(struct >> qcom_nand_host *host, int cw_cnt)
> >> >> >> >> }
> >> >> >> >> >> /*
> >> >> >> >> + * Helper to perform the page raw read operation. The read_cw_mask >> will be
> >> >> >> >> + * used to specify the codewords for which the data should be read. >> The
> >> >> >> >> + * single page contains multiple CW. Sometime, only few CW data is >> required
> >> >> >> >> + * in complete page. Also, start address will be determined with
> >> >> >> >> + * this CW mask to skip unnecessary data copy from NAND flash device. >> Then,
> >> >> >> >> + * actual data copy from NAND controller to data buffer will be done >> only
> >> >> >> >> + * for the CWs which have the mask set.
> >> >> >> >> + */
> >> >> >> >> +static int
> >> >> >> >> +nandc_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip,
> >> >> >> >> + u8 *data_buf, u8 *oob_buf,
> >> >> >> >> + int page, unsigned long read_cw_mask)
> >> >> >> >> +{
> >> >> <snip>
> >> >> >> >> >> - free_descs(nandc);
> >> >> >> >> -
> >> >> >> >> - if (!ret)
> >> >> >> >> - ret = check_flash_errors(host, ecc->steps);
> >> >> >> >> -
> >> >> >> >> - return 0;
> >> >> >> >> + return nandc_read_page_raw(mtd, chip, buf, chip->oob_poi, page,
> >> >> >> >> + BIT(chip->ecc.steps) - 1);
> >> >> >> > > I don't understand this. chip->ecc.steps is constant, right? So you
> >> >> >> > always ask for the same subpage?
> >> >> >> >> We need to do raw read for subpages in which we got uncorrectable
> >> >> >> error in next patch for erased page bitflip detection. This patch >> does
> >> >> >> reorganization of raw read and moves common code in helper function
> >> >> >> nandc_read_page_raw.
> >> >> >> >> For nomral raw read, all the subpages will be read so
> >> >> >> BIT(chip->ecc.steps) - 1 is used for qcom_nandc_read_page_raw.
> >> >> >> >> While for erased page raw read, only those sub pages will be
> >> >> >> read for which the controller gives the uncorrectable error.
> >> >> > > Still not okay: the driver should expose a way to do raw reads no
> >> >> > matter the length and the start and you should use that in a generic
> >> >> > way.
> >> >> >
> >> >> Thanks Miquel.
> >> >> I will explore regarding that.
> >> >> Looks like, we can some helper like read_subpage.
> >> > > Of course, when you implement raw accessors you can have static helpers
> >> > to clarify the code.
> >> >
> >> Hi Miquel,
> >> >> I checked regarding generic function.
> >> >> Normally the other NAND controller stores the data in main area
> >> and ecc bytes in oob area. So if page size is 2048+64 then 2048
> >> data bytes will go in main area followed by ECC bytes
> >> >> Main | OOB
> >> ------------------------------------
> >> D1 D2...........D2048|E1..........E64
> >> >> The QCOM nand contoller follows different layout in which this
> >> 2048+64 is internally divided in 528 bytes codeword so it will
> >> come to 4 codewords. Each codeword contains 516 bytes followed
> >> by ECC parity bytes. Also in at 0x1f0 offset in each CW, the bad
>
> Sorry Miquel.
> Its 0x1d0 instead of 0x1f0.
>
> >> block marker is stored. so each CW contains badblock marker also.
> >> The avilable OOB bytes are 16 which is also protected by ECC.
> >> >> 516 * 4 = 2048 data bytes + 16 available oob bytes.
> >> >> So for QCOM layout, it will be something lile
> >> >> >> 0 D1.........D495 B1 D496....D516 E1...E7 U1..U4
> >> 0x210 D517......D1012 B2 D1016..D1032 E8...E14 U5..U8
> >> 0x420 D1033.....D1528 B3 D1529..D1548 E15..E21 U9..U12
> >> 0x630 D1549.....D2044 B4 D2045..D2048 O1...O16 E22 E28 U13..U16
> >> >> Where D - Data bytes
> >> B - BBM bytes
> >> E - ECC parity bytes
> >> U - Unused bytes
> >> O - Avilable OOB bytes
> >> >> Now, the raw read expect data to be returned in buffer in the
> >> form of
> >> >> D1.....D2048 B1 E1...E7 U1..U4 B2 E8...E14 U5..U8 B3 E15..E21
> >> U9..U12 B4 O1...O16 E22 E28 U13..U16
> >> >> Now, the puporose of that helper function is read selected
> >> codewords. This codeword is QCOM specific and other controller
> >> normally doesn't follow this kind of layout.
> >> >> Now If we make generic fuction for subpgage raw read with any
> >> start and length then that function we can't use for erased page
> >> bitflip detection.
> >> >> If we give the start as 0 with length 528 then our requirement is
> >> to get 516 data bytes in data buffer and 12 bytes in OOB buffer
> >> but according to standard operations, it will be interpreted as
> >> all data bytes from D1..D528
> >> >> so that helper function is to read selected Codewords from
> >> QCOM NAND controller. This Codeword is different from subpages
> > > Thanks for the clarification, I understand now. Maybe you can add a
> > comment about this particular layout and why you need this helper to
> > check for erased pages.
> >
> Sure Miquel. Will add the same in comment.
>
> > BTW are these BBM preserved somehow?
>
> The QCOM nand layout is such that, the bad block byte for last codeowrd
> will come to first byte in spare area. If we have 2048+64 bytes device,
> then it will have 4 codewords of size 528. For the last codeword, the
> B4 will come at
>
> 528 * 3 + 0x1d0 = 0x800 which is first byte in OOB area.

Great, so maybe that's why the kernel don't mess up with bad blocks. As
long as this byte is !0xff it should not be touched.

For good pages however, I guess you just don't care about the fact that
three bytes are supposedly "markers" and use them as data? They won't
be checked by the kernel anyway.

>
> Normally vendor make this byte as non 0xff for factory bad block.
> For bad blocks created after that this byte will be marked
> as non 0xff in
> https://elixir.bootlin.com/linux/v4.16/source/drivers/mtd/nand/qcom_nandc.c#L2657
>
> This bad block byte won't be touched during read/write with ECC.
>
> Thanks,
> Abhishek
>

Thanks for the clarification,
Miquèl

--
Miquel Raynal, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

2018-04-26 07:44:29

by Abhishek Sahu

[permalink] [raw]
Subject: Re: [PATCH 8/9] mtd: nand: qcom: helper function for raw read

On 2018-04-26 12:41, Miquel Raynal wrote:
> Hi Abhishek,
>
> On Thu, 26 Apr 2018 11:23:19 +0530, Abhishek Sahu
> <[email protected]> wrote:
>
>> On 2018-04-25 18:29, Miquel Raynal wrote:
>> > Hi Abhishek,
>> > > On Wed, 25 Apr 2018 12:02:29 +0530, Abhishek Sahu
>> > <[email protected]> wrote:
>> > >> On 2018-04-23 12:28, Miquel Raynal wrote:
>> >> > Hi Abhishek,
>> >> > > On Mon, 23 Apr 2018 11:58:42 +0530, Abhishek Sahu
>> >> > <[email protected]> wrote:
>> >> > >> On 2018-04-22 21:49, Miquel Raynal wrote:
>> >> >> > Hi Abhishek,
>> >> >> > > On Thu, 12 Apr 2018 12:36:42 +0530, Abhishek Sahu
>> >> >> > <[email protected]> wrote:
>> >> >> > >> On 2018-04-10 15:14, Miquel Raynal wrote:
>> >> >> >> > Hi Abhishek,
>> >> >> >> > > On Wed, 4 Apr 2018 18:12:24 +0530, Abhishek Sahu
>> >> >> >> > <[email protected]> wrote:
>> >> >> >> > >> This patch does minor code reorganization for raw reads.
>> >> >> >> >> Currently the raw read is required for complete page but for
>> >> >> >> >> subsequent patches related with erased codeword bit flips
>> >> >> >> >> detection, only few CW should be read. So, this patch adds
>> >> >> >> >> helper function and introduces the read CW bitmask which
>> >> >> >> >> specifies which CW reads are required in complete page.
>> >> >> >> > > I am not sure this is the right approach for subpage reads. If the
>> >> >> >> > controller supports it, you should just enable it in chip->options.
>> >> >> >> >
>> >> >> >> Thanks Miquel.
>> >> >> >> >> It is internal to this file only. This patch makes one static helper
>> >> >> >> function which provides the support to read subpages.
>> >> >> > > Drivers should expose raw helpers, why keeping this helper static then?
>> >> >> > >> >> >> >> Signed-off-by: Abhishek Sahu <[email protected]>
>> >> >> >> >> ---
>> >> >> >> >> drivers/mtd/nand/qcom_nandc.c | 186 >> +++++++++++++++++++++++++-----------------
>> >> >> >> >> 1 file changed, 110 insertions(+), 76 deletions(-)
>> >> >> >> >> >> diff --git a/drivers/mtd/nand/qcom_nandc.c >> b/drivers/mtd/nand/qcom_nandc.c
>> >> >> >> >> index 40c790e..f5d1fa4 100644
>> >> >> >> >> --- a/drivers/mtd/nand/qcom_nandc.c
>> >> >> >> >> +++ b/drivers/mtd/nand/qcom_nandc.c
>> >> >> >> >> @@ -1590,6 +1590,114 @@ static int check_flash_errors(struct >> qcom_nand_host *host, int cw_cnt)
>> >> >> >> >> }
>> >> >> >> >> >> /*
>> >> >> >> >> + * Helper to perform the page raw read operation. The read_cw_mask >> will be
>> >> >> >> >> + * used to specify the codewords for which the data should be read. >> The
>> >> >> >> >> + * single page contains multiple CW. Sometime, only few CW data is >> required
>> >> >> >> >> + * in complete page. Also, start address will be determined with
>> >> >> >> >> + * this CW mask to skip unnecessary data copy from NAND flash device. >> Then,
>> >> >> >> >> + * actual data copy from NAND controller to data buffer will be done >> only
>> >> >> >> >> + * for the CWs which have the mask set.
>> >> >> >> >> + */
>> >> >> >> >> +static int
>> >> >> >> >> +nandc_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip,
>> >> >> >> >> + u8 *data_buf, u8 *oob_buf,
>> >> >> >> >> + int page, unsigned long read_cw_mask)
>> >> >> >> >> +{
>> >> >> <snip>
>> >> >> >> >> >> - free_descs(nandc);
>> >> >> >> >> -
>> >> >> >> >> - if (!ret)
>> >> >> >> >> - ret = check_flash_errors(host, ecc->steps);
>> >> >> >> >> -
>> >> >> >> >> - return 0;
>> >> >> >> >> + return nandc_read_page_raw(mtd, chip, buf, chip->oob_poi, page,
>> >> >> >> >> + BIT(chip->ecc.steps) - 1);
>> >> >> >> > > I don't understand this. chip->ecc.steps is constant, right? So you
>> >> >> >> > always ask for the same subpage?
>> >> >> >> >> We need to do raw read for subpages in which we got uncorrectable
>> >> >> >> error in next patch for erased page bitflip detection. This patch >> does
>> >> >> >> reorganization of raw read and moves common code in helper function
>> >> >> >> nandc_read_page_raw.
>> >> >> >> >> For nomral raw read, all the subpages will be read so
>> >> >> >> BIT(chip->ecc.steps) - 1 is used for qcom_nandc_read_page_raw.
>> >> >> >> >> While for erased page raw read, only those sub pages will be
>> >> >> >> read for which the controller gives the uncorrectable error.
>> >> >> > > Still not okay: the driver should expose a way to do raw reads no
>> >> >> > matter the length and the start and you should use that in a generic
>> >> >> > way.
>> >> >> >
>> >> >> Thanks Miquel.
>> >> >> I will explore regarding that.
>> >> >> Looks like, we can some helper like read_subpage.
>> >> > > Of course, when you implement raw accessors you can have static helpers
>> >> > to clarify the code.
>> >> >
>> >> Hi Miquel,
>> >> >> I checked regarding generic function.
>> >> >> Normally the other NAND controller stores the data in main area
>> >> and ecc bytes in oob area. So if page size is 2048+64 then 2048
>> >> data bytes will go in main area followed by ECC bytes
>> >> >> Main | OOB
>> >> ------------------------------------
>> >> D1 D2...........D2048|E1..........E64
>> >> >> The QCOM nand contoller follows different layout in which this
>> >> 2048+64 is internally divided in 528 bytes codeword so it will
>> >> come to 4 codewords. Each codeword contains 516 bytes followed
>> >> by ECC parity bytes. Also in at 0x1f0 offset in each CW, the bad
>>
>> Sorry Miquel.
>> Its 0x1d0 instead of 0x1f0.
>>
>> >> block marker is stored. so each CW contains badblock marker also.
>> >> The avilable OOB bytes are 16 which is also protected by ECC.
>> >> >> 516 * 4 = 2048 data bytes + 16 available oob bytes.
>> >> >> So for QCOM layout, it will be something lile
>> >> >> >> 0 D1.........D495 B1 D496....D516 E1...E7 U1..U4
>> >> 0x210 D517......D1012 B2 D1016..D1032 E8...E14 U5..U8
>> >> 0x420 D1033.....D1528 B3 D1529..D1548 E15..E21 U9..U12
>> >> 0x630 D1549.....D2044 B4 D2045..D2048 O1...O16 E22 E28 U13..U16
>> >> >> Where D - Data bytes
>> >> B - BBM bytes
>> >> E - ECC parity bytes
>> >> U - Unused bytes
>> >> O - Avilable OOB bytes
>> >> >> Now, the raw read expect data to be returned in buffer in the
>> >> form of
>> >> >> D1.....D2048 B1 E1...E7 U1..U4 B2 E8...E14 U5..U8 B3 E15..E21
>> >> U9..U12 B4 O1...O16 E22 E28 U13..U16
>> >> >> Now, the puporose of that helper function is read selected
>> >> codewords. This codeword is QCOM specific and other controller
>> >> normally doesn't follow this kind of layout.
>> >> >> Now If we make generic fuction for subpgage raw read with any
>> >> start and length then that function we can't use for erased page
>> >> bitflip detection.
>> >> >> If we give the start as 0 with length 528 then our requirement is
>> >> to get 516 data bytes in data buffer and 12 bytes in OOB buffer
>> >> but according to standard operations, it will be interpreted as
>> >> all data bytes from D1..D528
>> >> >> so that helper function is to read selected Codewords from
>> >> QCOM NAND controller. This Codeword is different from subpages
>> > > Thanks for the clarification, I understand now. Maybe you can add a
>> > comment about this particular layout and why you need this helper to
>> > check for erased pages.
>> >
>> Sure Miquel. Will add the same in comment.
>>
>> > BTW are these BBM preserved somehow?
>>
>> The QCOM nand layout is such that, the bad block byte for last
>> codeowrd
>> will come to first byte in spare area. If we have 2048+64 bytes
>> device,
>> then it will have 4 codewords of size 528. For the last codeword,
>> the
>> B4 will come at
>>
>> 528 * 3 + 0x1d0 = 0x800 which is first byte in OOB area.
>
> Great, so maybe that's why the kernel don't mess up with bad blocks. As
> long as this byte is !0xff it should not be touched.
>
> For good pages however, I guess you just don't care about the fact that
> three bytes are supposedly "markers" and use them as data? They won't
> be checked by the kernel anyway.
>

Correct Miquql.
The 3 bytes (B1, B2, B3) for first 3 codewords will be ignored.
No data will be written over that. During read/write with ECC,
NAND controller itself ignore that byte at 0x1d0 offset.

Only B4 in last codeword (first byte in spare area) will be used for
bad block marker.

Thanks,
Abhishek

>>
>> Normally vendor make this byte as non 0xff for factory bad block.
>> For bad blocks created after that this byte will be marked
>> as non 0xff in
>>
>> https://elixir.bootlin.com/linux/v4.16/source/drivers/mtd/nand/qcom_nandc.c#L2657
>>
>> This bad block byte won't be touched during read/write with ECC.
>>
>> Thanks,
>> Abhishek
>>
>
> Thanks for the clarification,
> Miquèl