2015-04-27 21:36:45

by Ben Shelton

[permalink] [raw]
Subject: Re: [PATCH 1/3] mtd: nand: Add on-die ECC support

Hi Richard,

On 03/25, Richard Weinberger wrote:
> Some Micron NAND chips offer an on-die ECC (AKA internal ECC)
> feature. It is useful when the host-platform does not offer
> multi-bit ECC and software ECC is not feasible.
>
> Based on original work by David Mosberger <[email protected]>
>
> Signed-off-by: Richard Weinberger <[email protected]>
> ---

[...]

> +
> +static int check_read_status_on_die(struct mtd_info *mtd, int page)
> +{
> + struct nand_chip *chip = mtd->priv;
> + int max_bitflips = 0;
> + uint8_t status;
> +
> + /* Check ECC status of page just transferred into NAND's page buffer */
> + chip->cmdfunc(mtd, NAND_CMD_STATUS, -1, -1);
> + status = chip->read_byte(mtd);
> +
> + /* Switch back to data reading */
> + chip->cmd_ctrl(mtd, NAND_CMD_READ0, NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE);
> + chip->cmd_ctrl(mtd, NAND_CMD_NONE, NAND_NCE | NAND_CTRL_CHANGE);

I tested this against the latest version of the PL353 NAND driver that Punnaiah
has been working to upstream (copying her on this message). With a few changes
to that driver, I got it most of the way through initialization with on-die ECC
enabled, but it segfaults here with a null pointer dereference because the
PL353 driver does not implement chip->cmd_ctrl. Instead, it implements a
custom override of cmd->cmdfunc that does not call cmd_ctrl. Looking through
the other in-tree NAND drivers, it looks like most of them do implement
cmd_ctrl, but quite a few of them do not (e.g. au1550nd, denali, docg4).

What do you think would be the best way to handle this? It seems like this gap
could be bridged from either side -- either the PL353 driver could implement
cmd_ctrl, at least as a stub version that provides the expected behavior in
this case; or the on-die framework could break this out into a callback
function with a default implementation that the driver could override to
perform this behavior in the manner of its choosing.

> +
> + if (status & NAND_STATUS_FAIL) {
> + /* Page has invalid ECC */
> + mtd->ecc_stats.failed++;
> + } else if (status & NAND_STATUS_REWRITE) {
> + /*
> + * Micron on-die ECC doesn't report the number of
> + * bitflips, so we have to count them ourself to see
> + * if the error rate is too high. This is
> + * particularly important for pages with stuck bits.
> + */
> + max_bitflips = check_for_bitflips(mtd, page);
> + }
> + return max_bitflips;
> +}
> +

[...]

> +#else /* !CONFIG_MTD_NAND_ECC_ON_DIE */
> +static inline int mtd_nand_has_on_die(void) { return 0; }
> +static inline int nand_read_subpage_on_die(struct mtd_info *mtd,
> + struct nand_chip *chip,
> + uint32_t data_offs,
> + uint32_t readlen,
> + uint8_t *bufpoi, int page)
> +{
> + BUG();

When I build this without CONFIG_MTD_NAND_ECC_ON_DIE enabled, I get the
following warning here:

In file included from drivers/mtd/nand/nand_base.c:46:0:
include/linux/mtd/nand_ondie.h: In function 'nand_read_subpage_on_die':
include/linux/mtd/nand_ondie.h:28:1: warning: no return statement in function returning non-void [-Wreturn-type]
include/linux/mtd/nand_ondie.h: In function 'nand_read_page_on_die':
include/linux/mtd/nand_ondie.h:34:1: warning: no return statement in function returning non-void [-Wreturn-type]

Perhaps return an error code here, even though you'll never get past the BUG()?

> +}
> +
> +static inline int nand_read_page_on_die(struct mtd_info *mtd, struct nand_chip *chip,
> + uint8_t *buf, int oob_required, int page)
> +{
> + BUG();

Same here.

> +}

Thanks,
Ben


2015-04-27 22:19:24

by Richard Weinberger

[permalink] [raw]
Subject: Re: [PATCH 1/3] mtd: nand: Add on-die ECC support

Am 27.04.2015 um 23:35 schrieb Ben Shelton:
> I tested this against the latest version of the PL353 NAND driver that Punnaiah
> has been working to upstream (copying her on this message). With a few changes
> to that driver, I got it most of the way through initialization with on-die ECC
> enabled, but it segfaults here with a null pointer dereference because the
> PL353 driver does not implement chip->cmd_ctrl. Instead, it implements a
> custom override of cmd->cmdfunc that does not call cmd_ctrl. Looking through
> the other in-tree NAND drivers, it looks like most of them do implement
> cmd_ctrl, but quite a few of them do not (e.g. au1550nd, denali, docg4).
>
> What do you think would be the best way to handle this? It seems like this gap
> could be bridged from either side -- either the PL353 driver could implement
> cmd_ctrl, at least as a stub version that provides the expected behavior in
> this case; or the on-die framework could break this out into a callback
> function with a default implementation that the driver could override to
> perform this behavior in the manner of its choosing.

Oh, I thought every driver has to implement that function. ;-\
But you're right there is a corner case.

What we could do is just using chip->cmdfunc() with a custom NAND command.
i.e. chip->cmdfunc(mtd, NAND_CMD_READMODE, -1, -1);

Gerhard Sittig tried to introduce such a command some time ago:
http://lists.infradead.org/pipermail/linux-mtd/2014-April/053115.html

Maybe Brian can bring some light into that too...

> When I build this without CONFIG_MTD_NAND_ECC_ON_DIE enabled, I get the
> following warning here:
>
> In file included from drivers/mtd/nand/nand_base.c:46:0:
> include/linux/mtd/nand_ondie.h: In function 'nand_read_subpage_on_die':
> include/linux/mtd/nand_ondie.h:28:1: warning: no return statement in function returning non-void [-Wreturn-type]
> include/linux/mtd/nand_ondie.h: In function 'nand_read_page_on_die':
> include/linux/mtd/nand_ondie.h:34:1: warning: no return statement in function returning non-void [-Wreturn-type]
>
> Perhaps return an error code here, even though you'll never get past the BUG()?

What gcc is this?
gcc 4.8 here does not warn, I thought it is smart enough that this function does never
return. Can it be that your .config has CONFIG_BUG=n?
Anyway, this functions clearly needs a return statement. :)

Thanks,
//richard

2015-04-27 22:37:06

by Ben Shelton

[permalink] [raw]
Subject: Re: [PATCH 1/3] mtd: nand: Add on-die ECC support

On 04/28, Richard Weinberger wrote:
> Am 27.04.2015 um 23:35 schrieb Ben Shelton:
> > I tested this against the latest version of the PL353 NAND driver that Punnaiah
> > has been working to upstream (copying her on this message). With a few changes
> > to that driver, I got it most of the way through initialization with on-die ECC
> > enabled, but it segfaults here with a null pointer dereference because the
> > PL353 driver does not implement chip->cmd_ctrl. Instead, it implements a
> > custom override of cmd->cmdfunc that does not call cmd_ctrl. Looking through
> > the other in-tree NAND drivers, it looks like most of them do implement
> > cmd_ctrl, but quite a few of them do not (e.g. au1550nd, denali, docg4).
> >
> > What do you think would be the best way to handle this? It seems like this gap
> > could be bridged from either side -- either the PL353 driver could implement
> > cmd_ctrl, at least as a stub version that provides the expected behavior in
> > this case; or the on-die framework could break this out into a callback
> > function with a default implementation that the driver could override to
> > perform this behavior in the manner of its choosing.
>
> Oh, I thought every driver has to implement that function. ;-\
> But you're right there is a corner case.
>
> What we could do is just using chip->cmdfunc() with a custom NAND command.
> i.e. chip->cmdfunc(mtd, NAND_CMD_READMODE, -1, -1);
>
> Gerhard Sittig tried to introduce such a command some time ago:
> http://lists.infradead.org/pipermail/linux-mtd/2014-April/053115.html

That sounds reasonable to me. That's similar to how we're checking the
NAND status after reads in our current out-of-tree PL353 driver. We
added the extra command:

+ /*
+ * READ0 command only, for checking read status. Note that the real command
+ * here is 0x00, but we can't differentiate between READ0 where we need to
+ * send a READSTART after the address bytes, or a READ0 by itself, after
+ * a read status command to check the on-die ECC status. The high bit is
+ * written into the unused end_cmd field, so we don't need to mask it off.
+ */
+#define NAND_CMD_READ0_ONLY 0x100

and then added it to the struct pl353_nand_command_format of the driver:

static const struct pl353_nand_command_format pl353_nand_commands[] = {
{NAND_CMD_READ0, NAND_CMD_READSTART, 5, PL353_NAND_CMD_PHASE},
+ {NAND_CMD_READ0_ONLY, NAND_CMD_NONE, 0, NAND_CMD_NONE},
{NAND_CMD_RNDOUT, NAND_CMD_RNDOUTSTART, 2, PL353_NAND_CMD_PHASE},
{NAND_CMD_READID, NAND_CMD_NONE, 1, NAND_CMD_NONE},
{NAND_CMD_STATUS, NAND_CMD_NONE, 0, NAND_CMD_NONE},

>
> Maybe Brian can bring some light into that too...
>
> > When I build this without CONFIG_MTD_NAND_ECC_ON_DIE enabled, I get the
> > following warning here:
> >
> > In file included from drivers/mtd/nand/nand_base.c:46:0:
> > include/linux/mtd/nand_ondie.h: In function 'nand_read_subpage_on_die':
> > include/linux/mtd/nand_ondie.h:28:1: warning: no return statement in function returning non-void [-Wreturn-type]
> > include/linux/mtd/nand_ondie.h: In function 'nand_read_page_on_die':
> > include/linux/mtd/nand_ondie.h:34:1: warning: no return statement in function returning non-void [-Wreturn-type]
> >
> > Perhaps return an error code here, even though you'll never get past the BUG()?
>
> What gcc is this?
> gcc 4.8 here does not warn, I thought it is smart enough that this function does never
> return. Can it be that your .config has CONFIG_BUG=n?
> Anyway, this functions clearly needs a return statement. :)

gcc 4.7.2, and you are correct that I had CONFIG_BUG off. :)

Thanks,
Ben

>
> Thanks,
> //richard

2015-04-27 22:42:23

by Richard Weinberger

[permalink] [raw]
Subject: Re: [PATCH 1/3] mtd: nand: Add on-die ECC support



Am 28.04.2015 um 00:36 schrieb Ben Shelton:
> On 04/28, Richard Weinberger wrote:
>> Am 27.04.2015 um 23:35 schrieb Ben Shelton:
>>> I tested this against the latest version of the PL353 NAND driver that Punnaiah
>>> has been working to upstream (copying her on this message). With a few changes
>>> to that driver, I got it most of the way through initialization with on-die ECC
>>> enabled, but it segfaults here with a null pointer dereference because the
>>> PL353 driver does not implement chip->cmd_ctrl. Instead, it implements a
>>> custom override of cmd->cmdfunc that does not call cmd_ctrl. Looking through
>>> the other in-tree NAND drivers, it looks like most of them do implement
>>> cmd_ctrl, but quite a few of them do not (e.g. au1550nd, denali, docg4).
>>>
>>> What do you think would be the best way to handle this? It seems like this gap
>>> could be bridged from either side -- either the PL353 driver could implement
>>> cmd_ctrl, at least as a stub version that provides the expected behavior in
>>> this case; or the on-die framework could break this out into a callback
>>> function with a default implementation that the driver could override to
>>> perform this behavior in the manner of its choosing.
>>
>> Oh, I thought every driver has to implement that function. ;-\
>> But you're right there is a corner case.
>>
>> What we could do is just using chip->cmdfunc() with a custom NAND command.
>> i.e. chip->cmdfunc(mtd, NAND_CMD_READMODE, -1, -1);
>>
>> Gerhard Sittig tried to introduce such a command some time ago:
>> http://lists.infradead.org/pipermail/linux-mtd/2014-April/053115.html
>
> That sounds reasonable to me. That's similar to how we're checking the
> NAND status after reads in our current out-of-tree PL353 driver. We
> added the extra command:
>
> + /*
> + * READ0 command only, for checking read status. Note that the real command
> + * here is 0x00, but we can't differentiate between READ0 where we need to
> + * send a READSTART after the address bytes, or a READ0 by itself, after
> + * a read status command to check the on-die ECC status. The high bit is
> + * written into the unused end_cmd field, so we don't need to mask it off.
> + */
> +#define NAND_CMD_READ0_ONLY 0x100
>
> and then added it to the struct pl353_nand_command_format of the driver:
>
> static const struct pl353_nand_command_format pl353_nand_commands[] = {
> {NAND_CMD_READ0, NAND_CMD_READSTART, 5, PL353_NAND_CMD_PHASE},
> + {NAND_CMD_READ0_ONLY, NAND_CMD_NONE, 0, NAND_CMD_NONE},
> {NAND_CMD_RNDOUT, NAND_CMD_RNDOUTSTART, 2, PL353_NAND_CMD_PHASE},
> {NAND_CMD_READID, NAND_CMD_NONE, 1, NAND_CMD_NONE},
> {NAND_CMD_STATUS, NAND_CMD_NONE, 0, NAND_CMD_NONE},

Yep. All you need to do in check_read_status_on_die() is switching back to reading
mode.

>>
>> Maybe Brian can bring some light into that too...
>>
>>> When I build this without CONFIG_MTD_NAND_ECC_ON_DIE enabled, I get the
>>> following warning here:
>>>
>>> In file included from drivers/mtd/nand/nand_base.c:46:0:
>>> include/linux/mtd/nand_ondie.h: In function 'nand_read_subpage_on_die':
>>> include/linux/mtd/nand_ondie.h:28:1: warning: no return statement in function returning non-void [-Wreturn-type]
>>> include/linux/mtd/nand_ondie.h: In function 'nand_read_page_on_die':
>>> include/linux/mtd/nand_ondie.h:34:1: warning: no return statement in function returning non-void [-Wreturn-type]
>>>
>>> Perhaps return an error code here, even though you'll never get past the BUG()?
>>
>> What gcc is this?
>> gcc 4.8 here does not warn, I thought it is smart enough that this function does never
>> return. Can it be that your .config has CONFIG_BUG=n?
>> Anyway, this functions clearly needs a return statement. :)
>
> gcc 4.7.2, and you are correct that I had CONFIG_BUG off. :)

Yeah, just noticed that BUG() with CONFIG_BUG=n does not have
a nonreturn attribute. So, gcc cannot know...

Thanks,
//richard

2015-04-27 22:53:14

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH 1/3] mtd: nand: Add on-die ECC support

On Tue, Apr 28, 2015 at 12:42:18AM +0200, Richard Weinberger wrote:
> Am 28.04.2015 um 00:36 schrieb Ben Shelton:
> >>> When I build this without CONFIG_MTD_NAND_ECC_ON_DIE enabled, I get the
> >>> following warning here:
> >>>
> >>> In file included from drivers/mtd/nand/nand_base.c:46:0:
> >>> include/linux/mtd/nand_ondie.h: In function 'nand_read_subpage_on_die':
> >>> include/linux/mtd/nand_ondie.h:28:1: warning: no return statement in function returning non-void [-Wreturn-type]
> >>> include/linux/mtd/nand_ondie.h: In function 'nand_read_page_on_die':
> >>> include/linux/mtd/nand_ondie.h:34:1: warning: no return statement in function returning non-void [-Wreturn-type]
> >>>
> >>> Perhaps return an error code here, even though you'll never get past the BUG()?
> >>
> >> What gcc is this?
> >> gcc 4.8 here does not warn, I thought it is smart enough that this function does never
> >> return. Can it be that your .config has CONFIG_BUG=n?
> >> Anyway, this functions clearly needs a return statement. :)
> >
> > gcc 4.7.2, and you are correct that I had CONFIG_BUG off. :)
>
> Yeah, just noticed that BUG() with CONFIG_BUG=n does not have
> a nonreturn attribute. So, gcc cannot know...

But it's an obvious infinite loop... all of my toolchains (4.2, 4.5,
4.6, 4.8) are able to compile this without complaining (gcc -Wall):

int test() { do { } while (1); }

Not sure if 4.7 is unique.

But hey, others are doing this, so why not join the crowd. e.g.:

include/asm-generic/pgtable.h-567-#ifndef __HAVE_ARCH_PMD_WRITE
include/asm-generic/pgtable.h-568-static inline int pmd_write(pmd_t pmd)
include/asm-generic/pgtable.h-569-{
include/asm-generic/pgtable.h:570: BUG();
include/asm-generic/pgtable.h-571- return 0;
include/asm-generic/pgtable.h-572-}
include/asm-generic/pgtable.h-573-#endif /* __HAVE_ARCH_PMD_WRITE */

2015-04-27 22:57:37

by Richard Weinberger

[permalink] [raw]
Subject: Re: [PATCH 1/3] mtd: nand: Add on-die ECC support

Am 28.04.2015 um 00:53 schrieb Brian Norris:
> On Tue, Apr 28, 2015 at 12:42:18AM +0200, Richard Weinberger wrote:
>> Am 28.04.2015 um 00:36 schrieb Ben Shelton:
>>>>> When I build this without CONFIG_MTD_NAND_ECC_ON_DIE enabled, I get the
>>>>> following warning here:
>>>>>
>>>>> In file included from drivers/mtd/nand/nand_base.c:46:0:
>>>>> include/linux/mtd/nand_ondie.h: In function 'nand_read_subpage_on_die':
>>>>> include/linux/mtd/nand_ondie.h:28:1: warning: no return statement in function returning non-void [-Wreturn-type]
>>>>> include/linux/mtd/nand_ondie.h: In function 'nand_read_page_on_die':
>>>>> include/linux/mtd/nand_ondie.h:34:1: warning: no return statement in function returning non-void [-Wreturn-type]
>>>>>
>>>>> Perhaps return an error code here, even though you'll never get past the BUG()?
>>>>
>>>> What gcc is this?
>>>> gcc 4.8 here does not warn, I thought it is smart enough that this function does never
>>>> return. Can it be that your .config has CONFIG_BUG=n?
>>>> Anyway, this functions clearly needs a return statement. :)
>>>
>>> gcc 4.7.2, and you are correct that I had CONFIG_BUG off. :)
>>
>> Yeah, just noticed that BUG() with CONFIG_BUG=n does not have
>> a nonreturn attribute. So, gcc cannot know...
>
> But it's an obvious infinite loop... all of my toolchains (4.2, 4.5,
> 4.6, 4.8) are able to compile this without complaining (gcc -Wall):
>
> int test() { do { } while (1); }

Not here. gcc 4.8 warns on that.
As soon I add __attribute__ ((noreturn)) it does not longer complain.

Thanks,
//richard

2015-04-27 23:10:06

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH 1/3] mtd: nand: Add on-die ECC support

On Mon, Apr 27, 2015 at 3:57 PM, Richard Weinberger <[email protected]> wrote:
> Am 28.04.2015 um 00:53 schrieb Brian Norris:
>> On Tue, Apr 28, 2015 at 12:42:18AM +0200, Richard Weinberger wrote:
>>> Am 28.04.2015 um 00:36 schrieb Ben Shelton:
>>>>>> When I build this without CONFIG_MTD_NAND_ECC_ON_DIE enabled, I get the
>>>>>> following warning here:
>>>>>>
>>>>>> In file included from drivers/mtd/nand/nand_base.c:46:0:
>>>>>> include/linux/mtd/nand_ondie.h: In function 'nand_read_subpage_on_die':
>>>>>> include/linux/mtd/nand_ondie.h:28:1: warning: no return statement in function returning non-void [-Wreturn-type]
>>>>>> include/linux/mtd/nand_ondie.h: In function 'nand_read_page_on_die':
>>>>>> include/linux/mtd/nand_ondie.h:34:1: warning: no return statement in function returning non-void [-Wreturn-type]
>>>>>>
>>>>>> Perhaps return an error code here, even though you'll never get past the BUG()?
>>>>>
>>>>> What gcc is this?
>>>>> gcc 4.8 here does not warn, I thought it is smart enough that this function does never
>>>>> return. Can it be that your .config has CONFIG_BUG=n?
>>>>> Anyway, this functions clearly needs a return statement. :)
>>>>
>>>> gcc 4.7.2, and you are correct that I had CONFIG_BUG off. :)
>>>
>>> Yeah, just noticed that BUG() with CONFIG_BUG=n does not have
>>> a nonreturn attribute. So, gcc cannot know...
>>
>> But it's an obvious infinite loop... all of my toolchains (4.2, 4.5,
>> 4.6, 4.8) are able to compile this without complaining (gcc -Wall):
>>
>> int test() { do { } while (1); }
>
> Not here. gcc 4.8 warns on that.
> As soon I add __attribute__ ((noreturn)) it does not longer complain.

Huh? Maybe I have a crazy modified gcc.

$ gcc --version
gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc -Wall -Wextra -c a.c
$ cat a.c
int test() { do {} while (1); }

But:

$ gcc -Wall -Wextra -c b.c
b.c: In function ‘test’:
b.c:1:1: warning: control reaches end of non-void function [-Wreturn-type]
int test() { }
^
$ cat b.c
int test() { }

2015-04-27 23:15:55

by Richard Weinberger

[permalink] [raw]
Subject: Re: [PATCH 1/3] mtd: nand: Add on-die ECC support

Am 28.04.2015 um 01:10 schrieb Brian Norris:
> On Mon, Apr 27, 2015 at 3:57 PM, Richard Weinberger <[email protected]> wrote:
>> Am 28.04.2015 um 00:53 schrieb Brian Norris:
>>> On Tue, Apr 28, 2015 at 12:42:18AM +0200, Richard Weinberger wrote:
>>>> Am 28.04.2015 um 00:36 schrieb Ben Shelton:
>>>>>>> When I build this without CONFIG_MTD_NAND_ECC_ON_DIE enabled, I get the
>>>>>>> following warning here:
>>>>>>>
>>>>>>> In file included from drivers/mtd/nand/nand_base.c:46:0:
>>>>>>> include/linux/mtd/nand_ondie.h: In function 'nand_read_subpage_on_die':
>>>>>>> include/linux/mtd/nand_ondie.h:28:1: warning: no return statement in function returning non-void [-Wreturn-type]
>>>>>>> include/linux/mtd/nand_ondie.h: In function 'nand_read_page_on_die':
>>>>>>> include/linux/mtd/nand_ondie.h:34:1: warning: no return statement in function returning non-void [-Wreturn-type]
>>>>>>>
>>>>>>> Perhaps return an error code here, even though you'll never get past the BUG()?
>>>>>>
>>>>>> What gcc is this?
>>>>>> gcc 4.8 here does not warn, I thought it is smart enough that this function does never
>>>>>> return. Can it be that your .config has CONFIG_BUG=n?
>>>>>> Anyway, this functions clearly needs a return statement. :)
>>>>>
>>>>> gcc 4.7.2, and you are correct that I had CONFIG_BUG off. :)
>>>>
>>>> Yeah, just noticed that BUG() with CONFIG_BUG=n does not have
>>>> a nonreturn attribute. So, gcc cannot know...
>>>
>>> But it's an obvious infinite loop... all of my toolchains (4.2, 4.5,
>>> 4.6, 4.8) are able to compile this without complaining (gcc -Wall):
>>>
>>> int test() { do { } while (1); }
>>
>> Not here. gcc 4.8 warns on that.
>> As soon I add __attribute__ ((noreturn)) it does not longer complain.
>
> Huh? Maybe I have a crazy modified gcc.
>
> $ gcc --version
> gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2
> Copyright (C) 2013 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions. There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> $ gcc -Wall -Wextra -c a.c
> $ cat a.c
> int test() { do {} while (1); }

Make test static and gcc will warn.

Thanks,
//richard

2015-04-27 23:19:54

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH 1/3] mtd: nand: Add on-die ECC support

On Mon, Apr 27, 2015 at 4:15 PM, Richard Weinberger <[email protected]> wrote:
> Am 28.04.2015 um 01:10 schrieb Brian Norris:
>> On Mon, Apr 27, 2015 at 3:57 PM, Richard Weinberger <[email protected]> wrote:
>>> Am 28.04.2015 um 00:53 schrieb Brian Norris:
>>>> On Tue, Apr 28, 2015 at 12:42:18AM +0200, Richard Weinberger wrote:
>>>>> Am 28.04.2015 um 00:36 schrieb Ben Shelton:
>>>>>>>> When I build this without CONFIG_MTD_NAND_ECC_ON_DIE enabled, I get the
>>>>>>>> following warning here:
>>>>>>>>
>>>>>>>> In file included from drivers/mtd/nand/nand_base.c:46:0:
>>>>>>>> include/linux/mtd/nand_ondie.h: In function 'nand_read_subpage_on_die':
>>>>>>>> include/linux/mtd/nand_ondie.h:28:1: warning: no return statement in function returning non-void [-Wreturn-type]
>>>>>>>> include/linux/mtd/nand_ondie.h: In function 'nand_read_page_on_die':
>>>>>>>> include/linux/mtd/nand_ondie.h:34:1: warning: no return statement in function returning non-void [-Wreturn-type]
>>>>>>>>
>>>>>>>> Perhaps return an error code here, even though you'll never get past the BUG()?
>>>>>>>
>>>>>>> What gcc is this?
>>>>>>> gcc 4.8 here does not warn, I thought it is smart enough that this function does never
>>>>>>> return. Can it be that your .config has CONFIG_BUG=n?
>>>>>>> Anyway, this functions clearly needs a return statement. :)
>>>>>>
>>>>>> gcc 4.7.2, and you are correct that I had CONFIG_BUG off. :)
>>>>>
>>>>> Yeah, just noticed that BUG() with CONFIG_BUG=n does not have
>>>>> a nonreturn attribute. So, gcc cannot know...
>>>>
>>>> But it's an obvious infinite loop... all of my toolchains (4.2, 4.5,
>>>> 4.6, 4.8) are able to compile this without complaining (gcc -Wall):
>>>>
>>>> int test() { do { } while (1); }
>>>
>>> Not here. gcc 4.8 warns on that.
>>> As soon I add __attribute__ ((noreturn)) it does not longer complain.
>>
>> Huh? Maybe I have a crazy modified gcc.
>>
>> $ gcc --version
>> gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2
>> Copyright (C) 2013 Free Software Foundation, Inc.
>> This is free software; see the source for copying conditions. There is NO
>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>>
>> $ gcc -Wall -Wextra -c a.c
>> $ cat a.c
>> int test() { do {} while (1); }
>
> Make test static and gcc will warn.

Hmm. That's a strange distinction for gcc to make. Maybe because of
the potential for inlining? Still seems odd.

2015-04-27 23:24:05

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH 1/3] mtd: nand: Add on-die ECC support

On Tue, Apr 28, 2015 at 12:19:16AM +0200, Richard Weinberger wrote:
> Am 27.04.2015 um 23:35 schrieb Ben Shelton:
> > I tested this against the latest version of the PL353 NAND driver that Punnaiah
> > has been working to upstream (copying her on this message). With a few changes
> > to that driver, I got it most of the way through initialization with on-die ECC
> > enabled, but it segfaults here with a null pointer dereference because the
> > PL353 driver does not implement chip->cmd_ctrl. Instead, it implements a
> > custom override of cmd->cmdfunc that does not call cmd_ctrl. Looking through
> > the other in-tree NAND drivers, it looks like most of them do implement
> > cmd_ctrl, but quite a few of them do not (e.g. au1550nd, denali, docg4).
> >
> > What do you think would be the best way to handle this? It seems like this gap
> > could be bridged from either side -- either the PL353 driver could implement
> > cmd_ctrl, at least as a stub version that provides the expected behavior in
> > this case; or the on-die framework could break this out into a callback
> > function with a default implementation that the driver could override to
> > perform this behavior in the manner of its choosing.
>
> Oh, I thought every driver has to implement that function. ;-\

Nope.

> But you're right there is a corner case.

And it's not the only one! Right now, there's no guarantee even that
read_buf() returns raw data, unmodified by the SoC's controller. Plenty
of drivers actually have HW-enabled ECC turned on by default, and so
they override the chip->ecc.read_page() (and sometimes
chip->ecc.read_page_raw() functions, if we're lucky) with something
that pokes the appropriate hardware instead. I expect anything
comprehensive here is probably going to have to utilize
chip->ecc.read_page_raw(), at least if it's provided by the hardware
driver.

> What we could do is just using chip->cmdfunc() with a custom NAND command.
> i.e. chip->cmdfunc(mtd, NAND_CMD_READMODE, -1, -1);
>
> Gerhard Sittig tried to introduce such a command some time ago:
> http://lists.infradead.org/pipermail/linux-mtd/2014-April/053115.html

Yikes! Please no! It's bad enough to have a ton of drivers doing
switch/case on a bunch of real, somewhat well-known opcodes, but to add
new fake ones? I'd rather not. We're inflicting ourselves with a
kernel-internal version of ioctl(). What's the justification, again? I
don't really remember the context of Gerhard's previous patch.

Brian

Subject: Re: [PATCH 1/3] mtd: nand: Add on-die ECC support

On Tue, Apr 28, 2015 at 4:53 AM, Brian Norris
<[email protected]> wrote:
> On Tue, Apr 28, 2015 at 12:19:16AM +0200, Richard Weinberger wrote:
>> Am 27.04.2015 um 23:35 schrieb Ben Shelton:
>> > I tested this against the latest version of the PL353 NAND driver that Punnaiah
>> > has been working to upstream (copying her on this message). With a few changes
>> > to that driver, I got it most of the way through initialization with on-die ECC
>> > enabled, but it segfaults here with a null pointer dereference because the
>> > PL353 driver does not implement chip->cmd_ctrl. Instead, it implements a
>> > custom override of cmd->cmdfunc that does not call cmd_ctrl. Looking through
>> > the other in-tree NAND drivers, it looks like most of them do implement
>> > cmd_ctrl, but quite a few of them do not (e.g. au1550nd, denali, docg4).
>> >
>> > What do you think would be the best way to handle this? It seems like this gap
>> > could be bridged from either side -- either the PL353 driver could implement
>> > cmd_ctrl, at least as a stub version that provides the expected behavior in
>> > this case; or the on-die framework could break this out into a callback
>> > function with a default implementation that the driver could override to
>> > perform this behavior in the manner of its choosing.
>>
>> Oh, I thought every driver has to implement that function. ;-\
>
> Nope.
>
>> But you're right there is a corner case.
>
> And it's not the only one! Right now, there's no guarantee even that
> read_buf() returns raw data, unmodified by the SoC's controller. Plenty
> of drivers actually have HW-enabled ECC turned on by default, and so
> they override the chip->ecc.read_page() (and sometimes
> chip->ecc.read_page_raw() functions, if we're lucky) with something
> that pokes the appropriate hardware instead. I expect anything
> comprehensive here is probably going to have to utilize
> chip->ecc.read_page_raw(), at least if it's provided by the hardware
> driver.

Yes, overriding the chip->ecc.read_page_raw would solve this. Agree that
read_buf need not be returning raw data always including my new driver for
arasan nand flash controller.

http://lkml.iu.edu/hypermail/linux/kernel/1504.2/00313.html


Regards,
Punnaiah
>
>> What we could do is just using chip->cmdfunc() with a custom NAND command.
>> i.e. chip->cmdfunc(mtd, NAND_CMD_READMODE, -1, -1);
>>
>> Gerhard Sittig tried to introduce such a command some time ago:
>> http://lists.infradead.org/pipermail/linux-mtd/2014-April/053115.html
>
> Yikes! Please no! It's bad enough to have a ton of drivers doing
> switch/case on a bunch of real, somewhat well-known opcodes, but to add
> new fake ones? I'd rather not. We're inflicting ourselves with a
> kernel-internal version of ioctl(). What's the justification, again? I
> don't really remember the context of Gerhard's previous patch.
>
> Brian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2015-04-28 03:22:20

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH 1/3] mtd: nand: Add on-die ECC support

On Tue, Apr 28, 2015 at 08:18:12AM +0530, punnaiah choudary kalluri wrote:
> On Tue, Apr 28, 2015 at 4:53 AM, Brian Norris
> <[email protected]> wrote:
> > On Tue, Apr 28, 2015 at 12:19:16AM +0200, Richard Weinberger wrote:
> >> Oh, I thought every driver has to implement that function. ;-\
> >
> > Nope.
> >
> >> But you're right there is a corner case.
> >
> > And it's not the only one! Right now, there's no guarantee even that
> > read_buf() returns raw data, unmodified by the SoC's controller. Plenty
> > of drivers actually have HW-enabled ECC turned on by default, and so
> > they override the chip->ecc.read_page() (and sometimes
> > chip->ecc.read_page_raw() functions, if we're lucky) with something
> > that pokes the appropriate hardware instead. I expect anything
> > comprehensive here is probably going to have to utilize
> > chip->ecc.read_page_raw(), at least if it's provided by the hardware
> > driver.
>
> Yes, overriding the chip->ecc.read_page_raw would solve this.

I'm actually suggesting that (in this patch set, for on-die ECC
support), maybe we *shouldn't* override chip->ecc.read_page_raw() and
leave that to be defined by the driver, and then on-die ECC support
should be added in a way that just calls chip->ecc.read_page_raw(). This
should work for any driver that already properly supports the raw
callbacks.

> Agree that
> read_buf need not be returning raw data always including my new driver for
> arasan nand flash controller.

I agree with that. At the moment, chip->read_buf() really has very
driver-specific meaning. Not sure if that's really a good thing, but
it's the way things are...

> http://lkml.iu.edu/hypermail/linux/kernel/1504.2/00313.html

In the half a minute I just spent looking at this (I may review it
properly later), I noted a few things:

1. you don't implement ecc.read_page_raw(); this means we'll probably
have trouble supporting on-die ECC with your driver, among other things

2. your patch is all white-space mangled. Please use your favorite
search engine to figure out how to get that right. git-send-email is
your friend.

Thanks,
Brian

Subject: Re: [PATCH 1/3] mtd: nand: Add on-die ECC support

On Tue, Apr 28, 2015 at 8:52 AM, Brian Norris
<[email protected]> wrote:
> On Tue, Apr 28, 2015 at 08:18:12AM +0530, punnaiah choudary kalluri wrote:
>> On Tue, Apr 28, 2015 at 4:53 AM, Brian Norris
>> <[email protected]> wrote:
>> > On Tue, Apr 28, 2015 at 12:19:16AM +0200, Richard Weinberger wrote:
>> >> Oh, I thought every driver has to implement that function. ;-\
>> >
>> > Nope.
>> >
>> >> But you're right there is a corner case.
>> >
>> > And it's not the only one! Right now, there's no guarantee even that
>> > read_buf() returns raw data, unmodified by the SoC's controller. Plenty
>> > of drivers actually have HW-enabled ECC turned on by default, and so
>> > they override the chip->ecc.read_page() (and sometimes
>> > chip->ecc.read_page_raw() functions, if we're lucky) with something
>> > that pokes the appropriate hardware instead. I expect anything
>> > comprehensive here is probably going to have to utilize
>> > chip->ecc.read_page_raw(), at least if it's provided by the hardware
>> > driver.
>>
>> Yes, overriding the chip->ecc.read_page_raw would solve this.
>
> I'm actually suggesting that (in this patch set, for on-die ECC
> support), maybe we *shouldn't* override chip->ecc.read_page_raw() and
> leave that to be defined by the driver, and then on-die ECC support
> should be added in a way that just calls chip->ecc.read_page_raw(). This
> should work for any driver that already properly supports the raw
> callbacks.

Ok. Understood.

>
>> Agree that
>> read_buf need not be returning raw data always including my new driver for
>> arasan nand flash controller.
>
> I agree with that. At the moment, chip->read_buf() really has very
> driver-specific meaning. Not sure if that's really a good thing, but
> it's the way things are...
>
>> http://lkml.iu.edu/hypermail/linux/kernel/1504.2/00313.html
>
> In the half a minute I just spent looking at this (I may review it
> properly later), I noted a few things:
>
> 1. you don't implement ecc.read_page_raw(); this means we'll probably
> have trouble supporting on-die ECC with your driver, among other things

On-die ECC is optional as long as the controller has better ecc coverage.
The arasan controller supports up to 24 bit ecc. There is no point to use
on-die ECC and will always use hw ecc even for On-die ecc devices.
This version of driver will not have the support for ecc.read_page_raw but
I will add based on the need in future.


>
> 2. your patch is all white-space mangled. Please use your favorite
> search engine to figure out how to get that right. git-send-email is
> your friend.

Oh sorry. Looks that was the web link issue. Here is the new one.
https://lkml.org/lkml/2015/4/16/311

Also request your time for reviewing this driver.

Thanks,
Punnaiah

>
> Thanks,
> Brian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2015-04-28 14:04:19

by Josh Cartwright

[permalink] [raw]
Subject: Re: [PATCH 1/3] mtd: nand: Add on-die ECC support

On Tue, Apr 28, 2015 at 09:14:26AM +0530, punnaiah choudary kalluri wrote:
> On Tue, Apr 28, 2015 at 8:52 AM, Brian Norris <[email protected]> wrote:
> > On Tue, Apr 28, 2015 at 08:18:12AM +0530, punnaiah choudary kalluri wrote:
> >> On Tue, Apr 28, 2015 at 4:53 AM, Brian Norris <[email protected]> wrote:
[..]
> >> Agree that read_buf need not be returning raw data always including
> >> my new driver for arasan nand flash controller.
> >
> > I agree with that. At the moment, chip->read_buf() really has very
> > driver-specific meaning. Not sure if that's really a good thing, but
> > it's the way things are...
> >
> >> http://lkml.iu.edu/hypermail/linux/kernel/1504.2/00313.html
> >
> > In the half a minute I just spent looking at this (I may review it
> > properly later), I noted a few things:
> >
> > 1. you don't implement ecc.read_page_raw(); this means we'll probably
> > have trouble supporting on-die ECC with your driver, among other things
>
> On-die ECC is optional as long as the controller has better ecc coverage.
> The arasan controller supports up to 24 bit ecc. There is no point to use
> on-die ECC and will always use hw ecc even for On-die ecc devices.

Maybe this is true of the controller instantiated in the Zynq MPSoC
chips, but the Zynq 7000 series SMC is documented to support only 1-bit
ECC.

(Which is why we're looking at your pl353 driver and this set).

Josh


Attachments:
(No filename) (1.39 kB)
(No filename) (473.00 B)
Download all attachments
Subject: Re: [PATCH 1/3] mtd: nand: Add on-die ECC support

On Tue, Apr 28, 2015 at 7:33 PM, Josh Cartwright <[email protected]> wrote:
> On Tue, Apr 28, 2015 at 09:14:26AM +0530, punnaiah choudary kalluri wrote:
>> On Tue, Apr 28, 2015 at 8:52 AM, Brian Norris <[email protected]> wrote:
>> > On Tue, Apr 28, 2015 at 08:18:12AM +0530, punnaiah choudary kalluri wrote:
>> >> On Tue, Apr 28, 2015 at 4:53 AM, Brian Norris <[email protected]> wrote:
> [..]
>> >> Agree that read_buf need not be returning raw data always including
>> >> my new driver for arasan nand flash controller.
>> >
>> > I agree with that. At the moment, chip->read_buf() really has very
>> > driver-specific meaning. Not sure if that's really a good thing, but
>> > it's the way things are...
>> >
>> >> http://lkml.iu.edu/hypermail/linux/kernel/1504.2/00313.html
>> >
>> > In the half a minute I just spent looking at this (I may review it
>> > properly later), I noted a few things:
>> >
>> > 1. you don't implement ecc.read_page_raw(); this means we'll probably
>> > have trouble supporting on-die ECC with your driver, among other things
>>
>> On-die ECC is optional as long as the controller has better ecc coverage.
>> The arasan controller supports up to 24 bit ecc. There is no point to use
>> on-die ECC and will always use hw ecc even for On-die ecc devices.
>
> Maybe this is true of the controller instantiated in the Zynq MPSoC
> chips, but the Zynq 7000 series SMC is documented to support only 1-bit
> ECC.

True. SMC controller in zynq supports only 1 -bit ECC. One can use
on-die ECC devices
and turn of the controller ECC for better ECC coverage. I am also
exploring the options
to use these patches with the pl353 driver in a generic way and
acceptable to all.

Regards,
Punnaiah
>
> (Which is why we're looking at your pl353 driver and this set).
>
> Josh

2015-05-08 21:27:10

by Ben Shelton

[permalink] [raw]
Subject: Re: [PATCH 1/3] mtd: nand: Add on-die ECC support

On 04/27, Brian Norris wrote:
> On Tue, Apr 28, 2015 at 08:18:12AM +0530, punnaiah choudary kalluri wrote:
> > On Tue, Apr 28, 2015 at 4:53 AM, Brian Norris
> > <[email protected]> wrote:
> > > On Tue, Apr 28, 2015 at 12:19:16AM +0200, Richard Weinberger wrote:
> > >> Oh, I thought every driver has to implement that function. ;-\
> > >
> > > Nope.
> > >
> > >> But you're right there is a corner case.
> > >
> > > And it's not the only one! Right now, there's no guarantee even that
> > > read_buf() returns raw data, unmodified by the SoC's controller. Plenty
> > > of drivers actually have HW-enabled ECC turned on by default, and so
> > > they override the chip->ecc.read_page() (and sometimes
> > > chip->ecc.read_page_raw() functions, if we're lucky) with something
> > > that pokes the appropriate hardware instead. I expect anything
> > > comprehensive here is probably going to have to utilize
> > > chip->ecc.read_page_raw(), at least if it's provided by the hardware
> > > driver.
> >
> > Yes, overriding the chip->ecc.read_page_raw would solve this.
>
> I'm actually suggesting that (in this patch set, for on-die ECC
> support), maybe we *shouldn't* override chip->ecc.read_page_raw() and
> leave that to be defined by the driver, and then on-die ECC support
> should be added in a way that just calls chip->ecc.read_page_raw(). This
> should work for any driver that already properly supports the raw
> callbacks.
>

Hi Richard et al,

I'm guessing it's probably too late for the on-die ECC stuff to land in
4.2 at this point. Is there anything I can do to help this along
(testing, etc.)?

Thanks,
Ben

2015-05-08 21:39:57

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH 1/3] mtd: nand: Add on-die ECC support

On Fri, May 08, 2015 at 04:26:32PM -0500, Ben Shelton wrote:
> On 04/27, Brian Norris wrote:
> > On Tue, Apr 28, 2015 at 08:18:12AM +0530, punnaiah choudary kalluri wrote:
> > > On Tue, Apr 28, 2015 at 4:53 AM, Brian Norris
> > > <[email protected]> wrote:
> > > > On Tue, Apr 28, 2015 at 12:19:16AM +0200, Richard Weinberger wrote:
> > > >> Oh, I thought every driver has to implement that function. ;-\
> > > >
> > > > Nope.
> > > >
> > > >> But you're right there is a corner case.
> > > >
> > > > And it's not the only one! Right now, there's no guarantee even that
> > > > read_buf() returns raw data, unmodified by the SoC's controller. Plenty
> > > > of drivers actually have HW-enabled ECC turned on by default, and so
> > > > they override the chip->ecc.read_page() (and sometimes
> > > > chip->ecc.read_page_raw() functions, if we're lucky) with something
> > > > that pokes the appropriate hardware instead. I expect anything
> > > > comprehensive here is probably going to have to utilize
> > > > chip->ecc.read_page_raw(), at least if it's provided by the hardware
> > > > driver.
> > >
> > > Yes, overriding the chip->ecc.read_page_raw would solve this.
> >
> > I'm actually suggesting that (in this patch set, for on-die ECC
> > support), maybe we *shouldn't* override chip->ecc.read_page_raw() and
> > leave that to be defined by the driver, and then on-die ECC support
> > should be added in a way that just calls chip->ecc.read_page_raw(). This
> > should work for any driver that already properly supports the raw
> > callbacks.
> >
>
> Hi Richard et al,
>
> I'm guessing it's probably too late for the on-die ECC stuff to land in
> 4.2 at this point.

Not technically. We've got several weeks (approx 5 to 6?) before 4.1 is
released. 4.2 material should be getting finalized by a week or so
before the merge window (i.e., 4 to 5 weeks from now).

> Is there anything I can do to help this along
> (testing, etc.)?

This is going to need to get rewritten. I'm not sure if Richard is going
to tackle this again, as he hasn't responded to the points I brought up.
(Note that Richard is not the first to have tried to implement this,
without initial success.)

Brian

2015-05-08 21:43:39

by Richard Weinberger

[permalink] [raw]
Subject: Re: [PATCH 1/3] mtd: nand: Add on-die ECC support

Am 08.05.2015 um 23:39 schrieb Brian Norris:
> On Fri, May 08, 2015 at 04:26:32PM -0500, Ben Shelton wrote:
>> On 04/27, Brian Norris wrote:
>>> On Tue, Apr 28, 2015 at 08:18:12AM +0530, punnaiah choudary kalluri wrote:
>>>> On Tue, Apr 28, 2015 at 4:53 AM, Brian Norris
>>>> <[email protected]> wrote:
>>>>> On Tue, Apr 28, 2015 at 12:19:16AM +0200, Richard Weinberger wrote:
>>>>>> Oh, I thought every driver has to implement that function. ;-\
>>>>>
>>>>> Nope.
>>>>>
>>>>>> But you're right there is a corner case.
>>>>>
>>>>> And it's not the only one! Right now, there's no guarantee even that
>>>>> read_buf() returns raw data, unmodified by the SoC's controller. Plenty
>>>>> of drivers actually have HW-enabled ECC turned on by default, and so
>>>>> they override the chip->ecc.read_page() (and sometimes
>>>>> chip->ecc.read_page_raw() functions, if we're lucky) with something
>>>>> that pokes the appropriate hardware instead. I expect anything
>>>>> comprehensive here is probably going to have to utilize
>>>>> chip->ecc.read_page_raw(), at least if it's provided by the hardware
>>>>> driver.
>>>>
>>>> Yes, overriding the chip->ecc.read_page_raw would solve this.
>>>
>>> I'm actually suggesting that (in this patch set, for on-die ECC
>>> support), maybe we *shouldn't* override chip->ecc.read_page_raw() and
>>> leave that to be defined by the driver, and then on-die ECC support
>>> should be added in a way that just calls chip->ecc.read_page_raw(). This
>>> should work for any driver that already properly supports the raw
>>> callbacks.
>>>
>>
>> Hi Richard et al,
>>
>> I'm guessing it's probably too late for the on-die ECC stuff to land in
>> 4.2 at this point.
>
> Not technically. We've got several weeks (approx 5 to 6?) before 4.1 is
> released. 4.2 material should be getting finalized by a week or so
> before the merge window (i.e., 4 to 5 weeks from now).
>
>> Is there anything I can do to help this along
>> (testing, etc.)?
>
> This is going to need to get rewritten. I'm not sure if Richard is going
> to tackle this again, as he hasn't responded to the points I brought up.
> (Note that Richard is not the first to have tried to implement this,
> without initial success.)

I'm definitely willing to take the challenge.
But as I'm currently very busy with non-MTD stuff I had no time
to address your comments.

Thanks,
//richard