In function do_write_buffer(), in the for loop, there is a case
chip_ready() returns 1 while chip_good() returns 0, so it never
break the loop.
To fix this, chip_good() is enough and it should timeout if it stay
bad for a while.
Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer to check
correct value)
Signed-off-by: Yi Huaijie <[email protected]>
Signed-off-by: Liu Jian <[email protected]>
---
drivers/mtd/chips/cfi_cmdset_0002.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c
index 72428b6..818e94b 100644
--- a/drivers/mtd/chips/cfi_cmdset_0002.c
+++ b/drivers/mtd/chips/cfi_cmdset_0002.c
@@ -1876,14 +1876,14 @@ static int __xipram do_write_buffer(struct map_info *map, struct flchip *chip,
continue;
}
- if (time_after(jiffies, timeo) && !chip_ready(map, adr))
- break;
-
if (chip_good(map, adr, datum)) {
xip_enable(map, chip, adr);
goto op_done;
}
+ if (time_after(jiffies, timeo))
+ break;
+
/* Latency issues. Drop the lock, wait a while and retry */
UDELAY(map, chip, adr, 1);
}
--
2.7.4
+Przemyslaw
On Fri, 1 Feb 2019 07:30:39 +0800
Liu Jian <[email protected]> wrote:
> In function do_write_buffer(), in the for loop, there is a case
> chip_ready() returns 1 while chip_good() returns 0, so it never
> break the loop.
> To fix this, chip_good() is enough and it should timeout if it stay
> bad for a while.
Looks like Przemyslaw reported and fixed the same problem.
>
> Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer to check
> correct value)
Can you put the Fixes tag on a single, and the format is
Fixes: <hash> ("message")
> Signed-off-by: Yi Huaijie <[email protected]>
> Signed-off-by: Liu Jian <[email protected]>
[1]http://patchwork.ozlabs.org/patch/1025566/
> ---
> drivers/mtd/chips/cfi_cmdset_0002.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c
> index 72428b6..818e94b 100644
> --- a/drivers/mtd/chips/cfi_cmdset_0002.c
> +++ b/drivers/mtd/chips/cfi_cmdset_0002.c
> @@ -1876,14 +1876,14 @@ static int __xipram do_write_buffer(struct map_info *map, struct flchip *chip,
> continue;
> }
>
> - if (time_after(jiffies, timeo) && !chip_ready(map, adr))
> - break;
> -
> if (chip_good(map, adr, datum)) {
> xip_enable(map, chip, adr);
> goto op_done;
> }
>
> + if (time_after(jiffies, timeo))
> + break;
> +
> /* Latency issues. Drop the lock, wait a while and retry */
> UDELAY(map, chip, adr, 1);
> }
On Sun, 3 Feb 2019 09:26:45 +0100
Boris Brezillon <[email protected]> wrote:
> +Przemyslaw
>
> On Fri, 1 Feb 2019 07:30:39 +0800
> Liu Jian <[email protected]> wrote:
>
> > In function do_write_buffer(), in the for loop, there is a case
> > chip_ready() returns 1 while chip_good() returns 0, so it never
> > break the loop.
> > To fix this, chip_good() is enough and it should timeout if it stay
> > bad for a while.
>
> Looks like Przemyslaw reported and fixed the same problem.
>
> >
> > Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer to check
> > correct value)
>
> Can you put the Fixes tag on a single, and the format is
>
> Fixes: <hash> ("message")
>
> > Signed-off-by: Yi Huaijie <[email protected]>
> > Signed-off-by: Liu Jian <[email protected]>
>
> [1]http://patchwork.ozlabs.org/patch/1025566/
>
> > ---
> > drivers/mtd/chips/cfi_cmdset_0002.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c
> > index 72428b6..818e94b 100644
> > --- a/drivers/mtd/chips/cfi_cmdset_0002.c
> > +++ b/drivers/mtd/chips/cfi_cmdset_0002.c
> > @@ -1876,14 +1876,14 @@ static int __xipram do_write_buffer(struct map_info *map, struct flchip *chip,
> > continue;
> > }
> >
> > - if (time_after(jiffies, timeo) && !chip_ready(map, adr))
> > - break;
> > -
> > if (chip_good(map, adr, datum)) {
> > xip_enable(map, chip, adr);
> > goto op_done;
> > }
> >
> > + if (time_after(jiffies, timeo))
> > + break;
> > +
> > /* Latency issues. Drop the lock, wait a while and retry */
> > UDELAY(map, chip, adr, 1);
> > }
>
BTW, the patch itself looks good to me. Ikegami, can you confirm it
does the right thing?
Thanks,
Boris
> From: Boris Brezillon <[email protected]>
> Sent: Sunday, February 3, 2019 12:35 AM
> > +Przemyslaw
> >
> > On Fri, 1 Feb 2019 07:30:39 +0800
> > Liu Jian <[email protected]> wrote:
> >
> > > In function do_write_buffer(), in the for loop, there is a case
> > > chip_ready() returns 1 while chip_good() returns 0, so it never
> > > break the loop.
> > > To fix this, chip_good() is enough and it should timeout if it stay
> > > bad for a while.
> >
> > Looks like Przemyslaw reported and fixed the same problem.
> >
> > >
> > > Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer to
> > > check correct value)
> >
> > Can you put the Fixes tag on a single, and the format is
> >
> > Fixes: <hash> ("message")
> >
> > > Signed-off-by: Yi Huaijie <[email protected]>
> > > Signed-off-by: Liu Jian <[email protected]>
> >
> > [1]http://patchwork.ozlabs.org/patch/1025566/
> >
> > > ---
> > > drivers/mtd/chips/cfi_cmdset_0002.c | 6 +++---
> > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c
> > > b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > index 72428b6..818e94b 100644
> > > --- a/drivers/mtd/chips/cfi_cmdset_0002.c
> > > +++ b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > @@ -1876,14 +1876,14 @@ static int __xipram do_write_buffer(struct map_info *map, struct flchip *chip,
> > > continue;
> > > }
> > >
> > > - if (time_after(jiffies, timeo) && !chip_ready(map, adr))
> > > - break;
> > > -
> > > if (chip_good(map, adr, datum)) {
> > > xip_enable(map, chip, adr);
> > > goto op_done;
> > > }
> > >
> > > + if (time_after(jiffies, timeo))
> > > + break;
> > > +
> > > /* Latency issues. Drop the lock, wait a while and retry */
> > > UDELAY(map, chip, adr, 1);
> > > }
> >
>
> BTW, the patch itself looks good to me. Ikegami, can you confirm it does the right thing?
>
> Thanks,
>
> Boris
>
One comment to this patch. If value is written incorrectly quickly we will be
stuck in the loop even though nothing is going to change. For example a value was
written incorrectly after 1us, the loop was set to 1ms, function will return
after 1ms, this solution is not optimized for performance. I considered same
when working on this change and decided to do it different way.
Regards,
Przemek
Hi Sobon,
On Tue, 5 Feb 2019 22:28:44 +0000
"Sobon, Przemyslaw" <[email protected]> wrote:
> > From: Boris Brezillon <[email protected]>
> > Sent: Sunday, February 3, 2019 12:35 AM
> > > +Przemyslaw
> > >
> > > On Fri, 1 Feb 2019 07:30:39 +0800
> > > Liu Jian <[email protected]> wrote:
> > >
> > > > In function do_write_buffer(), in the for loop, there is a case
> > > > chip_ready() returns 1 while chip_good() returns 0, so it never
> > > > break the loop.
> > > > To fix this, chip_good() is enough and it should timeout if it stay
> > > > bad for a while.
> > >
> > > Looks like Przemyslaw reported and fixed the same problem.
> > >
> > > >
> > > > Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer to
> > > > check correct value)
> > >
> > > Can you put the Fixes tag on a single, and the format is
> > >
> > > Fixes: <hash> ("message")
> > >
> > > > Signed-off-by: Yi Huaijie <[email protected]>
> > > > Signed-off-by: Liu Jian <[email protected]>
> > >
> > > [1]http://patchwork.ozlabs.org/patch/1025566/
> > >
> > > > ---
> > > > drivers/mtd/chips/cfi_cmdset_0002.c | 6 +++---
> > > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > index 72428b6..818e94b 100644
> > > > --- a/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > +++ b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > @@ -1876,14 +1876,14 @@ static int __xipram do_write_buffer(struct map_info *map, struct flchip *chip,
> > > > continue;
> > > > }
> > > >
> > > > - if (time_after(jiffies, timeo) && !chip_ready(map, adr))
> > > > - break;
> > > > -
> > > > if (chip_good(map, adr, datum)) {
> > > > xip_enable(map, chip, adr);
> > > > goto op_done;
> > > > }
> > > >
> > > > + if (time_after(jiffies, timeo))
> > > > + break;
> > > > +
> > > > /* Latency issues. Drop the lock, wait a while and retry */
> > > > UDELAY(map, chip, adr, 1);
> > > > }
> > >
> >
> > BTW, the patch itself looks good to me. Ikegami, can you confirm it does the right thing?
> >
> > Thanks,
> >
> > Boris
> >
>
> One comment to this patch. If value is written incorrectly quickly we will be
> stuck in the loop even though nothing is going to change. For example a value was
> written incorrectly after 1us, the loop was set to 1ms, function will return
> after 1ms, this solution is not optimized for performance. I considered same
> when working on this change and decided to do it different way.
Seems like you're right if we assume that checking for GOOD state does
not require a delay after the READY check, but if that's not the case
and an extra delay is actually required, you might end up with a BAD
status while it could have turned GOOD at some point with the 'check
only for GOOD state until we timeout' approach.
TBH, I don't know how CFI flashes work, so I'll let you guys sort this
out.
Regards,
Boris
Hi Ikegami,
I have seen a case myself where a value was written, chip changed
state to "ready" but when I was reading the value was incorrect.
This can happen as result of intermittent issue with flash. It is
hard to fall into scenario when testing on limited number of devices
but with large enough population you can see that. Another situation
is when a flash chip reaches its maximum number of writes. So for
example a chip is designed for 100k writes to a page. Once you
reach that number of writes you can have invalid data written to
flash but chip itself reports everything was good and switches to
"ready" state.
Hope this explanation is clear. Please let me know.
Regards,
Przemek
> -----Original Message-----
> From: [email protected] <[email protected]>
> Sent: Thursday, February 7, 2019 3:00 PM
>
> Hi Przemek-san,
>
> Could you please explain the case detail that the value is written incorrectly?
> I think that the value is only written correctly except a bug.
>
> Regards,
> Ikegami
>
> --- [email protected] wrote --- :
> > Hi Sobon,
> >
> > On Tue, 5 Feb 2019 22:28:44 +0000
> > "Sobon, Przemyslaw" <[email protected]> wrote:
> >
> > > > From: Boris Brezillon <[email protected]>
> > > > Sent: Sunday, February 3, 2019 12:35 AM
> > > > > +Przemyslaw
> > > > >
> > > > > On Fri, 1 Feb 2019 07:30:39 +0800 Liu Jian
> > > > > <[email protected]> wrote:
> > > > >
> > > > > > In function do_write_buffer(), in the for loop, there is a
> > > > > > case
> > > > > > chip_ready() returns 1 while chip_good() returns 0, so it
> > > > > > never break the loop.
> > > > > > To fix this, chip_good() is enough and it should timeout if it
> > > > > > stay bad for a while.
> > > > >
> > > > > Looks like Przemyslaw reported and fixed the same problem.
> > > > >
> > > > > >
> > > > > > Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer
> > > > > > to check correct value)
> > > > >
> > > > > Can you put the Fixes tag on a single, and the format is
> > > > >
> > > > > Fixes: <hash> ("message")
> > > > >
> > > > > > Signed-off-by: Yi Huaijie <[email protected]>
> > > > > > Signed-off-by: Liu Jian <[email protected]>
> > > > >
> > > > > [1]http://patchwork.ozlabs.org/patch/1025566/
> > > > >
> > > > > > ---
> > > > > > drivers/mtd/chips/cfi_cmdset_0002.c | 6 +++---
> > > > > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > index 72428b6..818e94b 100644
> > > > > > --- a/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > +++ b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > @@ -1876,14 +1876,14 @@ static int __xipram do_write_buffer(struct map_info *map, struct flchip *chip,
> > > > > > continue;
> > > > > > }
> > > > > >
> > > > > > - if (time_after(jiffies, timeo) && !chip_ready(map, adr))
> > > > > > - break;
> > > > > > -
> > > > > > if (chip_good(map, adr, datum)) {
> > > > > > xip_enable(map, chip, adr);
> > > > > > goto op_done;
> > > > > > }
> > > > > >
> > > > > > + if (time_after(jiffies, timeo))
> > > > > > + break;
> > > > > > +
> > > > > > /* Latency issues. Drop the lock, wait a while and retry */
> > > > > > UDELAY(map, chip, adr, 1);
> > > > > > }
> > > > >
> > > >
> > > > BTW, the patch itself looks good to me. Ikegami, can you confirm it does the right thing?
> > > >
> > > > Thanks,
> > > >
> > > > Boris
> > > >
> > >
> > > One comment to this patch. If value is written incorrectly quickly
> > > we will be stuck in the loop even though nothing is going to change.
> > > For example a value was written incorrectly after 1us, the loop was
> > > set to 1ms, function will return after 1ms, this solution is not
> > > optimized for performance. I considered same when working on this change and decided to do it different way.
> >
> > Seems like you're right if we assume that checking for GOOD state does
> > not require a delay after the READY check, but if that's not the case
> > and an extra delay is actually required, you might end up with a BAD
> > status while it could have turned GOOD at some point with the 'check
> > only for GOOD state until we timeout' approach.
> >
> > TBH, I don't know how CFI flashes work, so I'll let you guys sort this
> > out.
> >
> > Regards,
> >
> > Boris
> >
> > ______________________________________________________
> > Linux MTD discussion mailing list
> > http://lists.infradead.org/mailman/listinfo/linux-mtd/
> >
>
>
On Thu, 2019-02-07 at 23:50 +0000, Sobon, Przemyslaw wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
> Hi Ikegami,
>
> I have seen a case myself where a value was written, chip changed
> state to "ready" but when I was reading the value was incorrect.
> This can happen as result of intermittent issue with flash. It is
> hard to fall into scenario when testing on limited number of devices
> but with large enough population you can see that. Another situation
> is when a flash chip reaches its maximum number of writes. So for
> example a chip is designed for 100k writes to a page. Once you
> reach that number of writes you can have invalid data written to
> flash but chip itself reports everything was good and switches to
> "ready" state.
This makes perfekt sense but the AMD flash control I/F does not. You will
find that trying to do advanced things with "toggle" bits is very hard.
Especially when you also need to scale it to interleaved flashes.
I think the odd delay when flash fails is quite OK. If you want to
fix this you need to move the other control I/F(which mimics what Intel has)
Jocke
Hi Przemek-san,
Thank you so much for your explanation.
> I have seen a case myself where a value was written, chip changed
> state to "ready" but when I was reading the value was incorrect.
I also know the similar issues for the both buffer and word write.
Both issues were able to reproduce the write error behavior.
Note: The word write issue is able to reproduce now also.
Those were resolved by using chip_good() instead to check the state.
> This can happen as result of intermittent issue with flash. It is
> hard to fall into scenario when testing on limited number of devices
> but with large enough population you can see that.
If possible I would like to know the issue detail and its cause also.
> Another situation
> is when a flash chip reaches its maximum number of writes. So for
> example a chip is designed for 100k writes to a page. Once you
> reach that number of writes you can have invalid data written to
> flash but chip itself reports everything was good and switches to
> "ready" state.
Yes I see.
Regards,
Ikegami
> -----Original Message-----
> From: linux-mtd [mailto:[email protected]] On Behalf
> Of Sobon, Przemyslaw
> Sent: Friday, February 8, 2019 8:51 AM
> To: [email protected]; Boris Brezillon
> Cc: [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; Liu Jian
> Subject: RE: Re: [PATCH] cfi: fix deadloop in cfi_cmdset_0002.c
> do_write_buffer
>
> Hi Ikegami,
>
> I have seen a case myself where a value was written, chip changed
> state to "ready" but when I was reading the value was incorrect.
> This can happen as result of intermittent issue with flash. It is
> hard to fall into scenario when testing on limited number of devices
> but with large enough population you can see that. Another situation
> is when a flash chip reaches its maximum number of writes. So for
> example a chip is designed for 100k writes to a page. Once you
> reach that number of writes you can have invalid data written to
> flash but chip itself reports everything was good and switches to
> "ready" state.
>
> Hope this explanation is clear. Please let me know.
>
> Regards,
> Przemek
>
> > -----Original Message-----
> > From: [email protected] <[email protected]>
> > Sent: Thursday, February 7, 2019 3:00 PM
> >
> > Hi Przemek-san,
> >
> > Could you please explain the case detail that the value is written
> incorrectly?
> > I think that the value is only written correctly except a bug.
> >
> > Regards,
> > Ikegami
> >
> > --- [email protected] wrote --- :
> > > Hi Sobon,
> > >
> > > On Tue, 5 Feb 2019 22:28:44 +0000
> > > "Sobon, Przemyslaw" <[email protected]> wrote:
> > >
> > > > > From: Boris Brezillon <[email protected]>
> > > > > Sent: Sunday, February 3, 2019 12:35 AM
> > > > > > +Przemyslaw
> > > > > >
> > > > > > On Fri, 1 Feb 2019 07:30:39 +0800 Liu Jian
> > > > > > <[email protected]> wrote:
> > > > > >
> > > > > > > In function do_write_buffer(), in the for loop, there is a
> > > > > > > case
> > > > > > > chip_ready() returns 1 while chip_good() returns 0, so it
> > > > > > > never break the loop.
> > > > > > > To fix this, chip_good() is enough and it should timeout if
> it
> > > > > > > stay bad for a while.
> > > > > >
> > > > > > Looks like Przemyslaw reported and fixed the same problem.
> > > > > >
> > > > > > >
> > > > > > > Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer
> > > > > > > to check correct value)
> > > > > >
> > > > > > Can you put the Fixes tag on a single, and the format is
> > > > > >
> > > > > > Fixes: <hash> ("message")
> > > > > >
> > > > > > > Signed-off-by: Yi Huaijie <[email protected]>
> > > > > > > Signed-off-by: Liu Jian <[email protected]>
> > > > > >
> > > > > > [1]http://patchwork.ozlabs.org/patch/1025566/
> > > > > >
> > > > > > > ---
> > > > > > > drivers/mtd/chips/cfi_cmdset_0002.c | 6 +++---
> > > > > > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > > b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > > index 72428b6..818e94b 100644
> > > > > > > --- a/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > > +++ b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > > @@ -1876,14 +1876,14 @@ static int __xipram
> do_write_buffer(struct map_info *map, struct flchip *chip,
> > > > > > > continue;
> > > > > > > }
> > > > > > >
> > > > > > > - if (time_after(jiffies, timeo) && !chip_ready(map,
> adr))
> > > > > > > - break;
> > > > > > > -
> > > > > > > if (chip_good(map, adr, datum)) {
> > > > > > > xip_enable(map, chip, adr);
> > > > > > > goto op_done;
> > > > > > > }
> > > > > > >
> > > > > > > + if (time_after(jiffies, timeo))
> > > > > > > + break;
> > > > > > > +
> > > > > > > /* Latency issues. Drop the lock, wait a while and
> retry */
> > > > > > > UDELAY(map, chip, adr, 1);
> > > > > > > }
> > > > > >
> > > > >
> > > > > BTW, the patch itself looks good to me. Ikegami, can you confirm
> it does the right thing?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Boris
> > > > >
> > > >
> > > > One comment to this patch. If value is written incorrectly quickly
> > > > we will be stuck in the loop even though nothing is going to change.
> > > > For example a value was written incorrectly after 1us, the loop was
> > > > set to 1ms, function will return after 1ms, this solution is not
> > > > optimized for performance. I considered same when working on this
> change and decided to do it different way.
> > >
> > > Seems like you're right if we assume that checking for GOOD state does
> > > not require a delay after the READY check, but if that's not the case
> > > and an extra delay is actually required, you might end up with a BAD
> > > status while it could have turned GOOD at some point with the 'check
> > > only for GOOD state until we timeout' approach.
> > >
> > > TBH, I don't know how CFI flashes work, so I'll let you guys sort this
> > > out.
> > >
> > > Regards,
> > >
> > > Boris
> > >
> > > ______________________________________________________
> > > Linux MTD discussion mailing list
> > > http://lists.infradead.org/mailman/listinfo/linux-mtd/
> > >
> >
> >
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
Best Regards,
liujian
> -----Original Message-----
> From: Tokunori Ikegami [mailto:[email protected]]
> Sent: Friday, February 08, 2019 10:24 PM
> To: 'Sobon, Przemyslaw' <[email protected]>; 'Boris Brezillon'
> <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; liujian (CE) <[email protected]>;
> [email protected]
> Subject: RE: Re: [PATCH] cfi: fix deadloop in cfi_cmdset_0002.c
> do_write_buffer
>
> Hi Przemek-san,
>
> Thank you so much for your explanation.
>
> > I have seen a case myself where a value was written, chip changed
> > state to "ready" but when I was reading the value was incorrect.
>
> I also know the similar issues for the both buffer and word write.
> Both issues were able to reproduce the write error behavior.
> Note: The word write issue is able to reproduce now also.
>
> Those were resolved by using chip_good() instead to check the state.
>
> > This can happen as result of intermittent issue with flash. It is hard
> > to fall into scenario when testing on limited number of devices but
> > with large enough population you can see that.
>
> If possible I would like to know the issue detail and its cause also.
>
> > Another situation
> > is when a flash chip reaches its maximum number of writes. So for
> > example a chip is designed for 100k writes to a page. Once you reach
> > that number of writes you can have invalid data written to flash but
> > chip itself reports everything was good and switches to "ready" state.
>
> Yes I see.
>
> Regards,
> Ikegami
>
> > -----Original Message-----
> > From: linux-mtd [mailto:[email protected]] On
> > Behalf Of Sobon, Przemyslaw
> > Sent: Friday, February 8, 2019 8:51 AM
> > To: [email protected]; Boris Brezillon
> > Cc: [email protected]; [email protected];
> > [email protected]; [email protected];
> > [email protected]; [email protected];
> > [email protected]; [email protected];
> > [email protected]; Liu Jian
> > Subject: RE: Re: [PATCH] cfi: fix deadloop in cfi_cmdset_0002.c
> > do_write_buffer
> >
> > Hi Ikegami,
> >
> > I have seen a case myself where a value was written, chip changed
> > state to "ready" but when I was reading the value was incorrect.
> > This can happen as result of intermittent issue with flash. It is hard
> > to fall into scenario when testing on limited number of devices but
> > with large enough population you can see that. Another situation is
> > when a flash chip reaches its maximum number of writes. So for example
> > a chip is designed for 100k writes to a page. Once you reach that
> > number of writes you can have invalid data written to flash but chip
> > itself reports everything was good and switches to "ready" state.
> >
> > Hope this explanation is clear. Please let me know.
> >
> > Regards,
> > Przemek
> >
> > > -----Original Message-----
> > > From: [email protected] <[email protected]>
> > > Sent: Thursday, February 7, 2019 3:00 PM
> > >
> > > Hi Przemek-san,
> > >
> > > Could you please explain the case detail that the value is written
> > incorrectly?
> > > I think that the value is only written correctly except a bug.
> > >
> > > Regards,
> > > Ikegami
> > >
> > > --- [email protected] wrote --- :
> > > > Hi Sobon,
> > > >
> > > > On Tue, 5 Feb 2019 22:28:44 +0000
> > > > "Sobon, Przemyslaw" <[email protected]> wrote:
> > > >
> > > > > > From: Boris Brezillon <[email protected]>
> > > > > > Sent: Sunday, February 3, 2019 12:35 AM
> > > > > > > +Przemyslaw
> > > > > > >
> > > > > > > On Fri, 1 Feb 2019 07:30:39 +0800 Liu Jian
> > > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > > In function do_write_buffer(), in the for loop, there is a
> > > > > > > > case
> > > > > > > > chip_ready() returns 1 while chip_good() returns 0, so it
> > > > > > > > never break the loop.
> > > > > > > > To fix this, chip_good() is enough and it should timeout
> > > > > > > > if
> > it
> > > > > > > > stay bad for a while.
> > > > > > >
> > > > > > > Looks like Przemyslaw reported and fixed the same problem.
> > > > > > >
> > > > > > > >
> > > > > > > > Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write
> > > > > > > > buffer to check correct value)
> > > > > > >
> > > > > > > Can you put the Fixes tag on a single, and the format is
> > > > > > >
> > > > > > > Fixes: <hash> ("message")
> > > > > > >
> > > > > > > > Signed-off-by: Yi Huaijie <[email protected]>
> > > > > > > > Signed-off-by: Liu Jian <[email protected]>
> > > > > > >
> > > > > > > [1]http://patchwork.ozlabs.org/patch/1025566/
> > > > > > >
So, do I need to send a v2 patch? Or use Przemyslaw's new patch http://patchwork.ozlabs.org/patch/1038395/
> > > > > > > > ---
> > > > > > > > drivers/mtd/chips/cfi_cmdset_0002.c | 6 +++---
> > > > > > > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > > > b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > > > index 72428b6..818e94b 100644
> > > > > > > > --- a/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > > > +++ b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > > > @@ -1876,14 +1876,14 @@ static int __xipram
> > do_write_buffer(struct map_info *map, struct flchip *chip,
> > > > > > > > continue;
> > > > > > > > }
> > > > > > > >
> > > > > > > > - if (time_after(jiffies, timeo) && !chip_ready(map,
> > adr))
> > > > > > > > - break;
> > > > > > > > -
> > > > > > > > if (chip_good(map, adr, datum)) {
> > > > > > > > xip_enable(map, chip, adr);
> > > > > > > > goto op_done;
> > > > > > > > }
> > > > > > > >
> > > > > > > > + if (time_after(jiffies, timeo))
> > > > > > > > + break;
> > > > > > > > +
> > > > > > > > /* Latency issues. Drop the lock, wait a while
> > > > > > > > and
> > retry */
> > > > > > > > UDELAY(map, chip, adr, 1);
> > > > > > > > }
> > > > > > >
> > > > > >
> > > > > > BTW, the patch itself looks good to me. Ikegami, can you
> > > > > > confirm
> > it does the right thing?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Boris
> > > > > >
> > > > >
> > > > > One comment to this patch. If value is written incorrectly
> > > > > quickly we will be stuck in the loop even though nothing is going to
> change.
> > > > > For example a value was written incorrectly after 1us, the loop
> > > > > was set to 1ms, function will return after 1ms, this solution is
> > > > > not optimized for performance. I considered same when working on
> > > > > this
> > change and decided to do it different way.
> > > >
> > > > Seems like you're right if we assume that checking for GOOD state
> > > > does not require a delay after the READY check, but if that's not
> > > > the case and an extra delay is actually required, you might end up
> > > > with a BAD status while it could have turned GOOD at some point
> > > > with the 'check only for GOOD state until we timeout' approach.
> > > >
> > > > TBH, I don't know how CFI flashes work, so I'll let you guys sort
> > > > this out.
> > > >
> > > > Regards,
> > > >
> > > > Boris
> > > >
> > > > ______________________________________________________
> > > > Linux MTD discussion mailing list
> > > > http://lists.infradead.org/mailman/listinfo/linux-mtd/
> > > >
> > >
> > >
> > ______________________________________________________
> > Linux MTD discussion mailing list
> > http://lists.infradead.org/mailman/listinfo/linux-mtd/