2023-09-23 01:10:13

by ZhaoLong Wang

[permalink] [raw]
Subject: [RFC] mtd: Fix error code loss in mtdchar_read() function.

In the first while loop, if the mtd_read() function returns -EBADMSG
and 'retlen' returns 0, the loop break and the function returns value
'total_retlen' is 0, not the error code.

This problem causes the user-space program to encounter EOF when it has
not finished reading the mtd partion, and this also violates the read
system call standard in POSIX.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217939
Signed-off-by: ZhaoLong Wang <[email protected]>
---
drivers/mtd/mtdchar.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mtd/mtdchar.c b/drivers/mtd/mtdchar.c
index 8dc4f5c493fc..ba60dc6bef98 100644
--- a/drivers/mtd/mtdchar.c
+++ b/drivers/mtd/mtdchar.c
@@ -211,7 +211,7 @@ static ssize_t mtdchar_read(struct file *file, char __user *buf, size_t count,
}

kfree(kbuf);
- return total_retlen;
+ return total_retlen ? total_retlen : ret;
} /* mtdchar_read */

static ssize_t mtdchar_write(struct file *file, const char __user *buf, size_t count,
--
2.31.1


2023-09-25 09:26:53

by Richard Weinberger

[permalink] [raw]
Subject: Re: [RFC] mtd: Fix error code loss in mtdchar_read() function.

----- Ursprüngliche Mail -----
>> 'total_retlen' is 0, not the error code.
>
> Actually after looking at the code, I have no strong opinion
> regarding whether we should return 0 or an error code in this case.
>
> There is this comment right above, and I'm not sure it is still up to
> date because I believe many drivers just don't provide the data upon
> ECC error:
>
> /* Nand returns -EBADMSG on ECC errors, but it returns
> * the data. For our userspace tools it is important
> * to dump areas with ECC errors!
> * For kernel internal usage it also might return -EUCLEAN
> * to signal the caller that a bitflip has occurred and has
> * been corrected by the ECC algorithm.
> * Userspace software which accesses NAND this way
> * must be aware of the fact that it deals with NAND
> */
>
>> This problem causes the user-space program to encounter EOF when it has
>> not finished reading the mtd partion, and this also violates the read
>> system call standard in POSIX.

This is a special purpose device file and not a regular file.
Please explain in detail why this violates POSIX and which program breaks.

As pointed out by Miquel, the comment makes it clean that this behavior is
on purpose. If we return now all of a sudden -EBADMSG for the described
scenario we might even break existing MTD userspace.

Thanks,
//richard

2023-09-25 09:55:08

by Richard Weinberger

[permalink] [raw]
Subject: Re: [RFC] mtd: Fix error code loss in mtdchar_read() function.

----- Ursprüngliche Mail -----
> Von: "Miquel Raynal" <[email protected]>
>> As pointed out by Miquel, the comment makes it clean that this behavior is
>> on purpose. If we return now all of a sudden -EBADMSG for the described
>> scenario we might even break existing MTD userspace.
>
> The bugzilla link in the commit log [1] mentions:

Ups.

> * dd would just stop in the middle without showing errors
> -> we probably don't care, we expect the userspace to know this is
> NAND when dealing with mtd devices directly, dd is not mtd-aware
> anyway.

Yep. That's fine.

> * ubiformat would loop forever
> -> that one needs attention I guess :)

Hmm. Let me check the source.

Thanks,
//richard

2023-09-25 14:10:54

by Richard Weinberger

[permalink] [raw]
Subject: Re: [RFC] mtd: Fix error code loss in mtdchar_read() function.

----- Ursprüngliche Mail -----
> Von: "ZhaoLong Wang" <[email protected]>
> An: "Miquel Raynal" <[email protected]>, "richard" <[email protected]>, "Vignesh Raghavendra" <[email protected]>
> CC: "linux-mtd" <[email protected]>, "linux-kernel" <[email protected]>, "chengzhihao1"
> <[email protected]>, "ZhaoLong Wang" <[email protected]>, "yi zhang" <[email protected]>, "yangerkun"
> <[email protected]>
> Gesendet: Samstag, 23. September 2023 02:58:56
> Betreff: [RFC] mtd: Fix error code loss in mtdchar_read() function.

> In the first while loop, if the mtd_read() function returns -EBADMSG
> and 'retlen' returns 0, the loop break and the function returns value
> 'total_retlen' is 0, not the error code.

Given this a second thought. I don't think a NAND driver is allowed to return
less than requests bytes and setting EBADMSG.
UBI's IO path has a comment on that:

/*
* The driver should never return -EBADMSG if it failed to read
* all the requested data. But some buggy drivers might do
* this, so we change it to -EIO.
*/
if (read != len && mtd_is_eccerr(err)) {
ubi_assert(0);
err = -EIO;
}

Thanks,
//richard

2023-09-25 14:34:50

by ZhaoLong Wang

[permalink] [raw]
Subject: Re: [RFC] mtd: Fix error code loss in mtdchar_read() function.

> There is this comment right above, and I'm not sure it is still up to
> date because I believe many drivers just don't provide the data upon
> ECC error:

After observing the nand_base framework code, I think the current nand_base
framework can limit the length of retlen to 0 when an ECC error occurs. The
prerequisite is that the NAND driver development personnel can correctly
provide
the return value of the function according to the requirements of the
chip->ecc.read_page()
callback.

However, the read_page() callback comment does not notice the
particularity of the
following two error codes:

* -EUCLEAN - Returned by the MTD layer when maxbitflips greater then
bitflip_threshold
* -EBADMSG - Returned by NAND Generic Layer when the statistical ECC
error stats
                         changes and the number of retries is exhausted.

These two error codes are handled by the upper layer and should not be
returned by the
NAND driver developer. But some driver developers don't realize this.

So I don't think it's worth fixing right now, but is the description of
the return value of the
callback too simplistic? Is there any other more detailed description
document for reference?

2023-09-25 16:00:07

by Richard Weinberger

[permalink] [raw]
Subject: Re: [RFC] mtd: Fix error code loss in mtdchar_read() function.

----- Ursprüngliche Mail -----
> Von: "Miquel Raynal" <[email protected]>
>> Given this a second thought. I don't think a NAND driver is allowed to return
>> less than requests bytes and setting EBADMSG.
>> UBI's IO path has a comment on that:
>>
>> /*
>> * The driver should never return -EBADMSG if it failed to read
>> * all the requested data. But some buggy drivers might do
>> * this, so we change it to -EIO.
>> */
>> if (read != len && mtd_is_eccerr(err)) {
>> ubi_assert(0);
>> err = -EIO;
>> }
>
> Interesting. Shall we add this check to the mtd_read() path as well?
>
> Maybe with a WARN_ON()?

WARN_ON_ONCE(), please. But yes, let's add it.

Thanks,
//richard

2023-09-25 17:48:22

by Miquel Raynal

[permalink] [raw]
Subject: Re: [RFC] mtd: Fix error code loss in mtdchar_read() function.

Hi Richard,

[email protected] wrote on Mon, 25 Sep 2023 11:14:40 +0200 (CEST):

> ----- Ursprüngliche Mail -----
> >> 'total_retlen' is 0, not the error code.
> >
> > Actually after looking at the code, I have no strong opinion
> > regarding whether we should return 0 or an error code in this case.
> >
> > There is this comment right above, and I'm not sure it is still up to
> > date because I believe many drivers just don't provide the data upon
> > ECC error:
> >
> > /* Nand returns -EBADMSG on ECC errors, but it returns
> > * the data. For our userspace tools it is important
> > * to dump areas with ECC errors!
> > * For kernel internal usage it also might return -EUCLEAN
> > * to signal the caller that a bitflip has occurred and has
> > * been corrected by the ECC algorithm.
> > * Userspace software which accesses NAND this way
> > * must be aware of the fact that it deals with NAND
> > */
> >
> >> This problem causes the user-space program to encounter EOF when it has
> >> not finished reading the mtd partion, and this also violates the read
> >> system call standard in POSIX.
>
> This is a special purpose device file and not a regular file.
> Please explain in detail why this violates POSIX and which program breaks.
>
> As pointed out by Miquel, the comment makes it clean that this behavior is
> on purpose. If we return now all of a sudden -EBADMSG for the described
> scenario we might even break existing MTD userspace.

The bugzilla link in the commit log [1] mentions:

* dd would just stop in the middle without showing errors
-> we probably don't care, we expect the userspace to know this is
NAND when dealing with mtd devices directly, dd is not mtd-aware
anyway.

* ubiformat would loop forever
-> that one needs attention I guess :)

[1] https://bugzilla.kernel.org/show_bug.cgi?id=217939

Thanks,
Miquèl

2023-09-25 18:26:17

by Miquel Raynal

[permalink] [raw]
Subject: Re: [RFC] mtd: Fix error code loss in mtdchar_read() function.

Hi Richard,

[email protected] wrote on Mon, 25 Sep 2023 16:03:03 +0200 (CEST):

> ----- Ursprüngliche Mail -----
> > Von: "ZhaoLong Wang" <[email protected]>
> > An: "Miquel Raynal" <[email protected]>, "richard" <[email protected]>, "Vignesh Raghavendra" <[email protected]>
> > CC: "linux-mtd" <[email protected]>, "linux-kernel" <[email protected]>, "chengzhihao1"
> > <[email protected]>, "ZhaoLong Wang" <[email protected]>, "yi zhang" <[email protected]>, "yangerkun"
> > <[email protected]>
> > Gesendet: Samstag, 23. September 2023 02:58:56
> > Betreff: [RFC] mtd: Fix error code loss in mtdchar_read() function.
>
> > In the first while loop, if the mtd_read() function returns -EBADMSG
> > and 'retlen' returns 0, the loop break and the function returns value
> > 'total_retlen' is 0, not the error code.
>
> Given this a second thought. I don't think a NAND driver is allowed to return
> less than requests bytes and setting EBADMSG.
> UBI's IO path has a comment on that:
>
> /*
> * The driver should never return -EBADMSG if it failed to read
> * all the requested data. But some buggy drivers might do
> * this, so we change it to -EIO.
> */
> if (read != len && mtd_is_eccerr(err)) {
> ubi_assert(0);
> err = -EIO;
> }

Interesting. Shall we add this check to the mtd_read() path as well?

Maybe with a WARN_ON()?

Thanks,
Miquèl

2023-09-25 19:39:01

by Miquel Raynal

[permalink] [raw]
Subject: Re: [RFC] mtd: Fix error code loss in mtdchar_read() function.


[email protected] wrote on Mon, 25 Sep 2023 16:59:31 +0200 (CEST):

> ----- Ursprüngliche Mail -----
> > Von: "Miquel Raynal" <[email protected]>
> >> Given this a second thought. I don't think a NAND driver is allowed to return
> >> less than requests bytes and setting EBADMSG.
> >> UBI's IO path has a comment on that:
> >>
> >> /*
> >> * The driver should never return -EBADMSG if it failed to read
> >> * all the requested data. But some buggy drivers might do
> >> * this, so we change it to -EIO.
> >> */
> >> if (read != len && mtd_is_eccerr(err)) {
> >> ubi_assert(0);
> >> err = -EIO;
> >> }
> >
> > Interesting. Shall we add this check to the mtd_read() path as well?
> >
> > Maybe with a WARN_ON()?
>
> WARN_ON_ONCE(), please. But yes, let's add it.

Zhaolong, can you take care of it?

>
> Thanks,
> //richard


Thanks,
Miquèl

2023-09-26 01:08:34

by ZhaoLong Wang

[permalink] [raw]
Subject: Re: [RFC] mtd: Fix error code loss in mtdchar_read() function.


> [email protected] wrote on Mon, 25 Sep 2023 16:59:31 +0200 (CEST):
>
>> ----- Ursprüngliche Mail -----
>>> Von: "Miquel Raynal" <[email protected]>
>>>> Given this a second thought. I don't think a NAND driver is allowed to return
>>>> less than requests bytes and setting EBADMSG.
>>>> UBI's IO path has a comment on that:
>>>>
>>>> /*
>>>> * The driver should never return -EBADMSG if it failed to read
>>>> * all the requested data. But some buggy drivers might do
>>>> * this, so we change it to -EIO.
>>>> */
>>>> if (read != len && mtd_is_eccerr(err)) {
>>>> ubi_assert(0);
>>>> err = -EIO;
>>>> }
>>> Interesting. Shall we add this check to the mtd_read() path as well?
>>>
>>> Maybe with a WARN_ON()?
>> WARN_ON_ONCE(), please. But yes, let's add it.
> Zhaolong, can you take care of it?
>
>> Thanks,
>> //richard
>
> Thanks,
> Miquèl


Yes!That is a good idea, and I am pleased to do this.

Thanks,
Zhaolong

2023-09-26 06:16:28

by Miquel Raynal

[permalink] [raw]
Subject: Re: [RFC] mtd: Fix error code loss in mtdchar_read() function.

Hello,

Richard, your advice is welcome here.

[email protected] wrote on Sat, 23 Sep 2023 08:58:56 +0800:

> In the first while loop, if the mtd_read() function returns -EBADMSG

s/the// s/function//
,

> and 'retlen' returns 0, the loop break and the function returns value

s/and// remains to 0. The loop breaks and the function
returns 'total_retlen' which is 0 instead of the error code.

> 'total_retlen' is 0, not the error code.

Actually after looking at the code, I have no strong opinion
regarding whether we should return 0 or an error code in this case.

There is this comment right above, and I'm not sure it is still up to
date because I believe many drivers just don't provide the data upon
ECC error:

/* Nand returns -EBADMSG on ECC errors, but it returns
* the data. For our userspace tools it is important
* to dump areas with ECC errors!
* For kernel internal usage it also might return -EUCLEAN
* to signal the caller that a bitflip has occurred and has
* been corrected by the ECC algorithm.
* Userspace software which accesses NAND this way
* must be aware of the fact that it deals with NAND
*/

> This problem causes the user-space program to encounter EOF when it has
> not finished reading the mtd partion, and this also violates the read
> system call standard in POSIX.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217939
> Signed-off-by: ZhaoLong Wang <[email protected]>
> ---
> drivers/mtd/mtdchar.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/mtd/mtdchar.c b/drivers/mtd/mtdchar.c
> index 8dc4f5c493fc..ba60dc6bef98 100644
> --- a/drivers/mtd/mtdchar.c
> +++ b/drivers/mtd/mtdchar.c
> @@ -211,7 +211,7 @@ static ssize_t mtdchar_read(struct file *file, char __user *buf, size_t count,
> }
>
> kfree(kbuf);
> - return total_retlen;
> + return total_retlen ? total_retlen : ret;

This is kind of wrong, if ret is 0 then you return ret while you should
return total_retlen. In practice it does not really matter, the result
is the same, but it makes it harder to understand the code IMHO.

> } /* mtdchar_read */
>
> static ssize_t mtdchar_write(struct file *file, const char __user *buf, size_t count,


Thanks,
Miquèl