In the mtdchar_write_ioctl() function, memdup_user() is called with its
'len' parameter set to verbatim values provided by user space via a
struct mtd_write_req. Both the 'len' and 'ooblen' fields of that
structure are 64-bit unsigned integers, which means the MEMWRITE ioctl
can trigger unbounded kernel memory allocation requests.
Fix by iterating over the buffers provided by user space in a loop,
processing at most mtd->erasesize bytes in each iteration. Adopt some
checks from mtd_check_oob_ops() to retain backward user space
compatibility.
Suggested-by: Boris Brezillon <[email protected]>
Signed-off-by: Michał Kępień <[email protected]>
---
Fixing this problem was suggested last month, during a discussion about
a new MEMREAD ioctl. [1] [2]
My primary objective was to not break user space, i.e. to not change
externally visible behavior compared to the current code. The main
problem I faced was that splitting the input data into chunks makes the
MEMWRITE ioctl a _wrapper_ for mtd_write_oob() rather than a direct
_interface_ to it and yet from the user space perspective it still needs
to behave as if nothing changed.
Despite my efforts, the patch does _not_ retain absolute backward
compatibility and I do not know whether this is acceptable.
Specifically, multi-eraseblock-sized writes (requiring multiple loop
iterations to be processed) which succeeded with the original code _may_
now return errors and/or write different contents to the device than the
original code, depending on the MTD mode of operation requested and on
whether the start offset is page-aligned. The documentation for struct
mtd_oob_ops warns about even multi-page write requests, but...
Example:
MTD device parameters:
- mtd->writesize = 2048
- mtd->erasesize = 131072
- 64 bytes of raw OOB space per page
struct mtd_write_req req = {
.start = 2048,
.len = 262144,
.ooblen = 64,
.usr_data = ...,
.usr_oob = ...,
.mode = MTD_OPS_RAW,
};
(This is a 128-page write with OOB data supplied for just one page.)
Current mtdchar_write_ioctl() returns 0 for this request and writes
128 pages of data and 1 page's worth of OOB data (64 bytes) to the
MTD device.
Patched mtdchar_write_ioctl() may return an error because the
original request gets split into two chunks (<data_len, oob_len>):
<131072, 64>
<131072, 0>
and the second chunk with zero OOB data length may make the MTD
driver unhappy in raw mode (resulting in only the first 64 pages of
data and 1 page's worth of OOB data getting written to the MTD
device).
Is an ioctl like this considered a "degenerate" one and therefore
acceptable to break or not?
At any rate, the revised code feels brittle to me and I would not be
particularly surprised if I missed more cases in which it produces
different results than the original code.
I keep on wondering whether the benefits of this change are worth the
extra code complexity, but fortunately it is not my call to make :)
Perhaps I am missing something and my proposal can be simplified? Or
maybe the way I approached this is completely wrong? Any thoughts on
this are welcome.
As the outcome of the discussion around this patch will have a
significant influence on the shape of the v2 of the MEMREAD ioctl, I
decided to submit this one first as a standalone patch.
[1] https://lists.infradead.org/pipermail/linux-mtd/2021-September/088485.html
[2] https://lists.infradead.org/pipermail/linux-mtd/2021-September/088544.html
drivers/mtd/mtdchar.c | 93 ++++++++++++++++++++++++++++++++-----------
1 file changed, 70 insertions(+), 23 deletions(-)
diff --git a/drivers/mtd/mtdchar.c b/drivers/mtd/mtdchar.c
index 155e991d9d75..a3afc390e254 100644
--- a/drivers/mtd/mtdchar.c
+++ b/drivers/mtd/mtdchar.c
@@ -578,9 +578,10 @@ static int mtdchar_write_ioctl(struct mtd_info *mtd,
{
struct mtd_info *master = mtd_get_master(mtd);
struct mtd_write_req req;
- struct mtd_oob_ops ops = {};
const void __user *usr_data, *usr_oob;
- int ret;
+ uint8_t *datbuf = NULL, *oobbuf = NULL;
+ size_t datbuf_len, oobbuf_len;
+ int ret = 0;
if (copy_from_user(&req, argp, sizeof(req)))
return -EFAULT;
@@ -590,33 +591,79 @@ static int mtdchar_write_ioctl(struct mtd_info *mtd,
if (!master->_write_oob)
return -EOPNOTSUPP;
- ops.mode = req.mode;
- ops.len = (size_t)req.len;
- ops.ooblen = (size_t)req.ooblen;
- ops.ooboffs = 0;
-
- if (usr_data) {
- ops.datbuf = memdup_user(usr_data, ops.len);
- if (IS_ERR(ops.datbuf))
- return PTR_ERR(ops.datbuf);
- } else {
- ops.datbuf = NULL;
+
+ if (!usr_data)
+ req.len = 0;
+
+ if (!usr_oob)
+ req.ooblen = 0;
+
+ if (req.start + req.len > mtd->size)
+ return -EINVAL;
+
+ datbuf_len = min_t(size_t, req.len, mtd->erasesize);
+ if (datbuf_len > 0) {
+ datbuf = kmalloc(datbuf_len, GFP_KERNEL);
+ if (!datbuf)
+ return -ENOMEM;
}
- if (usr_oob) {
- ops.oobbuf = memdup_user(usr_oob, ops.ooblen);
- if (IS_ERR(ops.oobbuf)) {
- kfree(ops.datbuf);
- return PTR_ERR(ops.oobbuf);
+ oobbuf_len = min_t(size_t, req.ooblen, mtd->erasesize);
+ if (oobbuf_len > 0) {
+ oobbuf = kmalloc(oobbuf_len, GFP_KERNEL);
+ if (!oobbuf) {
+ kfree(datbuf);
+ return -ENOMEM;
}
- } else {
- ops.oobbuf = NULL;
}
- ret = mtd_write_oob(mtd, (loff_t)req.start, &ops);
+ while (req.len > 0 || (!usr_data && req.ooblen > 0)) {
+ struct mtd_oob_ops ops = {
+ .mode = req.mode,
+ .len = min_t(size_t, req.len, datbuf_len),
+ .ooblen = min_t(size_t, req.ooblen, oobbuf_len),
+ .datbuf = req.len ? datbuf : NULL,
+ .oobbuf = req.ooblen ? oobbuf : NULL,
+ };
- kfree(ops.datbuf);
- kfree(ops.oobbuf);
+ /*
+ * For writes which are not OOB-only, adjust the amount of OOB
+ * data written according to the number of data pages written.
+ * This is necessary to prevent OOB data from being skipped
+ * over in data+OOB writes requiring multiple mtd_write_oob()
+ * calls to be completed.
+ */
+ if (ops.len > 0 && ops.ooblen > 0) {
+ u32 oob_per_page = mtd_oobavail(mtd, &ops);
+ uint32_t pages_to_write = mtd_div_by_ws(ops.len, mtd);
+
+ if (mtd_mod_by_ws(req.start + ops.len, mtd))
+ pages_to_write++;
+
+ ops.ooblen = min_t(size_t, ops.ooblen,
+ pages_to_write * oob_per_page);
+ }
+
+ if (copy_from_user(datbuf, usr_data, ops.len) ||
+ copy_from_user(oobbuf, usr_oob, ops.ooblen)) {
+ ret = -EFAULT;
+ break;
+ }
+
+ ret = mtd_write_oob(mtd, req.start, &ops);
+ if (ret)
+ break;
+
+ req.start += ops.retlen;
+ req.len -= ops.retlen;
+ usr_data += ops.retlen;
+
+ req.ooblen -= ops.oobretlen;
+ usr_oob += ops.oobretlen;
+ }
+
+ kfree(datbuf);
+ kfree(oobbuf);
return ret;
}
--
2.33.1
Hi Michał,
[email protected] wrote on Mon, 25 Oct 2021 10:21:04 +0200:
> In the mtdchar_write_ioctl() function, memdup_user() is called with its
> 'len' parameter set to verbatim values provided by user space via a
> struct mtd_write_req. Both the 'len' and 'ooblen' fields of that
> structure are 64-bit unsigned integers, which means the MEMWRITE ioctl
> can trigger unbounded kernel memory allocation requests.
>
> Fix by iterating over the buffers provided by user space in a loop,
> processing at most mtd->erasesize bytes in each iteration. Adopt some
> checks from mtd_check_oob_ops() to retain backward user space
> compatibility.
>
Thank you very much for this proposal!
> Suggested-by: Boris Brezillon <[email protected]>
> Signed-off-by: Michał Kępień <[email protected]>
> ---
> Fixing this problem was suggested last month, during a discussion about
> a new MEMREAD ioctl. [1] [2]
>
> My primary objective was to not break user space, i.e. to not change
> externally visible behavior compared to the current code. The main
> problem I faced was that splitting the input data into chunks makes the
> MEMWRITE ioctl a _wrapper_ for mtd_write_oob() rather than a direct
> _interface_ to it and yet from the user space perspective it still needs
> to behave as if nothing changed.
>
> Despite my efforts, the patch does _not_ retain absolute backward
> compatibility and I do not know whether this is acceptable.
> Specifically, multi-eraseblock-sized writes (requiring multiple loop
> iterations to be processed) which succeeded with the original code _may_
> now return errors and/or write different contents to the device than the
> original code, depending on the MTD mode of operation requested and on
> whether the start offset is page-aligned. The documentation for struct
> mtd_oob_ops warns about even multi-page write requests, but...
>
> Example:
>
> MTD device parameters:
>
> - mtd->writesize = 2048
> - mtd->erasesize = 131072
> - 64 bytes of raw OOB space per page
>
> struct mtd_write_req req = {
> .start = 2048,
> .len = 262144,
> .ooblen = 64,
> .usr_data = ...,
> .usr_oob = ...,
> .mode = MTD_OPS_RAW,
> };
>
> (This is a 128-page write with OOB data supplied for just one page.)
>
> Current mtdchar_write_ioctl() returns 0 for this request and writes
> 128 pages of data and 1 page's worth of OOB data (64 bytes) to the
> MTD device.
>
> Patched mtdchar_write_ioctl() may return an error because the
> original request gets split into two chunks (<data_len, oob_len>):
>
> <131072, 64>
> <131072, 0>
>
> and the second chunk with zero OOB data length may make the MTD
> driver unhappy in raw mode (resulting in only the first 64 pages of
> data and 1 page's worth of OOB data getting written to the MTD
> device).
Isn't this a driver issue instead? I mean, writing an eraseblock
without providing any OOB data is completely fine, if the driver
accepts 2 blocks + 1 page OOB but refuses 1 block + 1 page OOB and then
1 block, it's broken, no? Have you experienced such a situation in your
testing?
>
> Is an ioctl like this considered a "degenerate" one and therefore
> acceptable to break or not?
I don't think so :)
>
> At any rate, the revised code feels brittle to me and I would not be
> particularly surprised if I missed more cases in which it produces
> different results than the original code.
>
> I keep on wondering whether the benefits of this change are worth the
> extra code complexity, but fortunately it is not my call to make :)
> Perhaps I am missing something and my proposal can be simplified? Or
> maybe the way I approached this is completely wrong? Any thoughts on
> this are welcome.
>
> As the outcome of the discussion around this patch will have a
> significant influence on the shape of the v2 of the MEMREAD ioctl, I
> decided to submit this one first as a standalone patch.
>
> [1] https://lists.infradead.org/pipermail/linux-mtd/2021-September/088485.html
> [2] https://lists.infradead.org/pipermail/linux-mtd/2021-September/088544.html
>
> drivers/mtd/mtdchar.c | 93 ++++++++++++++++++++++++++++++++-----------
> 1 file changed, 70 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/mtd/mtdchar.c b/drivers/mtd/mtdchar.c
> index 155e991d9d75..a3afc390e254 100644
> --- a/drivers/mtd/mtdchar.c
> +++ b/drivers/mtd/mtdchar.c
> @@ -578,9 +578,10 @@ static int mtdchar_write_ioctl(struct mtd_info *mtd,
> {
> struct mtd_info *master = mtd_get_master(mtd);
> struct mtd_write_req req;
> - struct mtd_oob_ops ops = {};
> const void __user *usr_data, *usr_oob;
> - int ret;
> + uint8_t *datbuf = NULL, *oobbuf = NULL;
> + size_t datbuf_len, oobbuf_len;
> + int ret = 0;
>
> if (copy_from_user(&req, argp, sizeof(req)))
> return -EFAULT;
> @@ -590,33 +591,79 @@ static int mtdchar_write_ioctl(struct mtd_info *mtd,
>
> if (!master->_write_oob)
> return -EOPNOTSUPP;
> - ops.mode = req.mode;
> - ops.len = (size_t)req.len;
> - ops.ooblen = (size_t)req.ooblen;
> - ops.ooboffs = 0;
> -
> - if (usr_data) {
> - ops.datbuf = memdup_user(usr_data, ops.len);
> - if (IS_ERR(ops.datbuf))
> - return PTR_ERR(ops.datbuf);
> - } else {
> - ops.datbuf = NULL;
> +
> + if (!usr_data)
> + req.len = 0;
> +
> + if (!usr_oob)
> + req.ooblen = 0;
> +
> + if (req.start + req.len > mtd->size)
> + return -EINVAL;
> +
> + datbuf_len = min_t(size_t, req.len, mtd->erasesize);
> + if (datbuf_len > 0) {
> + datbuf = kmalloc(datbuf_len, GFP_KERNEL);
> + if (!datbuf)
> + return -ENOMEM;
> }
>
> - if (usr_oob) {
> - ops.oobbuf = memdup_user(usr_oob, ops.ooblen);
> - if (IS_ERR(ops.oobbuf)) {
> - kfree(ops.datbuf);
> - return PTR_ERR(ops.oobbuf);
> + oobbuf_len = min_t(size_t, req.ooblen, mtd->erasesize);
> + if (oobbuf_len > 0) {
> + oobbuf = kmalloc(oobbuf_len, GFP_KERNEL);
> + if (!oobbuf) {
> + kfree(datbuf);
> + return -ENOMEM;
> }
> - } else {
> - ops.oobbuf = NULL;
> }
>
> - ret = mtd_write_oob(mtd, (loff_t)req.start, &ops);
> + while (req.len > 0 || (!usr_data && req.ooblen > 0)) {
> + struct mtd_oob_ops ops = {
> + .mode = req.mode,
> + .len = min_t(size_t, req.len, datbuf_len),
> + .ooblen = min_t(size_t, req.ooblen, oobbuf_len),
> + .datbuf = req.len ? datbuf : NULL,
> + .oobbuf = req.ooblen ? oobbuf : NULL,
> + };
>
> - kfree(ops.datbuf);
> - kfree(ops.oobbuf);
> + /*
> + * For writes which are not OOB-only, adjust the amount of OOB
> + * data written according to the number of data pages written.
> + * This is necessary to prevent OOB data from being skipped
> + * over in data+OOB writes requiring multiple mtd_write_oob()
> + * calls to be completed.
> + */
> + if (ops.len > 0 && ops.ooblen > 0) {
> + u32 oob_per_page = mtd_oobavail(mtd, &ops);
> + uint32_t pages_to_write = mtd_div_by_ws(ops.len, mtd);
> +
> + if (mtd_mod_by_ws(req.start + ops.len, mtd))
> + pages_to_write++;
> +
> + ops.ooblen = min_t(size_t, ops.ooblen,
> + pages_to_write * oob_per_page);
> + }
> +
> + if (copy_from_user(datbuf, usr_data, ops.len) ||
> + copy_from_user(oobbuf, usr_oob, ops.ooblen)) {
> + ret = -EFAULT;
> + break;
> + }
> +
> + ret = mtd_write_oob(mtd, req.start, &ops);
> + if (ret)
> + break;
> +
> + req.start += ops.retlen;
> + req.len -= ops.retlen;
> + usr_data += ops.retlen;
> +
> + req.ooblen -= ops.oobretlen;
> + usr_oob += ops.oobretlen;
> + }
> +
> + kfree(datbuf);
> + kfree(oobbuf);
>
> return ret;
> }
Thanks,
Miquèl
Hi Miquèl,
TL;DR: thanks, your comment made me look closer and I found what seems
to be a feasible workaround that I will apply in v2 (along other fixes).
> > Despite my efforts, the patch does _not_ retain absolute backward
> > compatibility and I do not know whether this is acceptable.
> > Specifically, multi-eraseblock-sized writes (requiring multiple loop
> > iterations to be processed) which succeeded with the original code _may_
> > now return errors and/or write different contents to the device than the
> > original code, depending on the MTD mode of operation requested and on
> > whether the start offset is page-aligned. The documentation for struct
> > mtd_oob_ops warns about even multi-page write requests, but...
> >
> > Example:
> >
> > MTD device parameters:
> >
> > - mtd->writesize = 2048
> > - mtd->erasesize = 131072
> > - 64 bytes of raw OOB space per page
> >
> > struct mtd_write_req req = {
> > .start = 2048,
> > .len = 262144,
> > .ooblen = 64,
> > .usr_data = ...,
> > .usr_oob = ...,
> > .mode = MTD_OPS_RAW,
> > };
> >
> > (This is a 128-page write with OOB data supplied for just one page.)
> >
> > Current mtdchar_write_ioctl() returns 0 for this request and writes
> > 128 pages of data and 1 page's worth of OOB data (64 bytes) to the
> > MTD device.
> >
> > Patched mtdchar_write_ioctl() may return an error because the
> > original request gets split into two chunks (<data_len, oob_len>):
> >
> > <131072, 64>
> > <131072, 0>
> >
> > and the second chunk with zero OOB data length may make the MTD
> > driver unhappy in raw mode (resulting in only the first 64 pages of
> > data and 1 page's worth of OOB data getting written to the MTD
> > device).
>
> Isn't this a driver issue instead? I mean, writing an eraseblock
> without providing any OOB data is completely fine, if the driver
> accepts 2 blocks + 1 page OOB but refuses 1 block + 1 page OOB and then
> 1 block, it's broken, no? Have you experienced such a situation in your
> testing?
I may not have expressed myself clearly here, sorry - the example was
already getting a bit lengthy at that point... :)
I tested the patch with nandsim, but I do not think it is that specific
driver that is broken. The catch is that when mtd_write_oob() is
called, nand_do_write_ops() splits multi-page requests into single-page
requests and what it passes to nand_write_page() depends on whether the
struct mtd_oob_ops it was invoked with has the oobbuf field set to NULL
or not. This is okay in itself, but when another request-splitting
"layer" is introduced by my patch, the ioctl may start returning
different result codes than it used to.
Here is what happens with the unpatched code for the example above:
1. mtdchar_write_ioctl() gets called with the following request:
struct mtd_write_req req = {
.start = 2048,
.len = 262144,
.ooblen = 64,
.usr_data = 0x10000000,
.usr_oob = 0x20000000,
.mode = MTD_OPS_RAW,
};
2. The above request is passed through to mtd_write_oob() verbatim:
struct mtd_oob_ops ops = {
.mode = MTD_OPS_RAW,
.len = 262144,
.ooblen = 64,
.datbuf = 0x10000000,
.oobbuf = 0x20000000,
};
3. nand_do_write_ops() splits this request up into page-sized requests.
a) For the first page, the initial 2048 bytes of data + 64 bytes of
OOB data are passed to nand_write_page().
b) For each subsequent page, a 2048-byte chunk of data + 64 bytes
of 0xff bytes are passed to nand_write_page().
Since the oobbuf field in the struct mtd_oob_ops passed is not NULL,
oob_required is set to 1 for all nand_write_page() calls.
4. The above causes the driver to receive 2112 bytes of data for each
page write, which results in the ioctl being successful.
Here is what happens with the patched code:
1. mtdchar_write_ioctl() gets called with the same request as above.
2. The original request gets split into two eraseblock-sized
mtd_write_oob() calls:
a) struct mtd_oob_ops ops = {
.mode = MTD_OPS_RAW,
.len = 131072,
.ooblen = 64,
.datbuf = 0x10000000,
.oobbuf = 0x20000000,
};
b) struct mtd_oob_ops ops = {
.mode = MTD_OPS_RAW,
.len = 131072,
.ooblen = 0,
.datbuf = 0x10020000,
.oobbuf = NULL,
};
(My code sets oobbuf to NULL if ooblen is 0.)
3. nand_do_write_ops() splits the first request up into page-sized
requests the same way as for the original code. It returns
successfully, so mtdchar_write_ioctl() proceeds with the next
eraseblock-sized request.
4. nand_do_write_ops() splits the second request up into page-sized
requests. The first page write contains 2048 bytes of data and no
OOB data; since the oobbuf field in the struct mtd_oob_ops passed is
NULL, oob_required is set to 0.
5. The above causes the driver to receive 2048 bytes of data for a page
write in raw mode, which results in an error that propagates all the
way up to mtdchar_write_ioctl().
The nandsim driver returns the same error if you pass the following
request to the MEMWRITE ioctl:
struct mtd_write_req req = {
.start = 2048,
.len = 2048,
.ooblen = 0,
.usr_data = 0x10000000,
.usr_oob = NULL,
.mode = MTD_OPS_RAW,
};
so it is not the driver that is broken or insane, it is the splitting
process that may cause the MEMWRITE ioctl to return different error
codes than before.
I played with the code a bit more and I found a fix which addresses this
issue without breaking other scenarios: setting oobbuf to the same
pointer for every loop iteration (if ooblen is 0, no OOB data will be
written anyway).
I also tackled the problem of mishandling large non-page-aligned writes
in v1 and I managed to fix it by trimming the first mtd_write_oob() call
so that it ends on an eraseblock boundary. This implicitly makes
subsequent writes page-aligned and seems to fix the problem.
Finally, I reworked the OOB length adjustment code to address other
cases of mishandling non-page-aligned writes.
I could not find any other cases in which the revised code behaves
differently than the original one. I will send v2 soon.
--
Best regards,
Michał Kępień
Hi Michał,
[email protected] wrote on Thu, 25 Nov 2021 13:08:16 +0100:
> Hi Miquèl,
>
> TL;DR: thanks, your comment made me look closer and I found what seems
> to be a feasible workaround that I will apply in v2 (along other fixes).
>
> > > Despite my efforts, the patch does _not_ retain absolute backward
> > > compatibility and I do not know whether this is acceptable.
> > > Specifically, multi-eraseblock-sized writes (requiring multiple loop
> > > iterations to be processed) which succeeded with the original code _may_
> > > now return errors and/or write different contents to the device than the
> > > original code, depending on the MTD mode of operation requested and on
> > > whether the start offset is page-aligned. The documentation for struct
> > > mtd_oob_ops warns about even multi-page write requests, but...
> > >
> > > Example:
> > >
> > > MTD device parameters:
> > >
> > > - mtd->writesize = 2048
> > > - mtd->erasesize = 131072
> > > - 64 bytes of raw OOB space per page
> > >
> > > struct mtd_write_req req = {
> > > .start = 2048,
> > > .len = 262144,
> > > .ooblen = 64,
> > > .usr_data = ...,
> > > .usr_oob = ...,
> > > .mode = MTD_OPS_RAW,
> > > };
> > >
> > > (This is a 128-page write with OOB data supplied for just one page.)
> > >
> > > Current mtdchar_write_ioctl() returns 0 for this request and writes
> > > 128 pages of data and 1 page's worth of OOB data (64 bytes) to the
> > > MTD device.
> > >
> > > Patched mtdchar_write_ioctl() may return an error because the
> > > original request gets split into two chunks (<data_len, oob_len>):
> > >
> > > <131072, 64>
> > > <131072, 0>
> > >
> > > and the second chunk with zero OOB data length may make the MTD
> > > driver unhappy in raw mode (resulting in only the first 64 pages of
> > > data and 1 page's worth of OOB data getting written to the MTD
> > > device).
> >
> > Isn't this a driver issue instead? I mean, writing an eraseblock
> > without providing any OOB data is completely fine, if the driver
> > accepts 2 blocks + 1 page OOB but refuses 1 block + 1 page OOB and then
> > 1 block, it's broken, no? Have you experienced such a situation in your
> > testing?
>
> I may not have expressed myself clearly here, sorry - the example was
> already getting a bit lengthy at that point... :)
>
> I tested the patch with nandsim, but I do not think it is that specific
> driver that is broken. The catch is that when mtd_write_oob() is
> called, nand_do_write_ops() splits multi-page requests into single-page
> requests and what it passes to nand_write_page() depends on whether the
> struct mtd_oob_ops it was invoked with has the oobbuf field set to NULL
> or not. This is okay in itself, but when another request-splitting
> "layer" is introduced by my patch, the ioctl may start returning
> different result codes than it used to.
>
> Here is what happens with the unpatched code for the example above:
>
> 1. mtdchar_write_ioctl() gets called with the following request:
>
> struct mtd_write_req req = {
> .start = 2048,
> .len = 262144,
> .ooblen = 64,
> .usr_data = 0x10000000,
> .usr_oob = 0x20000000,
> .mode = MTD_OPS_RAW,
> };
>
> 2. The above request is passed through to mtd_write_oob() verbatim:
>
> struct mtd_oob_ops ops = {
> .mode = MTD_OPS_RAW,
> .len = 262144,
> .ooblen = 64,
> .datbuf = 0x10000000,
> .oobbuf = 0x20000000,
> };
>
> 3. nand_do_write_ops() splits this request up into page-sized requests.
>
> a) For the first page, the initial 2048 bytes of data + 64 bytes of
> OOB data are passed to nand_write_page().
>
> b) For each subsequent page, a 2048-byte chunk of data + 64 bytes
> of 0xff bytes are passed to nand_write_page().
>
> Since the oobbuf field in the struct mtd_oob_ops passed is not NULL,
> oob_required is set to 1 for all nand_write_page() calls.
>
> 4. The above causes the driver to receive 2112 bytes of data for each
> page write, which results in the ioctl being successful.
>
> Here is what happens with the patched code:
>
> 1. mtdchar_write_ioctl() gets called with the same request as above.
>
> 2. The original request gets split into two eraseblock-sized
> mtd_write_oob() calls:
>
> a) struct mtd_oob_ops ops = {
> .mode = MTD_OPS_RAW,
> .len = 131072,
> .ooblen = 64,
> .datbuf = 0x10000000,
> .oobbuf = 0x20000000,
> };
>
> b) struct mtd_oob_ops ops = {
> .mode = MTD_OPS_RAW,
> .len = 131072,
> .ooblen = 0,
> .datbuf = 0x10020000,
> .oobbuf = NULL,
> };
>
> (My code sets oobbuf to NULL if ooblen is 0.)
>
> 3. nand_do_write_ops() splits the first request up into page-sized
> requests the same way as for the original code. It returns
> successfully, so mtdchar_write_ioctl() proceeds with the next
> eraseblock-sized request.
>
> 4. nand_do_write_ops() splits the second request up into page-sized
> requests. The first page write contains 2048 bytes of data and no
> OOB data; since the oobbuf field in the struct mtd_oob_ops passed is
> NULL, oob_required is set to 0.
>
> 5. The above causes the driver to receive 2048 bytes of data for a page
> write in raw mode, which results in an error that propagates all the
> way up to mtdchar_write_ioctl().
This is definitely far from an expected behavior. Writing a page
without OOB is completely fine.
>
> The nandsim driver returns the same error if you pass the following
> request to the MEMWRITE ioctl:
>
> struct mtd_write_req req = {
> .start = 2048,
> .len = 2048,
> .ooblen = 0,
> .usr_data = 0x10000000,
> .usr_oob = NULL,
> .mode = MTD_OPS_RAW,
> };
>
> so it is not the driver that is broken or insane, it is the splitting
> process that may cause the MEMWRITE ioctl to return different error
> codes than before.
>
> I played with the code a bit more and I found a fix which addresses this
> issue without breaking other scenarios: setting oobbuf to the same
> pointer for every loop iteration (if ooblen is 0, no OOB data will be
> written anyway).
You mean that
{ .user_oob = NULL, .ooblen = 0 }
fails, while
{ .user_oob = random, .ooblen = 0 }
works? This seems a little bit fragile.
Could you tell us the origin of the error? Because in
nand_do_write_ops() if ops->oobbuf is populated then oob_required is
set to true no matter the value set in ooblen.
Plus, the code in mtdchar is clear: .oobbuf is set to NULL if there are
no OOBs provided by the user so I believe this is a situation that
should already work.
> I also tackled the problem of mishandling large non-page-aligned writes
> in v1 and I managed to fix it by trimming the first mtd_write_oob() call
> so that it ends on an eraseblock boundary. This implicitly makes
> subsequent writes page-aligned and seems to fix the problem.
Great!
>
> Finally, I reworked the OOB length adjustment code to address other
> cases of mishandling non-page-aligned writes.
>
> I could not find any other cases in which the revised code behaves
> differently than the original one. I will send v2 soon.
Thanks for all work on this topic!
Cheers,
Miquèl
Hi Miquèl,
> > 5. The above causes the driver to receive 2048 bytes of data for a page
> > write in raw mode, which results in an error that propagates all the
> > way up to mtdchar_write_ioctl().
>
> This is definitely far from an expected behavior. Writing a page
> without OOB is completely fine.
Could it be a nandsim quirk? Sorry, I do not feel qualified enough to
comment on this. I prepared a code flow analysis below, though.
> > The nandsim driver returns the same error if you pass the following
> > request to the MEMWRITE ioctl:
> >
> > struct mtd_write_req req = {
> > .start = 2048,
> > .len = 2048,
> > .ooblen = 0,
> > .usr_data = 0x10000000,
> > .usr_oob = NULL,
> > .mode = MTD_OPS_RAW,
> > };
> >
> > so it is not the driver that is broken or insane, it is the splitting
> > process that may cause the MEMWRITE ioctl to return different error
> > codes than before.
> >
> > I played with the code a bit more and I found a fix which addresses this
> > issue without breaking other scenarios: setting oobbuf to the same
> > pointer for every loop iteration (if ooblen is 0, no OOB data will be
> > written anyway).
>
> You mean that
> { .user_oob = NULL, .ooblen = 0 }
> fails, while
> { .user_oob = random, .ooblen = 0 }
> works? This seems a little bit fragile.
That is indeed the behavior I am observing with nandsim, even on a
kernel which does not include my patch.
> Could you tell us the origin of the error? Because in
> nand_do_write_ops() if ops->oobbuf is populated then oob_required is
> set to true no matter the value set in ooblen.
Correct - and that is what causes the behavior described above (and why
the tweak I came up with works around the problem).
nand_do_write_ops() calls nand_write_page() with 'oob_required' passed
as the fifth parameter. In raw mode, nand_write_page() calls
nand_write_page_raw(). Here is what happens there:
1. nand_prog_page_begin_op() sets up a page programming operation by
sending a few commands to the chip. See nand_exec_prog_page_op()
for details. Note that since the 'prog' parameter is set to false,
the last two instructions from the 'instrs' array are not run yet.
2. 'oob_required' is checked. If it is set to 1, OOB data is sent to
the chip; otherwise, it is not sent.
3. nand_prog_page_end_op() is called to finish the programming
operation.
At that point, the ACTION_PRGPAGE switch case in ns_do_state_action()
(in drivers/mtd/nand/raw/nandsim.c) checks whether the number of bytes
it received so far for this operation (ns->regs.count, updated by
ns_nand_write_buf() as data is pushed to the chip) equals the number of
bytes in a full page with OOB data (num). If not, an error is returned,
which results in the NAND_STATUS_FAIL flag being set in the status byte,
triggering an -EIO.
This does not happen for any other MTD operation mode because the chip
callbacks that nand_write_page() invokes in those other modes cause OOB
data to be sent to the chip.
> Plus, the code in mtdchar is clear: .oobbuf is set to NULL if there are
> no OOBs provided by the user so I believe this is a situation that
> should already work.
Correct, though current mtdchar_write_ioctl() code only looks at the
value of the 'usr_oob' field in the struct mtd_write_req passed to it,
so even if you pass { .usr_oob = <something non-NULL>, .ooblen = 0 }, it
will still set ops.oobbuf to the pointer returned by memdup_user().
--
Best regards,
Michał Kępień