[CCing LKML in a hope to get good suggestions]
[The patch: http://lists.infradead.org/pipermail/linux-mtd/2011-April/034645.html]
Hi Grant,
Just in case, Jarkko was trying to address the same issue recently:
http://lists.infradead.org/pipermail/linux-mtd/2011-March/034416.html
On Fri, 2011-04-01 at 17:44 -0700, Grant Erickson wrote:
> When handling user space read or write requests via mtd_{read,write},
> exponentially back off on the size of the requested kernel transfer
> buffer until it succeeds or until the requested transfer buffer size
> falls below the page size.
>
> This helps ensure the operation can succeed under low-memory,
> highly-fragmented situations albeit somewhat more slowly.
>
> Signed-off-by: Grant Erickson <[email protected]>
> ---
> drivers/mtd/mtdchar.c | 30 ++++++++++++++----------------
> 1 files changed, 14 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/mtd/mtdchar.c b/drivers/mtd/mtdchar.c
> index 145b3d0d..d887b91 100644
> --- a/drivers/mtd/mtdchar.c
> +++ b/drivers/mtd/mtdchar.c
> @@ -179,6 +179,7 @@ static ssize_t mtd_read(struct file *file, char __user *buf, size_t count,loff_t
> size_t total_retlen=0;
> int ret=0;
> int len;
> + size_t size;
> char *kbuf;
>
> DEBUG(MTD_DEBUG_LEVEL0,"MTD_read\n");
> @@ -192,20 +193,18 @@ static ssize_t mtd_read(struct file *file, char __user *buf, size_t count,loff_t
> /* FIXME: Use kiovec in 2.5 to lock down the user's buffers
> and pass them directly to the MTD functions */
>
> - if (count > MAX_KMALLOC_SIZE)
> - kbuf=kmalloc(MAX_KMALLOC_SIZE, GFP_KERNEL);
> - else
> - kbuf=kmalloc(count, GFP_KERNEL);
> + size = min_t(size_t, count, MAX_KMALLOC_SIZE);
> +
> + do {
> + kbuf=kmalloc(size, GFP_KERNEL);
> + } while (!kbuf && ((size >>= 1) >= PAGE_SIZE));
This should be a bit more complex I think. First of all, I think it is
better to make this a separate function. Second, you should make sure
the system does not print scary warnings when the allocation fails - use
__GFP_NOWARN flag, just like Jarkko did.
An third, as I wrote in my answer to Jarkko, allocating large contiguous
buffers is bad for performance: if the system memory is fragmented and
there is no such large contiguous areas, the kernel will start writing
back dirty FS data, killing FS caches, shrinking caches and buggers,
probably even swapping out applications. We do not want MTD to cause
this at all.
Probably we can mitigate this with kmalloc flags. Now, I'm not sure what
flags are the optimal, but I'd do:
__GFP_NOWARN | __GFP_WAIT | __GFP_NORETRY
May be even __GFP_WAIT flag could be kicked out.
>
> if (!kbuf)
> return -ENOMEM;
>
> while (count) {
>
> - if (count > MAX_KMALLOC_SIZE)
> - len = MAX_KMALLOC_SIZE;
> - else
> - len = count;
> + len = min_t(size_t, count, size);
>
> switch (mfi->mode) {
> case MTD_MODE_OTP_FACTORY:
> @@ -268,6 +267,7 @@ static ssize_t mtd_write(struct file *file, const char __user *buf, size_t count
> {
> struct mtd_file_info *mfi = file->private_data;
> struct mtd_info *mtd = mfi->mtd;
> + size_t size;
> char *kbuf;
> size_t retlen;
> size_t total_retlen=0;
> @@ -285,20 +285,18 @@ static ssize_t mtd_write(struct file *file, const char __user *buf, size_t count
> if (!count)
> return 0;
>
> - if (count > MAX_KMALLOC_SIZE)
> - kbuf=kmalloc(MAX_KMALLOC_SIZE, GFP_KERNEL);
> - else
> - kbuf=kmalloc(count, GFP_KERNEL);
> + size = min_t(size_t, count, MAX_KMALLOC_SIZE);
> +
> + do {
> + kbuf=kmalloc(size, GFP_KERNEL);
> + } while (!kbuf && ((size >>= 1) >= PAGE_SIZE));
>
> if (!kbuf)
> return -ENOMEM;
>
> while (count) {
>
> - if (count > MAX_KMALLOC_SIZE)
> - len = MAX_KMALLOC_SIZE;
> - else
> - len = count;
> + len = min_t(size_t, count, size);
>
> if (copy_from_user(kbuf, buf, len)) {
> kfree(kbuf);
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
On Mon, 2011-04-04 at 10:27 +0300, Artem Bityutskiy wrote:
> An third, as I wrote in my answer to Jarkko, allocating large contiguous
> buffers is bad for performance: if the system memory is fragmented and
> there is no such large contiguous areas, the kernel will start writing
> back dirty FS data, killing FS caches, shrinking caches and buggers,
> probably even swapping out applications. We do not want MTD to cause
> this at all.
s/buggers/buffers/
> Probably we can mitigate this with kmalloc flags. Now, I'm not sure what
> flags are the optimal, but I'd do:
>
> __GFP_NOWARN | __GFP_WAIT | __GFP_NORETRY
Of course I meant you should use special flags as long as you are
allocating more than 1 contiguous page. But the last PAGE_SIZE
allocation should be done with standard GFP_KERNEL flag.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
On 4/4/11 12:27 AM, Artem Bityutskiy wrote:
> [CCing LKML in a hope to get good suggestions]
> [The patch:
> http://lists.infradead.org/pipermail/linux-mtd/2011-April/034645.html]
>
> Hi Grant,
>
> Just in case, Jarkko was trying to address the same issue recently:
>
> http://lists.infradead.org/pipermail/linux-mtd/2011-March/034416.html
>
> This should be a bit more complex I think. First of all, I think it is
> better to make this a separate function. Second, you should make sure
> the system does not print scary warnings when the allocation fails - use
> __GFP_NOWARN flag, just like Jarkko did.
>
> An third, as I wrote in my answer to Jarkko, allocating large contiguous
> buffers is bad for performance: if the system memory is fragmented and
> there is no such large contiguous areas, the kernel will start writing
> back dirty FS data, killing FS caches, shrinking caches and buggers,
> probably even swapping out applications. We do not want MTD to cause
> this at all.
>
> Probably we can mitigate this with kmalloc flags. Now, I'm not sure what
> flags are the optimal, but I'd do:
>
> __GFP_NOWARN | __GFP_WAIT | __GFP_NORETRY
>
> May be even __GFP_WAIT flag could be kicked out.
Artem:
Thanks for the feedback and the link to Jarkko's very similar patch. Your
suggestions will be incorporated into a subsequent patch.
For reference, I pursued a second uses-less-memory-but-is-more-complex
approach that does get_user_pages, builds up a series of iovecs for the page
extents. This worked well for all read cases I could test; however, for the
write case, the approach required yet more refinement and overhead since the
head and tail of the transfer need to be deblocked with read-modify-write
due to the NOTALIGNED checks in nand_base.c:nand_do_write_ops. I am happy to
share the work-in-progress with the list if anyone is interested.
I propose a two-stage approach. This issue has been in the kernel for about
six years.
Can we take a modified version of Jarkko's or my simpler fixes for the first
pass and then iterate toward the get_user_pages scatter/gather approach
later?
Best,
Grant
On Mon, 2011-04-04 at 09:05 -0700, Grant Erickson wrote:
> I propose a two-stage approach. This issue has been in the kernel for about
> six years.
>
> Can we take a modified version of Jarkko's or my simpler fixes for the first
> pass and then iterate toward the get_user_pages scatter/gather approach
Sure, I've just commented on your patch.
--
Best Regards,
Artem Bityutskiy (Битюцкий Артём)