2014-06-20 02:12:39

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] initramfs: Support initrd that is bigger then 2G.

When initrd (compressed or not) is used, kernel report data corrupted
with /dev/ram0.

The root cause:
During initramfs checking, if it is initrd, it will be transferred to
/initrd.image with sys_write.
sys_write only support 2G-4K write, so if the initrd ram is more than
that, /initrd.image will not complete at all.

Add local sys_write_large to loop calling sys_write to workaround the
problem.

Also need to use that in write_buffer path for cpio that have file is
more than file.

At the same time, we don't need to worry about sys_read/sys_write in
do_mounts_rd.c::crd_load. As decompressor will have fill/flush that
means it will allocate buffer and buffer is smaller than 2G.

Test with uncompressed initrd, and compressed with gz, bz2, lzma,xz,
lzop.

Signed-off-by: Yinghai Lu <[email protected]>

---
init/initramfs.c | 33 +++++++++++++++++++++++++++++----
1 file changed, 29 insertions(+), 4 deletions(-)

Index: linux-2.6/init/initramfs.c
===================================================================
--- linux-2.6.orig/init/initramfs.c
+++ linux-2.6/init/initramfs.c
@@ -19,6 +19,26 @@
#include <linux/syscalls.h>
#include <linux/utime.h>

+static long __init sys_write_large(unsigned int fd, char *p,
+ size_t count)
+{
+ ssize_t left = count;
+ long written;
+
+ /* sys_write only can write MAX_RW_COUNT aka 2G-4K bytes at most */
+ while (left > 0) {
+ written = sys_write(fd, p, left);
+
+ if (written <= 0)
+ break;
+
+ left -= written;
+ p += written;
+ }
+
+ return (written < 0) ? written : count;
+}
+
static __initdata char *message;
static void __init error(char *x)
{
@@ -346,7 +366,7 @@ static int __init do_name(void)
static int __init do_copy(void)
{
if (count >= body_len) {
- sys_write(wfd, victim, body_len);
+ sys_write_large(wfd, victim, body_len);
sys_close(wfd);
do_utime(vcollected, mtime);
kfree(vcollected);
@@ -354,7 +374,7 @@ static int __init do_copy(void)
state = SkipIt;
return 0;
} else {
- sys_write(wfd, victim, count);
+ sys_write_large(wfd, victim, count);
body_len -= count;
eat(count);
return 1;
@@ -604,8 +624,13 @@ static int __init populate_rootfs(void)
fd = sys_open("/initrd.image",
O_WRONLY|O_CREAT, 0700);
if (fd >= 0) {
- sys_write(fd, (char *)initrd_start,
- initrd_end - initrd_start);
+ long written = sys_write_large(fd, (char *)initrd_start,
+ initrd_end - initrd_start);
+
+ if (written != initrd_end - initrd_start)
+ pr_err("/initrd.image: incomplete write (%ld != %ld)\n",
+ written, initrd_end - initrd_start);
+
sys_close(fd);
free_initrd();
}


2014-06-20 04:30:11

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] initramfs: Support initrd that is bigger then 2G.

On 06/19/2014 07:12 PM, Yinghai Lu wrote:
> When initrd (compressed or not) is used, kernel report data corrupted
> with /dev/ram0.
>
> The root cause:
> During initramfs checking, if it is initrd, it will be transferred to
> /initrd.image with sys_write.
> sys_write only support 2G-4K write, so if the initrd ram is more than
> that, /initrd.image will not complete at all.
>
> Add local sys_write_large to loop calling sys_write to workaround the
> problem.
>
> Also need to use that in write_buffer path for cpio that have file is
> more than file.

That sentence doesn't make sense.

> At the same time, we don't need to worry about sys_read/sys_write in
> do_mounts_rd.c::crd_load. As decompressor will have fill/flush that
> means it will allocate buffer and buffer is smaller than 2G.
>
> Test with uncompressed initrd, and compressed with gz, bz2, lzma,xz,
> lzop.
>
> Signed-off-by: Yinghai Lu <[email protected]>

I would call this function xwrite(), which is usually called in userspace.

It would be nice in order to support very large initrd/initramfs, to
free the memory as it becomes available instead of requiring two copies
of the data in memory at the same time.

Otherwise,

Acked-by: H. Peter Anvin <[email protected]>

-hpa

2014-06-20 05:02:54

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] initramfs: Support initrd that is bigger then 2G.

On Thu, Jun 19, 2014 at 9:29 PM, H. Peter Anvin <[email protected]> wrote:
> On 06/19/2014 07:12 PM, Yinghai Lu wrote:
>>
>> Also need to use that in write_buffer path for cpio that have file is
>> more than file.
>
> That sentence doesn't make sense.

I mean this path:
unpack_to_rootfs ===> write_buffer ===> actions[].../do_copy
and image is uncompressed cpio, and there is one big file (>2G) in that cpio.


>
>
> I would call this function xwrite(), which is usually called in userspace.

Good, will change that.

>
> It would be nice in order to support very large initrd/initramfs, to
> free the memory as it becomes available instead of requiring two copies
> of the data in memory at the same time.

for initramfs, it is from ramdisk_image/ramdisk_size to tmpfs directly.
and ramdisk_image/ramdisk_size get freed.

for initrd, it is transferred to /initrd.image in tmpfs at first, and
ramdisk_image/ramdisk_size
get freed, at last /initrd.image is decompressed/copied to /dev/ram0
and get removed
from tempfs.

So what do you mean "free the memory"?

Thanks

Yinghai

2014-06-20 05:07:21

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] initramfs: Support initrd that is bigger then 2G.

On 06/19/2014 10:02 PM, Yinghai Lu wrote:
> On Thu, Jun 19, 2014 at 9:29 PM, H. Peter Anvin <[email protected]> wrote:
>> On 06/19/2014 07:12 PM, Yinghai Lu wrote:
>>>
>>> Also need to use that in write_buffer path for cpio that have file is
>>> more than file.
>>
>> That sentence doesn't make sense.
>
> I mean this path:
> unpack_to_rootfs ===> write_buffer ===> actions[].../do_copy
> and image is uncompressed cpio, and there is one big file (>2G) in that cpio.

Don't tell me, make the description clear so someone can understand it
10 years from now.
>>
>> It would be nice in order to support very large initrd/initramfs, to
>> free the memory as it becomes available instead of requiring two copies
>> of the data in memory at the same time.
>
> for initramfs, it is from ramdisk_image/ramdisk_size to tmpfs directly.
> and ramdisk_image/ramdisk_size get freed.
>
> for initrd, it is transferred to /initrd.image in tmpfs at first, and
> ramdisk_image/ramdisk_size
> get freed, at last /initrd.image is decompressed/copied to /dev/ram0
> and get removed
> from tempfs.
>
> So what do you mean "free the memory"?
>

For each of those transfers, we don't free the source memory until the
very end. We could free that memory as we process the input, requiring
less total memory.

-hpa

2014-06-20 16:03:56

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] initramfs: Support initrd that is bigger then 2G.

On Thu, Jun 19, 2014 at 10:07 PM, H. Peter Anvin <[email protected]> wrote:
>
> For each of those transfers, we don't free the source memory until the
> very end. We could free that memory as we process the input, requiring
> less total memory.

Yes, that would be nice enhancement.

Yinghai