When initrd (compressed or not) is used, kernel report data corrupted
with /dev/ram0.
The root cause:
During initramfs checking, if it is initrd, it will be transferred to
/initrd.image with sys_write.
sys_write only support 2G-4K write, so if the initrd ram is more than
that, /initrd.image will not complete at all.
Add local sys_write_large to loop calling sys_write to workaround the
problem.
Also need to use that in write_buffer path for cpio that have file is
more than file.
At the same time, we don't need to worry about sys_read/sys_write in
do_mounts_rd.c::crd_load. As decompressor will have fill/flush that
means it will allocate buffer and buffer is smaller than 2G.
Test with uncompressed initrd, and compressed with gz, bz2, lzma,xz,
lzop.
Signed-off-by: Yinghai Lu <[email protected]>
---
init/initramfs.c | 33 +++++++++++++++++++++++++++++----
1 file changed, 29 insertions(+), 4 deletions(-)
Index: linux-2.6/init/initramfs.c
===================================================================
--- linux-2.6.orig/init/initramfs.c
+++ linux-2.6/init/initramfs.c
@@ -19,6 +19,26 @@
#include <linux/syscalls.h>
#include <linux/utime.h>
+static long __init sys_write_large(unsigned int fd, char *p,
+ size_t count)
+{
+ ssize_t left = count;
+ long written;
+
+ /* sys_write only can write MAX_RW_COUNT aka 2G-4K bytes at most */
+ while (left > 0) {
+ written = sys_write(fd, p, left);
+
+ if (written <= 0)
+ break;
+
+ left -= written;
+ p += written;
+ }
+
+ return (written < 0) ? written : count;
+}
+
static __initdata char *message;
static void __init error(char *x)
{
@@ -346,7 +366,7 @@ static int __init do_name(void)
static int __init do_copy(void)
{
if (count >= body_len) {
- sys_write(wfd, victim, body_len);
+ sys_write_large(wfd, victim, body_len);
sys_close(wfd);
do_utime(vcollected, mtime);
kfree(vcollected);
@@ -354,7 +374,7 @@ static int __init do_copy(void)
state = SkipIt;
return 0;
} else {
- sys_write(wfd, victim, count);
+ sys_write_large(wfd, victim, count);
body_len -= count;
eat(count);
return 1;
@@ -604,8 +624,13 @@ static int __init populate_rootfs(void)
fd = sys_open("/initrd.image",
O_WRONLY|O_CREAT, 0700);
if (fd >= 0) {
- sys_write(fd, (char *)initrd_start,
- initrd_end - initrd_start);
+ long written = sys_write_large(fd, (char *)initrd_start,
+ initrd_end - initrd_start);
+
+ if (written != initrd_end - initrd_start)
+ pr_err("/initrd.image: incomplete write (%ld != %ld)\n",
+ written, initrd_end - initrd_start);
+
sys_close(fd);
free_initrd();
}
On 06/19/2014 07:12 PM, Yinghai Lu wrote:
> When initrd (compressed or not) is used, kernel report data corrupted
> with /dev/ram0.
>
> The root cause:
> During initramfs checking, if it is initrd, it will be transferred to
> /initrd.image with sys_write.
> sys_write only support 2G-4K write, so if the initrd ram is more than
> that, /initrd.image will not complete at all.
>
> Add local sys_write_large to loop calling sys_write to workaround the
> problem.
>
> Also need to use that in write_buffer path for cpio that have file is
> more than file.
That sentence doesn't make sense.
> At the same time, we don't need to worry about sys_read/sys_write in
> do_mounts_rd.c::crd_load. As decompressor will have fill/flush that
> means it will allocate buffer and buffer is smaller than 2G.
>
> Test with uncompressed initrd, and compressed with gz, bz2, lzma,xz,
> lzop.
>
> Signed-off-by: Yinghai Lu <[email protected]>
I would call this function xwrite(), which is usually called in userspace.
It would be nice in order to support very large initrd/initramfs, to
free the memory as it becomes available instead of requiring two copies
of the data in memory at the same time.
Otherwise,
Acked-by: H. Peter Anvin <[email protected]>
-hpa
On Thu, Jun 19, 2014 at 9:29 PM, H. Peter Anvin <[email protected]> wrote:
> On 06/19/2014 07:12 PM, Yinghai Lu wrote:
>>
>> Also need to use that in write_buffer path for cpio that have file is
>> more than file.
>
> That sentence doesn't make sense.
I mean this path:
unpack_to_rootfs ===> write_buffer ===> actions[].../do_copy
and image is uncompressed cpio, and there is one big file (>2G) in that cpio.
>
>
> I would call this function xwrite(), which is usually called in userspace.
Good, will change that.
>
> It would be nice in order to support very large initrd/initramfs, to
> free the memory as it becomes available instead of requiring two copies
> of the data in memory at the same time.
for initramfs, it is from ramdisk_image/ramdisk_size to tmpfs directly.
and ramdisk_image/ramdisk_size get freed.
for initrd, it is transferred to /initrd.image in tmpfs at first, and
ramdisk_image/ramdisk_size
get freed, at last /initrd.image is decompressed/copied to /dev/ram0
and get removed
from tempfs.
So what do you mean "free the memory"?
Thanks
Yinghai
On 06/19/2014 10:02 PM, Yinghai Lu wrote:
> On Thu, Jun 19, 2014 at 9:29 PM, H. Peter Anvin <[email protected]> wrote:
>> On 06/19/2014 07:12 PM, Yinghai Lu wrote:
>>>
>>> Also need to use that in write_buffer path for cpio that have file is
>>> more than file.
>>
>> That sentence doesn't make sense.
>
> I mean this path:
> unpack_to_rootfs ===> write_buffer ===> actions[].../do_copy
> and image is uncompressed cpio, and there is one big file (>2G) in that cpio.
Don't tell me, make the description clear so someone can understand it
10 years from now.
>>
>> It would be nice in order to support very large initrd/initramfs, to
>> free the memory as it becomes available instead of requiring two copies
>> of the data in memory at the same time.
>
> for initramfs, it is from ramdisk_image/ramdisk_size to tmpfs directly.
> and ramdisk_image/ramdisk_size get freed.
>
> for initrd, it is transferred to /initrd.image in tmpfs at first, and
> ramdisk_image/ramdisk_size
> get freed, at last /initrd.image is decompressed/copied to /dev/ram0
> and get removed
> from tempfs.
>
> So what do you mean "free the memory"?
>
For each of those transfers, we don't free the source memory until the
very end. We could free that memory as we process the input, requiring
less total memory.
-hpa
On Thu, Jun 19, 2014 at 10:07 PM, H. Peter Anvin <[email protected]> wrote:
>
> For each of those transfers, we don't free the source memory until the
> very end. We could free that memory as we process the input, requiring
> less total memory.
Yes, that would be nice enhancement.
Yinghai