2002-08-27 02:45:33

by Jeff Chua

[permalink] [raw]
Subject: Re: [BUG] initrd >24MB corruption (fwd)


Alan,

Who else can help with this problem? I tried to write to Werner
Almesberger <[email protected]> (no such email) and Hans Lermen
<[email protected]>, but no response from either.

I'm suspecting that somehow part of initrd is being corrupted during boot
up or may be ungzip is not working properly, because I can definitely
gzip/ungzip on all versions of running Linux for the ram.gz filesystem I
created. Again, the only difference between ram-18mb.gz (6MB) and
ram-24mb.gz (8MB) is ram24.gz contains one extra file to fill up the
filesystem to 90%.

Same bzImage, same ramdisk_size=28000, just different initrd files.
ram-18mb.gz boots, ram-24mb.gz hangs.

gzip 1.3.3

I noticed that lib/inflate.c says gzip is based on gzip-1.0.3

Thanks,
Jeff
[ [email protected] ]

---------- Forwarded message ----------
Date: Tue, 27 Aug 2002 08:05:14 +0800 (SGT)
From: Jeff Chua <[email protected]>
To: Alan Cox <[email protected]>
Cc: Linux Kernel <[email protected]>
Subject: Re: [BUG] initrd >24MB corruption


On 26 Aug 2002, Alan Cox wrote:

> > RAMDISK: Compressed image found at block 0 ... then stuck!
> Force a 1K block size when you make the fs

That was the default for mke2fs.

Tried compress instead of gzip. Same problem. I guess the compressed file
is too big for the kernel. The 8MB compressed (from 24MB) didn't work. 6MB
compressed from 18MB worked. The 24MB filesystem has just one extra junk
file in /tmp to fill up the filesystem to 90% and this caused the system
to hang.

I'm thinking it could be the ungzip function in the kernel that's causing
the problem.


Jeff.



2002-08-27 02:51:56

by Erik Andersen

[permalink] [raw]
Subject: Re: [BUG] initrd >24MB corruption (fwd)

On Tue Aug 27, 2002 at 10:49:13AM +0800, Jeff Chua wrote:
>
> Alan,
>
> Who else can help with this problem? I tried to write to Werner
> Almesberger <[email protected]> (no such email) and Hans Lermen
> <[email protected]>, but no response from either.
>
> I'm suspecting that somehow part of initrd is being corrupted during boot
> up or may be ungzip is not working properly, because I can definitely
> gzip/ungzip on all versions of running Linux for the ram.gz filesystem I
> created. Again, the only difference between ram-18mb.gz (6MB) and
> ram-24mb.gz (8MB) is ram24.gz contains one extra file to fill up the
> filesystem to 90%.
>
> Same bzImage, same ramdisk_size=28000, just different initrd files.
> ram-18mb.gz boots, ram-24mb.gz hangs.

How much total ram does your system have?

-Erik

--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--

2002-08-27 08:51:30

by Jeff Chua

[permalink] [raw]
Subject: Re: [BUG] initrd >24MB corruption (fwd)


On Mon, 26 Aug 2002, Erik Andersen wrote:

> How much total ram does your system have?

640MB.


Jeff.


2002-08-27 09:26:37

by Russell King

[permalink] [raw]
Subject: Re: [BUG] initrd >24MB corruption (fwd)

On Tue, Aug 27, 2002 at 04:55:00PM +0800, Jeff Chua wrote:
> On Mon, 26 Aug 2002, Erik Andersen wrote:
> > How much total ram does your system have?
>
> 640MB.

Its not that problem then. 8)

I was suspecting that the write() to the ramdisk device was hanging
(which you could confirm by printk'ing an 'i' before and an 'o' after
the write call in flush_window() in init/do_mounts.c or
drivers/block/rd.c. If you end up with 'i' as the last character, its
the write that hangs, if its an 'o' then its gunzip itself.)

The other possibility is this little critter:

static int __init fill_inbuf(void)
{
if (exit_code) return -1;

insize = read(crd_infd, inbuf, INBUFSIZ);
if (insize == 0) return -1;

inptr = 1;

return inbuf[0];
}

You could put printks in the paths that return -1 and see if you're
hitting any of those.

However, returning '-1' is asking for trouble. When I was looking at
how to handle the "out of space" in the ramdisk issue, I found that
there appears to be NO value that fill_inbuf() can return that will
guarantee to terminate inflate.c at an arbitary point in the
decompression.

In gzip, we abort the program on error. In the kernel, we don't
have that luxury. (Luckily initramfs uses a cut-down gzip to do
the decompression, which can exit on error.)

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2002-08-28 05:34:16

by Jeff Chua

[permalink] [raw]
Subject: Re: [BUG] initrd >24MB corruption (fwd)


On Tue, 27 Aug 2002, Russell King wrote:

> The other possibility is this little critter:
>
> static int __init fill_inbuf(void)
> {
> if (exit_code) return -1;
>
> insize = read(crd_infd, inbuf, INBUFSIZ);
> if (insize == 0) return -1;
>
> inptr = 1;
>
> return inbuf[0];
> }
>
> You could put printks in the paths that return -1 and see if you're
> hitting any of those.
>
> However, returning '-1' is asking for trouble. When I was looking at
> how to handle the "out of space" in the ramdisk issue, I found that
> there appears to be NO value that fill_inbuf() can return that will
> guarantee to terminate inflate.c at an arbitary point in the
> decompression.
>
> In gzip, we abort the program on error. In the kernel, we don't
> have that luxury. (Luckily initramfs uses a cut-down gzip to do
> the decompression, which can exit on error.)


What I found when I ran ram18.gz and ram24.gz was ...

ram18.gz has 18MB filled of available 28MB [6MB gzipped]

ram24.gz has 24MB filled of available 28MB [8MB gzipped]

in the case of ram24.gz, "inflate_stored" from lib/inflate.c was called,
but ram18.gz never called this function ... meaning the gzipped was
corrupted during boot up for ram24.gz. Where ... I don't know.


2002-08-28 05:55:29

by Jeff Chua

[permalink] [raw]
Subject: Re: [BUG] initrd >24MB corruption (fwd)


On Tue, 27 Aug 2002, Russell King wrote:

> I was suspecting that the write() to the ramdisk device was hanging
> (which you could confirm by printk'ing an 'i' before and an 'o' after
> the write call in flush_window() in init/do_mounts.c or
> drivers/block/rd.c. If you end up with 'i' as the last character, its
> the write that hangs, if its an 'o' then its gunzip itself.)

last character is an "o". After that, "inflate_stored" was called ...
meaning the kernel doesn't think the data is compressed anymore. Looks
like somewhere the code can't handle this big chunk of data (8MB), whereas
the 6MB was ok.

Jeff



2002-08-30 07:09:12

by Jeff Chua

[permalink] [raw]
Subject: Re: [BUG] initrd >24MB corruption (fwd)


On Wed, 28 Aug 2002, Jeff Chua wrote:

from linux/arch/i386/mm/init.c

/*
* paging_init() sets up the page tables - note that the first 8MB are
* already mapped by head.S.
*
* This routines also unmaps the page at virtual kernel address 0, so
* that we can trap those pesky NULL-reference errors in the kernel.
*/

#if CONFIG_X86_PAE
/*
* We will bail out later - printk doesnt work right now so
* the user would just see a hanging kernel.
*/
if (cpu_has_pae)
set_in_cr4(X86_CR4_PAE);
#endif



... does that mean the gzipped fs can only be <8MB? That could explain why
the ram6MB.gz worked and ram8MB.gz doesn't.


Jeff


2002-09-14 02:57:16

by Werner Almesberger

[permalink] [raw]
Subject: Re: [BUG] initrd >24MB corruption (fwd)

Jeff Chua wrote:
> Who else can help with this problem? I tried to write to Werner
> Almesberger <[email protected]> (no such email)

That one is gone. [email protected] should work for the
forseeable future.

> I'm suspecting that somehow part of initrd is being corrupted during boot

The initrd is typically loaded below 16 MB. Your bzImage
uncompresses after the loaded kernel, so if your kernel is, say,
3 MB and compresses to 1 MB (that's a reasonably lean 2.4.19 kernel),
up to about 4.5 MB are overwritten already when getting the kernel
in place. A 6 MB/2 MB kernel would happy scribble over ~8.5 MB.

See also figures 7 and 8 of
http://www.almesberger.net/cv/papers/ols2k-9.ps

> ... does that mean the gzipped fs can only be <8MB? That could explain why
> the ram6MB.gz worked and ram8MB.gz doesn't.

The 8 MB mapping affects mainly the maximum kernel size and
shouldn't matter in this case. If you want to try anyway, you
should be able to increase the mapping by pushing
arch/i386/kernel/head.S:empty_zero_page down by a page, and
adjusting the .org below too.

So, assuming the problem is indeed the kernel overwriting initrd,
there are three things you can do to avoid this:

- use a smaller initrd (they were never meant to be quite
*that* big anyway :-)
- make your kernel smaller
- get your boot loader to load the initrd at a higher
address, or find a boot loader that does (no, I don't
know which ones do, and whether they do this reliably.
Section 2.5 of my booting paper (see above) explains
some potential pitfalls.)

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2002-09-14 03:31:33

by Jeff Chua

[permalink] [raw]
Subject: Re: [BUG] initrd >24MB corruption (fwd)

On Sat, 14 Sep 2002, Werner Almesberger wrote:

> So, assuming the problem is indeed the kernel overwriting initrd,
> there are three things you can do to avoid this:
>
> - use a smaller initrd (they were never meant to be quite
> *that* big anyway :-)

First, thanks for replying.

Now, I used "strip" to strip everything including /lib/lib* and managed to
reduced from 24MB to 12MB uncompressed (8MB to 5MB compressed), and
avoided the booting problem. Stripping /lib/lib*.so* was the answer!

Thanks,
Jeff.