2003-03-13 08:31:48

by Oliver Tennert

[permalink] [raw]
Subject: Re: Kernel setup() and initrd problems



Seems some more have problems, too. It is possibly related.

> pivot_root() is the currently preferred method. Depending on where
> the initramfs is by the time Linux 2.6 comes out it may be replaced by
> then, but for 2.4, pivot_root() is the way to go.

OK, I am pretty aware of the fact that it is the DOCUMENTED way to go. But
can you tell me ONE SINGLE current distribution using the pivot_root()
call in their initrd to mount the realrootdev?

Have a look at the following linuxrc script:

<SHELLSCRIPT>

#! /bin/sh

export PATH=/bin

echo "Loading module sym53c8xx ..."
insmod sym53c8xx

echo "Loading module jbd ..."
insmod jbd

echo "Loading module ext3 ..."
insmod ext3

mount -n -t proc proc /proc
echo 0x0100 > /proc/sys/kernel/real-root-dev ## <<<---- THIS LINE IS IMPORTANT!!
mount -n -t ext3 /dev/sda4 /mnt
rm -f /mnt/.initrd 2>/dev/null
mkdir -p /mnt/.initrd
cd /mnt
pivot_root . .initrd
umount -n /.initrd/proc
exec sh -c 'umount -n /.initrd ; rmdir /.initrd ; mount -n -oremount,ro /' </dev/console >/dev/console 2>&1

</SHELLSCRIPT>

The fact is, without the "echo 0x0100 ..." line this linuxrc script WILL
NOT be able to mount your root device for kernel >=2.4.19. This is
independent of the distribution used.

So why is that?

I always thought the pivot_root() would make this echo-stuff unnecessary.

The way used by virtually all latest distributions is getting rid of the
pivot stuff altogether, leaving the loading of the modules, and that's it.

Seems totally unclear (and unclean) to me.

best regards

Oliver Tennert





Dr. Oliver Tennert

+49 -7071 -9457-598

e-mail: [email protected]
science + computing AG
Hagellocher Weg 71
D-72070 Tuebingen



2003-03-13 17:05:33

by Kai Germaschewski

[permalink] [raw]
Subject: Re: Kernel setup() and initrd problems

On Thu, 13 Mar 2003, Oliver Tennert wrote:

> Seems some more have problems, too. It is possibly related.
>
> > pivot_root() is the currently preferred method. Depending on where
> > the initramfs is by the time Linux 2.6 comes out it may be replaced by
> > then, but for 2.4, pivot_root() is the way to go.
>
> OK, I am pretty aware of the fact that it is the DOCUMENTED way to go. But
> can you tell me ONE SINGLE current distribution using the pivot_root()
> call in their initrd to mount the realrootdev?

Well, the script you attached shows one distro which does:

> [...]
> echo 0x0100 > /proc/sys/kernel/real-root-dev ## <<<---- THIS LINE IS IMPORTANT!!
> mount -n -t ext3 /dev/sda4 /mnt
> rm -f /mnt/.initrd 2>/dev/null
> mkdir -p /mnt/.initrd
> cd /mnt
> pivot_root . .initrd
> umount -n /.initrd/proc
> exec sh -c 'umount -n /.initrd ; rmdir /.initrd ; mount -n -oremount,ro /' </dev/console >/dev/console 2>&1
>
> </SHELLSCRIPT>
>
> The fact is, without the "echo 0x0100 ..." line this linuxrc script WILL
> NOT be able to mount your root device for kernel >=2.4.19. This is
> independent of the distribution used.
>
> So why is that?
>
> I always thought the pivot_root() would make this echo-stuff unnecessary.
>
> The way used by virtually all latest distributions is getting rid of the
> pivot stuff altogether, leaving the loading of the modules, and that's it.
>
> Seems totally unclear (and unclean) to me.

I agree it's a mess and for all I can tell pivot_root is not used in the
way it was originally designed.

Since I cleaned up the initrd stuff in 2.5 lately, I can at least explain
what's going on:

The kernel finds an initrd, loads that into /dev/ram, mounts that as root
and then basically fork()s and execve()s /linuxrc. So we're still using
the special initrd code, and the /linuxrc which is running has a pid != 1.
At this point, we can do preparations like loading modules and mounting
filesystems. As shown by the script above, we can also change the root
filesystem. However, we can not finish the script by just exec'ing
/sbin/init in the new root, since we're not pid 1. So instead, we need to
exit from the script (actually, the above first exec's and then the new
shells exits soon after).

Now kernel code takes control again, the "after-initrd-code" is run.
Traditionally, that means we now mount the real root device, free the
initrd mem (and also all __init mem a bit alter), and then execve()
/sbin/init, which then gets run as PID 1, and normal startup begins.

However, in the case above, we have already mounted root device, so we
don't want the kernel to mess with it. So we do echo 0x0100 >
/proc/real-root-dev, which tells the kernel that what he thinks is our
current root, /dev/ram, is the real root too, so it skips unmounting
the initrd root and mounting the real one.

I think whoever came up with that just got the idea of pivot_root wrong.
The idea was to get rid of the initrd special case. It should be possible
to do the following, though I didn't work out the details:

Tell the kernel that our root dev is /dev/ram and give it an initrd which
isn't really a classical initrd (with /linuxrc on it), but instead has a
/sbin/init which is similar to the linuxrc above.

Then, the kernel will load the image into /dev/ram, mount that as root and
exec /sbin/init, skipping the special initrd code.

Now, we have to take care of all the remaining business in /sbin/init
ourselves, i.e.

- load modules
- mount real root
- pivot root to real root
- execve /sbin/init on real root, pointing stdin/out/err to /dev/console
on the new root
- umount and free our first (ramdisk) root

--Kai


2003-03-13 17:55:01

by Kevin P. Fleming

[permalink] [raw]
Subject: Re: Kernel setup() and initrd problems

Kai Germaschewski wrote:
> I think whoever came up with that just got the idea of pivot_root wrong.
> The idea was to get rid of the initrd special case. It should be possible
> to do the following, though I didn't work out the details:
>
> Tell the kernel that our root dev is /dev/ram and give it an initrd which
> isn't really a classical initrd (with /linuxrc on it), but instead has a
> /sbin/init which is similar to the linuxrc above.
>
> Then, the kernel will load the image into /dev/ram, mount that as root and
> exec /sbin/init, skipping the special initrd code.
>
> Now, we have to take care of all the remaining business in /sbin/init
> ourselves, i.e.
>
> - load modules
> - mount real root
> - pivot root to real root
> - execve /sbin/init on real root, pointing stdin/out/err to /dev/console
> on the new root
> - umount and free our first (ramdisk) root

I have used exactly this process, and it works as you expect. In this
situation you're not really using the initrd as a "classic" initrd, it's
just a temporary root filesystem. The kernel has no idea what the real
root is going to be, and that determination isn't made until the
initrd's scripts decide what to mount and then pivot_root to it.

Much cleaner than the old way.