As davej started talking about a few months ago at Kernel Summit and
LPC, there's a lot of duplication between distros on the tools used to
generate the initramfs as well as the contents and how the initramfs
works. Ultimately, there's little reason for this not to be something
that is shared and worked on by everyone. Added to this is the fact
that everyone's infrastructures for this have grown up over a long-ish
period of time without significant amounts of reworking for the way that
the kernel and early boot works these days.
Therefore I've started on a new project, dracut, to try to be a new
initramfs tool that can be used across various distributions. From the
README...
Unlike existing initramfs's, this is an attempt at having as little as
possible hard-coded into the initramfs as possible. The initramfs has
(basically) one purpose in life -- getting the rootfs mounted so that we
can transition to the real rootfs. This is all driven off of device
availability. Therefore, instead of scripts hard-coded to do various
things, we depend on udev to create device nodes for us and then when we
have the rootfs's device node, we mount and carry on. This helps to keep
the time required in the initramfs as little as possible so that things
like a 5 second boot aren't made impossible as a result of the very
existence of an initramfs. It's likely that we'll grow some hooks for
running arbitrary commands in the flow of the script, but it's worth
trying to resist the urge as much as we can as hooks are guaranteed to
be the path to slow-down.
Also, there is an attempt to keep things as distribution-agnostic as
possible. Every distribution has their own tool here and it's not
something which is really interesting to have separate across them. So
contributions to help decrease the distro-dependencies are welcome.
The git tree can be found at git://fedorapeople.org/~katzj/dracut.git
for now. See the TODO file for things which still need to be done and
HACKING for some instructions on how to get started using the tool.
There is also a mailing list that is being used for the discussion --
[email protected].
Currently, there are a few Fedora-isms which have crept in just as a
result of it being the shortest path to solving some problems, but I'm
actively trying to get those out sooner rather than later as well as
getting to where I'm using it to boot my laptop.
Comments and discussion welcome
Jeremy
* Seewer Philippe [2008-12-19 08:41]:
>
> Hannes Reinecke wrote:
> [snip]
> > If anyone is interested I can give a short overview of it.
> Please do so, would be appreciated.
A good start is the manual page in section 5:
http://git.opensuse.org/?p=projects/mkinitrd.git;a=blob_plain;f=man/mkinitrd.5.txt;hb=7583c3cc047edc3e8f1a06e8b7925bd27ac0228c
(The git.kernel.org and the opensuse.org repos are basically the same,
we just switched to opensuse.org after the internal maintainership has
been transferred from Hannes to myself because it was easier to add
new users there. The opensuse.org git repo site didn't exist when the
kernel.org mkinitrd repo was created.)
Anyway:
The basic idea is to have most stuff not in the main 'mkinitrd' script
but in modules. Each module has (normally) a setup script part that is
executed when the initrd is created, and a boot part that is executed
when the initrd is running. For example, NFS root is in the 'nfs-util'
package, not in 'mkinitrd'. Same for iSCSI. Or the kdump part is
not in the main mkinitrd but in our 'kdump' package [1]. So the main
initrd package is quite small but still very flexible.
It's also flexible enough to use Busybox as module that resides in the
'busybox' package and can then be enabled with -F busybox (feature)
when building the initrd.
Only documentation is at the current time a bit weak, one has to fiddle
some stuff from the sources when writing new modules. But that's easy
to fix. :-)
Hannes may explain more ...
Bernhard
[1] hg clone http://freehg.org/u/bwalle/kdump/
Hannes Reinecke wrote:
[snip]
> If anyone is interested I can give a short overview of it.
Please do so, would be appreciated.
Cheers,
Philippe
Hi all,
Bernhard Walle wrote:
> * Seewer Philippe [2008-12-19 08:41]:
>> Hannes Reinecke wrote:
>> [snip]
>>> If anyone is interested I can give a short overview of it.
>> Please do so, would be appreciated.
>
> A good start is the manual page in section 5:
> http://git.opensuse.org/?p=projects/mkinitrd.git;a=blob_plain;f=man/mkinitrd.5.txt;hb=7583c3cc047edc3e8f1a06e8b7925bd27ac0228c
>
> (The git.kernel.org and the opensuse.org repos are basically the same,
> we just switched to opensuse.org after the internal maintainership has
> been transferred from Hannes to myself because it was easier to add
> new users there. The opensuse.org git repo site didn't exist when the
> kernel.org mkinitrd repo was created.)
>
> Anyway:
>
> The basic idea is to have most stuff not in the main 'mkinitrd' script
> but in modules. Each module has (normally) a setup script part that is
> executed when the initrd is created, and a boot part that is executed
> when the initrd is running. For example, NFS root is in the 'nfs-util'
> package, not in 'mkinitrd'. Same for iSCSI. Or the kdump part is
> not in the main mkinitrd but in our 'kdump' package [1]. So the main
> initrd package is quite small but still very flexible.
>
> It's also flexible enough to use Busybox as module that resides in the
> 'busybox' package and can then be enabled with -F busybox (feature)
> when building the initrd.
>
> Only documentation is at the current time a bit weak, one has to fiddle
> some stuff from the sources when writing new modules. But that's easy
> to fix. :-)
>
> Hannes may explain more ...
>
Yes, quite so. The design principles are as follows:
The goal of the initrd is to activate and mount the root fs.
And the root fs _only_. Every other system should be configured
once the main system is running.
So mkinitrd has two parts:
a) Detect the configuration of the rootfs and create the
initramfs
b) Configure the rootfs on boot.
Therefore there are two distinct script types:
setup-XXX.sh - run during initrd creation time to detect
and record the configuration
boot-XXX.sh - run during boot to configure the subsystem
The setup scripts have these tasks:
- Detect the rootfs
- Unwind the storage stack and record the configuration
on each level
- Copy the required contents into the initramfs
- Pack initramfs for use
The boot scripts have these tasks:
- Initial configuration (create required device nodes, start udev)
- Configure the storage stack
- fsck and mount the root fs
So basically the boot scripts have to be called in reverse
order to the setup scripts.
To ensure the order is preserved during each run I've
introduced some 'stages', which are run consecutively.
These stages are documented in mkinitrd.5
The neat thing here is that we've split off each
configuration into small scripts, which will be called
if present. This allows for a pretty modular setup
and avoid the massive requirement setting of an
monolithic script.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
[email protected] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Markus Rex, HRB 16746 (AG N?rnberg)
On Fri, Dec 19, 2008 at 02:55:26PM +0100, Hannes Reinecke wrote:
>
> The goal of the initrd is to activate and mount the root fs.
> And the root fs _only_. Every other system should be configured
> once the main system is running.
Don't forget resuming from hibernation....
And of course, activating and mounting the root filesystem can be
quite complicated --- it can involve loading driver modules,
activiating md and/or lvm, prompting for a password, setting up
networking (dhcp, routing, dns) for iSCSI and/or NFS/AFS/Lustre/et.al,
the equivalent setup for Fiber Channel attached disks, etc. If
there's any cryptography involved, the user may need to be prompted
for a password and/or key and/or fingerprint scan to unlock TPM unit
to access the key, etc.
There may also be times when it is useful to operate on the root
filesystem in some way before it is mounted; in most cases the
operation can bedone on a filesystem mounted read-only, yes --- but at
the cost of needing to reboot afterwards if the root filesystem needs
to be modified by said userspace tool.
Finally, note that part the discussion at the Kernel Summit, and also
what David Jones was looking to work at, was to do something that
could included as part of the kernel sources. The idea is that as
responsibility for early boot is moved from the kernel, an mkinitramfs
which is fixed and distributed by the distribution might not work with
a newer kernel.org kernel. So the idea that was explored was adding a
common mkinitramfs with basic functionality into kernel sources, with
the ability for distributions to add various "value add" enhancements
if they like. This way if the kernel wants to move more functionality
(for example, in the area of resuming from hibernation) out of the
kernel into initramfs, it can do so without breaking the ability of
older distributions from being able to use kernel.org kernels.
So IMHO, it's important not only that the distributions standardize on
a single initramfs framework, but that framework get integrated into
the kernel sources.
Regards,
- Ted
On Dec 19, 2008, at 10:27 AM, Theodore Tso wrote:
> On Fri, Dec 19, 2008 at 02:55:26PM +0100, Hannes Reinecke wrote:
>>
>> The goal of the initrd is to activate and mount the root fs.
>> And the root fs _only_. Every other system should be configured
>> once the main system is running.
>
> Don't forget resuming from hibernation....
I haven't, although I haven't sat down to implement it yet.
> And of course, activating and mounting the root filesystem can be
> quite complicated --- it can involve loading driver modules,
> activiating md and/or lvm, prompting for a password, setting up
> networking (dhcp, routing, dns) for iSCSI and/or NFS/AFS/Lustre/et.al,
> the equivalent setup for Fiber Channel attached disks, etc. If
> there's any cryptography involved, the user may need to be prompted
> for a password and/or key and/or fingerprint scan to unlock TPM unit
> to access the key, etc.
Well, driver modules should be being loaded by udev. Period. If
something requires a manual modprobe, that's a bug IMHO. The other
stuff, while non-trivial, is surprisingly doable from udev rules.
> There may also be times when it is useful to operate on the root
> filesystem in some way before it is mounted; in most cases the
> operation can bedone on a filesystem mounted read-only, yes --- but at
> the cost of needing to reboot afterwards if the root filesystem needs
> to be modified by said userspace tool.
I think that once you start getting into this realm, though, you end
up with an incredibly over-complicated and slow initramfs. If we
instead focus on keeping things "fast", the reboot afterwards isn't
that costly.
> Finally, note that part the discussion at the Kernel Summit, and also
> what David Jones was looking to work at, was to do something that
> could included as part of the kernel sources. The idea is that as
> responsibility for early boot is moved from the kernel, an mkinitramfs
> which is fixed and distributed by the distribution might not work with
> a newer kernel.org kernel. So the idea that was explored was adding a
> common mkinitramfs with basic functionality into kernel sources, with
> the ability for distributions to add various "value add" enhancements
> if they like. This way if the kernel wants to move more functionality
> (for example, in the area of resuming from hibernation) out of the
> kernel into initramfs, it can do so without breaking the ability of
> older distributions from being able to use kernel.org kernels.
>
> So IMHO, it's important not only that the distributions standardize on
> a single initramfs framework, but that framework get integrated into
> the kernel sources.
Yeah, Dave and I have talked a fair bit about that. It's just
significantly easier to get something going _outside_ of the kernel
sources and then work towards integrating it. The plus side of
integrating it is that the existing bits to generate a built-in
initramfs can go away.
Jeremy
Jeremy Katz <[email protected]> writes:
> On Dec 19, 2008, at 10:27 AM, Theodore Tso wrote:
>> On Fri, Dec 19, 2008 at 02:55:26PM +0100, Hannes Reinecke wrote:
>>>
>>> The goal of the initrd is to activate and mount the root fs.
>>> And the root fs _only_. Every other system should be configured
>>> once the main system is running.
[...]
>> There may also be times when it is useful to operate on the root
>> filesystem in some way before it is mounted; in most cases the
>> operation can bedone on a filesystem mounted read-only, yes --- but at
>> the cost of needing to reboot afterwards if the root filesystem needs
>> to be modified by said userspace tool.
>
> I think that once you start getting into this realm, though, you end
> up with an incredibly over-complicated and slow initramfs. If we
> instead focus on keeping things "fast", the reboot afterwards isn't
> that costly.
One of the features of the Debian / Ubuntu initramfs infrastructure,
which sounds remarkably like your design (or vice-versa), is that it
drops all the "standard" drivers into the initramfs.
This is, to me, worth several minutes of additional boot time, in terms
of flexibility: being able to modify the hardware and be confident that
the appropriate drivers are in place already makes life much, much
easier.
(In practice I doubt this adds more than a second or five to boot time;
certainly, it takes no longer to get to rootfs mounted than the RHEL 4
systems that have nothing but what is essential in the initrd...)
So, it would certainly be my hope — with my systems administration hat
on — that your proposed system would support that similar operation as
an option, at least.
Personally, I think it makes the right default: better correct than
fast, but obviously tastes vary there.
Regards,
Daniel
On Sun, Dec 21, 2008 at 12:50:21AM +1100, Daniel Pittman wrote:
> One of the features of the Debian / Ubuntu initramfs infrastructure,
> which sounds remarkably like your design (or vice-versa), is that it
> drops all the "standard" drivers into the initramfs.
>
> This is, to me, worth several minutes of additional boot time, in terms
> of flexibility: being able to modify the hardware and be confident that
> the appropriate drivers are in place already makes life much, much
> easier.
There's another reason this is really useful.
If something goes wrong, remotely debugging a users initrd right is
a lot easier if you know what it looks like. Right now, in Fedora for eg,
where we generate an initrd for each users system at runtime, we need
to get a copy of the generated initrd, and pull it apart just to find
out what modules ended up in there, what didn't, and then somehow
try to work backwards to try and figure out how the generator got into
that state. After doing this for five years, let me tell you it's
_really_ _really_ painful.
> (In practice I doubt this adds more than a second or five to boot time;
> certainly, it takes no longer to get to rootfs mounted than the RHEL 4
> systems that have nothing but what is essential in the initrd...)
At least in theory, with a kernel-event/udev driven system, the additional
modules shouldn't cause any additional boot time. There wouldn't be
events generated to cause them to be loaded, so they'd just be taking
up space. And the additional load time for a bigger initrd should be
really lost in the noise of the overall boot.
Dave
--
http://www.codemonkey.org.uk
On Wed, Dec 17, 2008 at 01:54:54PM -0500, Jeremy Katz wrote:
> As davej started talking about a few months ago at Kernel Summit and
> LPC, there's a lot of duplication between distros on the tools used to
> generate the initramfs as well as the contents and how the initramfs
> works. Ultimately, there's little reason for this not to be something
> that is shared and worked on by everyone. Added to this is the fact
> that everyone's infrastructures for this have grown up over a long-ish
> period of time without significant amounts of reworking for the way that
> the kernel and early boot works these days.
>
> Therefore I've started on a new project, dracut, to try to be a new
> initramfs tool that can be used across various distributions. From the
> README...
It looks like Hannes has also been working on a new, modular initramfs
for a while:
http://git.kernel.org/?p=linux/kernel/git/hare/mkinitrd.git;a=summary
I hope you guys can get together and agree on one implementation..
On Wed, Dec 17, 2008 at 01:54:54PM -0500, Jeremy Katz wrote:
> As davej started talking about a few months ago at Kernel Summit and
> LPC, there's a lot of duplication between distros on the tools used to
> generate the initramfs as well as the contents and how the initramfs
> works. Ultimately, there's little reason for this not to be something
> that is shared and worked on by everyone. Added to this is the fact
> that everyone's infrastructures for this have grown up over a long-ish
> period of time without significant amounts of reworking for the way that
> the kernel and early boot works these days.
>
> Therefore I've started on a new project, dracut, to try to be a new
> initramfs tool that can be used across various distributions. From the
> README...
>
> Unlike existing initramfs's, this is an attempt at having as little as
> possible hard-coded into the initramfs as possible. The initramfs has
> (basically) one purpose in life -- getting the rootfs mounted so that we
> can transition to the real rootfs. This is all driven off of device
> availability. Therefore, instead of scripts hard-coded to do various
> things, we depend on udev to create device nodes for us and then when we
> have the rootfs's device node, we mount and carry on. This helps to keep
> the time required in the initramfs as little as possible so that things
> like a 5 second boot aren't made impossible as a result of the very
> existence of an initramfs. It's likely that we'll grow some hooks for
> running arbitrary commands in the flow of the script, but it's worth
> trying to resist the urge as much as we can as hooks are guaranteed to
> be the path to slow-down.
>
> Also, there is an attempt to keep things as distribution-agnostic as
> possible. Every distribution has their own tool here and it's not
> something which is really interesting to have separate across them. So
> contributions to help decrease the distro-dependencies are welcome.
>
> The git tree can be found at git://fedorapeople.org/~katzj/dracut.git
> for now. See the TODO file for things which still need to be done and
> HACKING for some instructions on how to get started using the tool.
> There is also a mailing list that is being used for the discussion --
> [email protected].
>
> Currently, there are a few Fedora-isms which have crept in just as a
> result of it being the shortest path to solving some problems, but I'm
> actively trying to get those out sooner rather than later as well as
> getting to where I'm using it to boot my laptop.
>
> Comments and discussion welcome
>
> Jeremy
>
Not that I don't think a unifying tool to create an initramfs is a bad idea
(quite the contrary, I think it would be great), but I'd like to point out that
one of your underlying premises is a bit shaky. That an initramfs has one
purpose, that being to get the rootfs mounted, isn't entirely accurate. Kdump
and various embedded systems being the prime examples here. Many embedded
systems run entirely out of the initramfs, and contain all the code they need to
do so in them. Additionally, kdump in most environments, attemps to capture
core files entirely from the initramfs as well, operating under the assumption
that the rootfs may not be functioning properly after a crash. By and large
these initramfs images tend to be larger and offer a more typical (if not
standard) user operating environment. I'm looking at your tree now, and it
looks like a good start on standardizing the initramfs for the nominal case. Do
you have plans to include (or are you interested in including) support for
alternate infrastructure (like busybox instead of nash), interactive setup, etc?
Regards
Neil
--
/****************************************************
* Neil Horman <[email protected]>
* Software Engineer, Red Hat
****************************************************/
Christoph Hellwig wrote:
> On Wed, Dec 17, 2008 at 01:54:54PM -0500, Jeremy Katz wrote:
>> As davej started talking about a few months ago at Kernel Summit and
>> LPC, there's a lot of duplication between distros on the tools used to
>> generate the initramfs as well as the contents and how the initramfs
>> works. Ultimately, there's little reason for this not to be something
>> that is shared and worked on by everyone. Added to this is the fact
>> that everyone's infrastructures for this have grown up over a long-ish
>> period of time without significant amounts of reworking for the way that
>> the kernel and early boot works these days.
>>
>> Therefore I've started on a new project, dracut, to try to be a new
>> initramfs tool that can be used across various distributions. From the
>> README...
>
> It looks like Hannes has also been working on a new, modular initramfs
> for a while:
>
> http://git.kernel.org/?p=linux/kernel/git/hare/mkinitrd.git;a=summary
>
> I hope you guys can get together and agree on one implementation..
Thanks hch for pointing this out.
We definitely should get together to hammer our one implementation.
Having different scripts for every distributions is a PITA.
I'm not saying my implementation is the greatest on earth, so
if anyone has any better suggestions I'm all ears.
If anyone is interested I can give a short overview of it.
As per normal, proper documentation is about to be written RSN.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
[email protected] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Markus Rex, HRB 16746 (AG N?rnberg)
2008/12/17 Neil Horman <[email protected]>:
> On Wed, Dec 17, 2008 at 01:54:54PM -0500, Jeremy Katz wrote:
[snip]
>> Therefore I've started on a new project, dracut, to try to be a new
>> initramfs tool that can be used across various distributions. From the
>> README...
>>
>> Unlike existing initramfs's, this is an attempt at having as little as
>> possible hard-coded into the initramfs as possible. The initramfs has
>> (basically) one purpose in life -- getting the rootfs mounted so that we
>> can transition to the real rootfs.
[snip]
> Not that I don't think a unifying tool to create an initramfs is a bad idea
> (quite the contrary, I think it would be great), but I'd like to point out that
> one of your underlying premises is a bit shaky. That an initramfs has one
> purpose, that being to get the rootfs mounted, isn't entirely accurate. Kdump
> and various embedded systems being the prime examples here. Many embedded
> systems run entirely out of the initramfs, and contain all the code they need to
> do so in them. Additionally, kdump in most environments, attemps to capture
> core files entirely from the initramfs as well, operating under the assumption
> that the rootfs may not be functioning properly after a crash. By and large
> these initramfs images tend to be larger and offer a more typical (if not
> standard) user operating environment. I'm looking at your tree now, and it
> looks like a good start on standardizing the initramfs for the nominal case. Do
> you have plans to include (or are you interested in including) support for
> alternate infrastructure (like busybox instead of nash), interactive setup, etc?
Seconded: as a user of (various) distributions, I've come across
the following
problem: sometimes the initramfs, for whatever reason, does not manage to
find or mount the rootfs. In those cases I hate the developpers of
the distribution
because there is basically nothing that can be done in the initramfs
to see, test,
try, mount, modprobe, umount, ping, telnet, etc... Please, please, throw in a
full-featured busybox. Here is a good compromise I use:
Currently defined functions:
[, [[, arp, arping, ash, awk, basename, brctl, bunzip2, bzcat,
bzip2, cat, chgrp, chmod, chown, chroot, chvt, clear, cmp, comm,
cp, cpio, cut, date, dc, dd, deallocvt, depmod, df, diff, dirname,
dmesg, dos2unix, dpkg, dpkg-deb, du, dumpkmap, echo, egrep, eject,
env, expr, false, fbset, fdflush, fdformat, fdisk, fgrep, find,
findfs, ftpget, ftpput, grep, gunzip, gzip, halt, head, hexdump,
hostname, hwclock, ifconfig, ifenslave, insmod, ip, ipcalc, kbd_mode,
kill, killall, killall5, less, linux32, linux64, ln, loadfont,
loadkmap, losetup, ls, lsmod, md5sum, microcom, mkdir, mkfifo,
mknod, mkswap, mktemp, more, mount, mv, nice, od, openvt, patch,
pgrep, pidof, ping, ping6, pivot_root, pkill, poweroff, printf,
ps, pwd, readlink, reboot, reset, rm, rmdir, rmmod, route, rpm2cpio,
rtcwake, sed, setarch, setkeycodes, sh, sha1sum, sleep, sort,
stat, strings, stty, su, swapoff, swapon, switch_root, sync, tac,
tail, tar, tee, telnet, test, touch, tr, traceroute, true, tty,
udhcpc, umount, uname, uniq, unix2dos, unzip, vi, wc, wget, which,
xargs, yes, zcat
(linux32 and linux64 are probably not useful for general initramfs). With that
configuration, busybox (v1.13.0.svn, x86-32 executable) is 416224 bytes long
(dynamically linked with libc as only library needed).
Lo?c Greni?
Dave Jones wrote:
> On Sun, Dec 21, 2008 at 12:50:21AM +1100, Daniel Pittman wrote:
>
> > One of the features of the Debian / Ubuntu initramfs infrastructure,
> > which sounds remarkably like your design (or vice-versa), is that it
> > drops all the "standard" drivers into the initramfs.
> >
> > This is, to me, worth several minutes of additional boot time, in terms
> > of flexibility: being able to modify the hardware and be confident that
> > the appropriate drivers are in place already makes life much, much
> > easier.
>
> There's another reason this is really useful.
> If something goes wrong, remotely debugging a users initrd right is
> a lot easier if you know what it looks like. Right now, in Fedora for eg,
> where we generate an initrd for each users system at runtime, we need
> to get a copy of the generated initrd, and pull it apart just to find
> out what modules ended up in there, what didn't, and then somehow
> try to work backwards to try and figure out how the generator got into
> that state. After doing this for five years, let me tell you it's
> _really_ _really_ painful.
>
Whom do you tell.
I ended up on adding lots of shell escapes; everytime something goes
wrong you'll be dropped into a shell, which will resume execution
of the initrd once exited.
Quite handy for fixing up most things.
> > (In practice I doubt this adds more than a second or five to boot time;
> > certainly, it takes no longer to get to rootfs mounted than the RHEL 4
> > systems that have nothing but what is essential in the initrd...)
>
> At least in theory, with a kernel-event/udev driven system, the additional
> modules shouldn't cause any additional boot time. There wouldn't be
> events generated to cause them to be loaded, so they'd just be taking
> up space. And the additional load time for a bigger initrd should be
> really lost in the noise of the overall boot.
>
One can but hope. You certainly will notice a load time increase if the size
of the initrd increases by orders of magnitude.
Plus kdump / kexec will need to be configured to have more memory available.
Actually, I do like the callout idea:
Have the initrd configure a 'standard' system, and add some API which will
allow you to hook in additional scripts / services / whatever to configure
non-standard systems.
Which then can be distributed by the individual packages / vendors.
And then we would have a small common initramfs which well could be included
with the kernel sources.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
[email protected] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Markus Rex, HRB 16746 (AG N?rnberg)