2009-01-13 22:42:28

by H. Peter Anvin

[permalink] [raw]
Subject: The policy on initramfs decompression failure

As part of the multi-compression-formats patch, the issue has come up as
to what is the preferred policy is on initramfs decompression failure,
due to either corruption or due to the use of a compression format which
the kernel does not support.

I had personally assumed the proper policy would be to panic, since it
is unlikely to mean the system can be booted. However, Ingo brought up
the case where the initramfs is auxilliary to being able to boot the
full system, for example the initramfs supplied is primarily a data
carrier, and either the builtin initramfs or the kernel itself is
sufficient to boot.

By this argument, we should change initramfs decoding failure to a
KERN_CRIT message, and in the (presumably most common) case that it does
not suffice to boot the system, we will get a panic in short order as
the system is unable to find init.

This argument seems to mostly hold water, but it does implement a policy
change over the current code. Furthermore, it does make me concerned
that a *partial* decoding failure (such as can be caused by a corrupt
image, or, say, a gzipped image concatenated to a bzip2 image, with the
kernel only supporting bzip2) could cause a booted-but-dysfunctional
system, which is in many configurations a worse failure mode than a panic.

Hence I would like to solicit opinions about what the policy should be.

-hpa


2009-01-13 23:18:05

by Alain Knaff

[permalink] [raw]
Subject: Re: The policy on initramfs decompression failure

H. Peter Anvin wrote:
> As part of the multi-compression-formats patch, the issue has come up as
> to what is the preferred policy is on initramfs decompression failure,
> due to either corruption or due to the use of a compression format which
> the kernel does not support.
>
> I had personally assumed the proper policy would be to panic, since it
> is unlikely to mean the system can be booted. However, Ingo brought up
> the case where the initramfs is auxilliary to being able to boot the
> full system, for example the initramfs supplied is primarily a data
> carrier, and either the builtin initramfs or the kernel itself is
> sufficient to boot.
>
> By this argument, we should change initramfs decoding failure to a
> KERN_CRIT message, and in the (presumably most common) case that it does
> not suffice to boot the system, we will get a panic in short order as
> the system is unable to find init.
>
> This argument seems to mostly hold water, but it does implement a policy
> change over the current code. Furthermore, it does make me concerned
> that a *partial* decoding failure (such as can be caused by a corrupt
> image, or, say, a gzipped image concatenated to a bzip2 image, with the
> kernel only supporting bzip2) could cause a booted-but-dysfunctional
> system, which is in many configurations a worse failure mode than a panic.
>
> Hence I would like to solicit opinions about what the policy should be.
>
> -hpa

There is also the additional issue that continuing to boot might hide
the original error message. Indeed, the kernel might panic eventually
(as you said, for example due to missing init), but in the meantime the
original "junk in compressed archive" might have scrolled off the
screen. And after a panic, shift+pageup does not work to inspect past
messages.

Regards,

Alain

2009-01-13 23:42:59

by Bodo Eggert

[permalink] [raw]
Subject: Re: The policy on initramfs decompression failure

H. Peter Anvin <[email protected]> wrote:

[initramfs decryption failed]

> I had personally assumed the proper policy would be to panic, since it
> is unlikely to mean the system can be booted. However, Ingo brought up
> the case where the initramfs is auxilliary to being able to boot the
> full system, for example the initramfs supplied is primarily a data
> carrier, and either the builtin initramfs or the kernel itself is
> sufficient to boot.
>
> By this argument, we should change initramfs decoding failure to a
> KERN_CRIT message, and in the (presumably most common) case that it does
> not suffice to boot the system, we will get a panic in short order as
> the system is unable to find init.
>
> This argument seems to mostly hold water, but it does implement a policy
> change over the current code. Furthermore, it does make me concerned
> that a *partial* decoding failure (such as can be caused by a corrupt
> image, or, say, a gzipped image concatenated to a bzip2 image, with the
> kernel only supporting bzip2) could cause a booted-but-dysfunctional
> system, which is in many configurations a worse failure mode than a panic.

If the initrd is not decompressed successfully, and if it's not there for fun,
the system boot will be most likely fubar, and at worst a wrong system or a
less secured configuration may be started. Therefore I'd create a kernel
option. Possible values would be e.g. [*panic,continue] or
[*auto|required|optional|ignore], with:

panic, continue: As expected
auto: If there is an initrd, it's required, otherwise it's ignored
required: Panic if the initrd did not exist or was not unpacked correctly
optional: Like continue
ignore: Don't load the initrd
*: Default value

I currently don't know how to name the option.

2009-01-14 01:19:00

by Theodore Ts'o

[permalink] [raw]
Subject: Re: The policy on initramfs decompression failure

On Tue, Jan 13, 2009 at 02:38:49PM -0800, H. Peter Anvin wrote:
> As part of the multi-compression-formats patch, the issue has come up as
> to what is the preferred policy is on initramfs decompression failure,
> due to either corruption or due to the use of a compression format which
> the kernel does not support.
>
> I had personally assumed the proper policy would be to panic, since it
> is unlikely to mean the system can be booted. However, Ingo brought up
> the case where the initramfs is auxilliary to being able to boot the
> full system, for example the initramfs supplied is primarily a data
> carrier, and either the builtin initramfs or the kernel itself is
> sufficient to boot.

I would suggest a default policy of "panic", and a way of overriding
the policy, probably with a boot command-line option. The reality is
that most of the time, the failure case of a failed or partially
failed initramfs is not going to be well tested, as you have pointed
out, the downsidesof a booted-but-dysfunctional system is often going
to be worse than a hard failure. The partially decoded case is going
to be even worse, so I can see a three-way policy:

failed-initramfs-decode=panic Panic on failed initramfs
failed-initramfs-decode=partial If the initramfs fails part-way in,
decode what you can and let the boot
system see what files could be fully
decypted
failed-initramfs-decode=allow If the initramfs decryption fails
part-of-the-way in, continue the
boot, but do not provide the partial
initramfs --- i.e., this is the
all-or-nothing option

If this is too complicated, I'd be happy with the "panic on failed
initramfs". After all, the user can always simply delete the initrd
specifier from their grub boot configuration, and simply retry the
boot....

- Ted

2009-01-14 05:40:42

by Ingo Molnar

[permalink] [raw]
Subject: Re: The policy on initramfs decompression failure


* Alain Knaff <[email protected]> wrote:

> There is also the additional issue that continuing to boot might hide
> the original error message. [...]

I have hit this pointless panic during testing, that's the motivation for
this whole question. The initrd was unimportant in that bootup - but that
is generally true of bzImage bootups.

And your argument makes little sense: if there is something wrong then one
looks at the logs _anyway_. Are you suggesting that all warnings that
signal some potential badness should result in a panic? That is
nonsensical.

What you seem to be arguing for is to introduce a kernel option that says
"panic on warnings" - so that folks cannot miss warnings. _That_ would be
a fair argument.

Panics are rarely good, unless the user asks for it, period. We've been
flipping over BUG_ON()s to WARN_ON() everywhere where it matters in
practice.

Ingo

2009-01-14 05:46:20

by Ingo Molnar

[permalink] [raw]
Subject: Re: The policy on initramfs decompression failure


* Bodo Eggert <[email protected]> wrote:

> If the initrd is not decompressed successfully, [...]

No, that's not the issue - i think hpa's description was misleading in
that respect.

This is not some sort of corruption. I have hit this pointless panic
during testing: there was nothing wrong with either the initrd or the
system, the bzImage simply did not include the right decompressor .config
option to even read the initrd.

The analogue is if i booted a kernel with CONFIG_MODULES disabled. I do it
all the time, it always worked without problems and the initrd with
modules in it cannot be interpreted in any sane way CONFIG_MODULES - still
it works just fine because the initrd is uninteresting as far as the
modules go.

So basically now the kernel has regressed in its bzImage utility: "oh, i
dont have a decompressor for the initrd. PANIC!". And that is a step
backwards. Unless you use bzImage i dont think you can really appreciate
this argument.

I would not mind a warning message though, that bit makes sense.

Ingo

2009-01-14 06:47:59

by Alain Knaff

[permalink] [raw]
Subject: Re: The policy on initramfs decompression failure

Ingo Molnar wrote:
> So basically now the kernel has regressed in its bzImage utility: "oh, i
> dont have a decompressor for the initrd. PANIC!". And that is a step
> backwards.

Well, for the precise case of gzip I agree with you. The reason why you
are right in that special case is that gzip used to be the *only*
decompressor available, so it was always available, so you couldn't
easily chose a config option which removes the decompressor for a gzip
initrd.

However, if you think in more general terms, you _could_ get into that
case by attempting to boot up using a bzip2-compressed initrd. If you
fed such an initrd to the old unpatched code, you'd get an exception at
exactly the same place (populate_rootfs) for exactly the same reason
(kernel doesn't have a decompressor for the initrd). Please think about
it...

I'm not against fixing old problems with new code, that's in general a
good thing. What I do have an issue with is that this seems to become
_mandatory_, at least in this case...

And the result of such strictness will not be better code, but stagnant
code. Nobody will be able to address one problem, because anybody
attempting to do that will have his patch rejected because it doesn't
also solve the "hunger in the 3rd world" problem.

> Unless you use bzImage i dont think you can really appreciate
> this argument.

Maybe that's the source of our misunderstanding. What is this bzImage?
(I suppose it's not just the kernel name/format but something more. But
what?)

Regards,

Alain

2009-01-14 06:51:46

by Alain Knaff

[permalink] [raw]
Subject: Re: The policy on initramfs decompression failure

Theodore Tso wrote:
> failed-initramfs-decode=panic Panic on failed initramfs
> failed-initramfs-decode=partial If the initramfs fails part-way in,
> decode what you can and let the boot
> system see what files could be fully
> decypted
> failed-initramfs-decode=allow If the initramfs decryption fails
> part-of-the-way in, continue the
> boot, but do not provide the partial
> initramfs --- i.e., this is the
> all-or-nothing option

Interesting approach... but wouldn't it make more sense to have that be
global? Or else, eventually every single panic will have such a
tri-state switch, with associated option parsing and overhead, leading
to bloat.

> If this is too complicated, I'd be happy with the "panic on failed
> initramfs". After all, the user can always simply delete the initrd
> specifier from their grub boot configuration, and simply retry the
> boot....

Exactly!

Regards,

Alain

2009-01-14 07:03:14

by Alain Knaff

[permalink] [raw]
Subject: Re: The policy on initramfs decompression failure

Ingo Molnar wrote:
> And your argument makes little sense: if there is something wrong then one
> looks at the logs _anyway_.

Unfortunately, not everybody has the knowledge or equipment ready to set
up a serial console... And logs in the classical sense (in a logfile...)
don't exist yet at that early stage of boot, because it happens _before_
the kernel is able to write to the filesystem...

> Are you suggesting that all warnings that
> signal some potential badness should result in a panic? That is
> nonsensical.

There must be some misunderstanding somewhere. I didn't make any such
suggestion. I agree with you, such a suggestion would be nonsensical.

> What you seem to be arguing for is to introduce a kernel option that says
> "panic on warnings" - so that folks cannot miss warnings. _That_ would be
> a fair argument.

That would be an interesting idea, but might lead to the opposite
problem (kernel stopping _before_ the real problem happens).

Maybe what we could do is "fix" panic() such that it doesn't disable
Shift-Pgup. But I admit that such a change may not be trivial to
implement, as there may be cases where the interrupt system is fubar,
and all interrupt handlers (including keyboard) would need to be disabled.

> Panics are rarely good, unless the user asks for it, period. We've been
> flipping over BUG_ON()s to WARN_ON() everywhere where it matters in
> practice.
>
> Ingo

That is a valid philosophical discussion. But shouldn't we move it to a
thread of its own?

Regards,

Alain

2009-01-14 07:46:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: The policy on initramfs decompression failure


* Alain Knaff <[email protected]> wrote:

> > Unless you use bzImage i dont think you can really appreciate this
> > argument.
>
> Maybe that's the source of our misunderstanding. What is this bzImage?
> (I suppose it's not just the kernel name/format but something more. But
> what?)

pure bzImages is what many kernel developers use to boot static kernel
images, with drivers built in, often with no module support, etc.:

$ make help | grep -i bzImage
* bzImage - Compressed kernel image (arch/x86/boot/bzImage)

the bzImage method never led to a panic related to a ramdisk before
(unless the ramdisk was materially corrupted - which is not the case
here), and should not lead to a panic afterwards either.

We use panics/crashes when the kernel meets a problem that makes
continuing impossible, but that is not an issue here.

Ingo

2009-01-14 07:49:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: The policy on initramfs decompression failure


* Alain Knaff <[email protected]> wrote:

> Ingo Molnar wrote:
> > And your argument makes little sense: if there is something wrong then one
> > looks at the logs _anyway_.
>
> Unfortunately, not everybody has the knowledge or equipment ready to set
> up a serial console... [...]

By your argument the ton of warnings we emit in various situations are
wrong too and all should be panic()s. That argument is bogus.

Not looking at the logs makes boot problem analysis harder of course. Nor
is your argument actually true: you can use printk_delay or any other
method. Or you can use the VGA console and use shift-pageup ...

Ingo

2009-01-14 08:24:22

by Alain Knaff

[permalink] [raw]
Subject: Re: The policy on initramfs decompression failure

Ingo Molnar wrote:
> * Alain Knaff <[email protected]> wrote:
>
>> Ingo Molnar wrote:
>>> And your argument makes little sense: if there is something wrong then one
>>> looks at the logs _anyway_.
>> Unfortunately, not everybody has the knowledge or equipment ready to set
>> up a serial console... [...]
>
> By your argument the ton of warnings we emit in various situations are
> wrong too and all should be panic()s.

That is not my argument. I never said something like that.

I don't know, but I have to wonder about the strength of _your_ position if
the only way to defend it is to put words into other people's mouth.

> That argument is bogus.

Indeed that argument is bogus. However, I'm not sure where it is coming from...

> Not looking at the logs makes boot problem analysis harder of course. Nor
> is your argument actually true: you can use printk_delay or any other
> method.

This is interesting information. Where can I find documentation about this?
Neither google, nor a find-grep in the kernel sources turned up anything
useful...

> Or you can use the VGA console and use shift-pageup ...

No you can't. Try it. Or is this only a kvm artifact?

>
> Ingo

Alain

2009-01-14 10:37:46

by Ingo Molnar

[permalink] [raw]
Subject: Re: The policy on initramfs decompression failure


* Alain Knaff <[email protected]> wrote:

> Ingo Molnar wrote:
> > * Alain Knaff <[email protected]> wrote:
> >
> >> Ingo Molnar wrote:
> >>> And your argument makes little sense: if there is something wrong then one
> >>> looks at the logs _anyway_.
> >> Unfortunately, not everybody has the knowledge or equipment ready to set
> >> up a serial console... [...]
> >
> > By your argument the ton of warnings we emit in various situations are
> > wrong too and all should be panic()s.
>
> That is not my argument. I never said something like that.

I did not say that it is your argument, i said it is _by_ your argument:
i.e. it is a logical extension of your argument.

Exactly how is such a warning different from other warnings that the
kernel already emits? For which people supposedly have to set up a serial
console? (which they dont have to)

Answer: it is not different, and it is exactly as hard or easy to find as
the other ones. I.e. why should this warning get a special treatment? I
already told the kernel that i dont want a gzip ramfs image decompressor
by turning off the (otherwise default-enabled) option. panic()ing on that
decision, overriding my decision and escallating it into a non-working
system is silly and a bug.

Ingo

2009-01-14 15:19:59

by Bodo Eggert

[permalink] [raw]
Subject: Re: The policy on initramfs decompression failure

On Wed, 14 Jan 2009, Ingo Molnar wrote:
> * Bodo Eggert <[email protected]> wrote:

> > If the initrd is not decompressed successfully, [...]
>
> No, that's not the issue - i think hpa's description was misleading in
> that respect.
>
> This is not some sort of corruption. I have hit this pointless panic
> during testing: there was nothing wrong with either the initrd or the
> system, the bzImage simply did not include the right decompressor .config
> option to even read the initrd.

A unknown-compressed initrd is as good or as bad as a corrupted rd.
The kernel can't decide if it's got /dev/random or e.g. a RAR archive.
Therefore it must and should behave the same.

> The analogue is if i booted a kernel with CONFIG_MODULES disabled. I do it
> all the time, it always worked without problems and the initrd with
> modules in it cannot be interpreted in any sane way CONFIG_MODULES - still
> it works just fine because the initrd is uninteresting as far as the
> modules go.

> So basically now the kernel has regressed in its bzImage utility: "oh, i
> dont have a decompressor for the initrd. PANIC!". And that is a step
> backwards. Unless you use bzImage i dont think you can really appreciate
> this argument.

If there is no initrd, you won't get a panic. If you use a gzip initrd
with a bz2-only kernel what do you expect? What do you expect if you
say "root=/dev/internal-disk", but /dev/usb/attacker's-USB-stick is
currently the only working alternative?

I think having a kernel parameter ist the right thing, since it
won't decrease security, it gives everything you want and it allows
you to skip even "good" initrds if they turn out not to be good.

> I would not mind a warning message though, that bit makes sense.

"Warning, I'm starting a setup which you didn't intend to start
at all! Muahahahaha, good luck!"
--
Interchangeable parts aren't.

2009-01-14 17:46:42

by H. Peter Anvin

[permalink] [raw]
Subject: Re: The policy on initramfs decompression failure

Ingo Molnar wrote:
>
> By your argument the ton of warnings we emit in various situations are
> wrong too and all should be panic()s. That argument is bogus.
>

Thought about this whole thing some more, and it seems to me as follows:
what we really want, and need, is a "panic-level=X" option, where X will
naturally vary for differnet users. I suspect there are many users
today who would prefer a panic (and reboot) on a KERN_CRIT message, even
at runtime. For finer control, we need a message subsystem tag, but
that is something that would be highly desirable anyway.

As such, the initramfs decompression failure should be a KERN_CRIT or
KERN_ALERT message, and not a panic per se.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-01-14 18:36:04

by Bodo Eggert

[permalink] [raw]
Subject: Re: The policy on initramfs decompression failure

Ingo Molnar <[email protected]> wrote:

> Exactly how is such a warning different from other warnings that the
> kernel already emits? For which people supposedly have to set up a serial
> console? (which they dont have to)
>
> Answer:

A warning is given if the systen knows the correct way how to deal with the
situation, even if it shouldn't be there. How would you know that any system
being at the compiled-in default root device location (e.g. /dev/sda1, if I
did not have raid) WILL NOT boot unless I intend it to boot? Maybe it's a
rescue system supposed to run from initrd only? Or a public terminal
supposedly running from initrd + network only, where /dev/sda is the
client's USB stick? Or it's a remote setup, and panic() will reboot into the
old, working setup?

2009-01-18 12:55:47

by Bodo Eggert

[permalink] [raw]
Subject: Re: The policy on initramfs decompression failure

H. Peter Anvin <[email protected]> wrote:

> Thought about this whole thing some more, and it seems to me as follows:
> what we really want, and need, is a "panic-level=X" option, where X will
> naturally vary for differnet users. I suspect there are many users
> today who would prefer a panic (and reboot) on a KERN_CRIT message, even
> at runtime. For finer control, we need a message subsystem tag, but
> that is something that would be highly desirable anyway.

This will be fun if there are read errors on the CDROM.

> As such, the initramfs decompression failure should be a KERN_CRIT or
> KERN_ALERT message, and not a panic per se.

Only if you can argue that not using the initrd WILL NEVER be bad.