ChangeLog: Partially fix B0RKEN kernel usability
checkpatch.pl'd, tested, applies cleanly to 2.6.32-rc7.
Seemingly best to go via trusted mmotm.
Thanks as always,
Signed-off-by: Andreas Mohr <[email protected]>
--- linux-2.6/init/main.c.orig 2009-11-16 20:13:08.000000000 +0100
+++ linux-2.6/init/main.c 2009-11-16 20:14:51.000000000 +0100
@@ -846,7 +846,8 @@ static noinline int init_post(void)
run_init_process("/bin/init");
run_init_process("/bin/sh");
- panic("No init found. Try passing init= option to kernel.");
+ panic("No init found. Try passing init= option to kernel. "
+ "See Linux Documentation/init.txt for guidance.");
}
static int __init kernel_init(void * unused)
--- /dev/null 2009-11-10 08:07:33.390012116 +0100
+++ linux-2.6/Documentation/init.txt 2009-11-16 20:17:57.000000000 +0100
@@ -0,0 +1,44 @@
+Explaining the dreaded "No init found." boot hang message
+=========================================================
+
+OK, so you've got this pretty unintuitive message (currently located
+in init/main.c) and are wondering what the H*** went wrong.
+Some high-level reasons for failure (listed roughly in order of execution)
+to load the init binary are:
+A) Unable to mount root FS
+B) init binary doesn't exist on rootfs
+C) other requirements not met
+D) binary exists but dependencies not available
+E) binary cannot be loaded
+
+Detailed explanations:
+0) Set "debug" kernel parameter (in bootloader or CONFIG_CMDLINE)
+to get more detailed kernel messages.
+A) Please make sure you have the correct root FS type
+(and root= kernel parameter points to the correct partition),
+required drivers such as storage hardware (such as SCSI or USB!)
+and filesystem (ext3, jffs2 etc.) are builtin (alternatively as modules by
+using initrd)
+C) Possibly a conflict in console= setup --> initial console unavailable.
+E.g. some serial consoles are unreliable due to serial IRQ issues (e.g. missing
+interrupt-based configuration).
+Try using a different console= device or e.g. netconsole=.
+D) e.g. crucial library dependencies of the init binary such as
+/lib/ld-linux.so.2 missing or broken. Use readelf -d <INIT>|grep NEEDED
+to find out which libraries are required.
+E) make sure the binary's architecture matches your hardware.
+E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware.
+Or did you try loading a non-binary file here!?! (shell script?)
+To find out more, add code patch to display kernel_execve()s return values.
+
+Please extend this explanation whenever you find new failure causes
+(after all loading the init binary is a CRITICAL and hard transition step
+which needs to be made as painless as possible), then submit patch to LKML.
+Further TODOs:
+- Implement the various run_init_process() invocations via a struct array
+ which can then store the kernel_execve() result value and on failure
+ log it all by iterating over _all_ results (very important usability fix).
+- try to make the implementation itself more helpful in general,
+ e.g. by providing additional error messages at affected places.
+
+Andreas Mohr <andi at lisas period de>
On 20:40 Mon 16 Nov , Andreas Mohr wrote:
> ChangeLog: Partially fix B0RKEN kernel usability
Improving error messages is a good idea, but I'm not sure how much this
patch actually helps.
> --- linux-2.6/init/main.c.orig 2009-11-16 20:13:08.000000000 +0100
> +++ linux-2.6/init/main.c 2009-11-16 20:14:51.000000000 +0100
> @@ -846,7 +846,8 @@ static noinline int init_post(void)
> run_init_process("/bin/init");
> run_init_process("/bin/sh");
>
> - panic("No init found. Try passing init= option to kernel.");
> + panic("No init found. Try passing init= option to kernel. "
> + "See Linux Documentation/init.txt for guidance.");
I think that the people who know where to look after reading this are
mainly the people who don't need to read that file, with one exception -
point (C) later on.
> +OK, so you've got this pretty unintuitive message (currently located
> +in init/main.c) and are wondering what the H*** went wrong.
> +Some high-level reasons for failure (listed roughly in order of execution)
> +to load the init binary are:
> +A) Unable to mount root FS
Whenever the root FS has been unable to mount, I've always received an
error message that included the string "VFS: Unable to mount root fs".
Has this changed recently? What sort of setup causes one to receive "No
init found" instead?
> +B) init binary doesn't exist on rootfs
> +C) other requirements not met
The introduction to this list already stated that it is not exhaustive,
so this entry adds no new information. After reading the detailed
explanation, "broken console device" seems more appropriate here.
> +D) binary exists but dependencies not available
> +E) binary cannot be loaded
To me, (B), (D) and (E) are the same thing, and could just be "binary
cannot be loaded". The details can be expanded upon in the next
section.
> +Detailed explanations:
<snip>
> +C) Possibly a conflict in console= setup --> initial console unavailable.
> +E.g. some serial consoles are unreliable due to serial IRQ issues (e.g. missing
> +interrupt-based configuration).
> +Try using a different console= device or e.g. netconsole=.
This appears to be by far the most interesting point in this file, since
it clarifies that "No init found." might be caused by a configuration
problem which seems completely unrelated to loading init.
> +D) e.g. crucial library dependencies of the init binary such as
> +/lib/ld-linux.so.2 missing or broken. Use readelf -d <INIT>|grep NEEDED
> +to find out which libraries are required.
> +E) make sure the binary's architecture matches your hardware.
> +E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware.
> +Or did you try loading a non-binary file here!?! (shell script?)
Linux is perfectly happy to load a shell script as init, so this comment
is very misleading.
--
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
On Mon, Nov 16, 2009 at 03:35:45PM -0500, Nick Bowler wrote:
> On 20:40 Mon 16 Nov , Andreas Mohr wrote:
> > --- linux-2.6/init/main.c.orig 2009-11-16 20:13:08.000000000 +0100
> > +++ linux-2.6/init/main.c 2009-11-16 20:14:51.000000000 +0100
> > @@ -846,7 +846,8 @@ static noinline int init_post(void)
> > run_init_process("/bin/init");
> > run_init_process("/bin/sh");
> >
> > - panic("No init found. Try passing init= option to kernel.");
> > + panic("No init found. Try passing init= option to kernel. "
> > + "See Linux Documentation/init.txt for guidance.");
>
> I think that the people who know where to look after reading this are
> mainly the people who don't need to read that file, with one exception -
> point (C) later on.
I'm afraid I have to disagree with some parts in this mail.
As an LKML regular I've certainly had a rather higher share of problems
in all this than I'd ever have expected.
As for less-involved people, they will just raise eyebrows on
"Documentation/init.txt", Google the term (as long as they've got a second
working computer, that is ;) and be happy.
> > +OK, so you've got this pretty unintuitive message (currently located
> > +in init/main.c) and are wondering what the H*** went wrong.
> > +Some high-level reasons for failure (listed roughly in order of execution)
> > +to load the init binary are:
> > +A) Unable to mount root FS
>
> Whenever the root FS has been unable to mount, I've always received an
> error message that included the string "VFS: Unable to mount root fs".
> Has this changed recently? What sort of setup causes one to receive "No
> init found" instead?
This _might_ be the case (I think it happens often indeed), but you
never know whether it's correctly output in 100% of these cases
(e.g. possibly depending on whether "debug" is specified or not, as one
factor only!).
And given the avalanchy multitude of problems in this area my
staunch opinion is that this guidance should be committed NOW regardless
of whether it's got a "perfect" appearance (i.e. 100% of the content is
fully accurate, lists all required hints and doesn't contain false positives).
So far we've provided almost NOTHING, so let's at least add something, soon.
I'll just give further examples:
a) [same day] saw http://lkml.org/lkml/2009/11/10/526 during some light LKML reading
b) [same day] _first_ pastebin plea for help that I encountered on #openwrt
- guess what it was about?
c) [next day] wasting half a day at work due to Red Hat's sheer
inability to make a system work with more than 7MB/s on SATA hardware.
Even worse, trying to fix this up by going the way of building a custom
2.6.31.5 (something I'm doing all the time elsewhere), I even managed to hit
SEVERE Red Hat initrd root device issues (culminating in "Init not found."),
with about a hundred UNSOLVED Google results in trying to make a buggy
initrd / nash setup accept a different root device.
Talk about double fault, for crying out loud.
d) [second next day] private thankful reply of another power user to my patch mail
citing Debian initrd issues due to ldd issues causing .so's to get lost and
thus a "No init found." message produced.
> > +B) init binary doesn't exist on rootfs
> > +C) other requirements not met
>
> The introduction to this list already stated that it is not exhaustive,
> so this entry adds no new information. After reading the detailed
> explanation, "broken console device" seems more appropriate here.
Indeed, it's better to have one-liners with specific issues and then
multi-liners elaborating on these issues, I'll update it.
> > +D) binary exists but dependencies not available
> > +E) binary cannot be loaded
>
> To me, (B), (D) and (E) are the same thing, and could just be "binary
> cannot be loaded". The details can be expanded upon in the next
> section.
>
> > +Detailed explanations:
> <snip>
> > +C) Possibly a conflict in console= setup --> initial console unavailable.
> > +E.g. some serial consoles are unreliable due to serial IRQ issues (e.g. missing
> > +interrupt-based configuration).
> > +Try using a different console= device or e.g. netconsole=.
>
> This appears to be by far the most interesting point in this file, since
> it clarifies that "No init found." might be caused by a configuration
> problem which seems completely unrelated to loading init.
Users don't care much whether the message is "Init not found." or
"console broken." or whatever, all they know is that their system
doesn't work and that they want immediate help and earnest attempts
in getting this thing resolved.
Of course it would be nice to have individual areas of problems output
their fair share of log messages (e.g. console setup), but as long as we
don't have that entirely and I'm not fully ready to figure out myself
all places that are lacking certain messages (as opposed to e.g.
core developers), we need (certainly imperfect) helper documentation NOW.
> > +D) e.g. crucial library dependencies of the init binary such as
> > +/lib/ld-linux.so.2 missing or broken. Use readelf -d <INIT>|grep NEEDED
> > +to find out which libraries are required.
> > +E) make sure the binary's architecture matches your hardware.
> > +E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware.
> > +Or did you try loading a non-binary file here!?! (shell script?)
>
> Linux is perfectly happy to load a shell script as init, so this comment
> is very misleading.
Oh, interesting. I've seen a warning about this in a forum, thus I added
it here, but I don't have experience with this myself, so I guess it's
ok after all, thanks!
(and there are several reports that seem to confirm that a shell script is
possible, probably since the shebang mechanism likely is ld.so-related)
This part should thus be altered to mention that a script needs to have its fully
working interpreter binary plus dependencies available.
I'll submit a new version of this patch very soon.
Thanks,
Andreas Mohr
On Tue, Nov 17, 2009 at 09:40:16PM +0100, Andreas Mohr wrote:
> I'll submit a new version of this patch very soon.
Well, took quite a while longer, partly due to broken Broadcom USB host
(OpenWrt fix to be submitted) and non-working USB-audio on nicer platforms.
Took most of the comments into account (thanks!), improved some wording.
Patch against current git, compile- and runtime-tested,
checkpatch.pl'd (with a single nice hierarchy warning resulting from mixing
git diff output and manual /dev/null diffing).
Thanks!
Signed-off-by: Andreas Mohr <[email protected]>
diff --git a/init/main.c b/init/main.c
index dac44a9..33748c6 100644
--- a/init/main.c
+++ b/init/main.c
@@ -836,7 +836,8 @@ static noinline int init_post(void)
run_init_process("/bin/init");
run_init_process("/bin/sh");
- panic("No init found. Try passing init= option to kernel.");
+ panic("No init found. Try passing init= option to kernel. "
+ "See Linux Documentation/init.txt for guidance.");
}
static int __init kernel_init(void * unused)
--- /dev/null 2009-12-27 16:25:29.521258205 +0100
+++ Documentation/init.txt 2009-12-27 15:47:46.000000000 +0100
@@ -0,0 +1,49 @@
+Explaining the dreaded "No init found." boot hang message
+=========================================================
+
+OK, so you've got this pretty unintuitive message (currently located
+in init/main.c) and are wondering what the H*** went wrong.
+Some high-level reasons for failure (listed roughly in order of execution)
+to load the init binary are:
+A) Unable to mount root FS
+B) init binary doesn't exist on rootfs
+C) broken console device
+D) binary exists but dependencies not available
+E) binary cannot be loaded
+
+Detailed explanations:
+0) Set "debug" kernel parameter (in bootloader config file or CONFIG_CMDLINE)
+ to get more detailed kernel messages.
+A) make sure you have the correct root FS type
+ (and root= kernel parameter points to the correct partition),
+ required drivers such as storage hardware (such as SCSI or USB!)
+ and filesystem (ext3, jffs2 etc.) are builtin (alternatively as modules,
+ to be pre-loaded by an initrd)
+C) Possibly a conflict in console= setup --> initial console unavailable.
+ E.g. some serial consoles are unreliable due to serial IRQ issues (e.g.
+ missing interrupt-based configuration).
+ Try using a different console= device or e.g. netconsole= .
+D) e.g. required library dependencies of the init binary such as
+ /lib/ld-linux.so.2 missing or broken. Use readelf -d <INIT>|grep NEEDED
+ to find out which libraries are required.
+E) make sure the binary's architecture matches your hardware.
+ E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware.
+ In case you tried loading a non-binary file here (shell script?),
+ you should make sure that the script specifies an interpreter in its shebang
+ header line (#!/...) that is fully working (including its library
+ dependencies). And before tackling scripts, better first test a simple
+ non-script binary such as /bin/sh and confirm its successful execution.
+ To find out more, add code to init/main.c to display kernel_execve()s
+ return values.
+
+Please extend this explanation whenever you find new failure causes
+(after all loading the init binary is a CRITICAL and hard transition step
+which needs to be made as painless as possible), then submit patch to LKML.
+Further TODOs:
+- Implement the various run_init_process() invocations via a struct array
+ which can then store the kernel_execve() result value and on failure
+ log it all by iterating over _all_ results (very important usability fix).
+- try to make the implementation itself more helpful in general,
+ e.g. by providing additional error messages at affected places.
+
+Andreas Mohr <andi at lisas period de>
On Sun, 27 Dec 2009, Andreas Mohr wrote:
> Well, took quite a while longer, partly due to broken Broadcom USB host
> (OpenWrt fix to be submitted) and non-working USB-audio on nicer platforms.
>
> Took most of the comments into account (thanks!), improved some wording.
>
> Patch against current git, compile- and runtime-tested,
> checkpatch.pl'd (with a single nice hierarchy warning resulting from mixing
> git diff output and manual /dev/null diffing).
>
> Thanks!
>
It looks like this patch got mangled when added to mmotm-2010-02-01-16-25
in init-mainc-improve-usability-in-case-of-init-binary-failure.patch since
it added init.txt to the root directory instead of Documentation, even
though the patch below is correct.
> Signed-off-by: Andreas Mohr <[email protected]>
>
> diff --git a/init/main.c b/init/main.c
> index dac44a9..33748c6 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -836,7 +836,8 @@ static noinline int init_post(void)
> run_init_process("/bin/init");
> run_init_process("/bin/sh");
>
> - panic("No init found. Try passing init= option to kernel.");
> + panic("No init found. Try passing init= option to kernel. "
> + "See Linux Documentation/init.txt for guidance.");
> }
>
> static int __init kernel_init(void * unused)
> --- /dev/null 2009-12-27 16:25:29.521258205 +0100
> +++ Documentation/init.txt 2009-12-27 15:47:46.000000000 +0100
> @@ -0,0 +1,49 @@
> +Explaining the dreaded "No init found." boot hang message
> +=========================================================
> +
> +OK, so you've got this pretty unintuitive message (currently located
> +in init/main.c) and are wondering what the H*** went wrong.
> +Some high-level reasons for failure (listed roughly in order of execution)
> +to load the init binary are:
> +A) Unable to mount root FS
> +B) init binary doesn't exist on rootfs
> +C) broken console device
> +D) binary exists but dependencies not available
> +E) binary cannot be loaded
> +
> +Detailed explanations:
> +0) Set "debug" kernel parameter (in bootloader config file or CONFIG_CMDLINE)
> + to get more detailed kernel messages.
> +A) make sure you have the correct root FS type
> + (and root= kernel parameter points to the correct partition),
> + required drivers such as storage hardware (such as SCSI or USB!)
> + and filesystem (ext3, jffs2 etc.) are builtin (alternatively as modules,
> + to be pre-loaded by an initrd)
> +C) Possibly a conflict in console= setup --> initial console unavailable.
> + E.g. some serial consoles are unreliable due to serial IRQ issues (e.g.
> + missing interrupt-based configuration).
> + Try using a different console= device or e.g. netconsole= .
> +D) e.g. required library dependencies of the init binary such as
> + /lib/ld-linux.so.2 missing or broken. Use readelf -d <INIT>|grep NEEDED
> + to find out which libraries are required.
> +E) make sure the binary's architecture matches your hardware.
> + E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware.
> + In case you tried loading a non-binary file here (shell script?),
> + you should make sure that the script specifies an interpreter in its shebang
> + header line (#!/...) that is fully working (including its library
> + dependencies). And before tackling scripts, better first test a simple
> + non-script binary such as /bin/sh and confirm its successful execution.
> + To find out more, add code to init/main.c to display kernel_execve()s
> + return values.
> +
> +Please extend this explanation whenever you find new failure causes
> +(after all loading the init binary is a CRITICAL and hard transition step
> +which needs to be made as painless as possible), then submit patch to LKML.
> +Further TODOs:
> +- Implement the various run_init_process() invocations via a struct array
> + which can then store the kernel_execve() result value and on failure
> + log it all by iterating over _all_ results (very important usability fix).
> +- try to make the implementation itself more helpful in general,
> + e.g. by providing additional error messages at affected places.
> +
> +Andreas Mohr <andi at lisas period de>
On Mon, 1 Feb 2010 23:10:51 -0800 (PST) David Rientjes <[email protected]> wrote:
> On Sun, 27 Dec 2009, Andreas Mohr wrote:
>
> > Well, took quite a while longer, partly due to broken Broadcom USB host
> > (OpenWrt fix to be submitted) and non-working USB-audio on nicer platforms.
> >
> > Took most of the comments into account (thanks!), improved some wording.
> >
> > Patch against current git, compile- and runtime-tested,
> > checkpatch.pl'd (with a single nice hierarchy warning resulting from mixing
> > git diff output and manual /dev/null diffing).
> >
> > Thanks!
> >
>
> It looks like this patch got mangled when added to mmotm-2010-02-01-16-25
> in init-mainc-improve-usability-in-case-of-init-binary-failure.patch since
> it added init.txt to the root directory instead of Documentation,
ah, thanks.
> even though the patch below is correct.
Nope, the patch was wrong:
> > --- a/init/main.c
> > +++ b/init/main.c
> ...
> > --- /dev/null 2009-12-27 16:25:29.521258205 +0100
> > +++ Documentation/init.txt 2009-12-27 15:47:46.000000000 +0100
Should've been a/Documentation/init.txt
On Tue, Feb 02, 2010 at 09:20:41AM -0800, Andrew Morton wrote:
> On Mon, 1 Feb 2010 23:10:51 -0800 (PST) David Rientjes <[email protected]> wrote:
>
> > On Sun, 27 Dec 2009, Andreas Mohr wrote:
> >
> > > Well, took quite a while longer, partly due to broken Broadcom USB host
> > > (OpenWrt fix to be submitted) and non-working USB-audio on nicer platforms.
> > >
> > > Took most of the comments into account (thanks!), improved some wording.
> > >
> > > Patch against current git, compile- and runtime-tested,
> > > checkpatch.pl'd (with a single nice hierarchy warning resulting from mixing
> > > git diff output and manual /dev/null diffing).
> > >
> > > Thanks!
> > >
> >
> > It looks like this patch got mangled when added to mmotm-2010-02-01-16-25
> > in init-mainc-improve-usability-in-case-of-init-binary-failure.patch since
> > it added init.txt to the root directory instead of Documentation,
>
> ah, thanks.
>
> > even though the patch below is correct.
>
> Nope, the patch was wrong:
>
> > > --- a/init/main.c
> > > +++ b/init/main.c
> > ...
> > > --- /dev/null 2009-12-27 16:25:29.521258205 +0100
> > > +++ Documentation/init.txt 2009-12-27 15:47:46.000000000 +0100
>
> Should've been a/Documentation/init.txt
Indeed, which is why I had mentioned it in the submission (above),
but [fatally, as it turned out] did not bother to fix this ""minor"" issue.
Lots of sorries,
Andreas Mohr