2011-05-01 01:52:44

by werner

[permalink] [raw]
Subject: 2.6.39-rc5-git2 boot crashs

Enclosed is still the /var/log/debug file. UNFORTUNATELY,
IT DONT SHOWS THE KERNEL VERSION (THIS IS SOMETHING WHAT
YOU COULD CORRECT !!!) The boots of today 14:17:43 and
21:00:01 are using 2.6.38.4 (during the 1st run, I
compiled/packaged 2.6.39-rc5-git4 without problems, the
2nd is running actually), between them were runs with
2.6.39-rc5-git4 ON WHICH IS DEBUG (in the kernel hacker
config) SWITCHED ON (on 2.6.38.4 it's NOT switched on).
Perhaps thats helpful anyhow.
wl
---
Professional hosting for everyone - http://www.host.ru


Attachments:
debug.bz2 (241.83 kB)

2011-05-01 02:53:09

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.39-rc5-git2 boot crashs

On Sat, Apr 30, 2011 at 6:52 PM, werner <[email protected]> wrote:
>
> Enclosed is still the /var/log/debug file.

Can you please just do what I asked to, namely do the minimal config.

That will also make it _much_ faster for you to compile the kernel
(you can probably do it in 10 minutes or less if you just get a really
nice config that only has the stuff you actually require - on my
machine I compile my kernel in about 3 minutes from scratch).

And not only would it tell us something (namely whether the problem
persists or not), but if it _does_ persist even with a minimal kernel
that only has the drivers you need, it would make it much more
feasible to do a bisect because now the compile/test cycle would be
much shorter.

So please, just do it. Because quite frankly, by now you're mainly
just wasting everybody's time.

> ?UNFORTUNATELY, IT DONT SHOWS THE
> KERNEL VERSION (THIS IS SOMETHING WHAT YOU COULD CORRECT !!!)

The kernel version shows up in dmesg as the very first thing. Something like

[ 0.000000] Linux version 2.6.39-rc5-00127-g1be6a1f89f13
([email protected]) (gcc version 4.5.1 20100924 (Red
Hat 4.5.1-4) (GCC) ) #9 SMP PREEMPT Fri Apr 29 16:28:05 PDT 2011

You've just lost it, possibly because your dmesg file ended up growing
too large, or possibly because some tool just didn't bother to save
it.

Linus

2011-05-01 02:54:49

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.39-rc5-git2 boot crashs

2011/4/30 werner <[email protected]>:
>
> On the other hand, there occurs also ocasional crashs during booting in an
> earlier stage. See the 2nd foto anexed.

Oh, you have KMEMLEAK enabled too.

Disable it. It is very useful when looking for leaks and when it
works, but it often doesn't work, and just leads to problems.

Linus

2011-05-01 18:07:38

by werner

[permalink] [raw]
Subject: Re: 2.6.39-rc5-git2 boot crashs

Yes, my kernel compiling / packaging / lilo.conf-writing
script always do this: it makes 3 manners for boot:
normal/grafics, text, initrd; for this it makes an
initrd too. See below. Normally this also works without
any problem. This is necessary if someone want to make an
installer. And just because of this, I compile plenty
things inside vmlinuz , not as modules, because nowadays
people have any kind of devices with touchpad, touchscreen
etc. And, last night, I tested all these boot methods,
but in all attempts happened that error, that it didnt
find the boot device, and then crashs, more strange
because there also is a bootwait option in the command
line (see below) (because nowadays people often have
installed my distro on an usb key, or on an eee pc just
the starter is on the internel flash chip, and the system
with root fs is on an usb harddisk what takes time to be
discovered).


But, now, the 2nd compilation, I made with the slackware
huge configuration. And this works !! It boots normally,
and it also don't crash if unzipping big files. So you
were right, to use a more little config, for find out
better the reason. However, remember, at the end (until
-rc20) also the everythingyes config should work, because
this always was the case, and have to work because
nowadays people have all kinds of computers, laptops,
phones, coffe machines, washmachines etc etc running Linux
and the distro kernel have to run ANYWAY on ANY DEVICE.


That successful slackware huge config is in the middle
between the slackware smp config explained above which
didnt boot, and between my normal everythingyes
configuration which gave plenty problems. So, the
differences to both sides can make it more easy to find
out, a) why the slackware smp config didnt boot at all,
not finding the root fs, and to the other side, why b)
my everythingyes config, which all the time worked (also
on 2.6.34.8), give so many problems with 2.6.39-rcX.

Pls find, enclosed, as the 2nd file, the successful config
which is still running on my computer and didnt crash yet
(entering with slackware huge smp into menuconfig end
going out without changes). Also, as the 3rd file, is the
difference of this to my everythingyes config used for
-git2 to -git4; from tese differences it should be
possible to find better what cause the many problems in
the sense of comparison b) above.

However, I should mention, that this successful config has
a problem with the grafics. On the laptop of my neighbour,
it switched to a wrong text mode where the characters are
little and 3 times on the top of the screen. And on my
computer, I cannot change to the text screen with
CTRL-ALT-F1 etc.


wl



---
Professional hosting for everyone - http://www.host.ru

2011-05-01 18:20:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.39-rc5-git2 boot crashs

2011/5/1 werner <[email protected]>:
>
> But, now, the 2nd compilation, I made with the slackware huge configuration.
> And this works !! It boots normally, and it also don't crash ?if unzipping
> big files. ?So you were right, to use a more little config, for find out
> better the reason.

Ok, good. Now we have confirmation that it's not the SATA driver
itself that causes problems, it's some other driver.

So what I'd suggest you try to do is a "config bisect" to see exactly
_what_ config option it is that makes things break. Steve Rostedt
wrote a tool ("ktest") for this exact thing, but I'm not entirely sure
that it will work for your situation. I've added Steve to the email
participants, and I'd suggest you read up on it:

http://lwn.net/Articles/414064/

would seem to be a good starting point.

Of course, you could just try to do it manually too - just turn one
subsystem at a time from a module in the working slackware config into
a compiled-in thing, so that eventually you end up with the
non-working "almost everything compiled in" case. And see which
subsystem it is that causes problems.

And then when you find the subsystem that makes the problem re-appear,
you'd need to go back and try each driver at a time.

Linus

2011-05-02 13:04:56

by Steven Rostedt

[permalink] [raw]
Subject: Re: 2.6.39-rc5-git2 boot crashs

On Sun, 2011-05-01 at 11:20 -0700, Linus Torvalds wrote:
> 2011/5/1 werner <[email protected]>:

> So what I'd suggest you try to do is a "config bisect" to see exactly
> _what_ config option it is that makes things break. Steve Rostedt
> wrote a tool ("ktest") for this exact thing, but I'm not entirely sure
> that it will work for your situation. I've added Steve to the email
> participants, and I'd suggest you read up on it:
>
> http://lwn.net/Articles/414064/
>
> would seem to be a good starting point.

Note, from his email he has:


> That successful slackware huge config is in the middle between the
> slackware smp config explained above which didnt boot, and between my
> normal everythingyes configuration which gave plenty problems.

Which, if I understand this correctly, is three different configs: A, B
and C and they have a relationship of A < B < C. Where A doesn't boot, B
does, and C boots with problems.

config-bisect can definitely find the issues between B and C, which I
think is a second issue. As for the problems of A and B it may not work
so well. This would require a "reverse bisect" as "config bisect" is
much like git bisect where it expects things to work then suddenly
break. The difference between git bisect and config bisect is that a
reverse wont work with config bisect (it might if you're lucky). That's
because git may have branches, but configs have nasty dependencies.

The way config bisect works to find a bad config (one if set will break
the kernel) is the following algorithm:

o You feed it a good config (does not have config that breaks the
kernel), and a config that breaks the kernel (contains a bad config).

again:

o It will select all the configs that are in both of the configs and
half of those configs that are in the difference of the two configs and
run make oldnoconfig on it. Because these configs can select other
configs or may depend on other configs not selected, it is possible that
we end up with the bad or good config again. So ktest does a diff on
this config to make sure it is different. If it is not then it will
select the other half instead. If it is also the same, then it will
select only one config at a time until it finds a config that is
different.

o runs the test
(you can add CONFIG_BISECT_TYPE = build, BISECT_MANUAL = 1, which will
just build the kernel and then wait for you to say if it was good or
bad. This is handy if you don't want to set up all the automation of
ktest and only want it to do the config bisect for you. Then you can
install and reboot the kernel and then tell ktest if it worked or not).
This still requires that the build machine to be on a separate box than
the test machine.

o If the test fails, it takes this config as the new bad config, and the
difference will be against this config and the good config.

o If the test passes, it believes all these configs that were selected
are good, and will permanently select them for the remainder of the
tests. We need to permanently select this configs because later configs
(and perhaps the bad config) may have dependencies on these configs.

o If there's no more configs to compare, then the last config to be
selected is the bad config, otherwise goto "again".


This works great when you have a bad config you are looking for. I've
used this 4 or 5 times already which had great results. But I do not
think it will help if there's a good config that makes the kernel work
again. If we select that config on the first pass, then all new configs
will contain this working config.


>
> Of course, you could just try to do it manually too - just turn one
> subsystem at a time from a module in the working slackware config into
> a compiled-in thing, so that eventually you end up with the
> non-working "almost everything compiled in" case. And see which
> subsystem it is that causes problems.
>
> And then when you find the subsystem that makes the problem re-appear,
> you'd need to go back and try each driver at a time.

I'm confused, as I thought the working config was between the two broken
configs he has. Maybe I just misunderstood.

-- Steve