2008-12-29 07:03:20

by Igor Podlesny

[permalink] [raw]
Subject: > I even didn't have a backtrace.

2008/12/29 Igor Podlesny <[email protected]>:
> 2008/12/29 Willy Tarreau <[email protected]>:
>> On Mon, Dec 29, 2008 at 12:39:55PM +0700, Igor Podlesny wrote:
> [...]
>> Well, I won't say that I find them 100% rock solid, but you seem to be
>> able to reproduce a lot of serious issues. Have you filed bug reports
>> to get them fixed ? You cannot expect people to fix bugs they're not
>> aware of !
>
> I even didn't have a backtrace. What's to fill in? "It just crashed 2
> times, Dear Bugzilla"? :-)
>
BTW, I wonder -- can the kernel store crash related information (if
any) in RAM, at certain addresses, so it can survive warm reboot and
get displayed "in dmesg" on the next boot?

--
End of message. Next message?


2008-12-29 12:52:45

by Igor Podlesny

[permalink] [raw]
Subject: Re: > I even didn't have a backtrace.

2008/12/29 Sitsofe Wheeler <[email protected]>:
> Igor Podlesny wrote:
>>
>> BTW, I wonder -- can the kernel store crash related information (if
>> any) in RAM, at certain addresses, so it can survive warm reboot and
>> get displayed "in dmesg" on the next boot?
>
> Perhaps you are thinking of kdump - http://lwn.net/Articles/108595/ ?

Similar, but no exactly. I just thought that unlikely BIOS erases
memory content during "fast checks", so kernel diagnostic could be
left at certain addresses, probably duplicated for safety, and newly
booted kernel could check if there were something left for it in RAM.
That's simpler than kdump/kexec, the question is only whether memory
is really left intact during BIOS work, at least partly.

--
End of message. Next message?

2008-12-29 13:25:26

by Sitsofe Wheeler

[permalink] [raw]
Subject: Re: > I even didn't have a backtrace.

Igor Podlesny wrote:
> Similar, but no exactly. I just thought that unlikely BIOS erases
> memory content during "fast checks", so kernel diagnostic could be

My understanding is that the only part of the BIOS that doesn't get
cleared over a reboot (the hardware clock) is already being abused for
suspend tracing -
http://mjmwired.net/kernel/Documentation/power/s2ram.txt .

2008-12-29 14:07:50

by Matthew Garrett

[permalink] [raw]
Subject: Re: > I even didn't have a backtrace.

On Mon, Dec 29, 2008 at 12:59:57PM +0000, Sitsofe Wheeler wrote:
> Igor Podlesny wrote:
> > Similar, but no exactly. I just thought that unlikely BIOS erases
> >memory content during "fast checks", so kernel diagnostic could be
>
> My understanding is that the only part of the BIOS that doesn't get
> cleared over a reboot (the hardware clock) is already being abused for
> suspend tracing -
> http://mjmwired.net/kernel/Documentation/power/s2ram.txt .

That's the only part of the system that's guaranteed to persist over a
power cycle. You probably have a little more flexibility with a warm
reboot.

--
Matthew Garrett | [email protected]

2008-12-29 19:24:36

by Bodo Eggert

[permalink] [raw]
Subject: Re: > I even didn't have a backtrace.

Igor Podlesny <[email protected]> wrote:
> 2008/12/29 Sitsofe Wheeler <[email protected]>:
>> Igor Podlesny wrote:

>>> BTW, I wonder -- can the kernel store crash related information (if
>>> any) in RAM, at certain addresses, so it can survive warm reboot and
>>> get displayed "in dmesg" on the next boot?
>>
>> Perhaps you are thinking of kdump - http://lwn.net/Articles/108595/ ?
>
> Similar, but no exactly. I just thought that unlikely BIOS erases
> memory content during "fast checks", so kernel diagnostic could be
> left at certain addresses, probably duplicated for safety, and newly
> booted kernel could check if there were something left for it in RAM.
> That's simpler than kdump/kexec, the question is only whether memory
> is really left intact during BIOS work, at least partly.

You may want to read "ISA System Architecture
By Tom Shanley, Don Anderson, John Swindle, MindShare, Inc"

http://books.google.com/books?id=iXE6mwUCNWQC&pg=PA110&lpg=PA112&ots=ZH_c88LEqd&dq=post+reset+flag+0072+0040&ie=ISO-8859-1&output=htmlhttp://books.google.com/books?id=iXE6mwUCNWQC&pg=PA112&lpg=PA112&dq=post+reset+flag+0072+0040&source=web&ots=ZH_c88LEqd&sig=3EbSpcxbKrPvd1ixhysJK4AuOrA&hl=en&sa=X&oi=book_result&resnum=1&ct=result

HTH.

2008-12-29 23:34:18

by Alan

[permalink] [raw]
Subject: Re: > I even didn't have a backtrace.

One place to hide stuff is the upper areas of video ram as a lot of
videocards don't clear the ram on a crash/reboot. I used to use a 3Dfx
32Mb card as a buffer with good effect.

Alan

2008-12-30 02:17:51

by Daniel Barkalow

[permalink] [raw]
Subject: Re: > I even didn't have a backtrace.

On Mon, 29 Dec 2008, Igor Podlesny wrote:

> 2008/12/29 Igor Podlesny <[email protected]>:
> > 2008/12/29 Willy Tarreau <[email protected]>:
> >> On Mon, Dec 29, 2008 at 12:39:55PM +0700, Igor Podlesny wrote:
> > [...]
> >> Well, I won't say that I find them 100% rock solid, but you seem to be
> >> able to reproduce a lot of serious issues. Have you filed bug reports
> >> to get them fixed ? You cannot expect people to fix bugs they're not
> >> aware of !
> >
> > I even didn't have a backtrace. What's to fill in? "It just crashed 2
> > times, Dear Bugzilla"? :-)
> >
> BTW, I wonder -- can the kernel store crash related information (if
> any) in RAM, at certain addresses, so it can survive warm reboot and
> get displayed "in dmesg" on the next boot?

Usually, you get one of:

- some bus is locked, and the CPU can't get kernel code from RAM, let
alone write anything anywhere
- triple fault, and the system spontaneously reboots
- system is unaware that anything's wrong, but nothing runs
- any attempt to get data useful for debugging hangs
- system is alive enough that you can interact with it and get info

That doesn't leave a big possibility for the kernel to determine that the
system has crashed and put something in memory for a warm reboot to find.
There isn't really any case where the kernel reboots intentionally in a
context where it thinks the system is crashing but has the ability to do
lots of information gathering.

About the only common reasons for the kernel to panic these days are a
missing filesystem or hardware driver, such that it can't find a root
partition or init, and these really ought to allow the user to debug
interactively (at least scroll up and look at messages) unless the system
is configured to reboot.

On your original question: your .config and boot dmesg, and anything
similar about the situations (like, "both times I was running hwclock" or
"I was copying big files over XFS..."). If you can trigger it reliably or
at least repeatably, that's at least as good as a backtrace.

You might post the oopses from your kernel logs, too. Also,
failing-to-suspend tends to leave useful messages in dmesg and not be too
hard to explain (at least as compared to failing to resume), and there's
been a push for getting all drivers to support suspending, even ones for
desktop hardware, so that can probably be fixed (of course, relatively few
people actually try suspending desktops, so it's easier for bugs to go
unnoticed there).

-Daniel
*This .sig left intentionally blank*