Frozen while compiling galeon (1.1.2, 778 files in ~50Mb),
also had xmms playing something (alsa-0.5.12, Ensoniq AudioPCI ES1371),
and some ssh (slow traffic, NIC Digital Equipment Corporation DECchip 21142/43).
NFS traffic (kernel automounter). XFree86 4.2.0, usb devices (mouse, for example).
Low static electricity.
It looks really bad :(
Ok, continue...
alt-sysrq-b booted, and sync seems also worked:
Feb 7 23:45:31 steel kernel: CPU 0: Machine Check Exception: 0000000000000004
Feb 7 23:45:31 steel kernel: Bank 4: b200000000040151
Feb 7 23:45:31 steel kernel: Kernel panic: CPU context corrupt
Feb 7 23:46:07 steel kernel: <6>SysRq : Emergency Sync
Feb 7 23:46:07 steel kernel: Syncing device 03:02 ... OK
I've pressed sysrq-s many times, at the moments sound played a second,
two or three times.
No serial console output, sorry, thought the system went stable.
Booted 2.5.4-pre1 before, recovered home reiserfs (--rebuild-tree)
from the mess it left. Rebooted in 2.4.18-pre8-K2. Got the panic.
-alex
P.S. no nasty suspections about processor, please. No funds reserved
for a new one :)
PIII-700, ASUS CUV4X (VIA KT133), <512Mb
ver_linux:
Linux steel 2.4.18-pre8-K2 #2 Thu Feb 7 00:02:26 CET 2002 i686 unknown
Gnu C 2.95.3
Gnu make 3.79.1
binutils 2.11.2
util-linux 2.11n
mount 2.11n
modutils 2.4.12
e2fsprogs 1.23
reiserfsprogs 3.x.0j
Linux C Library 2.2.4
Dynamic linker (ldd) 2.2.4
Procps 2.0.7
Console-tools 0.3.3
Sh-utils 2.0
Modules Loaded nfs lockd sunrpc ide-cd cdrom snd-seq-midi snd-seq-midi-event snd-seq snd-card-ens1371 snd-ens1371 snd-pcm snd-timer snd-rawmidi snd-seq-device snd-ac97-codec snd-mixer snd soundcore autofs4 tulip mousedev usbmouse usb-uhci usbcore input reiserfs ext3 jbd nls_iso8859-1 nls_cp437 vfat fat
On Fri, Feb 08, 2002 at 12:18:31AM +0100, Alex Riesen wrote:
> Feb 7 23:45:31 steel kernel: CPU 0: Machine Check Exception: 0000000000000004
> Feb 7 23:45:31 steel kernel: Bank 4: b200000000040151
> Feb 7 23:45:31 steel kernel: Kernel panic: CPU context corrupt
Machine checks are indicative of hardware fault.
Overclocking, inadequate cooling and bad memory are the usual causes.
> P.S. no nasty suspections about processor, please. No funds reserved
> for a new one :)
The truth hurts 8(
--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs
On Fri, Feb 08, 2002 at 12:36:53AM +0100, Dave Jones wrote:
> On Fri, Feb 08, 2002 at 12:18:31AM +0100, Alex Riesen wrote:
>
> > Feb 7 23:45:31 steel kernel: CPU 0: Machine Check Exception: 0000000000000004
> > Feb 7 23:45:31 steel kernel: Bank 4: b200000000040151
> > Feb 7 23:45:31 steel kernel: Kernel panic: CPU context corrupt
>
> Machine checks are indicative of hardware fault.
> Overclocking, inadequate cooling and bad memory are the usual causes.
no overclocking, memtest passed (1 pass, 1 hour), native intel cooler.
Space radiation, maybe 8)
> > P.S. no nasty suspections about processor, please. No funds reserved
> > for a new one :)
>
> The truth hurts 8(
oh dear...
>
> --
> | Dave Jones. http://www.codemonkey.org.uk
> | SuSE Labs
On Friday, February 08, 2002 at 22:18 +0100, Alex Riesen wrote:
> On Fri, Feb 08, 2002 at 12:36:53AM +0100, Dave Jones wrote:
> > On Fri, Feb 08, 2002 at 12:18:31AM +0100, Alex Riesen wrote:
> >
> > > Feb 7 23:45:31 steel kernel: CPU 0: Machine Check Exception:
> > > 0000000000000004
> > > Feb 7 23:45:31 steel kernel: Bank 4: b200000000040151
> > > Feb 7 23:45:31 steel kernel: Kernel panic: CPU context corrupt
> >
> > Machine checks are indicative of hardware fault.
> > Overclocking, inadequate cooling and bad memory are the usual causes.
>
> no overclocking, memtest passed (1 pass, 1 hour), native intel cooler.
> Space radiation, maybe 8)
We run it over night in our lab, to be sure...
Good luck!
-Dieter
--
Dieter N?tzel
Graduate Student, Computer Science
University of Hamburg
Department of Computer Science
@home: [email protected]
Hi!
> > Feb 7 23:45:31 steel kernel: CPU 0: Machine Check Exception: 0000000000000004
> > Feb 7 23:45:31 steel kernel: Bank 4: b200000000040151
> > Feb 7 23:45:31 steel kernel: Kernel panic: CPU context corrupt
>
> Machine checks are indicative of hardware fault.
> Overclocking, inadequate cooling and bad memory are the usual
> causes.
Maybe you should print something like
Machine Check Exception: .... (hardware problem!)
so that we get less reports like this?
Pavel
--
(about SSSCA) "I don't say this lightly. However, I really think that the U.S.
no longer is classifiable as a democracy, but rather as a plutocracy." --hpa
On Sat, 9 Feb 2002, Pavel Machek wrote:
> > > Feb 7 23:45:31 steel kernel: CPU 0: Machine Check Exception: 0000000000000004
> > > Feb 7 23:45:31 steel kernel: Bank 4: b200000000040151
> > > Feb 7 23:45:31 steel kernel: Kernel panic: CPU context corrupt
> > Machine checks are indicative of hardware fault.
> > Overclocking, inadequate cooling and bad memory are the usual
> > causes.
> Maybe you should print something like
> Machine Check Exception: .... (hardware problem!)
> so that we get less reports like this?
When I get around to finishing the diagnosis tool, I'll add
something like "Feed to decodemca for more info".
--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs
> > Maybe you should print something like
> > Machine Check Exception: .... (hardware problem!)
> > so that we get less reports like this?
>
> When I get around to finishing the diagnosis tool, I'll add
> something like "Feed to decodemca for more info".
For a lot of processors the MCE values are not documented. Strangely for
once Intel are the good guys and AMD seem to be sitting on the docs.
I can good understand that it is a hardware problem.
But if someone seems not to be interested in reports like this,
why dump them out? Just save what we can and hang silently,
but no reports, they're boring 8-]
What does the "Bank 4: b200000000040151" mean?
If that is a memory, can anyone help to find out which slot it is?
(memtest86 haven't found anything, btw, i doubt that counts)
-alex
P.S. if someone going to change the message about machine check,
could you please avoid lame descriptions? Like "(hardware problem!)"?
I sure the majority are experienced enough to understand what the
words "Machine Check" mean.
On Sat, Feb 09, 2002 at 11:23:58PM +0100, Pavel Machek wrote:
> Hi!
>
> > > Feb 7 23:45:31 steel kernel: CPU 0: Machine Check Exception: 0000000000000004
> > > Feb 7 23:45:31 steel kernel: Bank 4: b200000000040151
> > > Feb 7 23:45:31 steel kernel: Kernel panic: CPU context corrupt
> >
> > Machine checks are indicative of hardware fault.
> > Overclocking, inadequate cooling and bad memory are the usual
> > causes.
>
> Maybe you should print something like
>
> Machine Check Exception: .... (hardware problem!)
>
> so that we get less reports like this?
> Pavel
> --
> (about SSSCA) "I don't say this lightly. However, I really think that the U.S.
> no longer is classifiable as a democracy, but rather as a plutocracy." --hpa
Hi!
> What does the "Bank 4: b200000000040151" mean?
> If that is a memory, can anyone help to find out which slot it is?
> (memtest86 haven't found anything, btw, i doubt that counts)
> -alex
>
> P.S. if someone going to change the message about machine check,
> could you please avoid lame descriptions? Like "(hardware problem!)"?
> I sure the majority are experienced enough to understand what the
> words "Machine Check" mean.
Ugh? If you understand that its hardware problem, why did you bother
contacting l-k? l-k is certainly not interested in debugging hardware
problems....
...and... It is not exactly easy to see that Machine check means
hardware problem...
Pavel
> > > > Feb 7 23:45:31 steel kernel: CPU 0: Machine Check Exception: 0000000000000004
> > > > Feb 7 23:45:31 steel kernel: Bank 4: b200000000040151
> > > > Feb 7 23:45:31 steel kernel: Kernel panic: CPU context corrupt
> > >
> > > Machine checks are indicative of hardware fault.
> > > Overclocking, inadequate cooling and bad memory are the usual
> > > causes.
> >
> > Maybe you should print something like
> >
> > Machine Check Exception: .... (hardware problem!)
> >
> > so that we get less reports like this?
--
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.