2007-09-06 18:47:43

by Daniel Exner

[permalink] [raw]
Subject: Kernel Panic on 2.6.23-rc5

Hi!

I'm not really sure if this is a regression or if I simply hit a hardware
problem.
After some time of work (mostly hours sometimes minutes) my system will freeze
including Blinking LED's and unresponsiveness on SysRQ, but I finally got
this using netconsole:

CPU 0: Machine Check Exception: 0000000000000004
Bank 4: b200000000070f0f
Kernel panic - not syncing: CPU context corrupt

I Also keep getting ext3 errors about reading or writing in wrong zones.
Will now have a - this means many - run of memtest.

--
Greetings
Daniel Exner


2007-09-06 19:06:58

by Michal Piotrowski

[permalink] [raw]
Subject: Re: Kernel Panic on 2.6.23-rc5

Hi Daniel,

On 06/09/07, Daniel Exner <[email protected]> wrote:
> Hi!
>
> I'm not really sure if this is a regression or if I simply hit a hardware
> problem.
> After some time of work (mostly hours sometimes minutes) my system will freeze
> including Blinking LED's and unresponsiveness on SysRQ, but I finally got
> this using netconsole:
>
> CPU 0: Machine Check Exception: 0000000000000004
> Bank 4: b200000000070f0f
> Kernel panic - not syncing: CPU context corrupt

It is a hardware problem.

You may want to use mcelog ftp://ftp.x86-64.org/pub/linux/tools/mcelog/

>
> I Also keep getting ext3 errors about reading or writing in wrong zones.
> Will now have a - this means many - run of memtest.
>
> --
> Greetings
> Daniel Exner

Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/

2007-09-06 22:16:59

by Daniel Exner

[permalink] [raw]
Subject: Re: Kernel Panic on 2.6.23-rc5

06 September 2007 Michal Piotrowski wrote:
[..]
> > CPU 0: Machine Check Exception: 0000000000000004
> > Bank 4: b200000000070f0f
> > Kernel panic - not syncing: CPU context corrupt
>
> It is a hardware problem.
>
> You may want to use mcelog ftp://ftp.x86-64.org/pub/linux/tools/mcelog/
Tried this on grml64 (cause I'm normaly on x86)
Results:

--
WARNING: with --dmi mcelog --ascii must run on the same machine with the
same BIOS/memory configuration as where the machine check occurred.
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 0 data cache STATUS 0 MCGSTATUS 4
Bank 4: b200000000070f0f
Kernel panic - not syncing: CPU context corrupt
---

This really doesnt say me anything the above didn't.
Next thing I tried was:

parsemce-e 0 -b 4 -s b200000000070f0f -a 0
Output:
Status: (0) Restart IP invalid.
parsebank(4): b200000000070f0f @ 0
External tag parity error
CPU state corrupt. Restart not possible
Error enabled in control register
Error not corrected.
Bus and interconnect error
Participation: Generic
Timeout:
Request: Generic error
Transaction type : Invalid
Memory/IO : Other


Wich doesnt tell me anything either :(

Google suggest anything from broken CPU(bad), broken RAM(even more bad) to
broken mainboard(Ouch..)

I'm going to let memtest run overnight. This is the easiest test I guess :)
--
Greetings
Daniel Exner
--
Mit freundlichen Gr??en
Daniel Exner

2007-09-12 07:20:46

by Daniel Exner

[permalink] [raw]
Subject: Re: Kernel Panic on 2.6.23-rc5 (solved)

Hi!

Michal Piotrowski:
> On 06/09/07, Daniel Exner <[email protected]> wrote:
> > I'm not really sure if this is a regression or if I simply hit a hardware
> > problem.
> > After some time of work (mostly hours sometimes minutes) my system will
> > freeze including Blinking LED's and unresponsiveness on SysRQ, but I
> > finally got this using netconsole:
> >
> > CPU 0: Machine Check Exception: 0000000000000004
> > Bank 4: b200000000070f0f
> > Kernel panic - not syncing: CPU context corrupt
> It is a hardware problem.
You where right. I switched the power suply (first guess of hardware guy ;)
The Box is now up 2 days 9hrs and no kp so far :)

I really should use sensord to show undervoltages in syslog..

> You may want to use mcelog ftp://ftp.x86-64.org/pub/linux/tools/mcelog/
This is a nice tool, but why is it only available for x86_64 ?
The MCE reporting facility is in place in x86, too.

Anyway I only send this mail to say: Not Kernel's fault.

--
Greetings
Daniel Exner