2008-11-26 23:12:45

by Carsten Menke

[permalink] [raw]
Subject: Panic: NMI Watchdog detected LOCKUP on CPU

Hi List,

I hope this is not the too wrong place to ask a question about the issue in the subject as the problem arises with a rather old 2.6.15 Kernel of an Ubuntu 6.06 Installation. But as nobody has even looked at the report I filed in the Ubuntu Bug database for 2 months. I try my luck here. I got a HP ML 350 G5 Server running and I occasionly receive kernel panics (vary from 1 time per week up to 2 months). I just want to get any pointer if this may be a hardware issue or a kernel issue, which in turn may have been fixed in a later version. As only Red Hat is supported by HP it will get rather difficult to get someone on service (we have an extended On site Warranty) an replace parts as we're running Ubuntu on the server and 2nd the diagnostic CD (Smart Start) shows that everything is ok (That of course doesn't mean, i trust these tools to much).

One thing I noted is that always the smbd process occurs in the panic log, the second thing is that it seems that it is unrelated to the system load as it happened 1 time during full working hours and another time during the night whe the server was absolutly idle (The server in turn does not do much more than a load of 20% either).

So here is the log I caputred via syslogd from netconsole and the dmesg, please apologize that due to syslog some lines are split across multiple lines, which show actually up as one on the console

Hope someone get me anywhere

Carsten

--
Psssst! Schon vom neuen GMX MultiMessenger geh?rt? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger


Attachments:
syslog.mail.log (10.39 kB)
dmesg.txt (28.17 kB)
Download all attachments

2008-11-27 00:00:32

by Robert Hancock

[permalink] [raw]
Subject: Re: Panic: NMI Watchdog detected LOCKUP on CPU

Carsten Menke wrote:
> Hi List,
>
> I hope this is not the too wrong place to ask a question about the issue in the subject as the problem arises with a rather old 2.6.15 Kernel of an Ubuntu 6.06 Installation. But as nobody has even looked at the report I filed in the Ubuntu Bug database for 2 months. I try my luck here. I got a HP ML 350 G5 Server running and I occasionly receive kernel panics (vary from 1 time per week up to 2 months). I just want to get any pointer if this may be a hardware issue or a kernel issue, which in turn may have been fixed in a later version. As only Red Hat is supported by HP it will get rather difficult to get someone on service (we have an extended On site Warranty) an replace parts as we're running Ubuntu on the server and 2nd the diagnostic CD (Smart Start) shows that everything is ok (That
> of course doesn't mean, i trust these tools to much).
>
> One thing I noted is that always the smbd process occurs in the panic log, the second thing is that it seems that it is unrelated to the system load as it happened 1 time during full working hours and another time during the night whe the server was absolutly idle (The server in turn does not do much more than a load of 20% either).
>
> So here is the log I caputred via syslogd from netconsole and the dmesg, please apologize that due to syslog some lines are split across multiple lines, which show actually up as one on the console
>
> Hope someone get me anywhere
>
> Carsten
>
>

Normally server-class hardware doesn't cause these kinds of symptoms
from hardware problems, they usually spew NMIs, etc. but it's not
impossible. 2.6.15 is pretty ancient though. Can you try a newer kernel?

2008-11-27 00:20:10

by Carsten Menke

[permalink] [raw]
Subject: Re: Panic: NMI Watchdog detected LOCKUP on CPU


-------- Original-Nachricht --------
> Datum: Wed, 26 Nov 2008 18:00:11 -0600
> Von: Robert Hancock <[email protected]>
> An: [email protected]
> Betreff: Re: Panic: NMI Watchdog detected LOCKUP on CPU

> Normally server-class hardware doesn't cause these kinds of symptoms
> from hardware problems, they usually spew NMIs, etc. but it's not
> impossible. 2.6.15 is pretty ancient though. Can you try a newer kernel?
>

Yes I will try that definitley, but I hoped that someone maybe knows if that was a particular problem of that
kernel (according to the panic log) version and chances are good they are fixed in later versions. The second problem is, that it can takes fairly long until I know if the problem is gone because sometimes it takes 2 months until a crash happens. So the second question is, how to prove if this is not hardware or how to prove it is definitley hardware. I have read the list and saw that NMI CPU Lockup messages can also be due to kernel bugs.

Regards

Carsten
P.S. before the these problems happened the server was running well for a year, but I've also done the usual system upgrades
--
Psssst! Schon vom neuen GMX MultiMessenger geh?rt? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger