2003-09-03 21:20:44

by Robert L. Harris

[permalink] [raw]
Subject: nmi errors?



Can anyone tell me what this is?

16:00:09 mailserver kernel: Uhhuh. NMI received for unknown reason 31.
16:00:09 mailserver kernel: Dazed and confused, but trying to continue
16:00:09 mailserver kernel: Do you have a strange power saving mode enabled?
16:00:34 mailserver kernel: Uhhuh. NMI received for unknown reason 21.
16:00:34 mailserver kernel: Dazed and confused, but trying to continue

A coworker put a script on a server which loads up quite afew arrays
with pre-set values and then compares the values against arrays. As soon as he
kicked off the script I got alot of these in my log files. Not much longer and the
machine crashed hard.

Quad proc P3-550
16Gigs of RAM
Kernel: 2.4.22-rc2-ac3

CONFIG_HIGHMEM64G=y
CONFIG_HIGHMEM=y

Anyone have any thoughts or know what this means? Do I have a HIGHMEM
problem?

:wq!
---------------------------------------------------------------------------
Robert L. Harris | GPG Key ID: E344DA3B
@ x-hkp://pgp.mit.edu
DISCLAIMER:
These are MY OPINIONS ALONE. I speak for no-one else.

Life is not a destination, it's a journey.
Microsoft produces 15 car pileups on the highway.
Don't stop traffic to stand and gawk at the tragedy.


Attachments:
(No filename) (1.23 kB)
(No filename) (189.00 B)
Download all attachments

2003-09-03 21:25:44

by Richard B. Johnson

[permalink] [raw]
Subject: Re: nmi errors?

On Wed, 3 Sep 2003, Robert L. Harris wrote:

>
>
> Can anyone tell me what this is?
>
> 16:00:09 mailserver kernel: Uhhuh. NMI received for unknown reason 31.
> 16:00:09 mailserver kernel: Dazed and confused, but trying to continue
> 16:00:09 mailserver kernel: Do you have a strange power saving mode enabled?
> 16:00:34 mailserver kernel: Uhhuh. NMI received for unknown reason 21.
> 16:00:34 mailserver kernel: Dazed and confused, but trying to continue
>
> A coworker put a script on a server which loads up quite afew arrays
> with pre-set values and then compares the values against arrays. As soon as he
> kicked off the script I got alot of these in my log files. Not much longer and the
> machine crashed hard.
>

Possible bad RAM.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (794.73 BogoMips).
Note 96.31% of all statistics are fiction.


2003-09-03 21:34:24

by Robert L. Harris

[permalink] [raw]
Subject: Re: nmi errors?



We ran "memtest" on the machine over the weekend and it completed 3
times without any problems. Know a better or different test?


Thus spake Richard B. Johnson ([email protected]):

> On Wed, 3 Sep 2003, Robert L. Harris wrote:
>
> >
> >
> > Can anyone tell me what this is?
> >
> > 16:00:09 mailserver kernel: Uhhuh. NMI received for unknown reason 31.
> > 16:00:09 mailserver kernel: Dazed and confused, but trying to continue
> > 16:00:09 mailserver kernel: Do you have a strange power saving mode enabled?
> > 16:00:34 mailserver kernel: Uhhuh. NMI received for unknown reason 21.
> > 16:00:34 mailserver kernel: Dazed and confused, but trying to continue
> >
> > A coworker put a script on a server which loads up quite afew arrays
> > with pre-set values and then compares the values against arrays. As soon as he
> > kicked off the script I got alot of these in my log files. Not much longer and the
> > machine crashed hard.
> >
>
> Possible bad RAM.
>
> Cheers,
> Dick Johnson
> Penguin : Linux version 2.4.22 on an i686 machine (794.73 BogoMips).
> Note 96.31% of all statistics are fiction.
>

:wq!
---------------------------------------------------------------------------
Robert L. Harris | GPG Key ID: E344DA3B
@ x-hkp://pgp.mit.edu
DISCLAIMER:
These are MY OPINIONS ALONE. I speak for no-one else.

Life is not a destination, it's a journey.
Microsoft produces 15 car pileups on the highway.
Don't stop traffic to stand and gawk at the tragedy.


Attachments:
(No filename) (1.53 kB)
(No filename) (189.00 B)
Download all attachments

2003-09-04 12:17:18

by Richard B. Johnson

[permalink] [raw]
Subject: Re: nmi errors?

On Wed, 3 Sep 2003, Robert L. Harris wrote:

>
>
> We ran "memtest" on the machine over the weekend and it completed 3
> times without any problems. Know a better or different test?
>
>

Write 0x80 out port 0x70, and hope nobody accesses the RTC. This
will (should) disable the NMI line. Then see if the error messages
go away. If they do, it's a real NMI and you really do have bad
RAM somewhere. If they don't, your motherboard is getting glitched
either by bad design or something plugged into a slot that doesn't
have the correct timing specs.

If everything works, in spite of the NMI, just comment out the
kernel printk() and cross your fingers.

> Thus spake Richard B. Johnson ([email protected]):
>
> > On Wed, 3 Sep 2003, Robert L. Harris wrote:
> >
> > >
> > >
> > > Can anyone tell me what this is?
> > >
> > > 16:00:09 mailserver kernel: Uhhuh. NMI received for unknown reason 31.
> > > 16:00:09 mailserver kernel: Dazed and confused, but trying to continue
> > > 16:00:09 mailserver kernel: Do you have a strange power saving mode enabled?
> > > 16:00:34 mailserver kernel: Uhhuh. NMI received for unknown reason 21.
> > > 16:00:34 mailserver kernel: Dazed and confused, but trying to continue
> > >
> > > A coworker put a script on a server which loads up quite afew arrays
> > > with pre-set values and then compares the values against arrays. As soon as he
> > > kicked off the script I got alot of these in my log files. Not much longer and the
> > > machine crashed hard.
> > >
> >
> > Possible bad RAM.
> >
> > Cheers,
> > Dick Johnson
> > Penguin : Linux version 2.4.22 on an i686 machine (794.73 BogoMips).
> > Note 96.31% of all statistics are fiction.
> >
>
> :wq!
> ---------------------------------------------------------------------------
> Robert L. Harris | GPG Key ID: E344DA3B
> @ x-hkp://pgp.mit.edu
> DISCLAIMER:
> These are MY OPINIONS ALONE. I speak for no-one else.
>
> Life is not a destination, it's a journey.
> Microsoft produces 15 car pileups on the highway.
> Don't stop traffic to stand and gawk at the tragedy.
>

Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (794.73 BogoMips).
Note 96.31% of all statistics are fiction.


2003-09-04 15:21:14

by Martin Schlemmer

[permalink] [raw]
Subject: Re: nmi errors?

On Wed, 2003-09-03 at 23:34, Robert L. Harris wrote:
> We ran "memtest" on the machine over the weekend and it completed 3
> times without any problems. Know a better or different test?
>

You might try to enable all the tests, addresses and set the
cache to be always on in memtest. Typical keys pressed is:

c - 1 - 2 - 2 - 3 - 3 - 3

Another is goldmemory, which is fairly the same in default setup
as memtest with above config, but shareware, not gpl.

>
> Thus spake Richard B. Johnson ([email protected]):
>
> > On Wed, 3 Sep 2003, Robert L. Harris wrote:
> >
> > >
> > >
> > > Can anyone tell me what this is?
> > >
> > > 16:00:09 mailserver kernel: Uhhuh. NMI received for unknown reason 31.
> > > 16:00:09 mailserver kernel: Dazed and confused, but trying to continue
> > > 16:00:09 mailserver kernel: Do you have a strange power saving mode enabled?
> > > 16:00:34 mailserver kernel: Uhhuh. NMI received for unknown reason 21.
> > > 16:00:34 mailserver kernel: Dazed and confused, but trying to continue
> > >
> > > A coworker put a script on a server which loads up quite afew arrays
> > > with pre-set values and then compares the values against arrays. As soon as he
> > > kicked off the script I got alot of these in my log files. Not much longer and the
> > > machine crashed hard.
> > >
> >
> > Possible bad RAM.
> >
> > Cheers,
> > Dick Johnson
> > Penguin : Linux version 2.4.22 on an i686 machine (794.73 BogoMips).
> > Note 96.31% of all statistics are fiction.
> >
>
> :wq!
> ---------------------------------------------------------------------------
> Robert L. Harris | GPG Key ID: E344DA3B
> @ x-hkp://pgp.mit.edu
> DISCLAIMER:
> These are MY OPINIONS ALONE. I speak for no-one else.
>
> Life is not a destination, it's a journey.
> Microsoft produces 15 car pileups on the highway.
> Don't stop traffic to stand and gawk at the tragedy.
--
Martin Schlemmer


2003-09-04 15:26:13

by Robert L. Harris

[permalink] [raw]
Subject: Re: nmi errors?



I ran some tests Richard gave me which said it wasn't bad ram but a bad
motherboard. I just upgraded to 2.4.22-bk10 and it ran MUCH better,
able to use all 16Gigs happily for quite some time until a couple of the
processes started finishing then it gave the NMI's en mass, guess it
couldn't shove them off to the log server in time.

I'll try the below just to make sure but this is getting odd.


Thus spake Martin Schlemmer ([email protected]):

> On Wed, 2003-09-03 at 23:34, Robert L. Harris wrote:
> > We ran "memtest" on the machine over the weekend and it completed 3
> > times without any problems. Know a better or different test?
> >
>
> You might try to enable all the tests, addresses and set the
> cache to be always on in memtest. Typical keys pressed is:
>
> c - 1 - 2 - 2 - 3 - 3 - 3
>
> Another is goldmemory, which is fairly the same in default setup
> as memtest with above config, but shareware, not gpl.
>
> >
> > Thus spake Richard B. Johnson ([email protected]):
> >
> > > On Wed, 3 Sep 2003, Robert L. Harris wrote:
> > >
> > > >
> > > >
> > > > Can anyone tell me what this is?
> > > >
> > > > 16:00:09 mailserver kernel: Uhhuh. NMI received for unknown reason 31.
> > > > 16:00:09 mailserver kernel: Dazed and confused, but trying to continue
> > > > 16:00:09 mailserver kernel: Do you have a strange power saving mode enabled?
> > > > 16:00:34 mailserver kernel: Uhhuh. NMI received for unknown reason 21.
> > > > 16:00:34 mailserver kernel: Dazed and confused, but trying to continue
> > > >
> > > > A coworker put a script on a server which loads up quite afew arrays
> > > > with pre-set values and then compares the values against arrays. As soon as he
> > > > kicked off the script I got alot of these in my log files. Not much longer and the
> > > > machine crashed hard.
> > > >
> > >
> > > Possible bad RAM.
> > >
> > > Cheers,
> > > Dick Johnson
> > > Penguin : Linux version 2.4.22 on an i686 machine (794.73 BogoMips).
> > > Note 96.31% of all statistics are fiction.
> > >
> >
> > :wq!
> > ---------------------------------------------------------------------------
> > Robert L. Harris | GPG Key ID: E344DA3B
> > @ x-hkp://pgp.mit.edu
> > DISCLAIMER:
> > These are MY OPINIONS ALONE. I speak for no-one else.
> >
> > Life is not a destination, it's a journey.
> > Microsoft produces 15 car pileups on the highway.
> > Don't stop traffic to stand and gawk at the tragedy.
> --
> Martin Schlemmer
>

:wq!
---------------------------------------------------------------------------
Robert L. Harris | GPG Key ID: E344DA3B
@ x-hkp://pgp.mit.edu
DISCLAIMER:
These are MY OPINIONS ALONE. I speak for no-one else.

Life is not a destination, it's a journey.
Microsoft produces 15 car pileups on the highway.
Don't stop traffic to stand and gawk at the tragedy.


Attachments:
(No filename) (2.91 kB)
(No filename) (189.00 B)
Download all attachments