2002-09-06 16:50:38

by jeff

[permalink] [raw]
Subject: Linux SMP kernel bug with > 512M ram


I've been having problems with a few of our servers and I can't seem to find this
problem mentioned anywhere else. All of the dual processor machines will not operate
with greater than 512 megs of ram with the newer SMP kernels (2.4.7-10enterprise #1
SMP). Two of the dual P3 1ghz machines crash after a few minutes, when the memory
usage gets high enough, I presume. The errors they spit out vary, but its only when
I go over 512megs of ram, and only on dual processor machines. I had a slightly
different problem when I tried to set it up on a dual p2 266 machine, when I go over
512 megs there, the system takes an hour to boot up, and everything crawls from
there. I asked a friend of mine to try this newer kernel with his dual processor
server, and he says the same thing (when I go over 512, it crashes). Has anybody had
this problem? Is there a fix?

Regards,

Jeffrey Moss
[email protected]







2002-09-06 17:01:31

by Alan

[permalink] [raw]
Subject: Re: Linux SMP kernel bug with > 512M ram

On Fri, 2002-09-06 at 17:55, [email protected] wrote:
> I've been having problems with a few of our servers and I can't seem to find this
> problem mentioned anywhere else. All of the dual processor machines will not operate
> with greater than 512 megs of ram with the newer SMP kernels (2.4.7-10enterprise #1

2.4.7 is hardly "new". Red Hat has issued errata kernels going up to
2.4.9.

> SMP). Two of the dual P3 1ghz machines crash after a few minutes, when the memory
> usage gets high enough, I presume. The errors they spit out vary, but its only when
> I go over 512megs of ram, and only on dual processor machines. I had a slightly

Chipset or memory hardware problems seem the most likely cause if the
errors seem random or weird

> different problem when I tried to set it up on a dual p2 266 machine, when I go over
> 512 megs there, the system takes an hour to boot up, and everything crawls from

Thats BIOS. Thats a well known BIOS problem where the BIOS doesnt
configure the mtrr registers properly. You can work around that one by
tweaking the settings by hand or probably by getting a newer BIOS


2002-09-06 17:04:03

by Martin J. Bligh

[permalink] [raw]
Subject: Re: Linux SMP kernel bug with > 512M ram

Works fine for me with up to 32Gb of RAM. Can you try a vaguely
current kernel?

Martin.

--On Friday, September 06, 2002 4:55 PM +0000 [email protected] wrote:

>
> I've been having problems with a few of our servers and I can't seem to find this
> problem mentioned anywhere else. All of the dual processor machines will not operate
> with greater than 512 megs of ram with the newer SMP kernels (2.4.7-10enterprise #1
> SMP). Two of the dual P3 1ghz machines crash after a few minutes, when the memory
> usage gets high enough, I presume. The errors they spit out vary, but its only when
> I go over 512megs of ram, and only on dual processor machines. I had a slightly
> different problem when I tried to set it up on a dual p2 266 machine, when I go over
> 512 megs there, the system takes an hour to boot up, and everything crawls from
> there. I asked a friend of mine to try this newer kernel with his dual processor
> server, and he says the same thing (when I go over 512, it crashes). Has anybody had
> this problem? Is there a fix?
>
> Regards,
>
> Jeffrey Moss
> [email protected]
>
>
>
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>


2002-09-06 20:45:48

by jeff

[permalink] [raw]
Subject: Re: Linux SMP kernel bug with > 512M ram

I don't think its chipset or memory, the machines that crash have different brand
motherboards with different chipsets, I ran docmem for 24 hours on each stick of ram
and found no errors. The ram worked fine in my WindowsXP machine, and it works fine
when I use the non-smp kernel, and/or when I take the ram down to 2 sticks (512
meg). I'm posting here because I believe I have narrowed it down to a bug in the
kernel.

> On Fri, 2002-09-06 at 17:55, [email protected] wrote:
> > I've been having problems with a few of our servers and I can't seem to find
this
> > problem mentioned anywhere else. All of the dual processor machines will not
operate
> > with greater than 512 megs of ram with the newer SMP kernels (2.4.7-10enterprise
#1
>
> 2.4.7 is hardly "new". Red Hat has issued errata kernels going up to
> 2.4.9.
>
> > SMP). Two of the dual P3 1ghz machines crash after a few minutes, when the
memory
> > usage gets high enough, I presume. The errors they spit out vary, but its only
when
> > I go over 512megs of ram, and only on dual processor machines. I had a slightly
>
> Chipset or memory hardware problems seem the most likely cause if the
> errors seem random or weird
>
> > different problem when I tried to set it up on a dual p2 266 machine, when I go
over
> > 512 megs there, the system takes an hour to boot up, and everything crawls from
>
> Thats BIOS. Thats a well known BIOS problem where the BIOS doesnt
> configure the mtrr registers properly. You can work around that one by
> tweaking the settings by hand or probably by getting a newer BIOS
>
>
>

2002-09-06 20:50:14

by David Miller

[permalink] [raw]
Subject: Re: Linux SMP kernel bug with > 512M ram

From: [email protected]
Date: 6 Sep 2002 20:50:26 -0000

I'm posting here because I believe I have narrowed it down to a bug in the
kernel.

And Alan also said there are errata upgrades available that
quite possibly will cure your problems.

2002-09-06 21:45:17

by jeff

[permalink] [raw]
Subject: Re: Linux SMP kernel bug with > 512M ram

Its a VIA motherboard, and I found the problem:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=47160
Thanks for the help.

Regards,

Jeffrey Moss
[email protected]



> Message-ID: <005a01c255ec$89be7b40$d281f6cc@WEASEL>
> From: "Steve Wolfe" <[email protected]>
> To: <[email protected]>
> References: <[email protected]> <[email protected]>
> Subject: Re: Linux SMP kernel bug with > 512M ram
> Date: Fri, 6 Sep 2002 15:29:29 -0600
> MIME-Version: 1.0
>
>

>
> > I don't think its chipset or memory, the machines that crash have
> different brand
> > motherboards with different chipsets, I ran docmem for 24 hours on each
> stick of ram
> > and found no errors. The ram worked fine in my WindowsXP machine, and it
> works fine
> > when I use the non-smp kernel, and/or when I take the ram down to 2
> sticks (512
> > meg). I'm posting here because I believe I have narrowed it down to a
> bug in the
> > kernel.
>
> I've also been bit in the rear-end by memory bugs, and I can assure
> you, they can be devilish to find. First, rather than running a memory
> test on each individual stick, you MUST run the memory test on the actual
> system, with the actual RAM. Often times, weird, bizarre errors crop up
> in certain combinations and not in others.
>
> It's also a little difficult to convince people that it's a bug in the
> kernel, when there is an incredibly number of people who run SMP kernels
> with >512 MB. I maintain a decent number of such machines, and have seen
> absolutely *zero* problems of similar natures that weren't traced back to
> defective or incompatible hardware.
>
> I'm curious - what kinds of motherboards are you using, and what kinds
> of RAM? What kind of memory timings are specified in the BIOS?
>
> steve
>
>