2002-06-15 22:22:00

by Steve Cole

[permalink] [raw]
Subject: Dual Athlon 2000 XP MP nightmare

I'm not sure that what I'm experiencing is a kernel problem, but I thought
I would stick my foot in the door nonetheless, since I have no real
indication of what is going on.

I have a dual Athlon 2000+ XP MP system. It's crashing very frequently and
looks to be getting worse. It seems to crash less with 2.4.19pre10-ac2
which supports the 760 bus and 744x IDE controller, but with something that
is as intermittent as this, who can tell?

Machine specs:

ASUS A7M266-D motherboard, 1006 BIOS rev.
2GB ECC registered memory
4 x 15K RPM Seagate UltraSCSI drives
2 x 2960 (AIC7892 rev 2) controllers
2 x 3C59x 3Com ethernet controllers
!USB card to free up IRQs (removed later)
400W power supply
240W power supply driving two of the hard drives + CD ROM
Budget vid card
2 drives partitioned 30%/70%, 30% mirrored together for boot, 70% mirrored
in RAID 0+1 with other drives

I get EIP errors and Null pointer exception errors during full kernel
panics. I've had a lot of file system corruption in ReiserFS originally and
now in EXT2, both fixable though Reiser seemed worse. Uptime is measured in
hours - usually 12 or more, sometimes two or three.

I can't come up with any reasons for this that point at the kernel, but on
the other hand, nothing is ever logged regarding SCSI I/O problems (verbose
logging turned on in kernel with extra queue checks). I've replaced the
> memory to no avail, and updated the BIOS' of both motherboard and Adaptec
cards. No memory errors are logged and one pass of memtest86 found no
memory errors.

Yet, the machine crashes semi-randomly (load seems to play some part in
this) and often crashes during the shutdown/reboot phase if it's run
reliably for a few hours.

If it's hardware for sure, please just indicate that and I'll move on. I'm
getting semi-desperate. :(


2002-06-16 21:07:27

by Austin Gonyou

[permalink] [raw]
Subject: Re: Dual Athlon 2000 XP MP nightmare

On Sun, 2002-06-16 at 03:30, Mark Hounschell wrote:
...
> First make sure you have MP cpus NOT XP's. The XP's are not certified by amd to run
> SMP. Second, try append="mem=nopentium" in your lilo.conf file. I have a dual 1900+ MP
> box and without that I have random lockups also.
>

Ahh..yes....sooo true. I thought that the subject was just
mis-represented, not necessarily wrong. I wholly agree on that. Also, if
you are using MPs, please try the KDB.
> Mark
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Austin Gonyou <[email protected]>

2002-06-17 02:43:20

by Shawn

[permalink] [raw]
Subject: Re: Dual Athlon 2000 XP MP nightmare

On 06/16, Shawn said something like:
> On 06/15, Dr. David Alan Gilbert said something like:
> > > which supports the 760 bus and 744x IDE controller, but with something that
> > > is as intermittent as this, who can tell?
> >
> > Can I clarify something - are the processors XP's or MP's ? If they are
> > XP's well then that isn't a supported operation and might well not work
> > relliably.
>
> XP MPs exist now.

Ah, just looked closer, and it appears they do in fact just call them
MPs. The "1900+" labeling scheme throws me off.

--
Shawn Leas
[email protected]

I didn't get a toy train like the other kids, I got
a toy subway instead; you couldn't see anything but
every now and then you'd hear this rumbling noise go by.
-- Stephen Wright

2002-06-17 02:38:45

by Shawn

[permalink] [raw]
Subject: Re: Dual Athlon 2000 XP MP nightmare

On 06/15, Dr. David Alan Gilbert said something like:
> > which supports the 760 bus and 744x IDE controller, but with something that
> > is as intermittent as this, who can tell?
>
> Can I clarify something - are the processors XP's or MP's ? If they are
> XP's well then that isn't a supported operation and might well not work
> relliably.

XP MPs exist now.

--
Shawn Leas
[email protected]

When I was crossing the border into Canada, they asked if I had
any firearms with me. I said, "Well, what do you need?"
-- Stephen Wright

2002-06-19 12:05:01

by Bill Davidsen

[permalink] [raw]
Subject: Re: Dual Athlon 2000 XP MP nightmare

On Sun, 16 Jun 2002, Allan Sandfeld Jensen wrote:

> On Sunday 16 June 2002 10:30, Mark Hounschell wrote:

> > First make sure you have MP cpus NOT XP's. The XP's are not certified by
> > amd to run SMP. Second, try append="mem=nopentium" in your lilo.conf file.
> > I have a dual 1900+ MP box and without that I have random lockups also.
> >
> BS and FUD!

What's wrong with this suggestion, from someone who believes it works?
Other than suggesting that it be hand entered instead of put in lilo?
Disabling 4M pages is unlikely to solve the problem, but (a) the poster
has tried it and I bet you haven't, and (b) all the things you suggest
require hardware action, while a boot option can be done with less effort
and chance of damage.

What you suggest is more likely to work, but I see no reason not to try
the simple fix first, with low time and effort budget.

> First try to remove one processor, and test the motherboard in single CPU
> configuration. If you still see crashes replace the motherboard. I also have
> a defective Asus A7M266-D. It crashes in any configuration of CPUs, power
> supplies and video cards.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-06-15 22:58:46

by Dave Gilbert (Home)

[permalink] [raw]
Subject: Re: Dual Athlon 2000 XP MP nightmare

* Steve Cole ([email protected]) wrote:
> I'm not sure that what I'm experiencing is a kernel problem, but I thought
> I would stick my foot in the door nonetheless, since I have no real
> indication of what is going on.

Hi Steve,

> I have a dual Athlon 2000+ XP MP system. It's crashing very frequently and
> looks to be getting worse. It seems to crash less with 2.4.19pre10-ac2
> which supports the 760 bus and 744x IDE controller, but with something that
> is as intermittent as this, who can tell?

Can I clarify something - are the processors XP's or MP's ? If they are
XP's well then that isn't a supported operation and might well not work
relliably.

> I get EIP errors and Null pointer exception errors during full kernel
> panics. I've had a lot of file system corruption in ReiserFS originally and
> now in EXT2, both fixable though Reiser seemed worse. Uptime is measured in
> hours - usually 12 or more, sometimes two or three.

Have you tried running this single processor? Is it reliable?

Dave
---------------- Have a happy GNU millennium! ----------------------
/ Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \
\ gro.gilbert @ treblig.org | MIPS,x86,ARM, SPARC and HP-PA | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/

2002-06-16 00:38:11

by Austin Gonyou

[permalink] [raw]
Subject: Re: Dual Athlon 2000 XP MP nightmare

On Sat, 2002-06-15 at 17:21, Steve Cole wrote:
> I'm not sure that what I'm experiencing is a kernel problem, but I thought
> I would stick my foot in the door nonetheless, since I have no real
> indication of what is going on.


One thing which could remedy this is using the KDB. (kernel debugger) If
you get the -aa series kernels/patches you can get this functionality.
(Dunno if -ac has it built it). Once the machine crashes, as long as
it's not a black screen, then you can get a back-trace on the process
and even possibly find the section of code that's actually causing the
problem right off the bat,(not usual and YMMV), though probably not, but
it's an enormous amount of info that you can't get ATM.


> If it's hardware for sure, please just indicate that and I'll move on. I'm
> getting semi-desperate. :(

--
Austin Gonyou <[email protected]>

2002-06-16 06:35:55

by Austin Gonyou

[permalink] [raw]
Subject: Re: Dual Athlon 2000 XP MP nightmare

It does most sound like some mis-behaving kernel code. No hard locks it
sounds like...but kernel panics and no joy. I'd for sure recommend KDB
for this chore, it will at least shed light on the misbehaving piece.

On Sat, 2002-06-15 at 19:54, Hugh wrote:
> Dear Steve, Richard, and others,
>
> >I'm not sure that what I'm experiencing is a kernel problem, but I >thought
> >I would stick my foot in the door nonetheless, since I have no real
> >indication of what is going on.
> >
....
>
> That was when I returned the motherboard the second time.
> The first time, the board gave me a CMOS error. The third time, the
> board even did not give me the first beep. I now think that I should
> have bought the Tyan board instead of the ASUS.
>
> Regards,
>
> G. Hugh Song
>
>
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Austin Gonyou <[email protected]>

2002-06-16 08:29:55

by Mark Hounschell

[permalink] [raw]
Subject: Re: Dual Athlon 2000 XP MP nightmare

Steve Cole wrote:
>
> I'm not sure that what I'm experiencing is a kernel problem, but I thought
> I would stick my foot in the door nonetheless, since I have no real
> indication of what is going on.
>
> I have a dual Athlon 2000+ XP MP system. It's crashing very frequently and
> looks to be getting worse. It seems to crash less with 2.4.19pre10-ac2
> which supports the 760 bus and 744x IDE controller, but with something that
> is as intermittent as this, who can tell?
>
> Machine specs:
>
> ASUS A7M266-D motherboard, 1006 BIOS rev.
> 2GB ECC registered memory
> 4 x 15K RPM Seagate UltraSCSI drives
> 2 x 2960 (AIC7892 rev 2) controllers
> 2 x 3C59x 3Com ethernet controllers
> !USB card to free up IRQs (removed later)
> 400W power supply
> 240W power supply driving two of the hard drives + CD ROM
> Budget vid card
> 2 drives partitioned 30%/70%, 30% mirrored together for boot, 70% mirrored
> in RAID 0+1 with other drives
>
> I get EIP errors and Null pointer exception errors during full kernel
> panics. I've had a lot of file system corruption in ReiserFS originally and
> now in EXT2, both fixable though Reiser seemed worse. Uptime is measured in
> hours - usually 12 or more, sometimes two or three.
>
> I can't come up with any reasons for this that point at the kernel, but on
> the other hand, nothing is ever logged regarding SCSI I/O problems (verbose
> logging turned on in kernel with extra queue checks). I've replaced the
> > memory to no avail, and updated the BIOS' of both motherboard and Adaptec
> cards. No memory errors are logged and one pass of memtest86 found no
> memory errors.
>
> Yet, the machine crashes semi-randomly (load seems to play some part in
> this) and often crashes during the shutdown/reboot phase if it's run
> reliably for a few hours.
>
> If it's hardware for sure, please just indicate that and I'll move on. I'm
> getting semi-desperate. :(
>

First make sure you have MP cpus NOT XP's. The XP's are not certified by amd to run
SMP. Second, try append="mem=nopentium" in your lilo.conf file. I have a dual 1900+ MP
box and without that I have random lockups also.

Mark

2002-06-16 12:29:41

by Allan Sandfeld Jensen

[permalink] [raw]
Subject: Re: Dual Athlon 2000 XP MP nightmare

On Sunday 16 June 2002 10:30, Mark Hounschell wrote:
> Steve Cole wrote:
> > I'm not sure that what I'm experiencing is a kernel problem, but I
> > thought I would stick my foot in the door nonetheless, since I have no
> > real indication of what is going on.
> >
> > I have a dual Athlon 2000+ XP MP system. It's crashing very frequently
> > and looks to be getting worse. It seems to crash less with
> > 2.4.19pre10-ac2 which supports the 760 bus and 744x IDE controller, but
> > with something that is as intermittent as this, who can tell?
> >
> > Machine specs:
> >
> > ASUS A7M266-D motherboard, 1006 BIOS rev.
> > 2GB ECC registered memory
> > 4 x 15K RPM Seagate UltraSCSI drives
> > 2 x 2960 (AIC7892 rev 2) controllers
> > 2 x 3C59x 3Com ethernet controllers
> > !USB card to free up IRQs (removed later)
> > 400W power supply
> > 240W power supply driving two of the hard drives + CD ROM
> > Budget vid card
> > 2 drives partitioned 30%/70%, 30% mirrored together for boot, 70%
> > mirrored in RAID 0+1 with other drives
> >
> > I get EIP errors and Null pointer exception errors during full kernel
> > panics. I've had a lot of file system corruption in ReiserFS originally
> > and now in EXT2, both fixable though Reiser seemed worse. Uptime is
> > measured in hours - usually 12 or more, sometimes two or three.
> >
> > I can't come up with any reasons for this that point at the kernel, but
> > on the other hand, nothing is ever logged regarding SCSI I/O problems
> > (verbose logging turned on in kernel with extra queue checks). I've
> > replaced the
> >
> > > memory to no avail, and updated the BIOS' of both motherboard and
> > > Adaptec
> >
> > cards. No memory errors are logged and one pass of memtest86 found no
> > memory errors.
> >
> > Yet, the machine crashes semi-randomly (load seems to play some part in
> > this) and often crashes during the shutdown/reboot phase if it's run
> > reliably for a few hours.
> >
> > If it's hardware for sure, please just indicate that and I'll move on.
> > I'm getting semi-desperate. :(
>
> First make sure you have MP cpus NOT XP's. The XP's are not certified by
> amd to run SMP. Second, try append="mem=nopentium" in your lilo.conf file.
> I have a dual 1900+ MP box and without that I have random lockups also.
>
BS and FUD!

First try to remove one processor, and test the motherboard in single CPU
configuration. If you still see crashes replace the motherboard. I also have
a defective Asus A7M266-D. It crashes in any configuration of CPUs, power
supplies and video cards.

2002-06-16 00:53:29

by Hugh

[permalink] [raw]
Subject: Re: Dual Athlon 2000 XP MP nightmare

Dear Steve, Richard, and others,

>I'm not sure that what I'm experiencing is a kernel problem, but I >thought
>I would stick my foot in the door nonetheless, since I have no real
>indication of what is going on.
>
>I have a dual Athlon 2000+ XP MP system. It's crashing very frequently >and
>looks to be getting worse. It seems to crash less with 2.4.19pre10-ac2
>which supports the 760 bus and 744x IDE controller, but with something
>that
>is as intermittent as this, who can tell?

I found three of us trying to use Athlon dual as a linux server so far.

>Machine specs:
>
> ASUS A7M266-D motherboard, 1006 BIOS rev.
> 2GB ECC registered memory
> 4 x 15K RPM Seagate UltraSCSI drives

So far the same.

> 2 x 2960 (AIC7892 rev 2) controllers
> 2 x 3C59x 3Com ethernet controllers
> !USB card to free up IRQs (removed later)
> 400W power supply

I also have a 400 watts.

> 240W power supply driving two of the hard drives + CD ROM
> Budget vid card
> 2 drives partitioned 30%/70%, 30% mirrored together for boot, 70%
>mirrored
>in RAID 0+1 with other drives

I got intemittent Memory-related errors while running Latex on a 1000
page book manuscript.
Without the error, the speed on this machine was the most impressive:

Athlon dual 1.6GHz : 16 sec
P4 Xeon dual 1.7GHz : 52 sec
P III single at 700 MHz : 35 sec

No error log in /var/log/message. Very rarely but certainly the
Athlon dual crashed with the Reiserfs error (horrible!! I lost a file
of a chapter. I recovered it only from a backup.) Initially, I
thought that it is a kernel problem in the recent kernels. In the
end, I realized that the kernel version really does not matter. I
finally checked the memory one by one using memtest86 with the ECC
disabled in the CMOS setup. No problem. I then installed all four
memory modules and did memtest86. Alas!!!!, it found the memory error.
Did the test three times. All three times, it always stopped in the
middle with slightly different error messages, everytime reinstalled
the memory modules just to be sure about the contact.

That was when I returned the motherboard the second time.
The first time, the board gave me a CMOS error. The third time, the
board even did not give me the first beep. I now think that I should
have bought the Tyan board instead of the ASUS.

Regards,

G. Hugh Song