2003-07-30 21:55:40

by Ian S. Nelson

[permalink] [raw]
Subject: Dell 2650 Dual Xeon freezing up frequently

I'm running a RedHat 2.4.20 kernel on some 2650's all dual xeon
(pentium 4 jacksonized so it looks like 4 procsessors) 2 have 1GB of
RAM and 1 has 2GB of RAM. THey all wedge, some times after a few
minutes, sometimes after hours.

I hooked up a serial consol to capture a kernel panic or something else
that would be fun to debug, no such luck.. It just locks up. No nothing.


I'm looking at the 2.4.21 change logs and I'm not seeing aynthing that
sounds like it would fix this, a couple possible SMP issues but nothing
that identifies Pentium 4 Xeon problems.
I've added one networking module but the problem happens without it
being loaded, so my crap doesn't smell bad, yet ;-)

I'm spinning stuff on it in uniprocessor mode at the moment, seeing if
that fixes anything.

any free clues?


thanks,
Ian


Attachments:
(No filename) (252.00 B)

2003-07-30 22:13:05

by Matt Domsch

[permalink] [raw]
Subject: Re: Dell 2650 Dual Xeon freezing up frequently

On Wed, 2003-07-30 at 16:52, Ian S. Nelson wrote:
> I'm running a RedHat 2.4.20 kernel on some 2650's all dual xeon
> (pentium 4 jacksonized so it looks like 4 procsessors) 2 have 1GB of
> RAM and 1 has 2GB of RAM. THey all wedge, some times after a few
> minutes, sometimes after hours.

Make sure you have an up-to-date tg3 network driver. There was a brief
period of time around when 2.4.20 was released where it didn't have all
the hardware quirk workarounds in it. Something in the current 2.4.x
tree should work fine.

Thanks,
Matt

--
Matt Domsch
Sr. Software Engineer, Lead Engineer
Dell Linux Solutions http://www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com

2003-07-30 22:50:58

by Mikael Pettersson

[permalink] [raw]
Subject: Re: Dell 2650 Dual Xeon freezing up frequently

On Wed, 30 Jul 2003 15:52:58 -0600, Ian S. Nelson wrote:
>I'm running a RedHat 2.4.20 kernel on some 2650's all dual xeon
>(pentium 4 jacksonized so it looks like 4 procsessors) 2 have 1GB of
>RAM and 1 has 2GB of RAM. THey all wedge, some times after a few
>minutes, sometimes after hours.
>
>I hooked up a serial consol to capture a kernel panic or something else
>that would be fun to debug, no such luck.. It just locks up. No nothing.

Welcome to the club. Our PE2650 used to hang hard anywhere from a few
days to two weeks after each reboot. This started after our RH7.3 to
RH8.0 upgrade and may be related to problems with the tg3 NIC driver.
(RH7.3 was stable, but then I think we used a bcm<something> driver.)

After having these problems from January to March or April, with a
a series of RH upgrade kernels, I accidentally found that enabling
the I/O-APIC nmi_watchdog solved all problems. (I can only speculate
that somehow the regular I/O-APIC NMIs prevent some hang somewhere.)
It's now running stable as a rock since April.

/Mikael

2003-07-30 23:02:19

by Alan

[permalink] [raw]
Subject: Re: Dell 2650 Dual Xeon freezing up frequently

On Mer, 2003-07-30 at 22:52, Ian S. Nelson wrote:
> I'm running a RedHat 2.4.20 kernel on some 2650's all dual xeon
> (pentium 4 jacksonized so it looks like 4 procsessors) 2 have 1GB of
> RAM and 1 has 2GB of RAM. THey all wedge, some times after a few
> minutes, sometimes after hours.

With tg3 networking. If so make sure you either have a current errata or
switch to the broadcom provided driver and that may help.

2003-08-04 05:14:30

by James Bourne

[permalink] [raw]
Subject: Re: Dell 2650 Dual Xeon freezing up frequently

On Wed, 30 Jul 2003, Ian S. Nelson wrote:

> I'm running a RedHat 2.4.20 kernel on some 2650's all dual xeon
> (pentium 4 jacksonized so it looks like 4 procsessors) 2 have 1GB of
> RAM and 1 has 2GB of RAM. THey all wedge, some times after a few
> minutes, sometimes after hours.
>
> I hooked up a serial consol to capture a kernel panic or something else
> that would be fun to debug, no such luck.. It just locks up. No nothing.
>
>
> I'm looking at the 2.4.21 change logs and I'm not seeing aynthing that
> sounds like it would fix this, a couple possible SMP issues but nothing
> that identifies Pentium 4 Xeon problems.
> I've added one networking module but the problem happens without it
> being loaded, so my crap doesn't smell bad, yet ;-)
>
> I'm spinning stuff on it in uniprocessor mode at the moment, seeing if
> that fixes anything.

Try replacing the tg3 driver with the one found in newer kernels (2.4.22pre
or 2.4.21) or make sure you are using the latest RH kernel with the updated
tg3 driver. Do not use the bcm5700.o driver BTW, it has problems.

Another problem could be the aacraid controller, but they normally have a
lot of noise associated with a hang. Unfortunately it's unclear at this
time if that is a hardware problem, firmware problem, or driver problem.

Regards
James Bourne

>
> any free clues?
>
>
> thanks,
> Ian
>
>

--
James Bourne | Email: [email protected]
Unix Systems Administrator | WWW: http://www.hardrock.org
Custom Unix Programming | Linux: The choice of a GNU generation
----------------------------------------------------------------------
"All you need's an occasional kick in the philosophy." Frank Herbert