2005-10-25 16:45:07

by Jack Howarth

[permalink] [raw]
Subject: W2100Z Critical temperature explained

Has anyone else run into the following problem with the 2.6.12
or 2.6.13 kernels on a Sun W2100Z dual opteron workstation? I found
that the Fedora Core 4 kernel 2.6.12-1.1456_FC4smp was causing random
shutdowns with error messages that 'Critical Temperature was reached: 68 C'.
This was occuring repeatedly under that kernel. After switching to the
latest FC4 kernel, 2.6.13-1.1532_FC4smp, these temperature events seemed
to have been eliminated for now.
However in calling the Sun Java Desktop support group, I was told
that the earlier BIOS versions on the W2100Z had bugs that can cause the
errors I was seeing as well as causing the cpu fans to self destruct.
The fix is apparently to upgrade the BIOS to the current one on their
Supplemental 2.1 CD. Is there some site this sort of information should
be added to? Perhaps Linux would be well served if there was a list of
motherboard BIOS kept and noted added regarding compatibility with
various Linux kernels. Certainly in cases like these where destruction
can occur due to the bugs in the firmware, this merits being passed
along to the Linux kernel users.
Jack



2005-10-25 18:13:11

by Alan

[permalink] [raw]
Subject: Re: W2100Z Critical temperature explained

On Maw, 2005-10-25 at 12:43 -0400, Jack Howarth wrote:
> various Linux kernels. Certainly in cases like these where destruction
> can occur due to the bugs in the firmware, this merits being passed
> along to the Linux kernel users.

If that is indeed the case and as serious as you describe then I'm sure
Sun will be contacting all their customers urgently to advise them of
the flawed hardware as they would with faulty PSU's or other items.

Trying to track each of the billion broken PCs in the world work on a
given day with a given card combination is something thats almost
computationally infeasible

Alan

2005-10-25 23:04:24

by Jack Howarth

[permalink] [raw]
Subject: Re: W2100Z Critical temperature explained

Alan,
Actually I found a discussion of the issue that I believe I am seeing
at...

http://supportforum.sun.com/hardware/index.php?t=msg&goto=18308&rid=6746&SQ=d7bff636081bc7374f3e861f6672e008

There may be more than one cause, but it seems clear that the earlier
BIOS is less tolerant of fans as they start to wear. The newer BIOS
probes the fans several times. Hence the user who had to put in a new fan
so he could stay booted long enough to flash the new BIOS and them the
old fan was usable. Ugh.
Jack

2005-11-10 16:28:03

by Kjartan Maraas

[permalink] [raw]
Subject: Re: W2100Z Critical temperature explained

tir, 25,.10.2005 kl. 19.42 +0100, skrev Alan Cox:
> On Maw, 2005-10-25 at 12:43 -0400, Jack Howarth wrote:
> > various Linux kernels. Certainly in cases like these where destruction
> > can occur due to the bugs in the firmware, this merits being passed
> > along to the Linux kernel users.
>
> If that is indeed the case and as serious as you describe then I'm sure
> Sun will be contacting all their customers urgently to advise them of
> the flawed hardware as they would with faulty PSU's or other items.
>
> Trying to track each of the billion broken PCs in the world work on a
> given day with a given card combination is something thats almost
> computationally infeasible
>
FWIW I've seen this from time to time on my HP/Compaq nc4010 laptop as
well, but there's been no updated BIOS for this one in close to a
year...

Cheers
Kjartan