2006-09-04 22:09:31

by Pavel Machek

[permalink] [raw]
Subject: x60 - spontaneous thermal shutdown

Hi!

x60 shut down after quite a while of uptime, in period of quite heavy
load:

Sep 4 23:33:01 amd kernel: ACPI: Critical trip point
Sep 4 23:33:01 amd kernel: Critical temperature reached (128 C), shutting down.
Sep 4 23:33:01 amd shutdown[32585]: shutting down for system halt
Sep 4 23:34:42 amd init: Switching to runlevel: 0

I do not think cpu reached 128C, as I still have my machine... Did
anyone else see that?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


2006-09-04 22:26:18

by Andreas Mohr

[permalink] [raw]
Subject: Re: x60 - spontaneous thermal shutdown

Hi,

On Mon, Sep 04, 2006 at 11:40:59PM +0200, Pavel Machek wrote:
> Hi!
>
> x60 shut down after quite a while of uptime, in period of quite heavy
> load:
>
> Sep 4 23:33:01 amd kernel: ACPI: Critical trip point
> Sep 4 23:33:01 amd kernel: Critical temperature reached (128 C), shutting down.
> Sep 4 23:33:01 amd shutdown[32585]: shutting down for system halt
> Sep 4 23:34:42 amd init: Switching to runlevel: 0
>
> I do not think cpu reached 128C, as I still have my machine... Did
> anyone else see that?

Could this be in any way related to the (in)famous Random Shutdown issues
on a little too many Apple MacBooks?
(since the x60 incidentally just happens to be Core Duo architecture, too)

Those Random Shutdown issues at least in several cases appear to happen
due to trouble with the temperature sensor or mainboard issues.
Thermal management is in quite some trouble there, judging from
the rather diverse aspects of machine shutdown failure...
(fan not working, CPU overheating, NOT overheating but shutting down
directly after boot, ...)

There's nothing like rushing out immature hardware to unsuspecting consumers...

Andreas Mohr

2006-09-04 22:36:15

by Pavel Machek

[permalink] [raw]
Subject: Re: x60 - spontaneous thermal shutdown

Hi!

> > x60 shut down after quite a while of uptime, in period of quite heavy
> > load:
> >
> > Sep 4 23:33:01 amd kernel: ACPI: Critical trip point
> > Sep 4 23:33:01 amd kernel: Critical temperature reached (128 C), shutting down.
> > Sep 4 23:33:01 amd shutdown[32585]: shutting down for system halt
> > Sep 4 23:34:42 amd init: Switching to runlevel: 0
> >
> > I do not think cpu reached 128C, as I still have my machine... Did
> > anyone else see that?
>
> Could this be in any way related to the (in)famous Random Shutdown issues
> on a little too many Apple MacBooks?
> (since the x60 incidentally just happens to be Core Duo
> architecture, too)

Well, but those macbooks were really overheating, no? This seems like
sensor failure, because I do not think cpu had 128 Celsius, without
going through 100 Celsius, first.

> Those Random Shutdown issues at least in several cases appear to happen
> due to trouble with the temperature sensor or mainboard issues.
> Thermal management is in quite some trouble there, judging from
> the rather diverse aspects of machine shutdown failure...
> (fan not working, CPU overheating, NOT overheating but shutting down
> directly after boot, ...)

I had fan working at the time of shutdown, and machine was able to
boot immediately afterwards. That means that 128 celsius was sensor
error.


--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-09-04 22:53:08

by Andreas Mohr

[permalink] [raw]
Subject: Re: x60 - spontaneous thermal shutdown

Hi,

On Tue, Sep 05, 2006 at 12:35:20AM +0200, Pavel Machek wrote:
> Well, but those macbooks were really overheating, no? This seems like
> sensor failure, because I do not think cpu had 128 Celsius, without
> going through 100 Celsius, first.

No, in several cases it was a problem with a broken/damaged sensor cable.
But it seems there are a number of different problems with thermal
management, a non-working sensor cable only being (albeit a significant)
one of those.
Google "MacBook random shutdown" will provide tons of information.

> I had fan working at the time of shutdown, and machine was able to
> boot immediately afterwards. That means that 128 celsius was sensor
> error.

Let's hope people get that braindamage resolved, either via BIOS updates
(hmm, but probably not helpful in case of ACPI?) or by shipping/repairing
into working hardware (an astonishing amount of people already had their
2nd or 3rd non-working repair).

Andreas Mohr

2006-09-05 06:12:20

by Jan Engelhardt

[permalink] [raw]
Subject: Re: x60 - spontaneous thermal shutdown


>> > Sep 4 23:33:01 amd kernel: ACPI: Critical trip point
>> > Sep 4 23:33:01 amd kernel: Critical temperature reached (128 C), shutting down.
>> > Sep 4 23:33:01 amd shutdown[32585]: shutting down for system halt
>> > Sep 4 23:34:42 amd init: Switching to runlevel: 0
>> >
>> > I do not think cpu reached 128C, as I still have my machine... Did
>> > anyone else see that?
>>
>> Could this be in any way related to the (in)famous Random Shutdown issues
>> on a little too many Apple MacBooks?
>> (since the x60 incidentally just happens to be Core Duo
>> architecture, too)
>
>Well, but those macbooks were really overheating, no? This seems like
>sensor failure, because I do not think cpu had 128 Celsius, without
>going through 100 Celsius, first.
>
>I had fan working at the time of shutdown, and machine was able to
>boot immediately afterwards. That means that 128 celsius was sensor
>error.

If it was near 128 C for some time, the plastic case the mainboard is
housed in would have been extremely hot and one would have probably burned
his fingers.


Jan Engelhardt
--

2006-09-11 09:46:26

by Stefan Seyfried

[permalink] [raw]
Subject: Re: x60 - spontaneous thermal shutdown

On Mon, Sep 04, 2006 at 11:40:59PM +0200, Pavel Machek wrote:
> Hi!
>
> x60 shut down after quite a while of uptime, in period of quite heavy
> load:
>
> Sep 4 23:33:01 amd kernel: ACPI: Critical trip point
> Sep 4 23:33:01 amd kernel: Critical temperature reached (128 C), shutting down.
> Sep 4 23:33:01 amd shutdown[32585]: shutting down for system halt
> Sep 4 23:34:42 amd init: Switching to runlevel: 0
>
> I do not think cpu reached 128C, as I still have my machine... Did
> anyone else see that?

my usual suspect: use ec_intr=0. I have seen this rather often on HP machines.
I attributed it to "communication problems with embedded controller" and
ec_intr=0 seemed to help somehow. But then, this was some kernel versions
ago and i did not encounter it recently.
--
Stefan Seyfried \ "I didn't want to write for pay. I
QA / R&D Team Mobile Devices \ wanted to be paid for what I write."
SUSE LINUX Products GmbH, N?rnberg \ -- Leonard Cohen

2006-09-11 14:10:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: x60 - spontaneous thermal shutdown

On Monday, 11 September 2006 11:46, Stefan Seyfried wrote:
> On Mon, Sep 04, 2006 at 11:40:59PM +0200, Pavel Machek wrote:
> > Hi!
> >
> > x60 shut down after quite a while of uptime, in period of quite heavy
> > load:
> >
> > Sep 4 23:33:01 amd kernel: ACPI: Critical trip point
> > Sep 4 23:33:01 amd kernel: Critical temperature reached (128 C), shutting down.
> > Sep 4 23:33:01 amd shutdown[32585]: shutting down for system halt
> > Sep 4 23:34:42 amd init: Switching to runlevel: 0
> >
> > I do not think cpu reached 128C, as I still have my machine... Did
> > anyone else see that?
>
> my usual suspect: use ec_intr=0.

Is this a kernel command line parameter?

I'm having some suspend/resume related problems on HPC 6325 now, and they
seem to be related to the embedded controller.

Greetings,
Rafael


--
You never change things by fighting the existing reality.
R. Buckminster Fuller

2006-09-11 15:13:22

by Stefan Seyfried

[permalink] [raw]
Subject: Re: x60 - spontaneous thermal shutdown

On Mon, Sep 11, 2006 at 04:10:36PM +0200, Rafael J. Wysocki wrote:
> On Monday, 11 September 2006 11:46, Stefan Seyfried wrote:
> > On Mon, Sep 04, 2006 at 11:40:59PM +0200, Pavel Machek wrote:
> > > Hi!
> > >
> > > x60 shut down after quite a while of uptime, in period of quite heavy
> > > load:
> > >
> > > Sep 4 23:33:01 amd kernel: ACPI: Critical trip point
> > > Sep 4 23:33:01 amd kernel: Critical temperature reached (128 C), shutting down.
> > > Sep 4 23:33:01 amd shutdown[32585]: shutting down for system halt
> > > Sep 4 23:34:42 amd init: Switching to runlevel: 0
> > >
> > > I do not think cpu reached 128C, as I still have my machine... Did
> > > anyone else see that?
> >
> > my usual suspect: use ec_intr=0.
>
> Is this a kernel command line parameter?

yes.

seife@susi:~> dmesg | grep "^ACPI: EC"
ACPI: EC polling mode.
seife@susi:~> cat /proc/cmdline
root=/dev/hda5 vga=0x317 sysrq=yes resume=/dev/hda1 splash=silent showopts ec_intr=0

with ec_intr=1 (default), you'll get "ACPI: EC interrupt mode."

> I'm having some suspend/resume related problems on HPC 6325 now, and they
> seem to be related to the embedded controller.

Well, polling mode is always on my "things to try"-list for those unspecified
ACPI failures :-)
--
Stefan Seyfried
QA / R&D Team Mobile Devices | "Any ideas, John?"
SUSE LINUX Products GmbH, N?rnberg | "Well, surrounding them's out."