2009-11-03 10:57:19

by Pavel Machek

[permalink] [raw]
Subject: 2.6.32-rc5: unexpected thermal shutdown?

Hi!

I found this in the syslog afterwards.

Nov 3 09:59:14 amd kernel: Critical temperature reached (128 C), shutting down.
Nov 3 09:59:14 amd shutdown[17819]: shutting down for system halt
Nov 3 09:59:15 amd init: Switching to runlevel: 0

Now, machine was lying on the bed at that point, so... maybe 32-rc5
consumes more power now, and maybe I was just unlucky...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


2009-11-03 17:25:01

by Frans Pop

[permalink] [raw]
Subject: Re: 2.6.32-rc5: unexpected thermal shutdown?

Pavel Machek wrote:
> I found this in the syslog afterwards.
>
> Nov 3 09:59:14 amd kernel: Critical temperature reached (128 C),
> shutting down.
> Nov 3 09:59:14 amd shutdown[17819]: shutting down for system halt
> Nov 3 09:59:15 amd init: Switching to runlevel: 0

Looks like what happened to me earlier this year. See
http://bugzilla.kernel.org/show_bug.cgi?id=13918.

> Now, machine was lying on the bed at that point, so...

So probably both the fan intake and hot air outlet were blocked,
effectively preventing cooling.

What hardware is this?
What's the output of 'grep . /proc/acpi/thermal_zone/TZ*/*'?
Any thermal zones in there that don't have a "passive" trip point?

Cheers,
FJP

2009-11-03 18:37:43

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.32-rc5: unexpected thermal shutdown?

On Tue 2009-11-03 18:25:03, Frans Pop wrote:
> Pavel Machek wrote:
> > I found this in the syslog afterwards.
> >
> > Nov 3 09:59:14 amd kernel: Critical temperature reached (128 C),
> > shutting down.
> > Nov 3 09:59:14 amd shutdown[17819]: shutting down for system halt
> > Nov 3 09:59:15 amd init: Switching to runlevel: 0
>
> Looks like what happened to me earlier this year. See
> http://bugzilla.kernel.org/show_bug.cgi?id=13918.

Will take a look.

> > Now, machine was lying on the bed at that point, so...
>
> So probably both the fan intake and hot air outlet were blocked,
> effectively preventing cooling.

Well, no. outlet was not blocked; parts of intake maybe.

> What hardware is this?

Thinkpad x60.

> What's the output of 'grep . /proc/acpi/thermal_zone/TZ*/*'?
> Any thermal zones in there that don't have a "passive" trip point?

128C means "slightly fake" temperature sensor. It seems that it just
produces 128 in THM0 when temperature exceeds some other limit.

pavel@amd:~$ grep . /proc/acpi/thermal_zone/*/*
/proc/acpi/thermal_zone/THM0/cooling_mode:<setting not supported>
/proc/acpi/thermal_zone/THM0/polling_frequency:<polling disabled>
/proc/acpi/thermal_zone/THM0/state:state: ok
/proc/acpi/thermal_zone/THM0/temperature:temperature: 58 C
/proc/acpi/thermal_zone/THM0/trip_points:critical (S5): 127
C
/proc/acpi/thermal_zone/THM1/cooling_mode:<setting not supported>
/proc/acpi/thermal_zone/THM1/polling_frequency:<polling disabled>
/proc/acpi/thermal_zone/THM1/state:state: ok
/proc/acpi/thermal_zone/THM1/temperature:temperature: 59 C
/proc/acpi/thermal_zone/THM1/trip_points:critical (S5): 97 C
/proc/acpi/thermal_zone/THM1/trip_points:passive: 93
C: tc1=5 tc2=4 tsp=600 devices=CPU0 CPU1
pavel@amd:~$

Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-11-03 19:12:52

by Frans Pop

[permalink] [raw]
Subject: Re: 2.6.32-rc5: unexpected thermal shutdown?

On Tuesday 03 November 2009, you wrote:
> pavel@amd:~$ grep . /proc/acpi/thermal_zone/*/*
> /proc/acpi/thermal_zone/THM0/cooling_mode:<setting not supported>
> /proc/acpi/thermal_zone/THM0/polling_frequency:<polling disabled>
> /proc/acpi/thermal_zone/THM0/state:state: ok
> /proc/acpi/thermal_zone/THM0/temperature:temperature: 58 C
> /proc/acpi/thermal_zone/THM0/trip_points:critical (S5): 127 C

Right, so this zone does not have a passive trip point. If it reaches
critical temp before THM1, it would cause exactly what you saw.

Try recreating the same situation while watching the temp for both thermal
zones.

The solution would be to force a passive cooling point for this zone using
for example (X could be either 0 or 1; depends on order in which zones are
defined in BIOS):
echo 95000 > /sys/class/thermal/thermal_zoneX/passive

Should work with current kernels, but Andrew has a patch set from me
for .33 that has some improvements: http://lkml.org/lkml/2009/10/26/41.

> 128C means "slightly fake" temperature sensor. It seems that it just
> produces 128 in THM0 when temperature exceeds some other limit.

Hmm. If THM0 does not have *any* other values between 58 and 128 then the
above will probably not work. If it makes a few jumps, you should adjust
the trip value in my example accordingly. For my HP2510p the 2 zones that
have no passive trip point in BIOS luckily do have a "real" sensor.

Cheers,
FJP

2009-11-03 19:17:42

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.32-rc5: unexpected thermal shutdown?

On Tue 2009-11-03 20:12:52, Frans Pop wrote:
> On Tuesday 03 November 2009, you wrote:
> > pavel@amd:~$ grep . /proc/acpi/thermal_zone/*/*
> > /proc/acpi/thermal_zone/THM0/cooling_mode:<setting not supported>
> > /proc/acpi/thermal_zone/THM0/polling_frequency:<polling disabled>
> > /proc/acpi/thermal_zone/THM0/state:state: ok
> > /proc/acpi/thermal_zone/THM0/temperature:temperature: 58 C
> > /proc/acpi/thermal_zone/THM0/trip_points:critical (S5): 127 C
>
> Right, so this zone does not have a passive trip point. If it reaches
> critical temp before THM1, it would cause exactly what you saw.
>
> Try recreating the same situation while watching the temp for both thermal
> zones.

Well, I'll try _not_ to recreate this situation :-). If it starts
happening too often, I'll play with passive limits. But... thanks!

> > 128C means "slightly fake" temperature sensor. It seems that it just
> > produces 128 in THM0 when temperature exceeds some other limit.
>
> Hmm. If THM0 does not have *any* other values between 58 and 128 then the
> above will probably not work. If it makes a few jumps, you should
> adjust

It has some values between 58 and 128, but I'm not sure how many.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html